The OmicsMaps catalog of all known sequencing instruments in the world reports that currently there are more than 2,500 high-throughput instruments, manufactured by several different companies, located in nearly 1,000 sequencing centers in 55 countries in universities, hospitals, and other research laboratories. The rate of growth over the last decade has also been truly astonishing, with the total amount of sequence data produced doubling approximately every seven months ( Fig 1). In short, data acquisition in these domains is expected to grow by up to two orders of magnitude in the next decade.įor genomics, data acquisition is highly distributed and involves heterogeneous formats. While this figure is beginning to plateau, a projected logarithmic growth rate would suggest a 2.4-fold growth by 2025, to 1.2 billion tweets per day, 1.36 petabytes/year. Today, Twitter generates 500 million tweets/day, each about 3 kilobytes including metadata ( S2 Note). YouTube currently has 300 hours of video being uploaded every minute, and this could grow to 1,000–1,700 hours per minute (1–2 exabytes of video data per year) by 2025 if we extrapolate from current trends ( S1 Note). For example, the Australian Square Kilometre Array Pathfinder (ASKAP) project currently acquires 7.5 terabytes/second of sample image data, a rate projected to increase 100-fold to 750 terabytes/second (~25 zettabytes per year) by 2025. Astronomy, YouTube, and Twitter are expected to show continued dramatic growth in the volume of data to be acquired. By contrast, YouTube and Twitter acquire data in a highly distributed manner, but under a few standardized protocols. Most astronomy data are acquired from a few highly centralized facilities. The four Big Data domains differ sharply in how data are acquired. To compare these four disparate domains, we considered the four components that comprise the “life cycle” of a dataset: acquisition, storage, distribution, and analysis ( Table 1). Consequently, we do not include the domain in full detail here, although that model of rapid filtering and analysis will surely play an increasingly important role in genomics as the field matures. Particle physics also produces massive quantities of raw data, although the footprint is surprisingly limited since the vast majority of data are discarded soon after acquisition using the processing power that is coupled to the sensors. Twitter, created in 2006, has become the poster child of the burgeoning movement in computational social science, with unprecedented opportunities for new insights by mining the enormous and ever-growing amount of textual data. YouTube burst on the scene in 2005 and has sparked extraordinary worldwide interest in creating and sharing huge numbers of videos. ![]() Astronomy has faced the challenges of Big Data for over 20 years and continues with ever-more ambitious studies of the universe. We compared genomics with three other major generators of Big Data: astronomy, YouTube, and Twitter.
0 Comments
Leave a Reply. |