Big Data Genomics: Storing & Protecting Your Data



← Insights Home

Big Data Genomics: Storing & Protecting Your Data

Tags: big data, genome sequencing, dna sequencing, data storage, storing genomic data, cloud data storage5/27/2020 12:16:34 PM  

Big Data Genomics: Storing & Protecting Your Data


For the past two decades, gene sequencing technology has improved so dramatically that the life sciences industry is on its way to becoming a big data business. The amount of storage accessible to many researchers during this genomic data boom is becoming scarce.

As they continue advancing in their fields, researchers are finding that there is an urgent and immediate need for data storage resources to store and back up all of their genomic data. Without massive amounts of storage accessible to them, scientists are unable to utilize data efficiently and accurately.

To best maintain backups of all of their genomic data and research, life sciences institutions are shifting private, hybrid, and public cloud data storage options to meet their storage needs.

Big Data Genome Sequencing

In 2003, the Human Genome Project sequenced the entire human genome for the first time. It cost $13 billion and took 13 years to complete. Today, with the help of big data genomics, the human genome can be sequenced for $1000 in less than 2 days; and in 10 years the cost could drop by a factor of 10. The change came from unprecedented advances in computing and engineering computers gained more processing power and servers could serve more data.

DNA Sequencing and Data Storage

Today's gene sequencing technology is called Next-Generation Sequencing (NGS) and produces around 15,000 times more data than the technology in 2003. That increase in computing power requires equally large and advanced storage resources. An NGS machine can create 1 terabyte of data in one day, which is largely unsustainable from a data storage point of view.

When researchers went beyond tape and disk, they moved to universally accessible databases and cloud storage platforms that could theoretically store all of the DNA data for 1 in 12 Americans. These databases are backed up safely by offsite data storage providers like Net2Vault.

Challenges Storing Genomic Data

Bioinformatics has exploded in recent years and the scientific community is facing a data crisis. The biggest issue with gene sequencing in the age of big data genomics is the amount of space it takes up and the overall cost of storage for decades. The cost of storing raw data can surpass the cost of generating and analyzing the data itself. Research facilities are organizing, sorting, and storing massive amounts of information using inadequate data storage methods. Exabytes of data are produced every day, but unfortunately, due to the lack of high storage capacity infrastructure, most of it is lost, causing gene sequencing research to slow down.

Mitigating Data Storage Issues

Since most research institutions produce hundreds of terabytes every month, both production data and backup storage take up significantly more room than is available at research data centers. The best way to combat excessive storage use is to delete the raw data files and process them as smaller text files for compression storage, often no more than 100 gigabytes per genome. This process most similarly resembles how GitHub uses text compression lists to encode passages of text into binary to take up less space. Compressing files like this can reduce the size of files by a factor of 20 without losing any important data, which makes genomics data storage a bit more accessible. These compressed files can be held by cloud-based, long-term backup storage solutions like Net2Vault for years, and accessed worldwide.

Store Your Genomic Research Data With Net2Vault

Net2Vault is a unique cloud-based data storage solution focused on storing and protecting NetApp backups safely and securely. As scientific research facilities continue to produce more and more data through DNA sequencing, they will require more storage space beyond their own data centers. Net2Vault provides the adequate space needed to store vast volumes of genomic data for years to come, and make it accessible to institutions. We have three North American data center locations to store replicated data.

Ask our experts how Net2Vault can help your institution safely store the data you need to continue your research's progress.


Net2Vault is a cloud service provider delivering enterprise-level solutions to NetApp customers through native replication. Headquartered in Portland, Oregon, we offer Data Backup, Disaster Recovery, Tier 4 Archive and Managed Services to customers across North America. Our goal is to provide a simple, cost effective solution customizable for any NetApp environment.


© Net2Vault - all rights reserved