Mar 28, 2022

Read Time IconRead time: 4 mins

How Big Data is Changing Genetic Research

Big data is affecting the way we do business across the globe. The digital era has enabled enterprises to gather and integrate data so that it can be used to inform business decisions, as well as artificial intelligence (AI).1 Regardless of what field you operate in – or the size of your business – data collection, analytics, and a basic understanding has become more accessible. In the technology-enabled world, the variety of data-producing platforms now offer vast business insight. Be it from a website, social media, or online shopping, data can be used to improve business processes and strategies.

The impact of big data is not confined to commercial sectors, it’s also improving how genetic data analysis is done.

Big data is defined as large datasets that are too vast or complicated to be processed by traditional data applications.2 Businesses depend on storage and processing power, as well as robust data analytics and skills, to harvest the value from these large datasets. The insights produced by big data are especially useful for industries that deal with large amounts of information, such as healthcare, biomedical research, and genetic research sectors.3

Big data and genetic research

Technological advancements have enabled scientists to quickly create, store, and analyze data that, until recently, would have taken years to compile and interpret.4 New biomedical techniques, such as next-generation genome sequencing, are generating large volumes of data and leading to scientific breakthroughs. However, a considerable hurdle to big data in medicine is both its volume and the sources from which it’s derived – which are numerous.5 For example, precision medicine is a novel approach to healthcare that uses personal information, including genetic, environmental, and lifestyle data, to help prevent, diagnose, and treat common and complex diseases.6 Scientists have attempted to do this in a study aimed at gathering and linking the electronic health records and data of one million Americans, in order to categorize and capture entire genome sequences, cell populations, proteins, metabolites, ribonucleic acid (RNA), deoxyribonucleic acid (DNA), as well as behavioral data. That’s a lot of data. The practical application of data science in genetic research is vast, but translating big data into useful insight that can be used for research and innovation is a big challenge.7

In today’s fast-changing, big data-fueled world, being a genetic researcher means working with algorithms that process swaths of generic data and managing data processing software. Dr. Anne Corcoran, Group Leader at the Babraham Institute in Cambridge, U.K. says, “When I started hiring Ph.D. students 15 years ago, they were entirely wet lab [specialized laboratories dealing with hazardous substances]. Now when we recruit them, the first thing we look for is if they can cope with complex bioinformatic analysis.”8

Machine learning (ML), a part of AI, also has a place in genetic research. It uses data-analysis techniques that are applied to multi-dimensional datasets so that predictive models can be built and insights gained.9 ML helps scientists study and understand complex cellular systems, such as genome or gene editing, and allows them to create models that learn to extrapolate information from big datasets and generate predictable outcomes.

What is gene editing?

Genome or gene editing is a cluster of technologies that allow scientists to change an organism’s DNA by adding, removing, or altering genetic material at specific locations within the genome.

There are several reasons for doing so, from improving understanding of how genes function to developing methods to treat genetic and acquired diseases. Genome editing can correct, introduce, or delete just about any DNA sequences in several cells and organisms.10

What is CRISPR?

There are several approaches to gene editing, among which is CRISPR-Cas9 (which stands for clustered regularly interspaced short palindromic repeats, and CRISPR-associated protein 9).11

CRISPR is a specialized region in the DNA strand with two unique characteristics:12

  1. The presence of nucleotide repeats. The repeated sequence of nucleotides13 (the basic building blocks of nucleic acids, which make DNA) are evident throughout a CRISPR region.
  2. The presence of spacers. Spacers are pieces of DNA that occur among these repeated sequences. Bacteria take their spaces from viruses that have attacked the organism previously, and this allows the bacteria to recognize the virus DNA and defend itself from future attacks of that virus.

The CRISPR-Cas9 system works much the same way. Genetic researchers create a small piece of RNA with a “guide” sequence that binds to a target sequence of DNA in a genome, as well as the Cas9 enzyme, and is used to recognize the DNA sequence.14 The Cas9 enzyme then snips the DNA at the specified location, allowing researchers to utilize the cell’s own DNA repair mechanism to add, delete, or alter pieces of genetic material.

The CRISPR-Cas9 gene-editing technology is proving popular in the genetic research community and amongst scientists at large, as it provides a more affordable, accurate, and efficient means of genetic editing than other genome editing technologies.15

Big data in genetic research with CRISPR

Genetic research and data genetics are working together to create advances in science’s understanding of diseases. The large volumes of data that are now available to scientists, as well as technologies, are accelerating the development of new drugs and personalized therapies. This is showcased by the rise of customized treatments based on an individual’s unique genetic profile.16

Big data also provides healthcare professionals with access to the information needed to prescribe doses that are tailored to each patient, reducing the risk of side effects and drug resistance. This personalized-medicine approach has a prohibitive cost though and has resulted in widespread resistance to its uptake.17 However, the costs of sequencing and genome editing technologies, such as CRISPR, are consistently dropping. CRISPR’s ability to edit genomes and DNA cost-effectively means tailored solutions will become more affordable to develop and produce, and more accessible to the public.

Examples of how big data is changing genetic research

The uptake of CRISPR amongst researchers is growing around the world. For instance, cancer researchers have applied CRISPR as a mainstream methodology in many cancer biology studies and has been extended into trials with human subjects.18 With the use of big data, large computing systems, and CRISPR, modern technologies now combine with more traditional genetic mapping and are able to identify a minute-by-minute playbook of what takes place when the immune system reacts to the presence of a virus within a cell.

In America, agricultural scientists have combined CRISPR genome editing nucleases with big data and ML to optimize innovation in the agricultural sector.19 Using CRISPR technology and machine learning-based predictive analytics, they have created a genome editing system that allows for the improvement of plant properties such as flavor and nutrient density. This effectively mitigates the typically high research and development costs that have restricted advanced genomic innovation to a small selection of researchers in the past.

CRISPR is proving to be a revolutionary technology, with the power to treat cancer and enable food security. But in order to capture its market potential, you’ll need to understand the possibilities it brings to genetic engineering.

Unlock your understanding of CRISPR and gene-editing applications with this online short course

  • 1 Ku, L. (Oct, 2021). ‘The impact of big data in business’. Retrieved from PlugandPlay.
  • 2 (Nd). ‘What is Big Data?’. Retrieved from Oracle. Accessed March 14, 2022.
  • 3 (Mar, 2021). ‘Applications and examples of big data in healthcare’. Retrieved from Touro College Illinois.
  • 4 (Mar, 2021). ‘Applications and examples of big data in healthcare’. Retrieved from Touro College Illinois.
  • 5 Durcevic, S. (Oct, 2020). ‘18 examples of big data analytics in healthcare that can save people’. Retrieved from Data Pine.
  • 6 Doxzen, K., et al. (Jan, 2022). ‘Advancing Precision Medicine Through Agile Governance’. Retrieved from Brookings.
  • 7 Stedman, C. (Feb, 2022). ‘The ultimate guide to big data for businesses’. Retrieved from TechTarget.
  • 8 Chivers, T. (Oct, 2018). ‘How big data is changing science’. Retrieved from Medium.
  • 9 Lecca, P. (Sep, 2021). ‘Machine learning for causal inference in biological networks: Perspectives of this challenge’. Retrieved from Frontiers in Bioinformatics.
  • 10 (Nd). ‘Gene editing – Digital media kit’. Retrieved from National Institutes of Health. Accessed March 14, 2022.
  • 11 (Sep, 2020). ‘What are genome editing and CRISPR-Cas9?’. Retrieved from Medline Plus.
  • 12 Vidyasagar, A. & Lanese, N. (Oct, 2021). ‘What is CRISPR’. Retrieved from Live Science.
  • 13 (Nd). ‘Nucleotide’. Retrieved from National Human Genome Research Institute. Accessed March 16, 2022.
  • 14 (Sep, 2020). ‘What are genome editing and CRISPR-Cas9?’. Retrieved from Medline Plus.
  • 15 (Sep, 2020). ‘What are genome editing and CRISPR-Cas9?’. Retrieved from Medline Plus.
  • 16 Kent, J. (Mar, 2021). ‘Exploring the intersection of genomic data and AI in healthcare’. Retrieved from HealthITAnalytics.
  • 17 Kent, J. (Mar, 2021). ‘Exploring the intersection of genomic data and AI in healthcare’. Retrieved from HealthITAnalytics.
  • 18 (Jul, 2020). ‘How CRISPR is changing cancer research and treatment’. Retrieved from National Cancer Institute.
  • 19 (Jan, 2021). ‘Plant genome editing expanded with newly engineered variant of CRISPR-Cas9’. Retrieved from Science Daily.