Big data, and its derived uses, are affecting the way we do business in every part of the world, from start-ups to Fortune 500 enterprises.1 Regardless of what field you operate in, or the size of the business, data collection, data analytics, and the understanding of that data has become more accessible, with wide-reaching impacts. In the technology-enabled world we live in, the variety of data-producing platforms now offer business insight; be it from a website, social media, or online shopping, and can be used to improve business processes and interaction. The impact of big data is not confined to commercial sectors; it is also improving how genetic research is done.
‘Big data’ is defined as large data sets that are too vast or complicated to be processed by traditional data applications.2 Businesses depend on storage and processing power, as well as robust data analytics and skills, to harvest the value from these large datasets. The value produced by big data means it is utilised in almost every industry all over the world, including the healthcare, biomedical research, and genetic research sectors.3
Big data and genetic research
Technological advancements have enabled scientists to quickly create, store and analyse data that, until recently, would have taken years to compile.4 New biomedical techniques, such as next-generation genome sequencing, are generating large volumes of data and leading to scientific breakthroughs, but researchers are struggling to keep up with the masses of data.5 For example, the National Institutes of Health started the Big Data to Knowledge (BD2K) Initiative and the ‘Precision Medicine Initiative’, with a view to developing genetically guided treatment with personalised, precision medicine for improved preventive, early detection, and treatment of common complex diseases.6 They aim to do this by gathering and linking the electronic health records and data of a group of one million Americans in order to categorise and capture entire genome sequences, cell populations, proteins, metabolites, RNA, DNA as well as behavioural data. That’s a lot of data. The practical application of data mining in genetic research is vast, but translating big data into useful insight that can be used for research and innovation is the main challenge.7
In today’s fast-changing, big data-fueled world, being a genetic researcher means working with algorithms that process big data in genetics, and data processing software. Dr Anne Corcoran, Group Leader Babraham Institute at Cambridge says, “When I started hiring PhD students 15 years ago, they were entirely wet lab. Now when we recruit them, the first thing we look for is if they can cope with complex bioinformatic analysis.”8
Machine learning (ML) is where data-analytical techniques are applied to multi-dimensional dataset so that predictive models can be built and insights gained from the data.9 ML helps scientists study and understand complex cellular systems, such as genome or gene editing, and allows them to create models that learn from big data sets and generate predictable outcomes.
What is CRISPR?
Genome or gene editing is a cluster of technologies that allow scientists to change an organism’s DNA by adding, removing, or altering genetic material at specific locations within the genome.10 There are several approaches to gene editing, among which is CRISPR-Cas9, which stands for clustered regularly interspaced short palindromic repeats, and CRISPR-associated protein 9.11
CRISPR is a specialised region in the deoxyribonucleic acid (DNA) strand with two unique characteristics:12
- The presence of nucleotide repeats. The repeated sequence of nucleotides are evident throughout a CRISPR region.
- The presence of spacers. Spacers are pieces of DNA that occur among these repeated sequences. Bacteria takes their spaces from viruses that have attacked the organism previously, and this allows the bacteria to recognise the virus DNA and defend itself from future attacks of that virus.
The CRISPR-Cas9 system works much the same way. Genetic researchers create a small piece of ribonucleic acid (RNA) with a “guide” sequence that binds to a target sequence of DNA in a genome, as well as the Cas9 enzyme, and is used to recognise the DNA sequence.13 The Cas9 enzyme then snips the DNA at the specified location, allowing researchers to utilise the cell’s own DNA repair mechanism to add, delete or alter pieces of genetic material.
The CRISPR-Cas9 gene editing technology is proving popular in the genetic research community, and amongst scientists at large, as it provides a more affordable, accurate and efficient means of genetic editing than other genome editing technologies.14
Big data in genetic research with CRISPR
Genetic research and data genetics are working together to create advances in science’s understanding of diseases. The large volumes of data that are now available to scientists, and that ML and data technology are able to process, is accelerating the development of new drugs and personalised therapies. This is showcased by the rise of customised treatment based on an individual’s unique genetic profile.15
Big data also provides healthcare professionals with access to the information needed to prescribe doses that are tailored to each patient, reducing the risk of side effects and drug resistance. The prohibitive cost of this personalised-medicine approach has resulted in widespread resistance to its uptake.16 However, the costs of sequencing and genome editing technologies, such as CRISPR, is consistently dropping. CRISPR’s ability to edit genomes and DNA cost-effectively means tailored solutions will become more affordable to develop and produce, and more accessible to the public.
Examples of how big data is changing genetic research
The uptake of CRISPR amongst researchers is growing around the world. Associate Professor Richard Kandasamy at the Norwegian University of Science and Technology’s (NTN) Centre of Molecular Inflammation Research (CEMIR) has conducted research on the inflammatory reactions that frequently take place in a multitude of diseases.17 With the use of big data, large computing systems, and CRISPR, Kandasamy has combined modern technologies with more traditional genetic mapping, and has identified a minute-by-minute playbook of what takes place when the immune system reacts to the presence of a virus within a cell.
In America, CropsOS has combined CRISPR genome editing nucleases with big data and machine learning to optimise innovation in the agricultural sector and give decision-making information to plant researchers.18 Using CRISPR technology and machine learning-based predictive analytics, they have created a genome editing system that allows for the improvement of plant properties such as flavour, nutrient density, and sustainability. This effectively mitigates the typically high research and development costs that have restricted advanced genomic innovation to a small selection of researchers in the past.
Big data is impacting on every field of science, from social and political, to genetics and personalised medicine. Big data offers ongoing opportunities to genetic research, and its associated sciences. Genetic researchers should retain an agile approach to big data, and stay up to date on advances in data analysis tools and open resources. This way they will leverage the true value of real-time big data for actionable genetic research decision-making.
- 1 (Mar, 2018). ‘How is big data changing the business landscape?’. Retrieved from Forbes.
- 2 (Nd). ‘Definition: what is big data?’. Retrieved from Oracle. Accessed 15 May 2019
- 3 Hulsen, T. et al. (Mar, 2019). ‘From big data to precision medicine’. Retrieved from Frontiers.
- 4 Hulsen, T. et al. (Mar, 2019). ‘From big data to precision medicine’. Retrieved from Frontiers.
- 5 Chivers, T. (Oct, 2018). ‘How big data is changing genetic research’. Retrieved from Digg.
- 6 (Apr, 2019). ‘Big data to knowledge’. Retrieved from NIH.
- 7 Mouratidis, Y. (Apr, 2019). ‘AI unlocks the mysteries of clinical data’. Retrieved from Forbes.
- 8 Mosaic. (Oct, 2018). ‘How big data is changing science’. Retrieved from Medium.
- 9 Camacho, D. et al. (Jun, 2018). ‘Next-generation machine learning for biological networks’. Retrieved from NCBI.
- 10 (Apr, 2019). ‘What are genome editing and CRISPR-Cas9?’. Retrieved from GHR.
- 11 (Apr, 2019). ‘What are genome editing and CRISPR-Cas9?’. Retrieved from GHR.
- 12 Vidyasagar, A. (Apr, 2018). ‘What is CRISPR?’. Retrieved from LiveScience.
- 13 (Apr, 2019). ‘What are genome editing and CRISPR-Cas9?’. Retrieved from GHR.
- 14 (Apr, 2019). ‘What are genome editing and CRISPR-Cas9?’. Retrieved from GHR.
- 15 (Nd). ‘Genomics and big data – unlocking the code to new therapies’. Retrieved from Sandoz. Accessed 22 April 2019
- 16 (Nd). ‘Genomics and big data – unlocking the code to new therapies’. Retrieved from Sandoz. Accessed 22 April 2019
- 17 Midling, A. (Jan, 2017). ‘Using big data to understand immune system responses’. Retrieved from Physorg.
- 18 Sterling, J. (Oct, 2017). ‘Fully enabling genome-editing system for crop improvement launched’. Retrieved from GenEngNews.