[dropcap style=”1″ size=”3″]RH[/dropcap]Ramesh Hariharan: We are at the threshold of a major revolution in health care: thanks to two decades of explosive research in tools and techniques that interrogate living cells at the molecular level, doctors will soon have an invaluable tool added to their arsenal to help diagnose and cure disease, i.e., the genome of the patient. Several success stories have already emerged, for instance, a little boy who needed several futile operations before sequencing his genome indicated a defect in the immune system, which was then solved with a blood transplant.
The genome and its associated paraphernalia is quite large and that naturally calls for Big Data techniques to manage and deliver genomic information to clinicians, consumers, and researchers. To just give you a feel, sequencing machines generate upwards of 150GB of compressed data for a single individual and analysing this data is equivalent to sifting through 30 finely shredded copies of a 200,000 page telephone directory![quote style=”2″]The next few years will see the translation of all the above from research lab to hospital and impact all our lives eventually. The goal of this session will be to introduce attendees to this area and share the excitement that the next few years hold in store.[/quote]
AIM: Could you provide some insights into the award winning Avadis® platform by Strand Life Sciences?
RH: Avadis was conceptualized as a data analytical platform for data resulting from Life Sciences experiments. It specializes at making large amounts of data visually accessible to scientists and at bringing together analytical methods with knowledge mined from literature for effective discovery. Over the last several years, Avadis has fuelled several thousand discoveries published in scientific literature and is the leading platform worldwide on this score.
AIM: How was Strand Life Sciences incepted, how has it evolved over years and what is the next step?
RH: Strand was conceived in 2000 by 4 founders, all faculty at the Indian Institute of Science. Strand’s vision was to bring together the best of Computer Science and the best of Biology to further research in understanding how life works, and in using this research for better health. In the last 10 years, Strand has achieved a leading position worldwide providing analytical tools for biological research. The next 5 years will see Strand broaden its activities towards clinical applications, i.e., applying genomics knowledge to avoid or better treat diseases.
AIM: Though you would be speaking about it in your talk, can you briefly provide an overview of techniques in handling large volumes of genomic data?[pullquote align=”left”]Now imagine a situation where costs drop down to the point that millions of people can get their genomes sequenced. This is many many Petabytes of data and huge amounts of computations. To handle these fast, better algorithms and better parallel computing paradigms are key.[/pullquote]RH: Today, you can walk into our labs as get your whole genome sequenced for about 10 lakhs. This will result in 150 Gigabytes of data. This data has to be stitched together much like a jigsaw puzzle, a task which take several tens of hours to run. Then unique features in this data are identified and interpreted based on a wealth of knowledge that exists in scientific literature.
AIM: A brief overview of how genomic measurement has evolved over time?
RH: The microscope was invented in the 1500’s. It allowed us to see microbes and cells and some cellular features like the nucleus but not much more. In the 1800s, chemists figured out the nucelus contained a substance rich in phosphorus and nitrogen, and simultaneously, great observations made by Mendel and Darwin showed that this material was the basis for heredity and for individual variations.
It wasn’t until the 1950’s that the gross structure of this DNA was determined, and even much later in the 1970s that the program encoded by this DNA could be read, albeit on a small scale. Continuous improvements lead to the whole human genome being read in 2002. This was DNA pooled from 5 individuals and sequencing took hundreds of millions of dollars; this clearly didn’t scale to sequencing individuals. Further technology improvements over the last few years have brought the cost down to a few thousand dollars now (and everyone expects the cost to be below $1000 soon), making it now possible to sequence large numbers of individuals and study individual variations.
AIM: Would you like to share any example of a data driven insight that converted to huge success story in this area?
RH: A few recent examples
There is disease called CHIME disease characterized by holes in the eye, scaling of skin, mental retardation, and ear anomalies or epilepsy. Researchers used our tools to sequence individuals with this disease and identify the causative mutation in the PIGL gene. In each case, both parents had one copy of the mutation (so they were carriers), which the affected patients had two copies, one from each parent. Now that the causative mutation is known, it is possible to screen couples who are carriers so this disease can effectively be abolished.
Another example is a young boy with a digestive system disorder who couldn’t tolerate whatever he ate. A 100 operations were done on him in vain. Finally, they sequenced his genome, found an unusual mutation in the XIAP gene which pointed to an immune system problem; this suggested a cord blood transplant, which worked and the boy is now steady and growing up healthy.
AIM: What challenges have handling genome data posed for your team?
RH: Multi-disciplinarity is a challenge: handling genome data requires really skilled Computer Scientists as well as good biologists. It requires making the right algorithmic choices, the right hardware platform choices, and the right methods to interpret results. This being an emerging and highly dynamic area, one has to evolve with the field and learn at a fast rate. All of these are challenges that make this area very lively.
[spoiler title=”Biography of Ramesh Hariharan” open=”0″ style=”2″]
Dr. Ramesh Hariharan is an academic entrepreneur responsible for the software-based technology development and implementation at Strand Life Sciences. He is also the chief architect for all of Strand’s products, including the award winning Avadis® platform. He is a recipient of the TR100 Award (2002) of Young Innovators by MIT’s Technology Review Magazine and in 2003 received the Global Indus Technovator Award from MIT, instituted to recognize the top 20 Indian technology innovators worldwide.
Ramesh is an IIT – Delhi Computer Science alumnus, has a Ph.D. in Computer Science from the Courant Institute of Mathematical Science, New York University and a postdoctoral degree in Computer Science from the Max Planck Institute, Saarbrücken, Germany. His research interests are in sequence analysis, string algorithms, computational biology, computational geometry and foundations of computing. He was on the faculty of the Computer Science Department of the Indian Institute of Science from 1995-2005, and currently serves as adjunct faculty there.[/spoiler]
Try deep learning using MATLAB