By Tannavee Kumar, Genetics & Genomics 20’
Author’s Note: As an undergraduate studying genetics and genomics and computer science, I wanted to interview a former professor to find out the steps he took in order to do computational research in the biological sciences. I was interested in finding out more about the growing field of computational biology and wanted to help shed light on a field to students that may be similarly interested.
Background
You received a Bachelors in Math, then a masters in Biochemistry, then a PhD in Computer Science; did you always know that you wanted to do research in biology? If so, what made you want to start off with a technical education rather than something traditional like biology or chemistry? If not, how did you come to discover applications of computer science in Biology even 15 years ago?
No, I did not start off wanting to do anything related to biology. I started my undergrad thinking I would make computer games. How I kind of got into this was a week before my undergrad, I got an email from my university asking if I wanted to be a part of the first cohort of a bioinformatics program. I initially declined.
As I was looking for my third internship in my co-op program, I had a friend who found a job for a professor in Toronto, and my friend asked if I wanted to work on this cool project about predicting how proteins fold into these 3D structures. I told him I don’t know anything about protein structures, but sure! It was a lot of fun.
Is that what inspired you to pursue Biochemistry for further education?
I think that first internship was very pivotal, because it really nurtured my interest in protein structures. When I finished my undergrad, I was kind of bored of computer science, which is why I thought I would do a PhD studying protein structures.
How do you see life sciences research evolving and progressing in the coming years, given the inclusion of this new field?
In my opinion, we will see more and more blurring of boundaries. In 25 years, there is going to be more undergraduate programs less defined by walls like “life sciences,” “chemistry,” and better recognition that everybody borrows knowledge and skills from many fields. It will be very difficult to do a life sciences degree without learning anything about math or statistics. Similarly, more people in traditional quantitative disciplines will want to take those classes in the life sciences. Essentially there will be fewer walls.
How would undergraduates studying quantitative subjects, like mathematics, statistics, or computer science be made aware of the growing demand for such skills in the Life Sciences?
The classic way to become introduced to such areas would be through coursework, internship with a company, or research with a professor. The last two ways are not very optimal. At the end of the day, in a standard undergraduate program, you have summers, and if you are ambitious you can try to do research during the school year. However, you can only do so many different kinds of internships before you graduate. If you did one every summer of college, and even two before hand, even that would not be sufficient for getting a nice, representative sample for all the things you can work on.
That is where universities need to do a better job of creating opportunities for students to engage with people from industry and research so that they don’t need 4-5 months to figure many essential things out.
Research and Beyond
Can you briefly describe some of the research that your lab does?
We are currently working towards a few different directions. A large project at the moment is studying the genetics of mental disorders and neurodegeneration, for example we look at the genetics of Alzheimer’s disease, schizophrenia, autism, etc. Our main goal is to mechanistically understand how genetic variants associated with these mental conditions modify disease risk. Much of those mechanistic studies currently look at events that happen at the molecular level. This is great and very useful; however, since the majority of the research is geared at the molecular level we don’t have a good understanding of what variants do functionally at the level of the cell. How does it affect the functional properties of the cell, such as neuron electrophysiology? Or, how is the organization of the tissue affected?
Other areas we work on are on building better models to understand how cells are spatially organized in the brain, as well as building models that quantitatively describe cell population behavior. We know that cells behave differently when put into different contexts. It’s of interest to build a model to predict what happens when you put together different kinds of cells in different combinations, orientations, or conditions.
Lastly, a third project being worked on is on the therapeutic end. We are essentially trying to identify the druggable region of the genome. There are a lot of computational problems in trying to determine what is druggable.
How do you think the integration of the computational sciences has shed light on how biological processes are interconnected, and what do they make clear that a molecular approach may not be able to?
In human genetics, computational models play a huge role in hypothesis generation. They do a good job leveraging big data, such as genomics, to prioritize which variants should be tested using molecular approaches, for example, when molecular approaches are costly or too slow to systematically test many variants across a genome. The role of computation is parsing through the many possibilities that you can’t explore molecularly.
For example, a study we worked on four years ago was to try and find a causal variant for obesity. Most human genetic studies only point to a region of the genome where causal variants might hide, but don’t tell you exactly which one is the true causal one. When these regions are big, like hundreds of kilobases long, you need computational tools to identify the precise causal one to test experimentally. In that study, computational tools played the pivotal role of identifying the causal variant that was ultimately tested and shown to drive large changes in obesity risk.
How does computational research like your own lead to the progression of curated care in the health industry?
At a superficial level, in some ways it accelerates some of the biomedical discoveries that are being done today. The obesity study is one example. If you didn’t have the computational resources, you would spend years and years trying to find the right variant. However, we found it relatively quickly with computation.
For healthcare specifically, fields such as machine learning are revolutionizing care today. People from statistics, computer science, and math are working directly with clinicians and hospitals to develop highly accurate ‘digital pathology’ software, they help predict when patients will need to come into the hospital or whether they are high risk for a disease.
Oftentimes, conditions and diseases are misdiagnosed, which leads to inappropriate treatments. How would research in this area begin to remedy this common problem in the healthcare industry?
Most diseases are heterogeneous, which means that a group of people who are diagnosed with the same condition might actually have different underlying conditions, and need different treatments. Many computational approaches based on molecular and clinical data are being developed to identify more homogeneous groups of patients, to help achieve precision medicine. This allows for the most accurate prescription of medication and treatments. This is because these homogenous groups help identify the underlying disease phenotype which means access to better directed medication.
During your time as a PhD student, you also “explored the application of models built from deconvolving gene expression profiles, for personalized medicine.” Can you go more in depth to how these models were built and how it can advance our ability to provide a more accurate prognosis to patients?
During my PhD, we were trying to predict the prognosis of early stage lung cancer patients. If you are diagnosed with early stage lung cancer say stage 1B, clinicians have to make important decisions, such as how much e.g. chemotherapy to give you. If they give you too much, it will get rid of the primary tumor, but you will increase your risk of recurrence. But if you don’t give enough, you don’t get rid of the primary tumor.
Back then, fourteen year ago or so, genome expression profiling was just becoming popular. People were thinking maybe we can predict whether these stage 1B patients were going to be at high risk of recurrence or not. Our motivation for that problem was essentially to build a computational model to predict based on molecular signatures if they should be given extra therapy or not. That in itself is a hard problem. Additionally, before single cell sequencing was available, it was hard to take a sample of a tumor and only sequence the tumor cells. Often times you would have contamination of normal cells that would mess up the signatures you would get. We had to develop a computational method to extract out only the signatures due to tumor cells, and show that once you do that it is much easier to predict prognosis.
Where do you think research will be in the next 10-20 years?
We are going to see a lot more connections across more previously isolated fields. For example, with respect to human psychiatric genetics, a lot of focus right now is at the molecular impact of genetic variants, but in the near future I’d expect there to be much closer integration with clinicians to also study the impact on behavior, and with the experimental biologists to study the impact on brain development and organization.
***Special thanks to Dr. Gerald Quon for this interview