By Isabella Krzesniak.
INTRODUCTION
John Davis is a 5th year Ph.D. candidate in the Integrative Genetics and Genomics graduate group at UC Davis. He works in the Maloof Lab and uses bioinformatics to analyze genetic variation among native California wildflowers in the Streptanthus clade in different environments and uses data to create gene models.
The project he is working on has two main goals. First, he aims to create genomic resources for Streptanthus clade species through reference genomes and transcriptomes, which can be used to analyze differential gene expression in different individuals. Second, he aims to examine the germination niche of Streptanthus clade species, the conditions required for them to germinate and the gene networks expressed during this life stage.
These models have many applications concerning adaptation in the wake of climate change; for instance, they can help ecologists make informed decisions such as whether a crop will function well in a given region as the climate warms. Davis’ work is part of a collaborative study between the Maloof, Gremer, Strauss, and Schmitt labs in the Department of Plant Biology.
What does your research consist of and what are its potential applications?
We’re looking at how plant populations persist in different environments. So even though it’s wildflowers that are closely related, you can also look at how they differ in terms of survival in different environments. If you have an environment that’s great for one crop, but it’s either getting wetter or hotter, the crop might not survive very well. But if you know which genes it has or how it functions, you can move it to a different location or potentially just bring in a different crop that will function well in that region. From an ecological standpoint, it’s a matter of which species will survive and which ones will die off. Underlying all of it are what genes the plant has.
What work are you doing with the project in particular?
The main thing I’m doing right now is building genomic references. We’re trying to do gene expression studies, but if we don’t know what the genes are, we can’t compare the differences in gene expression. So, one of the things I’m doing is building these reference genomes and transcriptomes to determine which genes are in the species. And then from there, I hope to build gene models, construct coexpression networks, and predict germination based on gene expression profiles. To analyze the data, I use Python, Linux, Excel, and R. Another thing I’m doing is building transcriptomes which are collections of just the genes that are expressed. Then, ideally, my goal would be to develop gene networks that would basically tell us which species have these genes that are needed to survive in these environments and which ones don’t.
Why are you studying Streptanthus in particular and what exactly are you doing as part of the study?
The Streptanthus clade has a well-documented phylogeny of closely-related species. Adding genomic resources will improve our ability to perform genetic analyses.
After seed collection, what steps do you take to analyze your data?
We took our seeds and sent them to a collaborating company where they extracted the RNA and then prepared RNA-seq libraries (where they extracted the RNA and then prepared the data), which were then sent to the UC Davis Genome Center where they were sequenced. and then the Genome Center sends us back the sequence reads. We have those reads, we use those to assemble transcripts and to also do gene expression analysis, where we start relating and making models to compare gene expression to different climate variables like precipitation, temperature, and elevation. They’re all correlated.
What kind of models do you employ for data analysis?
It’s just basic linear models and other types of models. You have your variable, which in our case would be germination proportion, and that is a function of gene expression. Gene expression is affected by temperature, genotype, and precipitation, so it’s just models on models.
Has the project been successful?
We did what we set out to accomplish with the funding. Right now, the final bit of sequencing data is coming in and then we’re actually starting to dive into it and produce actual results.
What are the difficulties of working with plants?
I love genetics and genomics stuff and I just fell into working with plants. Plants are the hardest compared to bacteria and humans. Plant genomes are ridiculous and weird things happen all the time. Humans are diploid–we’re boring. I finished working on a project with Brassica napus. It’s an allotetraploid (having four sets of chromosomes derived from different species), which is a hybrid of two different plants, Brassica rapa and Brassica oleracea, so it has two separate diploid genomes in itself. You have the two genomes that are crossing over with each other through homeologous exchange. So when you’re going try to assemble that genome, you don’t know if it came from the Rapa genome or the Oleracea genome. I think strawberries are up to eight copies of each chromosome, so it makes it a lot harder when you’re trying to find alleles. When you’re doing an experiment where you’re trying to knock out alleles of a genome, you have to knock out every copy in each chromosome. Whereas in humans, you only have to knock out two of them to make it homozygous. But in a strawberry, you have to knock out all six of those mutations. Plants just seem like the hardest of the group. And then you have pine trees where genomes have 22.5 gigabases (20 billion base pairs) and humans only have 3.2 gigabases.
How has extreme weather (wildfires, flooding) over the past years affected the study?
So one of the struggles of our project is that we’re looking at how the climate affects germination, but at the beginning of our project, there were droughts like crazy and wildfires, and that affects the genetics of the population and what survives and what doesn’t.
You’re trying to do all these environmental studies that look at the long-term effects compared to now, but when you’re a grad student on a grant, the grant only lasts four to five years. But, how do you take four to five years of data and project it out decades ahead without having data from decades prior? It just gets difficult when you only have four seasons that you can collect data from, and two of those are on fire and one of them is flooding. None of this seems like normal conditions historically. So it can make it a little bit tough to tease out what’s long-term variation in genetics in response to what’s happening in the environment right now.
What makes ecological, as opposed to transgenic research, difficult?
With our studies, we don’t knock out any genes or use any transgenics. Ours is all ecology. That’s the difficulty of our project. With Arabidopsis (a model organism in plant biology), the genes are pretty much homozygous and it’s a lot easier. In our case, all the seeds are collected in the wild, so they’re going to be heterozygous. We can try to make more of the seed by breeding the greenhouse to expand our seed stock, but we can only do so much since it takes up space to make more seed. The field is always going to be changing too. When you collect seeds from one year, the genetics could be completely different from the genetics of the next year.
Why don’t you use transgenics in your studies?
You don’t want to dive into transgenics (organisms whose genes have been altered) because there’s so much pushback against it. These are all natural California species and you don’t want to put something in the environment that can outcompete the natural population.
We’re trying not to affect the study environment that we’re looking at. When we do seed collections, we don’t take from at-risk populations of the certain species, and when we collect seeds, we only take a percentage of the seeds from each plant. We don’t want to affect the growth for the next season, so ultimately, we’re trying to do the minimum amount of disruption to the environment that we’re studying. We potentially hope to use our results for rehabilitation efforts. We’ll be able to tell which ones need more help to survive and which ones are fine.