Plant Genomes of 2016.

Why Genomes?

Genome sequencing is routine now. Sequencing the A–T, C–G order of base pairs has gotten a lot cheaper and pipelines of software to do it have gotten better too.

However, that doesn’t make it any less important. Before the era of physical (as in the actual basepair sequence structure), geneticists relied on maps based on different markers and defined locations in genomes based on how frequently markers recombined relative to one another in crosses (something not always possible with long-lived or uneasily manipulated species, in other words, non-reference/model organisms like fruit flies, C. elegans, yeast, Arabidopsis thaliana, mice, etc.).

Several more plant genomes (100 some have been sequenced so far) were sequenced in 2016; or even better quality genomes of some plants came out in 2016.

One way to think of it is genetic mining, however, it’s not just gene sequences we get, but also how biology of the organism may work, how it evolved, and how it fits in with other plants at the level of DNA.

Just because they are more routine also doesn’t mean they are *easy* to sequence and assemble. Each genome presents its own challenges, from initial identification of individual(s) to sequence through the final annotation of what is in the genome. Each is often backed up by a large team of scientists, often affiliated with genomics centers like the DOE’s Joint Genome Institute.

Genomes matter because they are the platform on which life is based. Knowing the genome of a few members of a species makes further genetic analyses of other populations easier as well.

Specific locations in genomes, loci, underly specific physical traits, or phenotypes. A sequenced genome can help map loci underlying any measurable physical trait that varies. Alternatively, knowing a genome can also help determine what physical traits are *not* genetic and are shaped by the environment (though most traits are a mix of environment and the genes underlying them). Having that first full genome of a species is a gateway to unlocking and exploring the genetic diversity of entire populations (genomes vary).

The first plant genome sequenced, following on the heels of the human genome, was Arabidopsis thaliana at 125 Megabases (125 million A–T and C–G base pairs)  and has been resequenced many times, there’s even a 1001 Arabidopsis genomes project, cataloging natural variation in Arabidopsis.

Since then many crops have been sequenced: rice, maize, tomato, sweet potato, soy bean, wheat, etc. and increasingly more species that are iconic culturally or keystones ecologically are being sequenced.

Genomes and other global, systems level, datasets are an important platform for innovation, interaction, and informing the future of our planet’s life in terms of agriculture, conservation, and understanding ecological interactions.

This year saw a lot of tree genomes as well as more ecologically important organisms.

Below is the list of plant genomes I found sequenced in 2016 (I may have missed some- and I didn’t include algae/cyanobacteria here). I also write about a few of the insights gained, however it’s important to note that a genome is not an end, but a further opportunity for exploration and further comparison/discovery. One way to think of it is genetic mining, however, it’s not just gene sequences we get, but also how biology of the organism may work, how it evolved, and how it fits in with other plants at the level of DNA.

What we’ve Learned so far from 2016 Plant Genomes

Sugar pine (Pinus lambertiana):

The 31 Gigabases (31 billion base pairs – 10x larger than the human genome, but with around half the annotated genes) allowed a better marker and identification of a candidate gene conferring resistance (called Cr1) to the white pine blister rust disease, making assessing the resistance to the rust in the pine population. Often, genomics projects are motivated to solve a specific problem. Sugar pine is the biggest plant genome (& may be the biggest genome) sequenced to date, and is one of a handful of conifers, among the most ancient seed plants on Earth and so has something to tell us about evolution.

Citation: Stevens, K. A., Wegrzyn, J. L., Zimin, A., Puiu, D., Crepeau, M., Cardeno, C., et al. (2016). Sequence of the Sugar Pine Megagenome. Genetics. doi: 10.1534/genetics.116.193227.

Venus fly trap (Dionaea muscipula):

This isn’t a full genome project, but a transcriptome, or the sequencing all of the expressed genes of the genome. They also did some other large-scale work as well, namely proteomics to get a picture of all the proteins present in a given tissue/time. This is the first ever picture of any genome-wide level data from a carnivorous plant. They compared and identified gene expression in several difference tissues as well. Along with a look at the structure of the plant as well as a close examination of gene expression after trap triggering, the authors discovered that a plant hormone associated with plant defense is correlated with activating carnivory, with the digestion of insect prey after some initial touch signaling that involving transducing an electric current to rapidly close the trap. This paper highlights the power of combining whole-genome studies with physiology and that new sequencing technologies make understanding of non-model organisms feasible.

CitationBemm, F., Becker, D., Larisch, C., Kreuzer, I., Escalante-Perez, M., Schulze, W. X., et al. (2016). Venus flytrap carnivorous lifestyle builds on herbivore defense strategies.Genome Research. doi: 10.1101/gr.202200.115.

Carrot (Daucus carota subsp. carota):

The carrot is familiar to many. It’s an orange tap root crop pulled up from the ground. However, in the wild case, carrot roots are white. And there are many varieties of tap roots, including purple. The orange/yellow in carrots is due to molecules called carotenoids. Carotenoids are what are behind a lot of fall colors; the orange/yellow leaves and their biological role is as light absorbing molecules feeding photons to the photosystems that drive carbon fixation (read: making sugar). They also protect the plant from light damage. In terms of nutrition, carotenoids are the precursors to vitamin A, something we all need in our diet.

It is somewhat odd that roots of carrots, that aren’t typically photosynthetic, accumulate these compounds. Besides getting an entire carrot genome assembly, the authors behind the carrot genome reported in Nature Genetics that they identified a gene that may play a major role in the accumulation of carotenoids in carrot tap roots and it seems to work via having the roots undergo at least partial light-induced development (though absent requiring the light; light-mediated development is the default and many plants have imposed a brake on that until light is truly perceived). Again, having a carrot genome is another resource for other plant scientists, deepening our understanding of nature.

Citation: Iorizzo, M., Ellison, S., Senalik, D., Zeng, P., Satapoomin, P., Huang, J., et al. (2016). A high-quality carrot genome assembly provides new insights into carotenoid accumulation and asterid genome evolution. Nature Genetics. doi: 10.1038/ng.3565.

Eelgrass (Zostera marina):

I wrote about this one earlier this year. Find out about “A Flowering Plant Under the sea” and the follow up “Habitat Loss, Climate Change, and the Story of Three Plants“.

Maize (Zea mays):

2016 saw a pre-print at the end of the year publishing a re-sequencing of the maize genome using one of the newer sequencing technologies from PacBio called SMRT sequencing technology that gives sequence of long stretches of DNA (as opposed to short-read sequencing, less than 100bp, that is most typical today). The new assembly gives what is essentially higher resolution of the maize genome. Maize is largely made up of transposon sequences (aka ‘jumping genes’ that Barbara McClintock discovered) that gives it a high level of repetitive sequences, providing a challenge to genome sequencing (how do you know where each repeat belongs and in what order? The longer reads help get around this problem). The authors also compared their new assembly to that of other maize varieties and found a lot of variation between them, underscoring just how varied genomes within a species can be. Even with this new, better assembly, there are still gaps. When genomes are first published, they are often as drafts that improve over time. And this maize example underscores just how hard it can be to say a genome is ever truly complete (but having one is better than not, usually).

Citation: Yinping Jiao, Paul Peluso, Jinghua Shi, Tiffany Liang, Michelle C Stitzer, Bo Wang, Michael Campbell, Joshua C Stein, Xuehong Wei, Chen-Shan Chin, Katherine Guill, Michael Regulski, Sunita Kumari, Andrew Olson, Jonathan Gent, Kevin L Schneider, Thomas K Wolfgruber, Michael May, Nathan Springer, Eric Antoniou, Richard McCombie, Gernot G Presting, Michael McMullen, Jeffrey Ross-Ibarra, R. Kelly Dawe, Alex Hastie, David R Rank, Doreen Ware. The complex sequence landscape of maize revealed by single molecule technologies (2016). BioarXiv 19 Dec. doi: 10.1101/079004.

Ginkgo (Ginkgo biloba):

Gingko is the one living example of a lineage of plants long extinct. They are seed plants, most closely related to cycads and conifers (like the sugar pine). Based on fossil evidence, ginkgo’s have not physically changed in 200 million years. This year, the 10.3 gigabase genome (~3x bigger than the human genome) was sequenced to get an understanding of this hardy plant that is a survivor and one that has become popular world-wide in cities. One factor in the history of evolution is what are known as whole-genome duplications, which is just what it sounds like, a duplication of the entire genome that can split lineages of plants onto new evolutionary paths. Ginkgo shows evidence of two such events, one correlated with a split from conifers,  and the other specific to the ginkgo line. With new genetic material to do things with, gingko has seen an expansion (& maintaining of) many genes involved in defense pathways, perhaps accounting for some of its hardiness as a species, it is built to resist pathogens.

CitationGuan, R., Zhao, Y., Zhang, H., Fan, G., Liu, X., Zhou, W., et al. (2016). Draft genome of the living fossil Ginkgo biloba.GigaScience. doi: 10.1186/s13742-016-0154-1.

peanut (Arachis hypogaea):

Peanuts are legumes and an important nutrition source. The domestic peanut is also a hybrid of two wild ancestors and harbors both of those genomes known as A and B subgenomes, with the A genome deriving from wild relative A. duranensis & the B genome deriving from wild relative A. ipaensis that are highly similar and hard to distinguish if sequencing DNA from a domesticated peanut variety. The peanut genome authors sequenced the wild relatives as well as a domesticated cultivar to get at the domesticated tetraploid genome and identify wild relative features that may be useful to bring into domesticated varieties. They found overall that the wild relatives mapped well onto the domestic tetraploid, with the wild relative of the B genome mapping better than the A genome, suggesting that the B genome origin is from a single line compared to the A genome that may have hybridized multiple times with a B genome plant (there is evidence to suggest humans transported the B genome A. ipaensis to the native range of A. duranensis) accounting for this variation.

CitationBertioli, D. J., Cannon, S. B., Froenicke, L., Huang, G., Farmer, A. D., Cannon, E. K. S., et al. (2016). The genome sequences of Arachis duranensis and Arachis ipaensis, the diploid ancestors of cultivated peanut. Nature Genetics. doi: 10.1038/ng.3517.

Cassava (Manihot esculenta):

Cassava is a plant that is an important calorie source for 500 million people, largely in Africa. However, cassava is originally from South America where most of the diversity of the crop exists. This means that the limited genetic diversity in Africa means less flexibility in optimizing the crop in terms of nutrition traits, yield, and factors like pathogens. In a report in Nature Biotechnology this year, a better draft genome of cassava was reported. They found a past whole genome duplication shared with a relative genus, the rubber tree (see below), and found signatures of domestication programs and hybridizations, intentional and natural. This assembly will improve prospects of the crop to import diversity into this vegetatively propagated crop plant.

Citation: Bredeson, J. V., Lyons, J. B., Prochnik, S. E., Wu, G. A., Ha, C. M., Edsinger-Gonzales, E., et al. (2016). Sequencing wild and cultivated cassava and related species reveals extensive interspecific hybridization and genetic diversity. Nature Biotechnology. doi: 10.1038/nbt.3535.

Rubber tree (Hevea  brasiliensis):

Rubber has only really been important commercially to people for a few hundred years. There just wasn’t much need for it before the era of needing tires, hoses, pencil erasers, etc. Though likely native populations in South America took advantage of the material in some ways. It’s use and domestication were certainly driven by the industrial revolution. Though there are synthetic methods of making rubber from petroleum, natural rubber remains important and may well be a good target for synthetic biology (creating the pathway to make rubber in yeast cells, for instance). However, to do all that requires knowing the genes underlying the laticifers that generate the rubber/latex (in the plant, rubber is a defense mechanism, stimulated by ethylene, a gaseous hormone  know to be involved in defense). However, the laticifer itself was found to have little capacity to make ethylene itself, suggesting that ethylene from other parts of the plant may trigger rubber production (or artificially supplied ethylene). In the Nature Plants paper, the authors improve the quality of the genomic resources already available and carry out global gene expression analyses and show the the genes responsible for rubber biosynthesis are present in greater numbers than in other plants and mostly exist in the same region of the rubber genome.

Citation: Tang, C., Yang, M., Fang, Y., Luo, Y., Gao, S., Xiao, X., et al. (2016). The rubber tree genome reveals new insights into rubber production and species adaptation. Nature Plants. doi: 10.1038/nplants.2016.73.

Ash Tree (Fraxinus excelsior):

The European Ash is facing die-back due to a fungal disease ravaging Europe. Scientists sequenced one genome with low heterozygosity (less variation w/in the genome as all plants have two copies of their genome, one from each parent) to get a better assembly of the genome; with a highly varied genome, aligning mismatches that really do match (for example, a gene that is in the same location in the genome can have vary in sequence which can underlie functional differences of that gene) is harder. This genome and sequencing of a population of ash trees throughout Europe revealed that English ash populations may have more resistance in some regions than in continental Europe. This story is similar to the Sugar pine story, though in this case the authors also identified compounds exuded in resistant trees that may mediate resistance to the fungus, though they identify a potential trade off between resistance to herbivores and resisting the fungus, which presents a real challenge to identify trees protected from both threats. They also found in comparison to other genomes, 1/4 of the genes are unique to ash. Again, underscoring the amount of genetic diversity out there we are simply not aware of.

CitationSollars, E. S. A., Harper, A. L., Kelly, L. J., Sambles, C. M., Ramirez-Gonzalez, R. H., Swarbreck, D., et al. (2016). Genome sequence and genetic diversity of European ash trees. Nature. doi: 10.1038/nature20786.

Almond & Peach (Prunus dulcis P. persica):

Though not entirely new genomic sequencing of either tree, a paper studying the divergence and domestication of almond and peach trees (in West and East Asia, respectively), the authors did re-sequence both genomes of several different cultivars of each to compare population structure. They found some overlap in the genomic regions selected for during domestication in both species. However, they noted that for peach, there didn’t find signs that traits related to the peach fruit itself were selected during domestication, but occurred naturally long before humans were around. This study highlights the power of genomics for determining the history of life as well as providing a resource for breeders in the future.

Citation: Velasco, D., Hough, J., Aradhya, M., & Ross-Ibarra, J. (2016). Evolutionary Genomics of Peach and Almond Domestication. G3: Genes|Genomes|Genetics. doi: 10.1534/g3.116.032672.

Japanese Morning Glory (Ipomoea nil):

Scientists generated the highest quality genome of the culturally important Japanese morning glory, a plant in the nightshade family that includes potatoes, tomatoes, sweet potatoes, and many other plants. This plant is used in education in Japan and has a lot of genetic resources available in terms of a genetic map. The authors also identified the CT gene, that confers dwarfing, to be a gene involved in the growth promoting Brassinosteroid pathway (w/o these, plants are often dwarf). The authors also compare this genome to other related genomes and provide a map of transposons (that can be used to induce changes in genes if they are activated and jump around the genome).

Citation: Hoshino, A., Jayakumar, V., Nitasaka, E., Toyoda, A., Noguchi, H., Itoh, T., et al. (2016). Genome sequence and analysis of the Japanese morning glory Ipomoea nil. Nature communications. doi: 10.1038/ncomms13295.

Olive tree (Olea europa):

A 1200 year old olive tree named Santander 1200 year old olive tree named Santander has its genome sequenced in 2016. And knowing the genome of olive is important in trying to find solutions to a devastating bacterial disease ravaging the olive industry in Italy currently. Perhaps of note here, is an instance of a small genome having an apparent large number of genes (1.38 billion base pairs with predicted 56,000 protein coding genes). Genome size does not correlate with gene number necessarily and complexity is not necessarily predicted by gene number.

CitationCruz, F., Julca, I., Gómez-Garrido, J., Loska, D., Marcet-Houben, M., Cano, E., et al. (2016). Genome sequence of the olive tree, Olea europaea. GigaScience. doi: 10.1186/s13742-016-0134-5.

Jujubee (Ziziphus jujuba):

Jujubee is an important fruit in China with 7,000 years of domestication history. One reason to get genome level information is to work out the self-incompatibility mechanisms in Jujubee. Many flowering plants prevent self-fertilization with molecular mechanisms of self-incompatibility that is coded in genes. If two genes of the same S-locus or other incompatible type come together, no fertilization occurs. Thus outcrossing is favored. This is an issue in the jujubee cultivation industry. The authors also explored the history of the jujubee domestication and selection of flavor and acidity in the fruit, providing population resources for future breeding.

Citation: Huang, J., Zhang, C., Zhao, X., Fei, Z., Wan, K., Zhang, Z., et al. (2016). The Jujube Genome Provides Insights into Genome Evolution and the Domestication of Sweetness/Acidity Taste in Fruit Trees. PLoS genetics. doi: 10.1371/journal.pgen.1006433.

In progress– Bauhinia (Bauhinia blakeana):

A project in Hong Kong involving the biggest genomics institute in the world, BGI, scientists at a local university, and citizens are participating in sequencing the symbol of Hong Kong (on the flag and money there as well as 25,000 trees around the city) got under way in 2016. The Bauhinia project can be followed via their websiteBauhinia blakeana is a sterile hybrid of two wild species, also in the Bauhinia genus. Initially it was discovered in the wild, though no wild hybrids have been seen in a long time. One thing citizens have been asked to look for is a B. blakeana tree that produces seed pods, meaning that life found a way to break the sterility and produce new seeds. (right now, the plant is propagated via cuttings and likely has a low genetic diversity, making it more potentially susceptible to diseases).

In progress – Joshua Tree (Yucca brevifolia):

The Joshua tree genome project was funded in 2016 and is under way. This is a project that brings together molecular biologists as well as ecologists studying the Joshua Tree population that is under threat from climate change. They are ultimately seeking variants that can tolerate higher temperatures and be more resilient in the face of a warmer desert. Joshua trees are iconic and are important habitat for many species in the desert.

The era of genomes

I’m sure there are a lot more genomes out there that will join the club in a2016. There are already a lot of plants that are represented in sequence databases in some form, not necessarily whole genome level, however that will get more common as sequencing costs continue to drop and computing power continues to catch up to the needs of the life sciences community. The limiting factor in genomics may be in training people to process and know how to use all the data that they generate, which is no small task. However, there are resources that are making genomics and genomic experiments more accessible, like the new easy GWAS (Genome Wide Association) tool recently published in Plant Cell.

Genomes and other global, systems level, datasets are an important platform for innovation, interaction, and informing the future of our planet’s life in terms of agriculture, conservation, and understanding ecological interactions.

Header image is from Anna Atkins, an early photography pioneer and is from the digital collections of the New York Public Library. I chose it because it’s an alga that resembles a phylogenetic tree, something that genomes help illuminate.

And I’ll update this post with more images


5 thoughts on “Plant Genomes of 2016.

  1. The genome of Trillium sp. is over 50 Gb in size. I do not think the entire genome has been sequenced yet, but it is certainly larger than that of the sugar pine, although 31 Gb is huge!

    Like

    1. Quite possible! There are larger genomes out there just didn’t find any plants that had been sequenced that are bigger than sugar pine which is the author’s claim, it’s the current largest whole genome assembly done. There is a, I think, single cell organism that has an estimated genome size of something like 600 Gb, but hasn’t been sequenced & I think the claim of the genome size might be a bit controversial..

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s