Ongoing Project: The Transposable Elements In The Cannabis Genome By Rahul Bharadwaj

IMG_6036 (1)Transposable elements (TEs) are mobile genetic sequences found in high numbers in particular in eukaryote (organisms that have a nucleus like plants and animals, bacteria are prokaryotes and lack a nucleus) genomes. TEs have different mechanisms to move across a genome. They were first identified more than 50 years ago by Barbara McClintock, who in 1983 was awarded with a Nobel Prize due to their discovery, but still TEs are not fully understood. Many have dismissed TEs as useless or “junk” DNA, however these mysterious elements might turn genes on and off and, might be responsible in regulating nearby genes.

There is a huge variation in genome size among eukaryotes especially plants. Most plants have a similar number of genes but the whole genome size varies in part due to the repetitive content. With the growth of technologies like Next Generation Sequencing, we are able to produce millions of genomic sequences at increasing speeds and decreasing costs, which allow us to also explore the repetitive regions of genomes. A major task lies in annotating these repetitive sequences and identifying their possible regulatory regions.

At CGRI, we’ve been analyzing the repetitive content of the Cannabis Purple Kush genome that was published in 2011 (Van Bakel et al. 2011) and that is available for the public at NCBI.  The genomes of maize and sunflower contain approximately 71.2 and 78 percent of repetitive sequences in their genomes respectively. What we have found so far is that 64.5% of the Cannabis genome is composed of repetitive sequences and so far we’ve been able to annotate only 10 %, which are repeats also present in Arabidopsis (model plant organism for biological studies distantly related to Cannabis) This percentage of repetitive sequences in Cannabis is comparable with that of mulberry (Morus notabilis, 47%), Peach (Prunus persica, 37.14%) and apple (Malus domestica, 48.8%), but higher when compared to Arabidopsis (18.5%) and lower compared to Maize (Zea mays, 71.2%). 

Repetitive elements can evolve at a faster rate than other parts of the genome; hence we can see much variation between the repeats of individuals, particularly when not closely related.  Van Bakel and team found a high degree of similarity between Arabidopsis and Cannabis in the major functional classes of genes, but the repeated regions in both genomes are much divergent when compared to the functional genes.

Mulberry is the closest relative to Cannabis whose genome has been sequenced. Since we know the repeated portions of the genome in both mulberry and Arabidopsis, we are comparing them with the repeats in Cannabis. Out of 31 thousand repeats found in Cannabis only 200 have a match with the repeats of mulberry and only 16 have a match with theArabidopsis repeats. 

In the future, we would like to establish the rates at which repetitive sequences diverge between close relatives (ie.Cannabis and mulberry), to understand the dynamic nature of TEs. Additionally, we are also interested in exploring genes located near TEs, which can shed some light into the regulatory actions of these elements.

H. van Bakel, J. M. Stout, A. G. Cote, C. M. Tallon, A. G. Sharpe, T. R. Hughes, J. E. Page. Genome Biology 12,  (2011).