INFORMATION ABOUT PROJECT,
SUPPORTED BY RUSSIAN SCIENCE FOUNDATION

The information is prepared on the basis of data from the information-analytical system RSF, informative part is represented in the author's edition. All rights belong to the authors, the use or reprinting of materials is permitted only with the prior consent of the authors.

 

COMMON PART


Project Number17-14-01138

Project titleAdvancing Felidae genomics to larger scale

Project LeadO'Brien Stephen

AffiliationFederal State Budgetary Educational Institution of Higher Education "Saint-Petersburg State University",

Implementation period 2017 - 2019 

Research area 04 - BIOLOGY AND LIFE SCIENCES, 04-201 - Structural, functional, and evolutionary genomics

Keywordsgenomics, Felidae genomics, comparative genomics, the next-generation sequencing technologies, genome assembly, genome annotation


 

PROJECT CONTENT


Annotation
Domestic cats enjoy an extensive veterinary medical surveillance which has resulted in the description of nearly 250 genetic diseases analogous to human disorders. Feline infectious agents offer powerful natural models of deadly human diseases, which include feline immunodeficiency virus, feline sarcoma virus and feline leukemia virus. A rich veterinary literature of feline disease pathogenesis and the demonstration of a highly conserved ancestral mammalian genome organization make the annotated cat genome a highly informative resource that facilitates multifaceted biomedical,physiological and evolutionary research endeavors. Recently, we reported a preliminary reannotation of the whole genome sequence of Cinnamon, a domestic cat living in Columbia (MO, USA), bisulfite sequencing and shotgun sequencing of Boris, a male cat from St. Petersburg (Russia). Since then we have collected, sequenced and assembled 12 other Felidae genomes in collaboration with other groups across the world. The current reference cat genome assembly remains highly fragmented, containing a multitude of gaps, ambiguities and assembly errors that make it difficult to produce a high-quality annotation of genomic features for the cat genome and the genomes of other Felidae species. Recent advances in next-generation sequencing technologies include mate-pair scaffolding, long-read scaffolding, compartmentalized shearing and barcoding, chromatin interaction mapping (Hi-C), and optical mapping. Together, these analytical technologies can produce the most contiguous de novo mammalian assemblies to date, with chromosome-length scaffolds and a very small number of remaining gaps mostly located in constitutive heterochromatin regions. Our proposal contains three principal parts. First, we propose to apply current state-of-the-art techniques to de novo sequencing and assembly of several Felidae species. We will use long sequence reads for contig formation, short reads for consensus validation, and optical and chromatin interaction mapping for scaffolding. Based on proven prior applications , we suggest, that these combined technologies will produce a highly contiguous de novo chromosome-level assembly and robust physical map of the domestic cat genome, a vast improvement required for anticipated empirical studies. Second, we shall annotate gene, variants and repeat families of each genome using a state-of-the-art annotation pipeline that was developed in our laboratory for multiple vertebrate genomes from the Genome 10K project. The pipeline will be expanded by adding more annotation features including detailed repeat annotation; gene annotation with the multi-evidence approach based on transcriptome data, protein homology information, gene homology information, and gene synteny information; non-coding RNA annotation; DNA variation annotation; genes under selection; analysis of population demography; DNA methylation annotation; and other features.Third, we will use both the high-quality genome assembly and high-quality genome annotation for comparative study of the domestic cat Felis catus and the genomes of 15 additional species of Felidae. These comparative genome analyses will reveal reliable genome organization characters for a valuable medical model in addition to resolving the explicit genomic determinants that would drive exquisite predatory, digestive, sensory and behavioral adaptations over the brief ten million year adaptive radiation of the Felidae family.

Expected results
To the best of our knowledge, we will be the first in the world who will apply evolutionary analysis approaches to the resolved haploid genomes within a single group of closely related species. Using state-of-the-art genome sequencing and assembly technologies, we are going to address technical problems of assembly and annotation that have been considered either intractable or too expensive to solve previously. With the new sequencing and assembly pipeline, we are going to: 1) obtain chromosome-level assemblies of Felidae genomes from four main lineages (the leopard cat lineage, the domestic cat (Felis) lineage, the Panthera lineage and the Puma lineage); 2) improve linear assembly of introns and exons at the gene level; 3) improve the assembly, annotation and correct ordering of multicopy genes 4) upgrade the assembly of complex regions with gene clusters or other functional elements; 5) improve assembly and localization of long repetitive elements like endogenous retroviruses (ERVs); 6) improve annotation and analysis of genomic regions with low or high rates of variation using haploid chromosomes; and 7) estimate biases in Illumina-only assemblies. With the obtained high-quality genome assemblies, we will be able to address fundamental biological questions including: 1) the evolution of chromosome rearrangements among closely related species using information about chromosome architecture and proximity; 2) transposon evolution and multiplication based on the spatial configuration of DNA; 3) more precise analysis of signatures selection (positive, negative and neutral) using more accurate gene assemblies and annotations. As previous analyses were based on mosaic gene sequences joined from two or more haploid variants, confounding the accurate annotation of multicopy genes by collapsing them, our approach will circumvent this problem through the generation of high quality phased genomes. This will allow a detailed analysis and of multicopy gene families, such as immunoglobulin, T-cell receptor gene cluster, cytokyne and chemokine receptor cluster and the major-histocompatibility complex (MHC), which are involved in disease resistance. Our results will be published in the high impact factor journals and all datasets will be deposited in publicly accessible database and in the Garfield genome browser (http://garfield.dobzhanskycenter.org/).


 

REPORTS


Annotation of the results obtained in 2017
Domestic cats enjoy an extensive veterinary medical surveillance which has resulted in the description of nearly 250 genetic diseases analogous to human disorders. Feline infectious agents offer powerful natural models of deadly human diseases, which include feline immunodeficiency virus, feline sarcoma virus and feline leukemia virus. A rich veterinary literature of feline disease pathogenesis and the demonstration of a highly conserved ancestral mammalian genome organization make the annotated cat genome a highly informative resource that facilitates multifaceted biomedical, physiological and evolutionary research endeavors. Recently we have collected, sequenced and assembled 12 other Felidae genomes in collaboration with other groups across the world. These genomes of 12 Felidae species includes: caracal, cheetah, domestic cat, fishing cat, sand cat, Iberian lynx, jaguar, leopard, leopard cat, lion, puma and tiger. From the four lineages that will be considered for the study, three of them (the lineages of domestic cat, leopard cat and puma) form a group of closely related species while the Panthera lineage is considered as an outgroup relative to them. As we expected during the first year, several groups of feline genetics began to try new sequencing methods as proposed in our project and others. Thus, the first task was to understand how not to do the same work, especially given its cost. After a thorough evaluation of the pros and cons of available in the market offers, as well as analysis of the results obtained in our other genomic projects, we chose the following sequencing strategy for 2017. Initially, we planned to use the PacBio Sequel, but the combination of cost and quality was unacceptable for us. Therefore, we chose a strategy based on the use of 10X Genomics technology in conjunction with maps from BioNano Genomics. In order to accomplish our tasks, we decided to focus on one branch of the Felidae. Puma (Puma concolor), Jaguarundi (Puma yagouaroundi) and African cheetah are representatives of the Puma family (Puma), part of the Felidae family. Despite the fact that currently cheetahs are found only in Africa and Iran, there is ample evidence that representatives of the Puma family arose from one common ancestor who lived in North America during the Miocene. This fact is based on a comparative analysis of molecular markers of mitochondrial and nuclear DNA. The time for the separation of cheetahs, pumas and jaguarundi into individual species is estimated at around 6.7 million years ago. Paleontological traces of the existence of a cheetah can be found in the Americas, Europe and Asia until the late Pleistocene (10,000 - 12,000 years ago). This time roughly corresponds to the period of extinction of the megafauna that affected more than 40 species of large mammals, including cheetahs (Miracinonyx) and pumas in North America. The rich population history of these species and the variety of specialized adaptation mechanisms make all three representatives extremely interesting objects for studying at the genome level, in particular, in connection with the possibility of detecting the signals of these adaptations that have left a mark on the genome. We collected samples and sent them for sequencing, so in the near future the sequence and assembly of two of the three selected species will be ready. One of the most important parts of project implementation is the ambitious goal of sequencing the genomes of all 37 species of Felidae. At the moment, we already have 12 genes assembled and 5 more unassembled. During 2017, we successfully completed the assembly of 3 species of Felidae: caracal, cat-fisher and Bengal cat. Over the next year, thanks to our collaborator and other groups involved in the genomes of felines, we hope to approach 20 of the 37 genomes of the feline. During 2017, we annotated 12 species of Felidae, according to these data we are now preparing an article for publication. Simultaneously with the assembly and annotation of 12 species of Felidae, we continued work on five instruments to facilitate comparative analysis of the cats genomes: 1) Chromosomer: a tool for chromosome assembly according to a close reference; 2) Panthera: a tool for annotating genes with subsequent analysis of the results; 3) Add-ons above the tool for multiple alignment Cactus for analysis of multiple whole genome alignments; 4) Lyrebird: a tool for analyzing the quality of assembling tandem repeats in genomes as the most complex parts of the genome. We are developing our own computer program for searching and annotating genes (Panthera). We have a rich experience in the annotation of genes in various genomes, and, unfortunately, all available tools do not give a satisfactory result for us. We named our program Panthera and use it for annotation of genes both in the genomes of cats and in other our genomic projects. Since we are not just looking for genes, but also conducting further analysis, we have added the possibility of more advanced analysis, which is present in most genomic articles such as: reconstruction of a phylogenetic tree, estimation of the time of species divergence, search for genes under positive selection, analysis of the evolution of gene families (gene expansion and contraction). One of the goal of the project was to work with other groups of scientists around the world dealing with the genomes of cats and conservative biology in general. The results of our joint work on the genome of the Jaguars were published in the journal Science Advances. In the published work, we presented the de novo assembly and the annotation of the jaguar (Panthera onca), the leopard (Panthera pardus) and a comparative analysis covering all living species of the panthers. Demographic reconstructions have shown that in all these species, variable episodes of population decline occur during the Pleistocene, which ultimately leads to small effective population sizes at the present time. We identified multiple traces of species-specific positive selection that affect genes involved in the development of craniofacial bones and limb bones, protein metabolism, hypoxia, reproduction, pigmentation and sensory perception. There was significant overlap in the pathways enriched by genomic segments involved in interspecies introgression and in positive selection. On the basis of the obtained data, we assumed that these processes are connected. We tested this hypothesis by developing an exome probes aimed at ~ 19,000 genes and using them to analyze the genomes of 30 jaguars. We found at least two genes (DOCK3 and COL4A5, both associated with the development of the optic nerve), which bear significant signals of interspecies introgression and intraspecific positive selection. These data indicate that the impurity, after speciation, has introduced genetic material that has contributed to the adaptive evolution of large cat lines (Figueir et al., 2017). Another important issue is the motivation for why we need to generally sequence and assembly the genomes of Felidae. To answer this question, our group published an article in the Journal of Heredity. The article emphasizes that the declining species variation of the wild nature of our planet have become the cause of increasing interest and attempts to preserve them. The use of powerful new genetic sequencing and data analysis technologies for the survival of mammal populations under threat has revolutionized our ability to recognize the underlying dangers that they are threatened. Through the use of genomic data in hundreds of similar studies, we have understood much about survival, adaptation and evolution. We are working in the framework of the grant of the RNF, one of the species we selected is a cheetah, this species is under threat and our work can probably save it and shed light on the long-term history of the problems facing us to preserve this species. Earlier, when analyzing the genome of a cheetah, we found a number of genetically determined problems. We hope that the synthesis of three decades of data, interpretation and contradictions, and our an analysis of the whole genome sequence of cheetahs, provides a convincing account of the relevance of conservation and action to protect this species and other endangered wildlife species (O'Brien et al., 2017). The third article, which will be available in January 2018 in the journal Current Biology, concerns the protection of another species under threat with a similar history - the rhinoceros. Black and white rhinoceroses (Diceros bicornis and Ceratotherium simum) are recognizable African species that are classified by the International Union for Conservation of Nature (IUCN) as endangered. We applied the analysis developed and tested earlier on the cheetah genome first to create a genome on the basis of a close relative genome using the Chromosomer program and then to create a set of probes for genetic typing that can facilitate the task of controlling a population of endangered species. This is a demonstration of the practical application of the methodology developed for cats for another unrelated taxon (Harper et al., 2018).

 

Publications

1. Figueir H.V., Li G., Trindade F.J., Assis J., Pais F., Fernandes G., Santos S.H.D., Hughes G.M., Komissarov A., O’Brien S.J. etc Genome-wide signatures of complex introgression and adaptive evolution in the big cats Science Advances, Vol. 3, no. 7, e1700299 (year - 2017) https://doi.org/10.1126/sciadv.1700299

2. Harper C., Ludwig A., Clarke A., Makgopela K., Yurchenko A., Guthrie A., Dobrynin P., Tamazian G., Koepfli K.P., Thompson P., O’Brien S. etc. Robust forensic matching of confiscated horns to individual poached African rhinoceros Current Biology, 28, R1–R3 (year - 2018) https://doi.org/10.1016/j.cub.2017.11.005

3. O'Brien S.J., Johnson W.E., Driscoll C.A., Dobrynin P., Marker L. Conservation Genetics of the Cheetah: Lessons Learned and New Opportunities JOURNAL OF HEREDITY, Volume: 108 Issue: 6 Pages: 671-677 (year - 2017) https://doi.org/10.1093/jhered/esx047


Annotation of the results obtained in 2018
Domestic cats enjoy an extensive veterinary medical surveillance which has resulted in the description of nearly 250 genetic diseases analogous to human disorders. Feline infectious agents offer powerful natural models of deadly human diseases, which include feline immunodeficiency virus, feline sarcoma virus and feline leukemia virus. A rich veterinary literature of feline disease pathogenesis and the demonstration of a highly conserved ancestral mammalian genome organisation make the annotated cat genome a highly informative resource that facilitates multifaceted biomedical, physiological and evolutionary research endeavours. Recently we have collected, sequenced and assembled 12 other Felidae genomes in collaboration with other groups across the world. As a result of the work done by research teams, including our laboratory, in 2018 genomes of five Felidae species were added to the 12 genomes that had been already available, resulting in overall 17 sequenced and assembled genomes of 37 Felidae species. These genomes of 12 Felidae species includes: caracal, cheetah, domestic cat, fishing cat, palace cat, sand cat, Iberian lynx, jaguar, leopard, leopard cat, lion, puma, jaguarundi, black-footed cat, bengal cat, canadian lynx, cloud leopard and tiger. From the four lineages that will be considered for the study, three of them (the lineages of domestic cat, leopard cat and puma) form a group of closely related species while the Panthera lineage is considered as an outgroup relative to them. In order to accomplish our tasks, we decided to focus on one lineage of Felidae. Puma (Puma concolor), jaguarundi (Puma yagouaroundi), and African cheetah (Acinonyx jubatus) are representatives of the puma lineage. Despite the fact that currently cheetahs are found only in Africa and Iran, there is ample evidence that representatives of the Puma family arose from one common ancestor who lived in North America during the Miocene. This fact is based on a comparative analysis of molecular markers of mitochondrial and nuclear DNA. The time for the separation of cheetahs, pumas and jaguarundi into individual species is estimated at around 6.7 million years ago. Paleontological traces of the existence of a cheetah can be found in the Americas, Europe and Asia until the late Pleistocene (10,000 - 12,000 years ago). This time roughly corresponds to the period of extinction of the megafauna that affected more than 40 species of large mammals, including cheetahs (Miracinonyx) and pumas in North America. The rich population history of these species and the variety of specialised adaptation mechanisms make all three representatives extremely interesting objects for studying at the genome level, in particular, in connection with the possibility of detecting the signals of these adaptations that have left a mark on the genome. Over the past year, several groups engaged in the genetics of cats, have begun to apply new sequencing methods, both proposed in our project and others. After a comprehensive assessment of the advantages and disadvantages of the offers available on the market, as well as the analysis of the results obtained in our other genomic projects, we chose a sequencing strategy for 2017-2018. Initially we planned to use PacBio Sequel, but the combination of cost and quality turned out to be unacceptable for us. Therefore, we chose a strategy based on the use of 10X Genomics technology in conjunction with the HiC method, instead of the previously planned optical maps. The selected methods allowed us in 2018 to assemble the cheetah genome to a pseudo-chromosome level, in addition to the puma pseudo-chromosomes. For the jaguarundi genome, we obtained biological samples for sequencing using the 10X Genomics method, the results of which were obtained in early December 2018. Thus, at the end of 2018, we have high-quality assemblies for three representatives of the puma lineage. One of the most important parts of the project is the ambitious goal of sequencing and annotating the genomes of all 37 feline species. To date, 17 genomes have been collected. During 2018, we successfully completed the assembly of 2 species of feline: the barchan cat (Felis margarita) and the smoky leopard (Neofelis nebulosa). In 2019, we plan to go even further in collecting biosamples, sequencing, assembly, and annotation of the missing feline genomes. Simultaneously with the assembly and annotation of 17 feline species, we continued to work on four tools to facilitate a comparative analysis of the feline genomes: 1) Chromosomer 2 - a tool for chromosomal assembly according to close reference; 2) Panthera - a tool for gene annotation with subsequent analysis of the results; 3) Add-ons above the tool for multiple alignment Cactus for analysis of multiple whole genome alignments; 4) Lyrebird is a tool for analyzing the build quality of tandem repeats in the genomes as the most complex parts of the genome. One of the goal of the project was to work with other groups of scientists around the world dealing with the genomes of cats and conservative biology in general. The results of our joint work on the genomes of tigers were published in the journal Current Biology. In a published paper, we presented an analysis of 32 sequencing of tiger genomes from six different groups; we showed that the time to the nearest common ancestor of modern tigers is 110 kya; and signals of selection associated with adaptation (Liu et al., 2018). Our other collaboration is devoted to the population genetics of mountain lions. Mountain lions provide a rare opportunity to explore the potential for restoring diversity using population genomics, and to see the long-term effects of translocation. Thus, the results of our joint work provide the basis for analyzing the entire genome of a mountain lions, and the data obtained can be used to control small and isolated populations for conservation purposes (Saremi et al., 2018). Our third joint work concerned adaptive genomic evolution in early mammals. Using integrative comparative genomic and phylogenetic methods over the family of photoreceptual opsin genes in 154 mammals. We have shown that mammals have genomic structures that are consistent with nocturnal origin. The loss of opsins RH2, VA, PARA, PARIE and OPN4x in all mammals led us to propose a hypothesis for a global bottleneck that explains the loss of these genes in the lineage (>> 215.5 million years ago). In addition, the analysis provided strong evidence that ancestral mammals were nocturnal, ultraviolet-sensitive vision, low visual acuity and low orbital convergence (i.e., panoramic vision) (Borges et al., 2018). The fourth article, adopted in 2017, was published in 2018 in the journal Current Biology, concerns the protection of another species under threat with a similar history - the rhinoceros. Black and white rhinoceroses (Diceros bicornis and Ceratotherium simum) are recognizable African species that are classified by the International Union for Conservation of Nature (IUCN) as endangered. We applied the analysis developed and tested earlier on the cheetah genome first to create a genome on the basis of a close relative genome using the Chromosomer program and then to create a set of probes for genetic typing that can facilitate the task of controlling a population of endangered species. This is a demonstration of the practical application of the methodology developed for cats for another unrelated taxon (Harper et al., 2018).

 

Publications

1. Harper C., Ludwig A., Clarke A., Makgopela K., Yurchenko A., Guthrie A., Dobrynin P., Tamazian G., Koepfli K.P., Thompson P., O’Brien S. etc. Robust forensic matching of confiscated horns to individual poached African rhinoceros Current Biology, 28(1):R13-R14 (year - 2018) https://doi.org/10.1016/j.cub.2017.11.005

2. Liu YC, Sun X, Driscoll C, Miquelle DG, Xu X, Martelli P, Uphyrkina O, Smith JLD, O'Brien SJ, Luo SJ Genome-Wide Evolutionary Analysis of Natural History and Adaptation in the World’s Tigers Current Biology, VOLUME 28, ISSUE 23, P3840-3849.E6 (year - 2018) https://doi.org/10.1016/j.cub.2018.09.019

3. Nedda F Saremi, Megan A Supple, Ashley Byrne, James A Cahill, Luiz Lehmann Coutinho, Love Dalen, Henrique V Figueiro, Warren E Johnson, Heather J Milne, Stephen J O'Brien, Brendan O'Connell, et al. Mountain lion genomes provide insights into genetic rescue of inbred populations bioRxiv, - (year - 2018) https://doi.org/10.1101/482315

4. Rui Borges, Warren E. Johnson, Stephen J. O’Brien, Cidália Gomes, Christopher P. Heesy and Agostinho AntunesEmail author Adaptive genomic evolution of opsins reveals that early mammals flourished in nocturnal environments BMC Genomics, 19:121 (year - 2018) https://doi.org/10.1186/s12864-017-4417-8

5. Stephen James O'Brien, Gaik Tamazian, Aleksey Komissarov, Pavel Dobrynin, Ksenia Krasheninnikova, Sergey Kliver, Nikolay Cherkasov, Klaus-Peter Koepfli A Moving Landscape for Comparative Genomics in Mammals Comparative cytogenetics, 12(3):299-360 (year - 2018) https://doi.org/10.3897/CompCytogen.v12i3.27748