Développement de plateformes et d'outils bio-informatiques pour faciliter l'analyse de données épigénomiques et génétiques

 

Pierre-Étienne Jacques

Université de Sherbrooke

 

Domaine : génétique humaine

Programme Chercheurs-boursiers - Junior 2

Concours 2018-2019

Over the last few years, the cost of DNA sequencing technologies has decreased drastically, therefore allowing a high number of life scientists to use these technologies to help address important biological questions. However, the large amounts of data generated by these technologies require the usage of advanced computing infrastructure. Every Canadian researcher can create a free account to access the world-class infrastructure of Compute Canada. However, most of the life scientists are not familiar with the command-line interface required to access this important infrastructure. We specifically designed the Genetics and genomics Analysis Platform (GenAP) to fulfill this need. We are now proposing to enhance the capabilities and increase the usability of GenAP.

We are also contributing to the development of another important platform, targeting the clinical community. Tens of thousands of human genomes can now be sequenced per year just in Canada, mainly from patients diagnosed with many types of disorders. Identifying novel mutations responsible for a genetic disorder often requires thousands of patients that are compared to healthy controls. However, the number of patients suffering from the same disease is usually too low in a single hospital to allow the identification of causal mutations. The Canadian Distributed cyber-Infrastructure for Genomics (CanDIG) platform will maximize the utility of clinical genome sequences by building robust software securely connecting the computational infrastructure and data between hospitals, without moving the data out of the hospital. CanDIG will therefore enable national-scale analysis over locally-controlled data.

We are also developing and applying new and efficient bioinformatics tools to address more specific biological questions. These tools will be applied to use thousands of public high quality epigenomic datasets to help the characterization of new datasets, to study the regulation of gene expression, and to analyze the DNA sequence of cancer samples to better classify tumours.