Reverse Engineering Gene Regulatory Networks by Integrating Multiple Types of High-Dimensional Biological Datasets

From REU@MU
Jump to: navigation, search

Title: Reverse engineering gene regulatory networks by integrating multiple types of high-dimensional biological datasets

Description: The underlying biological processes in living organisms could be summarized by a gene regulatory network (GRN). GRNs describe which gene regulates which gene in the cell. Given that the DNA in our cells are same, the reason that our hand tissue is different than an eye tissue is due the differences in the GRNs in those cells. Changes in GRNs could also lead into abnormal biological stages such as cancer. Building accurate GRNs of organisms is a key step to better characterize the system and pinpoint potential drivers of diseases.

With the advent of high dimensional biological datasets, various types of evidences between regulatory interactions between genes can be computed. To goal of this project is to integrate various types of datasets to reverse engineer GRNs with high accuracy. Students will work with a team of PhD students and the faculty mentor to build a computational tool that leverages various high-dimensional biological datasets to infer regulatory interactions between genes in various experimental conditions.

Students are expected to be proficient in programming. Experience in molecular biology, basic Linux commands and high performance computing is preferred, but not required.

Student learning objectives: After this project, students will

  • Have a basic understanding of molecular biology and high-dimensional biological datasets.
  • Be familiar with R or Python programming language and some bioinformatics libraries in those languages.
  • Learn gather biological data from public repositories
  • Build a computational pipeline that pre-processes and integrates high-dimensional biological datasets
  • Be familiar with data visualization tools to analyze and visualize gene networks
  • Learn methods to evaluate predictive models by computing true positive rate, false positive rate, precision, recall, ROC curves, etc.