User:Laurajp

From REU@MU
Revision as of 20:10, 21 June 2017 by Laurajp (Talk | contribs)

Jump to: navigation, search

Personal

  • Bucknell University Class of 2018
  • Computer Engineering Major

Logs

Log 0: Orientation Day - 5/31

Today we met the REU program coordinators and our project mentors. We also got a tour of the labs we will be working in.

I spoke with my faculty mentor, Dr. Serdar Bozdag, about the different projects that he is currently working on. He is involved in the field of Bioinformatics, which I am very interested in but have very little experience with. During the first couple weeks of the program I will probably be reading a lot of background information on molecular biology. I will likely be working on developing a ranked list of transcription that most affect the expression of certain genes using a computational model.


Log 1: 5/31

Today I met with a PhD student, Duc, who is also working with Dr. Bozdag. He helped me

  • Download R and RStudio onto my laptop
  • Start a tutorial for learning R

R is the primary programming language used in the field of bioinformatics, so it is important for me to be familiar with it. It is a very high level language, so I don't think it will take long at all to learn.

Today the other REU students and I also got a tour of the library and learned how to search the library catalog and the many online databases that the library subscribes to. This information will be helpful when we go to search for published papers related to our current summer research. I checked out four books from the library, Gene Transcription, RNA Motifs and Regulatory Elements, R Programming for Bioinformatics, and Bioinformatics for Biomedical Science and Clinical Applications. I am hoping that these books will help me gain the background knowledge necessary to begin my research project.


Log 2: 6/01

  • Continued working on the R tutorial for most of the day
  • Listened to a lecture by Dr. Factor on good research practices


Log 3: 6/02

  • Finished the R tutorial
  • Began reading background information

Today I began reading through the books I checked out on Wednesday. Dr. Bozdag also sent me a pdf of a book chapter called "Molecular Biology for Computer Scientists" for me to read through. I am currently about a third of the way through taking notes on the pdf, and refreshing my memory on all the concepts I learned in my high school biology course. My mentor and I also developed a list of specific goals and milestones for the rest of the summer. This list can be found on my user page.


Log 4: 6/03 and 6/04

  • Continued reading through background information
  • Reformatted wiki page
  • Goals and Milestones are now easily accessible from the 2017 projects list


Log 5: 6/05

  • Continued reading though "Molecular Biology for Computer Scientists"


Log 6: 6/06

  • Completed ethics training with other REU students
  • Finished reading and taking notes on "Molecular Biology for Computer Scientists"
  • Met with Dr. Bozdag and Duc to discuss more details of the project and goals for the next few weeks

Over the next week or so, I will be conducting a literature search. Duc has sent me some initial papers to read, as well as a tutorial for some of the main R libraries used for RNA sequencing in bioinformatics.


Log 7: 6/07

  • Completed a tutorial going over some major R functions created to help computational biologists model gene expression data

There are several open source R libraries and packages compiled on bioconductor.org, including tutorials and sample data sets for understanding how to use the packages. Today I read through a tutorial for modeling RNA sequencing data using the limma, Glimma, and edgeR libraries. I don't completely understand the details of every function used in the tutorial, but I do now know that it is possible to filter anomalous data, create and normalize graphs of gene expression distributions, and create detailed boxplots, multi-dimensional scaling plots, mean-variance plots, venn diagrams, interactive multi-dimensional scaling plots, and even heatmaps to highlight statistically significant differences in the data between samples.


Log 8: 6/08

  • Created a git repository for this research project
  • Wrote a program that runs all of the code in the tutorial on bioconductor.org in separate functions
  • Made sure I understood the code and that it is well-commented so I can reference it later


Log 9: 6/09

  • Met with Duc to discuss papers I should start with for the literature survey

Duc explained that the key points I should be looking for with each paper are 1) What research question is being asked? 2) What data sets are being used? and 3) How are the results being evaluated? Duc also sent me a number of tutorials so that I can work on starting to download and model gene expression data.


Log 10: 6/12

  • Read through and took notes on two research papers, including one review paper which compared several methods both for pre-processing data and evaluating predictions of microRNA - target gene interactions and microRNA - transcription factor - target gene interactions.


Log 11: 6/13

  • Read and took notes on three more papers

I will be making a small presentation next Thursday on common techniques for predicting transcription factor - target gene interactions based on what I am learning this week.


Log 12: 6/14

  • Continued reading and taking notes on papers, preparing for the presentation


Log 13: 6/15

  • Completed all three modules of the Responsible Conduct of Research course on the Collaborative Institutional Training Initiative (CITI) website


Log 14: 6/20

  • Continued reading and taking notes on papers, preparing for the presentation


Log 15: 6/21

  • Completed a tutorial on the TCGAbiolinks R package on the bioconductor website

I now understand how to search for, download, format, and display the many different types of data available for several tumor types from the Genomic Data Commons in The Cancer Genome Atlas. Because the files are so large, I can currently only download small portions of the data on my laptop. Once I am able to access a remote server, I will be able to download and analyze the data from several samples.