Difference between revisions of "User:Feidler"
From REU@MU
(→Week 6) |
(→Week 7) |
||
(One intermediate revision by the same user not shown) | |||
Line 133: | Line 133: | ||
==Week 7== | ==Week 7== | ||
+ | '''7/11''' | ||
+ | *Converted new PDFs into .txt files | ||
+ | *Began graphing topics instead of words | ||
+ | |||
+ | '''7/12''' | ||
+ | *Continued graphing topics | ||
+ | *Began cleaning new transcripts | ||
+ | |||
+ | '''7/13''' | ||
+ | *Attended student check-in | ||
+ | *Continued cleaning transcripts | ||
+ | |||
+ | '''7/14''' | ||
+ | *Redid conversion of PDFs to .txt with new packages | ||
+ | *Continued cleaning transcripts | ||
+ | |||
+ | '''7/15''' | ||
+ | *Attended Dr. Zimmer's talk on Data Ethics | ||
+ | *Continued cleaning transcripts | ||
+ | |||
+ | ==Week 8== | ||
+ | '''7/18''' | ||
+ | *Completed attempt at cleaning transcripts | ||
+ | *Examined WordStat results of interview transcripts | ||
+ | |||
+ | '''7/19''' | ||
+ | *Continued examining WordStat results for interview transcripts | ||
+ | *Made error corrections to interview transcripts via WordStat | ||
+ | *Modeled topics over time for interview transcripts | ||
+ | |||
+ | '''7/20''' | ||
+ | *Attended talk on creating effective research posters | ||
+ | *Experimented with lemmatization on interview transcripts | ||
+ | *Modeled frequencies and dendrograms of interview transcripts | ||
+ | |||
+ | '''7/21''' | ||
+ | *Combined social media posts and interviews into one large data set | ||
+ | *Ran crosstabs on complete data set, mainly frequencies | ||
+ | *Cleaned and corrected the complete data set | ||
+ | |||
+ | '''7/22''' | ||
+ | *Tried doing cooccurrence of complete set (WordStat crashes) | ||
+ | *Made some final fixes to complete data set | ||
+ | *Experimented with lemmatizing complete set | ||
+ | |||
+ | ==Week 9== | ||
+ | '''7/25''' | ||
+ | *Created a times series chart of topics in complete set | ||
+ | |||
+ | '''7/26''' | ||
+ | *Created a proximity plot for social media posts | ||
+ | *Experimented with time series chart for phrases in complete set | ||
+ | |||
+ | '''7/27''' | ||
+ | *Attended talk on graduate school | ||
+ | *Experimented with different conversion process (PDF to TXT) | ||
+ | |||
+ | '''7/28''' | ||
+ | *Graphed new times series charts for social media posts | ||
+ | *Explored context of filler words | ||
+ | *More topic modeling with interviews | ||
+ | |||
+ | '''7/29''' | ||
+ | *Graphed times series of word frequencies in complete set | ||
+ | *Tweaked preprocessing settings for complete set | ||
+ | *Examined topics in complete set, modified topics based on any errors found | ||
+ | |||
+ | ==Week 10== |
Latest revision as of 05:42, 5 August 2022
Contents
Week 1
5/31
- Attended orientation
- Met with advisor (Dr. Michael Zimmer)
- Explored research papers related to Zuckerberg Files
6/1
- Completed CITI modules
- Reviewed Dr. Zimmer's paper
- Explored analytics tools
- Continued reviewing papers related to Zuckerberg Files
6/2
- Met with Dr. Zimmer for further discussion, followed by lunch
- Continued exploring potential tools
- Continued reviewing papers related to Zuckerberg Files
6/3
- Reviewed papers on quantitative textual analysis
- Explored WordStat tool and quanteda R package
Week 2
6/6
- Attended RCR training
- Watched tutorials on WordStat tool
6/7
- Read Yazeed Alhumaidan's dissertation methodology
- Met with mentor to discuss potential tools
- Explored possible tools for analysis and data cleaning
6/8
- Attended talk on technical writing
- Continued searching for possible tools
6/9
- Met with adviser and touched base
- Set in motion ordering for WordStat
- Reviewed WordStat tutorials
6/10
- Installed trial version of WordStat
Week 3
6/13
- Obtained xml file of the archive
- Created archive account
- Began work on script to automate download and collection of transcripts
6/14
- Continued working on script
- Tested draft documents in WordStat
6/15
- Attended research presentation given by Dr. Madiraju
- Began debugging script
6/16
- Finished debugging script
- Organized all blog posts into one file for analysis
6/17
- Automated conversion of files from .pdf to .txt files
Week 4
6/20
- Cleaned set of files, so no interviews/video transcripts
- Obtained official license for WordStat
6/21
- Ran initial analysis in WordStat
- Fine-tuned text processing
- Refined some transcript errors
6/22
- Attended research talk given by Dr. Bialkowski
- Finished cleaning typos/errors in transcripts
- Explored frequencies and topic extraction
6/23
- Explored dendrogram anaylsis
- Explored proximity plots
- Experimented with different aspects of preprocessing
6/24
- Explored link analysis
- Experimented with postprocessing of text
- Tested graphing for frequencies
Week 5
6/27
- Attended talk on presentation
- Produced graphs of initial findings
- Reviewed relevant literature for presentation
6/28
- Made PowerPoint for presentation
- Rehearsed presentation
- Revised relevant graphs
6/29
- Gave presentation of work so far
- Gathered creation dates of posts
- Began process of changing post creation dates
6/30
- Continued coding the change of creation dates
- Experimented with cluster mapping
7/1
- Creation dates successfully changed
- Began experimenting with crosstab tool
Week 6
7/5
- Continued experimenting with crosstab
- Graphed frequencies over time
- Explored bubble plotting
7/6
- Attended student check-in
- More graphing, trying different intervals of time
- Explored heatmap tool
7/7
- Explored deviation table
- Examined key words in context
7/8
- Gathered more transcripts from the archive (non social media)
- Continued graphing different clusters
- Continued examining key words in context
Week 7
7/11
- Converted new PDFs into .txt files
- Began graphing topics instead of words
7/12
- Continued graphing topics
- Began cleaning new transcripts
7/13
- Attended student check-in
- Continued cleaning transcripts
7/14
- Redid conversion of PDFs to .txt with new packages
- Continued cleaning transcripts
7/15
- Attended Dr. Zimmer's talk on Data Ethics
- Continued cleaning transcripts
Week 8
7/18
- Completed attempt at cleaning transcripts
- Examined WordStat results of interview transcripts
7/19
- Continued examining WordStat results for interview transcripts
- Made error corrections to interview transcripts via WordStat
- Modeled topics over time for interview transcripts
7/20
- Attended talk on creating effective research posters
- Experimented with lemmatization on interview transcripts
- Modeled frequencies and dendrograms of interview transcripts
7/21
- Combined social media posts and interviews into one large data set
- Ran crosstabs on complete data set, mainly frequencies
- Cleaned and corrected the complete data set
7/22
- Tried doing cooccurrence of complete set (WordStat crashes)
- Made some final fixes to complete data set
- Experimented with lemmatizing complete set
Week 9
7/25
- Created a times series chart of topics in complete set
7/26
- Created a proximity plot for social media posts
- Experimented with time series chart for phrases in complete set
7/27
- Attended talk on graduate school
- Experimented with different conversion process (PDF to TXT)
7/28
- Graphed new times series charts for social media posts
- Explored context of filler words
- More topic modeling with interviews
7/29
- Graphed times series of word frequencies in complete set
- Tweaked preprocessing settings for complete set
- Examined topics in complete set, modified topics based on any errors found