Difference between revisions of "User:Feidler"
From REU@MU
(→Week 1) |
(→Week 7) |
||
(5 intermediate revisions by the same user not shown) | |||
Line 24: | Line 24: | ||
*Attended RCR training | *Attended RCR training | ||
*Watched tutorials on WordStat tool | *Watched tutorials on WordStat tool | ||
+ | |||
+ | '''6/7''' | ||
+ | *Read Yazeed Alhumaidan's dissertation methodology | ||
+ | *Met with mentor to discuss potential tools | ||
+ | *Explored possible tools for analysis and data cleaning | ||
+ | |||
+ | '''6/8''' | ||
+ | *Attended talk on technical writing | ||
+ | *Continued searching for possible tools | ||
+ | |||
+ | '''6/9''' | ||
+ | *Met with adviser and touched base | ||
+ | *Set in motion ordering for WordStat | ||
+ | *Reviewed WordStat tutorials | ||
+ | |||
+ | '''6/10''' | ||
+ | *Installed trial version of WordStat | ||
+ | |||
+ | ==Week 3== | ||
+ | '''6/13''' | ||
+ | *Obtained xml file of the archive | ||
+ | *Created archive account | ||
+ | *Began work on script to automate download and collection of transcripts | ||
+ | |||
+ | '''6/14''' | ||
+ | *Continued working on script | ||
+ | *Tested draft documents in WordStat | ||
+ | |||
+ | '''6/15''' | ||
+ | *Attended research presentation given by Dr. Madiraju | ||
+ | *Began debugging script | ||
+ | |||
+ | '''6/16''' | ||
+ | *Finished debugging script | ||
+ | *Organized all blog posts into one file for analysis | ||
+ | |||
+ | '''6/17''' | ||
+ | *Automated conversion of files from .pdf to .txt files | ||
+ | |||
+ | ==Week 4== | ||
+ | '''6/20''' | ||
+ | *Cleaned set of files, so no interviews/video transcripts | ||
+ | *Obtained official license for WordStat | ||
+ | |||
+ | '''6/21''' | ||
+ | *Ran initial analysis in WordStat | ||
+ | *Fine-tuned text processing | ||
+ | *Refined some transcript errors | ||
+ | |||
+ | '''6/22''' | ||
+ | *Attended research talk given by Dr. Bialkowski | ||
+ | *Finished cleaning typos/errors in transcripts | ||
+ | *Explored frequencies and topic extraction | ||
+ | |||
+ | '''6/23''' | ||
+ | *Explored dendrogram anaylsis | ||
+ | *Explored proximity plots | ||
+ | *Experimented with different aspects of preprocessing | ||
+ | |||
+ | '''6/24''' | ||
+ | *Explored link analysis | ||
+ | *Experimented with postprocessing of text | ||
+ | *Tested graphing for frequencies | ||
+ | |||
+ | ==Week 5== | ||
+ | '''6/27''' | ||
+ | *Attended talk on presentation | ||
+ | *Produced graphs of initial findings | ||
+ | *Reviewed relevant literature for presentation | ||
+ | |||
+ | '''6/28''' | ||
+ | *Made PowerPoint for presentation | ||
+ | *Rehearsed presentation | ||
+ | *Revised relevant graphs | ||
+ | |||
+ | '''6/29''' | ||
+ | *Gave presentation of work so far | ||
+ | *Gathered creation dates of posts | ||
+ | *Began process of changing post creation dates | ||
+ | |||
+ | '''6/30''' | ||
+ | *Continued coding the change of creation dates | ||
+ | *Experimented with cluster mapping | ||
+ | |||
+ | '''7/1''' | ||
+ | *Creation dates successfully changed | ||
+ | *Began experimenting with crosstab tool | ||
+ | |||
+ | ==Week 6== | ||
+ | '''7/5''' | ||
+ | *Continued experimenting with crosstab | ||
+ | *Graphed frequencies over time | ||
+ | *Explored bubble plotting | ||
+ | |||
+ | '''7/6''' | ||
+ | *Attended student check-in | ||
+ | *More graphing, trying different intervals of time | ||
+ | *Explored heatmap tool | ||
+ | |||
+ | '''7/7''' | ||
+ | *Explored deviation table | ||
+ | *Examined key words in context | ||
+ | |||
+ | '''7/8''' | ||
+ | *Gathered more transcripts from the archive (non social media) | ||
+ | *Continued graphing different clusters | ||
+ | *Continued examining key words in context | ||
+ | |||
+ | ==Week 7== | ||
+ | '''7/11''' | ||
+ | *Converted new PDFs into .txt files | ||
+ | *Began graphing topics instead of words | ||
+ | |||
+ | '''7/12''' | ||
+ | *Continued graphing topics | ||
+ | *Began cleaning new transcripts | ||
+ | |||
+ | '''7/13''' | ||
+ | *Attended student check-in | ||
+ | *Continued cleaning transcripts | ||
+ | |||
+ | '''7/14''' | ||
+ | *Redid conversion of PDFs to .txt with new packages | ||
+ | *Continued cleaning transcripts | ||
+ | |||
+ | '''7/15''' | ||
+ | *Attended Dr. Zimmer's talk on Data Ethics | ||
+ | *Continued cleaning transcripts | ||
+ | |||
+ | ==Week 8== | ||
+ | '''7/18''' | ||
+ | *Completed attempt at cleaning transcripts | ||
+ | *Examined WordStat results of interview transcripts | ||
+ | |||
+ | '''7/19''' | ||
+ | *Continued examining WordStat results for interview transcripts | ||
+ | *Made error corrections to interview transcripts via WordStat | ||
+ | *Modeled topics over time for interview transcripts | ||
+ | |||
+ | '''7/20''' | ||
+ | *Attended talk on creating effective research posters | ||
+ | *Experimented with lemmatization on interview transcripts | ||
+ | *Modeled frequencies and dendrograms of interview transcripts | ||
+ | |||
+ | '''7/21''' | ||
+ | *Combined social media posts and interviews into one large data set | ||
+ | *Ran crosstabs on complete data set, mainly frequencies | ||
+ | *Cleaned and corrected the complete data set | ||
+ | |||
+ | '''7/22''' | ||
+ | *Tried doing cooccurrence of complete set (WordStat crashes) | ||
+ | *Made some final fixes to complete data set | ||
+ | *Experimented with lemmatizing complete set | ||
+ | |||
+ | ==Week 9== | ||
+ | '''7/25''' | ||
+ | *Created a times series chart of topics in complete set | ||
+ | |||
+ | '''7/26''' | ||
+ | *Created a proximity plot for social media posts | ||
+ | *Experimented with time series chart for phrases in complete set | ||
+ | |||
+ | '''7/27''' | ||
+ | *Attended talk on graduate school | ||
+ | *Experimented with different conversion process (PDF to TXT) | ||
+ | |||
+ | '''7/28''' | ||
+ | *Graphed new times series charts for social media posts | ||
+ | *Explored context of filler words | ||
+ | *More topic modeling with interviews | ||
+ | |||
+ | '''7/29''' | ||
+ | *Graphed times series of word frequencies in complete set | ||
+ | *Tweaked preprocessing settings for complete set | ||
+ | *Examined topics in complete set, modified topics based on any errors found | ||
+ | |||
+ | ==Week 10== |
Latest revision as of 05:42, 5 August 2022
Contents
Week 1
5/31
- Attended orientation
- Met with advisor (Dr. Michael Zimmer)
- Explored research papers related to Zuckerberg Files
6/1
- Completed CITI modules
- Reviewed Dr. Zimmer's paper
- Explored analytics tools
- Continued reviewing papers related to Zuckerberg Files
6/2
- Met with Dr. Zimmer for further discussion, followed by lunch
- Continued exploring potential tools
- Continued reviewing papers related to Zuckerberg Files
6/3
- Reviewed papers on quantitative textual analysis
- Explored WordStat tool and quanteda R package
Week 2
6/6
- Attended RCR training
- Watched tutorials on WordStat tool
6/7
- Read Yazeed Alhumaidan's dissertation methodology
- Met with mentor to discuss potential tools
- Explored possible tools for analysis and data cleaning
6/8
- Attended talk on technical writing
- Continued searching for possible tools
6/9
- Met with adviser and touched base
- Set in motion ordering for WordStat
- Reviewed WordStat tutorials
6/10
- Installed trial version of WordStat
Week 3
6/13
- Obtained xml file of the archive
- Created archive account
- Began work on script to automate download and collection of transcripts
6/14
- Continued working on script
- Tested draft documents in WordStat
6/15
- Attended research presentation given by Dr. Madiraju
- Began debugging script
6/16
- Finished debugging script
- Organized all blog posts into one file for analysis
6/17
- Automated conversion of files from .pdf to .txt files
Week 4
6/20
- Cleaned set of files, so no interviews/video transcripts
- Obtained official license for WordStat
6/21
- Ran initial analysis in WordStat
- Fine-tuned text processing
- Refined some transcript errors
6/22
- Attended research talk given by Dr. Bialkowski
- Finished cleaning typos/errors in transcripts
- Explored frequencies and topic extraction
6/23
- Explored dendrogram anaylsis
- Explored proximity plots
- Experimented with different aspects of preprocessing
6/24
- Explored link analysis
- Experimented with postprocessing of text
- Tested graphing for frequencies
Week 5
6/27
- Attended talk on presentation
- Produced graphs of initial findings
- Reviewed relevant literature for presentation
6/28
- Made PowerPoint for presentation
- Rehearsed presentation
- Revised relevant graphs
6/29
- Gave presentation of work so far
- Gathered creation dates of posts
- Began process of changing post creation dates
6/30
- Continued coding the change of creation dates
- Experimented with cluster mapping
7/1
- Creation dates successfully changed
- Began experimenting with crosstab tool
Week 6
7/5
- Continued experimenting with crosstab
- Graphed frequencies over time
- Explored bubble plotting
7/6
- Attended student check-in
- More graphing, trying different intervals of time
- Explored heatmap tool
7/7
- Explored deviation table
- Examined key words in context
7/8
- Gathered more transcripts from the archive (non social media)
- Continued graphing different clusters
- Continued examining key words in context
Week 7
7/11
- Converted new PDFs into .txt files
- Began graphing topics instead of words
7/12
- Continued graphing topics
- Began cleaning new transcripts
7/13
- Attended student check-in
- Continued cleaning transcripts
7/14
- Redid conversion of PDFs to .txt with new packages
- Continued cleaning transcripts
7/15
- Attended Dr. Zimmer's talk on Data Ethics
- Continued cleaning transcripts
Week 8
7/18
- Completed attempt at cleaning transcripts
- Examined WordStat results of interview transcripts
7/19
- Continued examining WordStat results for interview transcripts
- Made error corrections to interview transcripts via WordStat
- Modeled topics over time for interview transcripts
7/20
- Attended talk on creating effective research posters
- Experimented with lemmatization on interview transcripts
- Modeled frequencies and dendrograms of interview transcripts
7/21
- Combined social media posts and interviews into one large data set
- Ran crosstabs on complete data set, mainly frequencies
- Cleaned and corrected the complete data set
7/22
- Tried doing cooccurrence of complete set (WordStat crashes)
- Made some final fixes to complete data set
- Experimented with lemmatizing complete set
Week 9
7/25
- Created a times series chart of topics in complete set
7/26
- Created a proximity plot for social media posts
- Experimented with time series chart for phrases in complete set
7/27
- Attended talk on graduate school
- Experimented with different conversion process (PDF to TXT)
7/28
- Graphed new times series charts for social media posts
- Explored context of filler words
- More topic modeling with interviews
7/29
- Graphed times series of word frequencies in complete set
- Tweaked preprocessing settings for complete set
- Examined topics in complete set, modified topics based on any errors found