Difference between revisions of "User:Feidler"

From REU@MU
Jump to: navigation, search
(Week 2)
(Week 7)
 
(4 intermediate revisions by the same user not shown)
Line 33: Line 33:
 
*Attended talk on technical writing
 
*Attended talk on technical writing
 
*Continued searching for possible tools
 
*Continued searching for possible tools
 +
 +
'''6/9'''
 +
*Met with adviser and touched base
 +
*Set in motion ordering for WordStat
 +
*Reviewed WordStat tutorials
 +
 +
'''6/10'''
 +
*Installed trial version of WordStat
 +
 +
==Week 3==
 +
'''6/13'''
 +
*Obtained xml file of the archive
 +
*Created archive account
 +
*Began work on script to automate download and collection of transcripts
 +
 +
'''6/14'''
 +
*Continued working on script
 +
*Tested draft documents in WordStat
 +
 +
'''6/15'''
 +
*Attended research presentation given by Dr. Madiraju
 +
*Began debugging script
 +
 +
'''6/16'''
 +
*Finished debugging script
 +
*Organized all blog posts into one file for analysis
 +
 +
'''6/17'''
 +
*Automated conversion of files from .pdf to .txt files
 +
 +
==Week 4==
 +
'''6/20'''
 +
*Cleaned set of files, so no interviews/video transcripts
 +
*Obtained official license for WordStat
 +
 +
'''6/21'''
 +
*Ran initial analysis in WordStat
 +
*Fine-tuned text processing
 +
*Refined some transcript errors
 +
 +
'''6/22'''
 +
*Attended research talk given by Dr. Bialkowski
 +
*Finished cleaning typos/errors in transcripts
 +
*Explored frequencies and topic extraction
 +
 +
'''6/23'''
 +
*Explored dendrogram anaylsis
 +
*Explored proximity plots
 +
*Experimented with different aspects of preprocessing
 +
 +
'''6/24'''
 +
*Explored link analysis
 +
*Experimented with postprocessing of text
 +
*Tested graphing for frequencies
 +
 +
==Week 5==
 +
'''6/27'''
 +
*Attended talk on presentation
 +
*Produced graphs of initial findings
 +
*Reviewed relevant literature for presentation
 +
 +
'''6/28'''
 +
*Made PowerPoint for presentation
 +
*Rehearsed presentation
 +
*Revised relevant graphs
 +
 +
'''6/29'''
 +
*Gave presentation of work so far
 +
*Gathered creation dates of posts
 +
*Began process of changing post creation dates
 +
 +
'''6/30'''
 +
*Continued coding the change of creation dates
 +
*Experimented with cluster mapping
 +
 +
'''7/1'''
 +
*Creation dates successfully changed
 +
*Began experimenting with crosstab tool
 +
 +
==Week 6==
 +
'''7/5'''
 +
*Continued experimenting with crosstab
 +
*Graphed frequencies over time
 +
*Explored bubble plotting
 +
 +
'''7/6'''
 +
*Attended student check-in
 +
*More graphing, trying different intervals of time
 +
*Explored heatmap tool
 +
 +
'''7/7'''
 +
*Explored deviation table
 +
*Examined key words in context
 +
 +
'''7/8'''
 +
*Gathered more transcripts from the archive (non social media)
 +
*Continued graphing different clusters
 +
*Continued examining key words in context
 +
 +
==Week 7==
 +
'''7/11'''
 +
*Converted new PDFs into .txt files
 +
*Began graphing topics instead of words
 +
 +
'''7/12'''
 +
*Continued graphing topics
 +
*Began cleaning new transcripts
 +
 +
'''7/13'''
 +
*Attended student check-in
 +
*Continued cleaning transcripts
 +
 +
'''7/14'''
 +
*Redid conversion of PDFs to .txt with new packages
 +
*Continued cleaning transcripts
 +
 +
'''7/15'''
 +
*Attended Dr. Zimmer's talk on Data Ethics
 +
*Continued cleaning transcripts
 +
 +
==Week 8==
 +
'''7/18'''
 +
*Completed attempt at cleaning transcripts
 +
*Examined WordStat results of interview transcripts
 +
 +
'''7/19'''
 +
*Continued examining WordStat results for interview transcripts
 +
*Made error corrections to interview transcripts via WordStat
 +
*Modeled topics over time for interview transcripts
 +
 +
'''7/20'''
 +
*Attended talk on creating effective research posters
 +
*Experimented with lemmatization on interview transcripts
 +
*Modeled frequencies and dendrograms of interview transcripts
 +
 +
'''7/21'''
 +
*Combined social media posts and interviews into one large data set
 +
*Ran crosstabs on complete data set, mainly frequencies
 +
*Cleaned and corrected the complete data set
 +
 +
'''7/22'''
 +
*Tried doing cooccurrence of complete set (WordStat crashes)
 +
*Made some final fixes to complete data set
 +
*Experimented with lemmatizing complete set
 +
 +
==Week 9==
 +
'''7/25'''
 +
*Created a times series chart of topics in complete set
 +
 +
'''7/26'''
 +
*Created a proximity plot for social media posts
 +
*Experimented with time series chart for phrases in complete set
 +
 +
'''7/27'''
 +
*Attended talk on graduate school
 +
*Experimented with different conversion process (PDF to TXT)
 +
 +
'''7/28'''
 +
*Graphed new times series charts for social media posts
 +
*Explored context of filler words
 +
*More topic modeling with interviews
 +
 +
'''7/29'''
 +
*Graphed times series of word frequencies in complete set
 +
*Tweaked preprocessing settings for complete set
 +
*Examined topics in complete set, modified topics based on any errors found
 +
 +
==Week 10==

Latest revision as of 05:42, 5 August 2022

Week 1

5/31

  • Attended orientation
  • Met with advisor (Dr. Michael Zimmer)
  • Explored research papers related to Zuckerberg Files

6/1

  • Completed CITI modules
  • Reviewed Dr. Zimmer's paper
  • Explored analytics tools
  • Continued reviewing papers related to Zuckerberg Files

6/2

  • Met with Dr. Zimmer for further discussion, followed by lunch
  • Continued exploring potential tools
  • Continued reviewing papers related to Zuckerberg Files

6/3

  • Reviewed papers on quantitative textual analysis
  • Explored WordStat tool and quanteda R package

Week 2

6/6

  • Attended RCR training
  • Watched tutorials on WordStat tool

6/7

  • Read Yazeed Alhumaidan's dissertation methodology
  • Met with mentor to discuss potential tools
  • Explored possible tools for analysis and data cleaning

6/8

  • Attended talk on technical writing
  • Continued searching for possible tools

6/9

  • Met with adviser and touched base
  • Set in motion ordering for WordStat
  • Reviewed WordStat tutorials

6/10

  • Installed trial version of WordStat

Week 3

6/13

  • Obtained xml file of the archive
  • Created archive account
  • Began work on script to automate download and collection of transcripts

6/14

  • Continued working on script
  • Tested draft documents in WordStat

6/15

  • Attended research presentation given by Dr. Madiraju
  • Began debugging script

6/16

  • Finished debugging script
  • Organized all blog posts into one file for analysis

6/17

  • Automated conversion of files from .pdf to .txt files

Week 4

6/20

  • Cleaned set of files, so no interviews/video transcripts
  • Obtained official license for WordStat

6/21

  • Ran initial analysis in WordStat
  • Fine-tuned text processing
  • Refined some transcript errors

6/22

  • Attended research talk given by Dr. Bialkowski
  • Finished cleaning typos/errors in transcripts
  • Explored frequencies and topic extraction

6/23

  • Explored dendrogram anaylsis
  • Explored proximity plots
  • Experimented with different aspects of preprocessing

6/24

  • Explored link analysis
  • Experimented with postprocessing of text
  • Tested graphing for frequencies

Week 5

6/27

  • Attended talk on presentation
  • Produced graphs of initial findings
  • Reviewed relevant literature for presentation

6/28

  • Made PowerPoint for presentation
  • Rehearsed presentation
  • Revised relevant graphs

6/29

  • Gave presentation of work so far
  • Gathered creation dates of posts
  • Began process of changing post creation dates

6/30

  • Continued coding the change of creation dates
  • Experimented with cluster mapping

7/1

  • Creation dates successfully changed
  • Began experimenting with crosstab tool

Week 6

7/5

  • Continued experimenting with crosstab
  • Graphed frequencies over time
  • Explored bubble plotting

7/6

  • Attended student check-in
  • More graphing, trying different intervals of time
  • Explored heatmap tool

7/7

  • Explored deviation table
  • Examined key words in context

7/8

  • Gathered more transcripts from the archive (non social media)
  • Continued graphing different clusters
  • Continued examining key words in context

Week 7

7/11

  • Converted new PDFs into .txt files
  • Began graphing topics instead of words

7/12

  • Continued graphing topics
  • Began cleaning new transcripts

7/13

  • Attended student check-in
  • Continued cleaning transcripts

7/14

  • Redid conversion of PDFs to .txt with new packages
  • Continued cleaning transcripts

7/15

  • Attended Dr. Zimmer's talk on Data Ethics
  • Continued cleaning transcripts

Week 8

7/18

  • Completed attempt at cleaning transcripts
  • Examined WordStat results of interview transcripts

7/19

  • Continued examining WordStat results for interview transcripts
  • Made error corrections to interview transcripts via WordStat
  • Modeled topics over time for interview transcripts

7/20

  • Attended talk on creating effective research posters
  • Experimented with lemmatization on interview transcripts
  • Modeled frequencies and dendrograms of interview transcripts

7/21

  • Combined social media posts and interviews into one large data set
  • Ran crosstabs on complete data set, mainly frequencies
  • Cleaned and corrected the complete data set

7/22

  • Tried doing cooccurrence of complete set (WordStat crashes)
  • Made some final fixes to complete data set
  • Experimented with lemmatizing complete set

Week 9

7/25

  • Created a times series chart of topics in complete set

7/26

  • Created a proximity plot for social media posts
  • Experimented with time series chart for phrases in complete set

7/27

  • Attended talk on graduate school
  • Experimented with different conversion process (PDF to TXT)

7/28

  • Graphed new times series charts for social media posts
  • Explored context of filler words
  • More topic modeling with interviews

7/29

  • Graphed times series of word frequencies in complete set
  • Tweaked preprocessing settings for complete set
  • Examined topics in complete set, modified topics based on any errors found

Week 10