Difference between revisions of "User:Feidler"

From REU@MU
Jump to: navigation, search
(Week 6)
(Week 7)
 
(One intermediate revision by the same user not shown)
Line 133: Line 133:
  
 
==Week 7==
 
==Week 7==
 +
'''7/11'''
 +
*Converted new PDFs into .txt files
 +
*Began graphing topics instead of words
 +
 +
'''7/12'''
 +
*Continued graphing topics
 +
*Began cleaning new transcripts
 +
 +
'''7/13'''
 +
*Attended student check-in
 +
*Continued cleaning transcripts
 +
 +
'''7/14'''
 +
*Redid conversion of PDFs to .txt with new packages
 +
*Continued cleaning transcripts
 +
 +
'''7/15'''
 +
*Attended Dr. Zimmer's talk on Data Ethics
 +
*Continued cleaning transcripts
 +
 +
==Week 8==
 +
'''7/18'''
 +
*Completed attempt at cleaning transcripts
 +
*Examined WordStat results of interview transcripts
 +
 +
'''7/19'''
 +
*Continued examining WordStat results for interview transcripts
 +
*Made error corrections to interview transcripts via WordStat
 +
*Modeled topics over time for interview transcripts
 +
 +
'''7/20'''
 +
*Attended talk on creating effective research posters
 +
*Experimented with lemmatization on interview transcripts
 +
*Modeled frequencies and dendrograms of interview transcripts
 +
 +
'''7/21'''
 +
*Combined social media posts and interviews into one large data set
 +
*Ran crosstabs on complete data set, mainly frequencies
 +
*Cleaned and corrected the complete data set
 +
 +
'''7/22'''
 +
*Tried doing cooccurrence of complete set (WordStat crashes)
 +
*Made some final fixes to complete data set
 +
*Experimented with lemmatizing complete set
 +
 +
==Week 9==
 +
'''7/25'''
 +
*Created a times series chart of topics in complete set
 +
 +
'''7/26'''
 +
*Created a proximity plot for social media posts
 +
*Experimented with time series chart for phrases in complete set
 +
 +
'''7/27'''
 +
*Attended talk on graduate school
 +
*Experimented with different conversion process (PDF to TXT)
 +
 +
'''7/28'''
 +
*Graphed new times series charts for social media posts
 +
*Explored context of filler words
 +
*More topic modeling with interviews
 +
 +
'''7/29'''
 +
*Graphed times series of word frequencies in complete set
 +
*Tweaked preprocessing settings for complete set
 +
*Examined topics in complete set, modified topics based on any errors found
 +
 +
==Week 10==

Latest revision as of 05:42, 5 August 2022

Week 1

5/31

  • Attended orientation
  • Met with advisor (Dr. Michael Zimmer)
  • Explored research papers related to Zuckerberg Files

6/1

  • Completed CITI modules
  • Reviewed Dr. Zimmer's paper
  • Explored analytics tools
  • Continued reviewing papers related to Zuckerberg Files

6/2

  • Met with Dr. Zimmer for further discussion, followed by lunch
  • Continued exploring potential tools
  • Continued reviewing papers related to Zuckerberg Files

6/3

  • Reviewed papers on quantitative textual analysis
  • Explored WordStat tool and quanteda R package

Week 2

6/6

  • Attended RCR training
  • Watched tutorials on WordStat tool

6/7

  • Read Yazeed Alhumaidan's dissertation methodology
  • Met with mentor to discuss potential tools
  • Explored possible tools for analysis and data cleaning

6/8

  • Attended talk on technical writing
  • Continued searching for possible tools

6/9

  • Met with adviser and touched base
  • Set in motion ordering for WordStat
  • Reviewed WordStat tutorials

6/10

  • Installed trial version of WordStat

Week 3

6/13

  • Obtained xml file of the archive
  • Created archive account
  • Began work on script to automate download and collection of transcripts

6/14

  • Continued working on script
  • Tested draft documents in WordStat

6/15

  • Attended research presentation given by Dr. Madiraju
  • Began debugging script

6/16

  • Finished debugging script
  • Organized all blog posts into one file for analysis

6/17

  • Automated conversion of files from .pdf to .txt files

Week 4

6/20

  • Cleaned set of files, so no interviews/video transcripts
  • Obtained official license for WordStat

6/21

  • Ran initial analysis in WordStat
  • Fine-tuned text processing
  • Refined some transcript errors

6/22

  • Attended research talk given by Dr. Bialkowski
  • Finished cleaning typos/errors in transcripts
  • Explored frequencies and topic extraction

6/23

  • Explored dendrogram anaylsis
  • Explored proximity plots
  • Experimented with different aspects of preprocessing

6/24

  • Explored link analysis
  • Experimented with postprocessing of text
  • Tested graphing for frequencies

Week 5

6/27

  • Attended talk on presentation
  • Produced graphs of initial findings
  • Reviewed relevant literature for presentation

6/28

  • Made PowerPoint for presentation
  • Rehearsed presentation
  • Revised relevant graphs

6/29

  • Gave presentation of work so far
  • Gathered creation dates of posts
  • Began process of changing post creation dates

6/30

  • Continued coding the change of creation dates
  • Experimented with cluster mapping

7/1

  • Creation dates successfully changed
  • Began experimenting with crosstab tool

Week 6

7/5

  • Continued experimenting with crosstab
  • Graphed frequencies over time
  • Explored bubble plotting

7/6

  • Attended student check-in
  • More graphing, trying different intervals of time
  • Explored heatmap tool

7/7

  • Explored deviation table
  • Examined key words in context

7/8

  • Gathered more transcripts from the archive (non social media)
  • Continued graphing different clusters
  • Continued examining key words in context

Week 7

7/11

  • Converted new PDFs into .txt files
  • Began graphing topics instead of words

7/12

  • Continued graphing topics
  • Began cleaning new transcripts

7/13

  • Attended student check-in
  • Continued cleaning transcripts

7/14

  • Redid conversion of PDFs to .txt with new packages
  • Continued cleaning transcripts

7/15

  • Attended Dr. Zimmer's talk on Data Ethics
  • Continued cleaning transcripts

Week 8

7/18

  • Completed attempt at cleaning transcripts
  • Examined WordStat results of interview transcripts

7/19

  • Continued examining WordStat results for interview transcripts
  • Made error corrections to interview transcripts via WordStat
  • Modeled topics over time for interview transcripts

7/20

  • Attended talk on creating effective research posters
  • Experimented with lemmatization on interview transcripts
  • Modeled frequencies and dendrograms of interview transcripts

7/21

  • Combined social media posts and interviews into one large data set
  • Ran crosstabs on complete data set, mainly frequencies
  • Cleaned and corrected the complete data set

7/22

  • Tried doing cooccurrence of complete set (WordStat crashes)
  • Made some final fixes to complete data set
  • Experimented with lemmatizing complete set

Week 9

7/25

  • Created a times series chart of topics in complete set

7/26

  • Created a proximity plot for social media posts
  • Experimented with time series chart for phrases in complete set

7/27

  • Attended talk on graduate school
  • Experimented with different conversion process (PDF to TXT)

7/28

  • Graphed new times series charts for social media posts
  • Explored context of filler words
  • More topic modeling with interviews

7/29

  • Graphed times series of word frequencies in complete set
  • Tweaked preprocessing settings for complete set
  • Examined topics in complete set, modified topics based on any errors found

Week 10