Difference between revisions of "User:Feidler"

Latest revision as of 05:42, 5 August 2022

Week 1

5/31

Attended orientation
Met with advisor (Dr. Michael Zimmer)
Explored research papers related to Zuckerberg Files

6/1

Completed CITI modules
Reviewed Dr. Zimmer's paper
Explored analytics tools
Continued reviewing papers related to Zuckerberg Files

6/2

Met with Dr. Zimmer for further discussion, followed by lunch
Continued exploring potential tools
Continued reviewing papers related to Zuckerberg Files

6/3

Reviewed papers on quantitative textual analysis
Explored WordStat tool and quanteda R package

Week 2

6/6

Attended RCR training
Watched tutorials on WordStat tool

6/7

Read Yazeed Alhumaidan's dissertation methodology
Met with mentor to discuss potential tools
Explored possible tools for analysis and data cleaning

6/8

Attended talk on technical writing
Continued searching for possible tools

6/9

Met with adviser and touched base
Set in motion ordering for WordStat
Reviewed WordStat tutorials

6/10

Installed trial version of WordStat

Week 3

6/13

Obtained xml file of the archive
Created archive account
Began work on script to automate download and collection of transcripts

6/14

Continued working on script
Tested draft documents in WordStat

6/15

Attended research presentation given by Dr. Madiraju
Began debugging script

6/16

Finished debugging script
Organized all blog posts into one file for analysis

6/17

Automated conversion of files from .pdf to .txt files

Week 4

6/20

Cleaned set of files, so no interviews/video transcripts
Obtained official license for WordStat

6/21

Ran initial analysis in WordStat
Fine-tuned text processing
Refined some transcript errors

6/22

Attended research talk given by Dr. Bialkowski
Finished cleaning typos/errors in transcripts
Explored frequencies and topic extraction

6/23

Explored dendrogram anaylsis
Explored proximity plots
Experimented with different aspects of preprocessing

6/24

Explored link analysis
Experimented with postprocessing of text
Tested graphing for frequencies

Week 5

6/27

Attended talk on presentation
Produced graphs of initial findings
Reviewed relevant literature for presentation

6/28

Made PowerPoint for presentation
Rehearsed presentation
Revised relevant graphs

6/29

Gave presentation of work so far
Gathered creation dates of posts
Began process of changing post creation dates

6/30

Continued coding the change of creation dates
Experimented with cluster mapping

7/1

Creation dates successfully changed
Began experimenting with crosstab tool

Week 6

7/5

Continued experimenting with crosstab
Graphed frequencies over time
Explored bubble plotting

7/6

Attended student check-in
More graphing, trying different intervals of time
Explored heatmap tool

7/7

Explored deviation table
Examined key words in context

7/8

Gathered more transcripts from the archive (non social media)
Continued graphing different clusters
Continued examining key words in context

Week 7

7/11

Converted new PDFs into .txt files
Began graphing topics instead of words

7/12

Continued graphing topics
Began cleaning new transcripts

7/13

Attended student check-in
Continued cleaning transcripts

7/14

Redid conversion of PDFs to .txt with new packages
Continued cleaning transcripts

7/15

Attended Dr. Zimmer's talk on Data Ethics
Continued cleaning transcripts

Week 8

7/18

Completed attempt at cleaning transcripts
Examined WordStat results of interview transcripts

7/19

Continued examining WordStat results for interview transcripts
Made error corrections to interview transcripts via WordStat
Modeled topics over time for interview transcripts

7/20

Attended talk on creating effective research posters
Experimented with lemmatization on interview transcripts
Modeled frequencies and dendrograms of interview transcripts

7/21

Combined social media posts and interviews into one large data set
Ran crosstabs on complete data set, mainly frequencies
Cleaned and corrected the complete data set

7/22

Tried doing cooccurrence of complete set (WordStat crashes)
Made some final fixes to complete data set
Experimented with lemmatizing complete set

Week 9

7/25

Created a times series chart of topics in complete set

7/26

Created a proximity plot for social media posts
Experimented with time series chart for phrases in complete set

7/27

Attended talk on graduate school
Experimented with different conversion process (PDF to TXT)

7/28

Graphed new times series charts for social media posts
Explored context of filler words
More topic modeling with interviews

7/29

Graphed times series of word frequencies in complete set
Tweaked preprocessing settings for complete set
Examined topics in complete set, modified topics based on any errors found

@@ Line 18: / Line 18: @@
 '''6/3'''
 *Reviewed papers on quantitative textual analysis
-*Explored WordStata tool and quanteda R package
+*Explored WordStat tool and quanteda R package
+==Week 2==
+'''6/6'''
+*Attended RCR training
+*Watched tutorials on WordStat tool
+'''6/7'''
+*Read Yazeed Alhumaidan's dissertation methodology
+*Met with mentor to discuss potential tools
+*Explored possible tools for analysis and data cleaning
+'''6/8'''
+*Attended talk on technical writing
+*Continued searching for possible tools
+'''6/9'''
+*Met with adviser and touched base
+*Set in motion ordering for WordStat
+*Reviewed WordStat tutorials
+'''6/10'''
+*Installed trial version of WordStat
+==Week 3==
+'''6/13'''
+*Obtained xml file of the archive
+*Created archive account
+*Began work on script to automate download and collection of transcripts
+'''6/14'''
+*Continued working on script
+*Tested draft documents in WordStat
+'''6/15'''
+*Attended research presentation given by Dr. Madiraju
+*Began debugging script
+'''6/16'''
+*Finished debugging script
+*Organized all blog posts into one file for analysis
+'''6/17'''
+*Automated conversion of files from .pdf to .txt files
+==Week 4==
+'''6/20'''
+*Cleaned set of files, so no interviews/video transcripts
+*Obtained official license for WordStat
+'''6/21'''
+*Ran initial analysis in WordStat
+*Fine-tuned text processing
+*Refined some transcript errors
+'''6/22'''
+*Attended research talk given by Dr. Bialkowski
+*Finished cleaning typos/errors in transcripts
+*Explored frequencies and topic extraction
+'''6/23'''
+*Explored dendrogram anaylsis
+*Explored proximity plots
+*Experimented with different aspects of preprocessing
+'''6/24'''
+*Explored link analysis
+*Experimented with postprocessing of text
+*Tested graphing for frequencies
+==Week 5==
+'''6/27'''
+*Attended talk on presentation
+*Produced graphs of initial findings
+*Reviewed relevant literature for presentation
+'''6/28'''
+*Made PowerPoint for presentation
+*Rehearsed presentation
+*Revised relevant graphs
+'''6/29'''
+*Gave presentation of work so far
+*Gathered creation dates of posts
+*Began process of changing post creation dates
+'''6/30'''
+*Continued coding the change of creation dates
+*Experimented with cluster mapping
+'''7/1'''
+*Creation dates successfully changed
+*Began experimenting with crosstab tool
+==Week 6==
+'''7/5'''
+*Continued experimenting with crosstab
+*Graphed frequencies over time
+*Explored bubble plotting
+'''7/6'''
+*Attended student check-in
+*More graphing, trying different intervals of time
+*Explored heatmap tool
+'''7/7'''
+*Explored deviation table
+*Examined key words in context
+'''7/8'''
+*Gathered more transcripts from the archive (non social media)
+*Continued graphing different clusters
+*Continued examining key words in context
+==Week 7==
+'''7/11'''
+*Converted new PDFs into .txt files
+*Began graphing topics instead of words
+'''7/12'''
+*Continued graphing topics
+*Began cleaning new transcripts
+'''7/13'''
+*Attended student check-in
+*Continued cleaning transcripts
+'''7/14'''
+*Redid conversion of PDFs to .txt with new packages
+*Continued cleaning transcripts
+'''7/15'''
+*Attended Dr. Zimmer's talk on Data Ethics
+*Continued cleaning transcripts
+==Week 8==
+'''7/18'''
+*Completed attempt at cleaning transcripts
+*Examined WordStat results of interview transcripts
+'''7/19'''
+*Continued examining WordStat results for interview transcripts
+*Made error corrections to interview transcripts via WordStat
+*Modeled topics over time for interview transcripts
+'''7/20'''
+*Attended talk on creating effective research posters
+*Experimented with lemmatization on interview transcripts
+*Modeled frequencies and dendrograms of interview transcripts
+'''7/21'''
+*Combined social media posts and interviews into one large data set
+*Ran crosstabs on complete data set, mainly frequencies
+*Cleaned and corrected the complete data set
+'''7/22'''
+*Tried doing cooccurrence of complete set (WordStat crashes)
+*Made some final fixes to complete data set
+*Experimented with lemmatizing complete set
+==Week 9==
+'''7/25'''
+*Created a times series chart of topics in complete set
+'''7/26'''
+*Created a proximity plot for social media posts
+*Experimented with time series chart for phrases in complete set
+'''7/27'''
+*Attended talk on graduate school
+*Experimented with different conversion process (PDF to TXT)
+'''7/28'''
+*Graphed new times series charts for social media posts
+*Explored context of filler words
+*More topic modeling with interviews
+'''7/29'''
+*Graphed times series of word frequencies in complete set
+*Tweaked preprocessing settings for complete set
+*Examined topics in complete set, modified topics based on any errors found
+==Week 10==

Difference between revisions of "User:Feidler"

Latest revision as of 05:42, 5 August 2022

Contents

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8

Week 9

Week 10

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools