Count or Context: Investigating Methods of Text Analysis Public Deposited

Downloadable Content

Download PDF
Last Modified
  • February 26, 2019
  • Neal, Anissa
    • Affiliation: College of Arts and Sciences, Department of Linguistics
  • Using text as a source of psychological and cognitive information has become a popular subject (Robinson, Navea & Ickes, 2013; Donahue, Liang & Druckman, 2014; Wolfe & Goldman, 2003). To do this, researchers use a variety of methods to analyze text, but Linguistic Inquiry Word Count (LIWC) has become one the more common techniques. LIWC is a tokenbased method that contains multiple dictionaries representing various psychological states (positive affect, leisure, religion, social words) and keeps a running tabulation of how many words in a given text occur in each category. Latent Semantic Analysis (LSA) is a context-based method that uses statistics to calculate similarity between different texts based off the surrounding words. As a common strategy of analyzing text for psychological states, it is important LIWC be truly representative of the aspects it explores. The dictionaries must accurately represent the categories they measure to be an authentic assessment of the analyzed psychological and cognitive states. This current study seeks to use LSA to improve LIWC. The hypothesis is that a combination method of the two will perform better than the application of a single token-based method. LIWC and two other token-based methods were compared to a combination LSA-token method. The two techniques were applied to a set of headlines that had been previously judged by humans in terms of emotion and positive/negative valence. The first part of the experiment compared the token-based methods to confirm that they were different from each other but still successful measures of the stimuli. The second part of the experiment compared the correlation between the token-based method and the correct response of the pre-tagged data against the correlation of the combination method and the pre-tagged stimuli. The findings did not support the hypothesis, as the combination method performed worse than the token-based methods. These results, however, suggest further investigation into the power of LSA and its reliance on context. Specifically, LSA may be suited for analysis of longer, more semantically complex texts, not short, basic samples, like the headlines used in this study.
Date of publication
Resource type
Rights statement
  • In Copyright
  • Funding: None
  • Pertsova, Katya
  • Bachelor of Arts
Honors level
  • Honors
Degree granting institution
  • University of North Carolina at Chapel Hill
  • 39

This work has no parents.