New publication on automated text classification methods
1 November 2018
The research group achieved another publication success. Jochen Hartmann, Juliana Huppert, Christina Schamp, and Mark Heitmann published their paper on “Comparing automated text classification methods” in the International Journal of Research in Marketing.
This paper was one of the focus topics of project A2 in the first funding period. The authors compare the effectiveness of several methods to classify unstructured text, which is an ongoing challenge for many marketing researchers. In particular, unstructured text data in social media is increasingly available and may provide great insights – if analyzed properly. For instance, the detection of communication shifts in sentiment is a widely used application of text classification in social media.
In their paper, Jochen Hartmann et al. compare the performance of ten different approaches (five lexicon-based, five machine learning algorithms) across 41 social media datasets covering major platforms, different classification problems, various sample sizes, and languages. In contrast to the dominant usage of support vector machines and Linguistic Inquiry and Word Count (LIWC) in marketing research, the authors show that these methods are inferior and, hence, provide less reliable results. Instead, their results reveal that random forest and naïve Bayes perform best in terms of correctly uncovering human intuition. The authors show the superiority of these methods across various settings. In sum, this paper provides valuable insights into the principles and applications of different automated text classification methods, which are relevant for both researchers and practitioners.
Hartmann, J., Huppertz, J., Schamp, C., & Heitmann, M. (2018): Comparing automated text classification methods, International Journal of Research in Marketing, forthcoming.