Text mining and social media: when quantitative meets qualitative, and software meets humans [Working paper]

The ongoing production of staggeringly huge volumes of digital data is a ubiquitous part of life in the early twenty-first century. A large proportion of this data is text. This development has serious implications for almost all scholarly endeavour. It is now possible for researchers from a wide range of disciplines to use text mining techniques and software tools in their daily practice. In our own field of political communication, the prospect of cheap access to what, how, and to whom very large numbers of citizens communicate in social media environments provides opportunities that are often too good to miss as we seek to understand how and why citizens think and feel the way they do about policies, political organizations, and political events. But what are the methods and tools on offer, how should they best be used, and what sorts of ethical issues are raised by their use? In this article we proceed as follows. First, we provide a basic definition of text mining. Second, we provide examples of how text mining has been used recently in a diverse range of analytical contexts, from business to media to politics. Third, we discuss the challenges of conducting text mining in online social media environments, focusing on issues such as the problem of gaining access to social media data, research ethics, and the integrity of the data corpuses that are available from social media companies. Fourth, we present a basic but comprehensive survey of the text mining tools that are currently available. Finally, we present two brief case studies of the application of text mining in the authors’ field of political communication. We conclude with some observations about the proper place of text mining in social science research.