Big Data, Machine Learning, and Mixed Methods

The study of social media has great promise, but we always need to understand its limitations. This sounds rather basic, but it is often not reflexively thought about. Though social media is not as shiny as it was several years ago, the zeitgeist still persists and it often clouds our ability to frame what it is exactly that we are doing with all the social data we have access to.[1] Specifically, if we use Twitter data, it is not enough to just leave research at the level of frequency counts (top hashtags, top retweets, most engaged with comments, etc.). David De Roure [2] warns that this type analysis of social media misses the social aspects of web technologies. Ultimately, social media spaces are sociotechnical systems and the social that is (re)produced – like face-to-face communication – is highly nuanced. I think that it is fundamentally important for researchers of social media data across the disciplines to think critically beyond the literal results of brute force machine learning. Rather, this is an opportunity for us to ask large and important social questions. My point is epistemological in that I think it is important for our results to contribute to our understanding of these social questions. This is not to say that quantitative methods such as natural language processing, n-grams (and other co-occurrence methods), and various descriptive statistics are not important to the study of social media. But, rather, they are often the starting or mid point of a research project. In my work, Big Data analytical models provide a great way to get a birds-eye view of social media data. However, they cannot answer social questions as such. However, these methods are valuable to, for example, grounded theory approaches, which can help produce valuable research questions or social insights. Additionally, the mixing of methods this encourages is exciting as it provides opportunities for us to innovate new research methods rather than trying to fit traditional research methods (though doing this is valuable of course too).

[1] Ramesh Jain in his talk at the NUS Web Science & Big Data Analytics workshop puts this as data being everywhere and that we have access to billions of data streams.

[2] In his talk at the NUS Web Science & Big Data Analytics workshop (December 8th, 2014)

Leave a Reply

Your email address will not be published. Required fields are marked *