The question of enhancing diversity in big data and computational social science is fundamentally important. I think the importance of diversity in computational areas is often ignored and, as scholars, this is at our peril.
First of all, we often have biases in how we interpret data. Specifically, bias due to particular subject positions (e.g. a researcher’s position coming from a dominant group). This often marginalizes minority voices in big data projects or just ‘others’ them.
Second, how data is analyzed, even if it is aggregated or difficult to discern identities, can be biased by subject positions. For example, we treat social media data as able to tell stories, but researchers often are not looking for diverse stories or diverse research questions. So, diverse stories need to actively be looked for.
Third, we have a general lack of diversity in terms of the types of data sets we generally collect and APIs do not easily facilitate efforts to showcase underrepresented groups. In social media, data collected is often based around a particular hashtag or categories, which may not represent racial/ethnic or other diversity well. Again, racial/ethnic, gender, socioeconomic and other diversity needs to be actively worked on.
Ultimately, it is important to understand that there is a lack of diversity in these areas and it is also critical for students and faculty from diverse areas to have literacy in big data and computational research methods.
As recent government hearings in the US and Europe around data privacy have underscored, there are deep consequences to the types of data being circulated. Moreover, the decisions that algorithms make tend to be based on what privileged people (e.g. the software developers designing the algorithms) see the world as. This often is to the detriment of diverse views (which are often seen as threatening).