Critical Sociology of Big Data: The possibilities for social research, commercial enterprise, and the efficient government offered by the massive digital data sets – big data – that are now collected via individuals’ online activities have been widely discussed and publicized in recent years.
Much is made of the potential offered by these ever-expanding data sets in the popular media and in data science, business, global development, policing and security, politics, healthcare, education, and agriculture.
Big data is thought to provide greater precision and predictive capabilities in order to improve efficiency, safety, wealth generation, and resource management. The ability of digital technologies to harvest, mine, store, and analyze data is regarded as superior to other forms of knowledge, allowing for greater insight into human behavior than ever before.
However, big data as sociocultural artifacts has a lot more to say from a critical sociological perspective. After providing an overview of how big data discourses and practices have come to dominate many social spheres, I go over how digital data assemblages and algorithms wield power and authority, the metaphors used to describe big data, and what they reveal about our anxieties and concerns about the phenomenon, big data hubris and rotted data and big data ethical issues.
Data that is digitally generated or stored has been around since the dawn of computing. The term “big data” refers to the massive increase in the quantity of digital data generated as a result of users’ transactions with and content creation via digital media technologies, as well as digital surveillance technologies like CCTV cameras, RFID chips, traffic monitors, and sensors monitoring the natural environment in the modern era.
Digital data objects are not only constantly generated, but they are also extremely detailed, allowing for precise pinpointing of many users’ activities. Smartphones collect information about whom the user calls, what websites and platforms they visit, and what search terms they use.
Through their embedded GPS receivers, compasses, gyroscopes, and accelerometers, they can track their users’ location and body movements.
Because of their ever-increasing volume, a constant state of generation, variety of sites from which they are produced, ability to search within and compare data sets, and potential to link to each other to create new and more detailed data sets, these data are considered ‘bigger’ than other types of data. These characteristics of digital data, it is argued, necessitate new approaches to data storage, processing, and analysis (Boyd and Crawford 2012; Dumbill 2013).
The term ‘big data is becoming increasingly common in the popular press, government reports, and business blogs. From January 2004 to March 2014, I created a Google Trends graph of the frequency of searches for the term “big data” (appropriately enough using a big data tool to research big data).
This indicated that until the end of 2010, the frequency of searches remained low. However, the term began to be searched for more frequently in 2011, and it has steadily increased since then, reaching a peak (at the time of writing) in March 2014. The Google Trends analysis also revealed that Asia had by far the greatest regional interest in big data, as evidenced by Google searches, with India showing the most relative interest, followed by Singapore, South Korea, Taiwan, and Hong Kong.
As individuals, businesses, and government agencies amass more data and recognize its apparent value; frenetic rhetoric has sprung up around the concept of big data. The more data collected and analyzed, the better, it is assumed. Big Data: A Revolution That Will Transform How We Live, Work, and Think, the first book to be published about the potential of big data for a general audience, demonstrates this approach (Mayer-Schonberger and Cukier 2013).
A Critical Sociology of Big Data: Dummies’ Guide to Big Data
The book’s dramatic title reflects the authors’ belief that big data is a transformative phenomenon. Big Data for Dummies (Hurwitz et al. 2013) is also now available to educate lay readers about big data’s uses and potential. A report with a little more heft
Digital data sets, according to the British House of Commons Public Administration Select Committee (2014), contain “unused knowledge that would otherwise go to waste, which can be used to empower citizens, improve public services, and benefit the economy and society as a whole.” The federal government of the United States has also backed open digital data initiatives.
With over 85,000 searchable data sets available, the Data.gov website has been established as a platform for centralizing government data and providing access to these data.
A common assumption in public discourse on big data, according to the editor of the new data science journal Big Data, is “the notion that we might compute our way to better decisions” (Dumbill 2013: 1).
Big data will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus, according to the authors of a report by the McKinsey Global Institute, a research arm of a large global management firm: ‘Leaders in every sector, not just a few data-oriented managers, will have to grapple with the implications of big data,’ they wrote (Manyika et al. 2011: n.p.).
The authors go on to say that big data can make information transparent and usable at a much higher frequency,’ that it can provide more accurate and detailed performance information for organizations that collect and analyze data, that it can ‘help make better management decisions,’ that it can allow for ‘ever-narrower segmentation of customers for more targeted marketing efforts, that it can substantially improve decision-making,’ and that it can ‘improve the next generation of products and services (Manyika et al.2011: n.p.).
Data scientists are now frequently depicted as the newest hot profession in news reports and blogs, and their scarcity is lamented. Data science is, according to the Harvard Business Review, “the sexiest job of the twenty-first century” (Davenport and Patil 2013).
Social media and eCommerce, Big Data Determination
Social media and digital information companies like Facebook, Microsoft, and Google, as well as major retailing companies like Amazon, Target, and Walmart, have paved the way for understanding how the data that users voluntarily share about themselves can be used.
Be used to tailor and customize product development and advertising for users. These businesses are currently constructing massive digital data storage facilities (Lesk 2013).
Axiom, one of the largest database marketing companies, claims to have digital data records on hundreds of millions of Americans culled from a variety of sources. It can create digital profiles based on these data sets that reveal details like a person’s age, gender, ethnicity, or race, the number of children they have, their education level, where they live, the type of car they drive, and so on.
Large banks, credit card issuers, telecom/media companies, and insurance companies are among the companies that Axiom sells data profiles (Marwick 2014). Many retailers now offer customer loyalty programs in which customers are given cards to swipe at the register when paying for their purchases.
The supermarket then stores the information about the purchases and uses it for marketing or sells it to its own customers. Customers are enticed to join by the prospect of receiving discounts or free products if they accumulate enough points. If retailers can connect enough databases, they can market their products to customers in ever more detailed and personalized ways.
Some devices have been developed that can closely monitor and measure human behavior, either for commercial or administrative purposes, as more objects become digitized and smart,’ attached to sensors and connected to the internet. As previously stated, retailers such as Walmart use wi-fi to track the movements of customers in their stores. A growing number of health self-tracking app and platform developers are selling their data to third parties.
Algorithm Authority And Digital Data Assemblages
Sociologists and other media and communication scholars have developed a unique perspective on the big data phenomenon and the algorithms that collect, classify, and process big data. They emphasize that big data is not as objective, complete, or neutral as it is portrayed in popular culture. Big data generation and use are political, social, and cultural processes.
Numbers, in this view, are sociotechnical devices that are inextricably linked to the practices that seek to count the materials they measure (Uprichard 2013; Verran 2012). They are semiotically agential,’ meaning that they are used for specific rhetorical and discursive purposes: ‘the
The real is deeply embedded in and constitutive of the workings of numbers; they lubricate its happening’ (Verran 2012: 112).
To put it another way, numbers can play a role in forming phenomena, bringing them into being, and making sense of them. Numbers are not neutral or objective, despite popular belief, especially when it comes to qualitative sources of knowledge. They are inextricably linked to what is deemed valuable, serving as both a source of value and a measure of it, as well as representing what is deemed valuable to quantify in the first place (West 2014). Digital data objects that are converted to numbers by digital technologies are both products of socio-technical devices and devices in and of themselves, with agency and power.
There is no such thing as “raw” data; in fact, “raw data” is an oxymoron, according to the memorable title of one book on the subject (Gitelman 2013). There are conventions and practices for locating, recording, archiving and categorizing data that are configured by the data itself.
Specific beliefs, judgments, values, and cultural assumptions that ‘cook’ the data from the start, never leaving it in a ‘raw’ state (Baym 2013; Boyd and Crawford 2012; Gitelman and Jackson 2013; Räsänen and Nyce 2013).
Digital data is co-produced or co-authored by those who create the software and devices that elicit and archive it, the coders who create the algorithms in the software, and those who use these technologies rather than pre-existing items of information. Individuals or institutions that archive data have a significant impact on how the data is organized and classified, and thus on how potential users can access and retrieve the data (Beer 2013a).
Big data discourses and practices have also spawned new ways of conceptualizing people and their actions. Indeed, it has been argued that our’ data selves,’ as defined by the data we and others collect on ourselves, represent human subjects. ‘digitized humans’ or ‘data-generating machines’ are two terms for data archives (McFedries 2013).
According to some commentators, this has the effect of not only turning people into data but also encouraging them to see themselves as data assemblages rather than other ways of defining identity and selfhood: ‘We are becoming data… So we need to be able to understand ourselves as data too’ (Watson 2013).
Not only are people portrayed as data-generating objects in these discourses, but they are also portrayed as commodities due to the commercially valuable data that consumers generate. The phrase “you are the product” has become a catchphrase in the digital data economy.
Algorithms are the building blocks of new kinds of selfhood: they give rise to ‘algorithmic identities’ (Cheney-Lippold 2011). The digital data collected on populations is a specific way of assembling specific types of assemblages of individuals or populations from various sources. Algorithms connect disparate data fragments.
Individuals’ actions and interactions are both drawn from and shaped by digital data, which can be used by external agencies to influence or act on individuals, or by individuals themselves to change their behavior in response to the data.
Furthermore, as discussed earlier in this chapter, big data analysis is becoming increasingly important in identifying which behaviors, activities, or outcomes are appropriate or ‘normal,’ and which are outliers. Because of the rhetorical power bestowed upon big data, they are now regarded as arbiters of acceptable and unacceptable practices and behaviors, effectively shaping definitions of ‘normality.’
Anxiety about Big Data
While big data has been praised in many forums, some popular representations have raised concerns. The rhetorical descriptions of big digital data reveal a lot about their current social and cultural meanings. Organic metaphors drawn from the natural world have been used to describe computer technologies since their emergence, according to Thomas (2013) in his book Technobiophilia: Nature and Cyberspace. The web, the cloud, the bug, the virus, the root, the mouse, and the spider have all been used to try to conceptualize and describe these technologies.
These have occasionally resulted in a mash-up of metaphors, such as surfing the web.’ Because of our ambivalence toward these technologies, Thomas claims, we try to make them more “natural” and thus less threatening and alienating.
This approach to naturalizing computer technologies may take a nurturing and beneficial view of nature. Nature, on the other hand, is not always benign: it can be wild, chaotic, and threatening at times, and digital technologies can take on these characteristics as well.
There has long been a metaphorical link between digital technologies and living creatures, including human bodies. I previously discussed how popular cultural representations of computer virus threats in the 1990s depicted personal computers as human entities becoming ill as a result of viral infection.
This metaphor implied that the computer was infiltrated by a malevolent alien invader who was causing problems (Lupton 1994). While the term ‘virus’ has become commonplace in the context of digital technologies, its use underpins our desire to think of computers as living entities similar to ourselves.
In a previous analysis, I argued that computer virus discourses reveal our ambivalence toward computer technologies: our desire to seamlessly integrate them into everyday life and strip them of their alienating meanings as complex machines, but also our awareness of our reliance on them and their technological complexity, which many of us do not comprehend.
Ethics of Big Data
Big data has numerous significant ethical and political implications. The terms’ good data’ and ‘bad data’ are now commonly used to describe the consequences of corporations and government agencies using big data (Lesk 2013).
‘Good data’ benefits commercial enterprises and government agencies, as well as contributes to the development of new technologies.
Important research (such as on medical topics) and assist security and safety measures without harming consumers and citizens or infringing on their privacy or civil liberties (when they are viewed as ‘bad data’).
Concerns about privacy and data security are fueled by discussions of data “deluges” and “tsunamis” – or, less dramatically, the dynamic, multiplying, and interconnected nature of digital data.
Issues. According to estimates, data about the average American is collected in more than 20 different ways, which is twice as many as it was 15 years ago due to the introduction of digital surveillance methods (Angwin and Valentino-Devries 2012).
Private information, such as the addresses of police officers’ homes and whether or not someone has ever been a suspect
Third-party data brokers have sold information from databases about people who have been raped or who have a genetic disease, cancer, or HIV/AIDS. Despite the fact that many digital data sets remove personal information such as names and addresses, the joining of several data sets that include the information
The same people’s details can be used to de-anonymize data (Crawford 2014).
Many app developers upload their data to the cloud, and not all name identifiers are removed from the data. Several companies that have developed self-tracking technologies are now selling their devices and data to businesses as part of a package deal.
‘Wellness programs in the workplace, as well as health insurance companies looking to identify patterns in their clients’ health-related behaviors’ (McCarthy 2013).
Some health insurance companies provide users with the technology to upload their health and medical information to platforms that they have created. People who self-track are collecting data on their own biometrics, which is seen as an opportunity for private companies and government agencies to monitor individuals as part of lowering healthcare costs.
In the United States, health insurance companies and employers have already begun to use self-tracking devices and online websites for the disclosure of health information, as well as topics such as whether or not clients are separated or divorced, their financial status, and whether or not they feel safe.
ork-related stress and the nature of their relationships with others
Coworkers as a way of ‘incentivizing’ people to engage in health-promoting behaviors Those who refuse to participate may face a significant surcharge from their health insurance provider (Dredge 2013; Shahani 2012; Singer 2013).
More seriously, big data has the potential to have a direct impact on people’s freedoms and rights as citizens. Crawford and Schultz (2014) have identified what they refer to as the ‘predictive privacy harms’ that predictive analytics may cause. Big data analytics can operate outside of current legal privacy protections because they rely on metadata rather than data content (Polonetsky and Tene 2013).
Individuals or groups identified by big data predictive analytics and data set crossreferencing may face bias or discrimination as a result of predictive privacy harm. People are rarely aware of how their metadata can be used to reveal their identity, habits, and preferences, as well as their location, by combining previously disparate and previously discrete data sets.
Health status and generate information about them that could affect their employment and/or access to state benefits or insurance (Crawford and Schultz 2014).
There have been concerns raised about the use of digital data for racial and other profiling, which could lead to discrimination, overcriminalization, and other restrictions. It has been argued that the big data era has created a significant policy challenge in determining the best way to use big data to improve health, well-being, security, and law enforcement while remaining cost-effective.
Ensuring that these data uses do not violate people’s rights to privacy, fairness, equality, or freedom of expression (Crawford and Schultz 2014; Laplante 2013; Polonetsky and Tene 2013).
Deborah Lupton, 2015