You may not have heard the phrase “Data-Intensive Technology” before this blog post. You wouldn’t be alone – I stumbled across this phrase for the first time while doing research for my first blog post, “American Public Libraries and Data Privacy.” But as I learned more about DITs (Data-Intensive Technologies), it became clear that I interact with them daily. You probably do, too. 

DIT is an umbrella term for applications and systems that can “process, analyze, and manage large data sets” (Paradkar). But a given technology is not necessarily data-intensive just because it collects a lot of data -- it only becomes data-intensive when the data it collects is “constantly changing and needs to be processed in real time” (Firebolt).  

DITs are pervasive. They are used by businesses, governments, healthcare providers, and a vast variety of other kinds of organizations to make decisions which can impact individuals, communities, laws and regulations, and bottom lines.  

So, let’s look at two DITs that you may interact with in your day-to-day, to get a sense of how DITs interact with your privacy: Artificial Intelligence, and social media. 

Artificial Intelligence 

I have generally run into two schools of thought when I talk about Artificial Intelligence (AI). The first approach can be appropriately described as “Run Away Screaming.” The second might be called “This Thing Sure Is Neat.” Whatever your opinion may be about AI, the fact remains: It’s an incredibly data-intensive system. 

AI has existed in some form or another since the 1950s, but there’s been a recent surge of interest in generative AI, which “can create stories, essays, images, audio, video, and more by learning from existing content” (Collins). All AI systems “rely on huge amounts of data, including sensitive personal data, to train algorithms and enhance performance” (Thomson).  

AI is an interesting case study in privacy, because there are two ways in which this DIT might be used to violate your privacy: with data you give it, and data you don’t. When you use generative AI, any information that you give it during your use of it is stored to be used later. If you love pulling up Chat GPT just to have a chat with it, remember that any information that you reveal about yourself, your beliefs, your work, etc., is used to train that AI and make it better at its task. But even if you've been avoiding using AI, Artificial Intelligences may still have access to your personally identifying information. That’s because training a generative AI takes a lot of data, and much of that data is scraped from the web.  

Generative Artificial Intelligence is being used in decision-making processes in all kinds of fields. For example, Edina, Minnesota’s parks and recreation department uses a platform called Placer.ai, “which uses location analytics to provide data on the movement of people, such as where they travel to/from and how long they stay at a given location,” using cellphone data (Collins). While the data is seen in aggregate to “maintain anonymity,” there is still concern over the general lack of policy and norms protecting park users from invasion of privacy (Collins). 

AI is also starting to appear in healthcare settings, where it is being used to strengthen clinical trials, improve medical diagnosis, treatment, and supplement health care professionals’ knowledge and skills (WHO). However, “the use of AI systems in healthcare is directly associated with access to individuals’ health data” (Uygun). If these AI systems were hosted internally, there might be fewer privacy risks. But that’s not the case – AIs are generally created and owned by corporate entities, and healthcare providers run the risk of patient data being commercialized by AI companies, who retain access to the data the AI is being asked to analyze (Uygun). 

These are just two examples of the countless environments in which AI utilizes the data of people who aren’t interacting with the software directly. AI programs like these analyze constantly-updated data sets, which may contain private information, to inform decisions in a variety of spheres of public life. Even if you’ve never used ChatGPT or Gemini, for example, your data might be fed into an AI system to train it, to answer questions about you as an individual, or to gather insight into behaviors of people like you.  

Social Media 

This section may very well be called “DITs You’ve Definitely Used at Some Point.” Social media and communications applications are becoming ever more data-intensive. “Companies like Google, X, and Meta collect vast amounts of user data, in part to better understand and improve their platforms, but largely to be able to sell targeted advertising” (Elliott). This data can be used in aggregate to determine what people are generally interested in purchasing, or how users are generally engaging with a platform. But it is often used on an individual level, utilizing data like a user’s sexuality, race, age, political beliefs, ethnicity, or other identifiers, to sell products and services (Elliott). This is because many social media and communications platforms monetize your personal engagement with their technology.  

In most cases, social media services don’t cost money. Instead, these companies “make money through monetizing the engagement of their users” (Richards). When you use one of these applications, your data is collected. That data is then used to show you content that aligns with your interests, including specifically targeted advertisements. These advertisements are what provides the businesses with revenue. The app or website is monetarily free for you to use, but you’re paying with your data. 

Engagement-based business models like this are becoming more standard. American legal scholar Jack Balkin has called this the “grand bargain;” social media and communications services are provided “in exchange for participation in a regime of fine-grained surveillance of your activity, desires, and psychological pressure points” (Richards). As technology improves and is better able to process and analyze ever more and bigger data sets, this surveillance data gathered by social media companies is used more and more specifically to advertise to you. 

As these engagement-based applications and websites become more and more pervasive in our everyday lives, it can be incredibly difficult, if not impossible, to opt out of data collection (Elliott). Privacy policies and Terms of Use Agreements for these companies “remain complicated and vague, and many users don’t have the time or knowledge of legalese to parse through them” (Elliott). It’s become a pop-culture in-joke that no one reads or understands Terms of Use agreements, yet contract law assumes the opposite, and “because legal doctrines apply to online contracts, consumers routinely find themselves legally bound to contracts they have not – and often could not – read” (Samples). Even dedicated Terms of Use-readers can sign up for a service or platform by agreeing to privacy policies or Terms of Use which then change over time (Elliott).  

Conclusion: You Care About Privacy, So What Do You Do Now? 

Data-Intensive Technologies can seem scary. If you read this blog post to this point and thought, “This sure looks grim,” you’re not alone. But you also aren’t alone in wanting to protect your data. This week is National Data Privacy Week, sponsored by the National Cybersecurity Alliance, and the 2025 theme is, aptly, Take Control of Your Data.  

There are ways to do just that – take control of your data and retain your privacy. 

When it comes to Data-Intensive Technologies, typical data privacy recommendations like strong passwords, multifactor authentication, and phishing awareness won’t address the issue (although they are all great practices).  

Instead, make sure you understand the “bargain” that you are making with these companies for your convenience. Is using the DIT worth the data it will gather about you? Make sure that you make this decision with as much information as possible by reading (or at least skimming) the fine print. 

 Also consider whether or not you can control how much data the DIT gathers from you by adjusting your privacy settings. Many DITs have the option to manage your privacy settings, and it is best to opt-out of data collection when you can.  

Beyond these steps, practice general data safety. Be selective with the personal data you share online. Consider using privacy tools like VPNs, ad blockers, and privacy-focused browsers to generally reduce your digital footprint. Finally, stay up to date on data privacy laws and regulations as you are able. Knowing your privacy rights allows you to advocate for stronger privacy protection from the services and technologies you use. 

Bibliography and Further Reading 

Cirrito, Chris. 2024. “An Introduction to the Ethical Use of Artificial Intelligence in Corrections.” Corrections Today 86 (2): 56–57. 

https://research.ebsco.com/linkprocessor/plink?id=7ef3a56b-ff7a-3cf7-a29e- 0dc4bb94212a. 

 

Collins, Lindsay. 2024. “Here and Now: How the Rise of Artificial Intelligence Will Impact the Field of Parks and Recreation.” Parks & Recreation 58 (13): 36–41. 

Paradkar, Sameer. “Data-Intensive Applications  --  Part 1.” Medium, September 27, 2024. https://medium.com/oolooroo/data-intensive-applications-part-1-87d9b46aa2b9. 

Richards, Neil, and Woodrow Hertzog. “Against Engagement.” Boston University Law Review 104, no. 1151 (2024). https://scholarship.law.bu.edu/faculty_scholarship/3818 

Samples, Tim, Katherine Ireland, and Caroline Kraczon. “TL;DR: The Law and Linguistics of Social Platform Terms-of-Use.” Berkeley Technology Law Journal 39, no. 47 (2024). 

Thomson, Lucy L., and Trooper Sanders. 2024. “Human Rights Challenges with Artificial Intelligence.” Human Rights 49 (4): 24–25. 

Uygun İli̇khan, Sevil, Mahmut Özer, Hande Tanberkan, and Veysel Bozkurt. 2024. “How to Mitigate the Risks of Deployment of Artificial Intelligence in Medicine?” Turkish Journal of Medical Sciences 54 (3): 483–92. doi:10.55730/1300-0144.5814. 

Elliott, Vittoria. “The New Era of Social Media Looks as Bad for Privacy as the Last One.” Wired, November 1, 2023. https://www.wired.com/story/x-alternatives-user-privacy-report/. 

“What Is a Data-Intensive Application?” Firebolt Glossary. Accessed January 8, 2025. https://www.firebolt.io/glossary-items/data-intensive-application. 

“WHO Outlines Considerations for Regulation of Artificial Intelligence for Health.” World Health Organization. Accessed January 8, 2025. https://www.who.int/news/item/19-10-2023-who-outlines-considerations-for-regulation-of-artificial-intelligence-for-health.