The Opt Out: The rewards and risks of lying to tech companies
DIY data poisoning can feel subversive, but is it?
ALGORITHMS are what they eat. These intricate pieces of code need nourishment to thrive and do accurate work, and when they don’t get enough bytes of good-quality data, they struggle and fail.
I encountered a malnourished algorithm when I looked at my 2022 Spotify Wrapped and saw my favorite artist was Peppa Pig. I frowned, befuddled. Why did Spotify think the cartoon piglet was my latest obsession? Then I remembered I’d spent a week with my 2-year-old niece over the summer, and how playing Peppa Pig songs on my phone was the only way to keep her entertained.
Well, that made more sense.
But I soon realized that the little porker had mucked up even more than my year in review: My recommendation algorithm was a mess as well. For weeks, at least one out of the four Daily Mix playlists the platform put together for me included compilations of music for kids.
It was annoying, but I wondered if, maybe, my niece’s obsession was actually a useful way to deal a staggering blow to the detailed profile tech companies have on each of us. After all, if Spotify, Instagram, Google, or any other platform thinks I’m someone I’m not, they’ll show me ads that are relevant to that fake version of me—but not to the real me. And if they happen to provide my data to a third party, like a data broker, they’ll be handing them details describing someone who doesn’t exist, with my true likes and interests buried in a mountain of Peppa Pig songs. Weaponizing this mistaken identity can help us hide in plain sight and, by extension, protect our privacy.
A camouflage suit made out of bad data
Feeding the algorithms in your life bad data is called data poisoning or obfuscation, and it’s a technique that aims to obscure your true identity by generating a large quantity of inaccurate information. The concept refers to synchronized attacks that deliberately seek to erase or alter the datasets fueling a platform’s algorithms to make them underperform and fail. This requires specific skills and know-how, as well as a lot of computing power.
You may not have any of those things, but you can use the same principle to protect yourself from constant surveillance online. The images you see, the posts you like, the videos you play, the songs you listen to, and the places where you check in—that’s all data that platforms collect and use to build a profile of who you are. Their goal is to understand you as much as possible (better than you know yourself) so they can predict what you’ll want and need. Tech companies and advertisers don’t do this for altruistic reasons, of course, but to show us ads that they hope will manipulate us into spending money—or make us feel or vote a certain way.
The easiest way to engage in data poisoning is to use a name, gender, location, and date of birth that is not yours when you sign up for a service. To advance beyond that baseline, you can like posts you don’t actually like, randomly click on ads that don’t interest you, or play content (videos, music, movies, etc.) that’s not to your taste. For the last of those options, just press play on whatever platform you’re using, turn off your screen, turn down the volume, and let it run overnight. If you want to throw off YouTube, use the autoplay feature and let the site go deep down a rabbit hole of content for hours and hours while you sleep or work. Finally, whenever you have to answer a question, like why you’re returning an item you bought online, use “other” as your default response and write whatever you want as a reason.
Where data poisoning can fail
If this all sounds too simple, you’re right—there are some caveats. Using fake information when you sign up for something might be pointless if the platform builds and refines your profile by aggregating numerous data points. For example, if you say you’re in California but consume local news from Wisconsin, list your workplace in Milwaukee, and tag a photo of yourself on the shore of Lake Michigan, the platform’s baseline assumption that you live in the Golden State won’t matter much. The same thing will happen if you say you were born in 1920, but you like content and hashtags typically associated with Generation Z. Let’s face it—it’s totally plausible for an 82-year-old to be a huge Blackpink fan, but it’s not terribly likely. And then there’s the risk that a service or site will require you to provide real identification if you ever get locked out or hacked.
Playing content that doesn’t interest you while you sleep may throw off the recommendation algorithms on whatever platform you’re using, but doing so will also require resources you may not have at your disposal. You’ll need a device consuming electricity for hours on end, and an uncapped internet connection fast enough to stream whatever comes through the tubes. Messing with the algorithms also messes up your user experience. If you depend on Netflix to tell you what you watch next or Instagram to keep you updated on emerging fashion trends, you’re not likely to enjoy what shows up if the platform doesn’t actually know what you’re interested in. It could even ruin the entire app for you—just think what would happen if you started swiping left and rejecting all the people you actually liked on a dating app.
Also, just as eating one salad doesn’t make you healthy, your data poisoning schemes must be constant to make a long-lasting impression. It’s not enough to click on a couple of uninteresting ads here and there and hope that’s enough to throw off the algorithm—you need to do it repeatedly to reinforce that aspect of your fake profile. You’ve probably noticed that after browsing an online store and seeing the brand or product you were interested in plastered on every website you visited afterward, the ads were eventually replaced by others. That’s because online ads are cyclical, which makes sense, as human interest comes and goes.
But the biggest caveat of all is uncertainty—we just don’t know how much damage we’re doing to the data tech companies and advertisers are collecting from us. Studies suggest that poisoning a minimal amount of data (1 to 3 percent) can significantly affect the performance of an algorithm that’s trying to figure out what you like. This means that even clicking on a small percentage of uninteresting ads might prompt an algorithm to put you in the wrong category and assume, for example, that you’re a parent when you’re not. But these are only estimates. The engineers behind Google, Facebook, and other big online platforms are constantly updating their algorithms, making them an ever-moving target. Not to mention this code is proprietary, so the only people who know for sure how effective data poisoning is are working for these companies, and it’s highly unlikely they would reveal their vulnerability to this technique. In the case of Google’s AdSense, for example, advertisers pay per click, and if they knew their money was paying for fake clicks (even just a few), it could jeopardize Google’s authority to reach audiences and sell products.
Does any of this matter?
Not knowing whether poisoning your data is actually doing anything to protect your privacy might make you think there’s no point in trying. But not all is lost. Anecdotal evidence—my Spotify Wrapped, YouTube’s sometimes wacky recommendations, Netflix’s occasionally baffling genre suggestions, and ads that think you’re interested in buying a product because you clicked on something accidentally—makes it clear that platforms are not immune to our white lies, and bad data is not innocuous. There’s also a very telling experiment by privacy researchers Helen Nissenbaum and Lee McGuigan at Cornell Tech, that proved AdNauseam, an extension banned from the Chrome Store that automatically clicks on all ads on a page to throw off Google’s profiling algorithm, is effective and that the Big G cannot tell the difference between real and fake clicks.
Maybe you need to read this to believe it, but we don’t need to comply with everything online platforms ask of us. Data poisoning is neither dishonest nor unethical. It’s us users reclaiming our information in any way we can. As Jon Callas, a computer security expert with the Electronic Frontier Foundation told me, we have no moral obligation to answer questions tech companies have no right to ask. They’re already accumulating thousands of data points on each and every one of us—why help them?
At the end of the day, it doesn’t matter whether data poisoning is highly or barely effective. We know it does something. And at a time when companies don’t have our best interests at heart and regulation is light years behind thanks to the billions of dollars tech companies spend lobbying elected officials, we the users are on our own. We might as well use every strategy we can to protect ourselves from constant surveillance.
Read more PopSci+ stories.