How language is hiding the real internet from you
Most of the internet is out of your reach, but the barrier isn't just algorithms. In another language, the same platforms turn into whole other worlds.
When you go online, it feels like you're accessing all the world's information. But you form social media relationships based on shared language. You search Google with the language you think in. And algorithms built to maximise attention have no reason to recommend what you won't understand. So, most of the internet remains out of sight, on the other side of a language filter – and you're missing far more than content.
Most internet activity is concentrated on a small number of large platforms, and from our linguistically siloed perspectives, it's easy to assume that everyone uses them in similar ways. But why should that be true? We expect music, literature and cuisine to vary between cultures, after all, so why not the internet?
In an upcoming paper, our team at the University of Massachusetts Amherst's Initiative for Digital Public Infrastructure has uncovered stark differences in how different cultures harness the internet. With more research, it may reshape how we think about the services that dominate the web. We're only just beginning to understand the implications.
The history of the internet offers some examples. Take the Russian social media/blogging platform LiveJournal. When it was popular in the mid-2000s, English-speaking users knew it as a space for young people to share their feelings or geek out about Harry Potter. But if you're a Russian speaker, you probably know LiveJournal very differently – as an important site of public intellectualism and political discourse, playing a rare role in hosting voices from the opposition.
With the biggest technology companies based in the US, a cultural blind spot has emerged where we often assume that the English internet is representative of the rest of the world. Research about YouTube in particular has a significant English-speaking bias – typically written in English, published in English-speaking countries and focused on English-language videos.
Comment & analysis
Ryan McGrady is a senior research fellow at the University of Massachusetts Amherst's Initiative for Digital Public Infrastructure.
The internet's leading platforms are more difficult to study than you might think. Computers can blaze through text, but video is harder to parse at scale. Platforms like YouTube, the world's most popular video service, don't offer tools to create the large representative samples necessary to understand the platform as a whole, or big swaths of it like linguistic communities.
As a result, YouTube is often understood through the easily accessible tip of the iceberg: its most popular videos. Between the language bias and this popularity bias, when users, creators, academics, educators, parents, teachers and even policymakers talk about platforms like YouTube, we're typically just talking about the part that's most visible to us – a small, unrepresentative piece of it. (For more, read Thomas Germain's story on the hidden world beneath the shadows of YouTube's algorithm.)
So, how do you study what's under the surface? A couple years ago, we came up with a way to do what YouTube's tools couldn't: we © BBC
