The state of data generation

The internet is exploding, and content platforms are ground zero. Every day there are 1B stories posted on Facebook, 95M photos and video posted on Instagram, 400k hours streamed on Netflix, 650 million Tweets and 2B videos watched on Twitter. Data produced and consumed through 2022 adds up to 9.4e+22 bytes of data (94T GBs). By 2025, this will be 5.3e+20 bytes per day (~500B GBs/day).

The state of ranking algorithms on content platforms

The success of any platform ultimately is determined by how well the platform can present interesting, relevant, and engaging content to its users. Platforms currently do this by using content filtering and ranking algorithms. These are the algorithms that filter through the set of all content a user could be shown, ranks each piece of content for a user, and determines what to show a user.

Ranking algorithms on content platforms are centralized, i.e., they are controlled by individual platforms. Currently platforms have rigid & inconsistent policies that determine how content is filtered and ranked. Content recommendations are largely passively determined, i.e., recommendations are determined largely by how users interact with and behave on the platform. This assumes that users fully communicate their interests via their interactions with content on the platform. This approach to content ranking has contributed to negative feedback loops for users that are reinforced by cheap dopamine.

Content platforms have also given rise to public concerns about disinformation, election rigging, and social engineering. The Cambridge Analytica scandal in 2016 is a good example, and we’re seeing it playout again today with privacy, free speech, and social engineering concerns on Tik Tok and Twitter.

We’re at a point where many people believe the downsides of social media—cheap dopamine, user exploitation, increased social anxiety—outweigh the positives—personal connection, thought distribution, community building. In a sense, whether intentionally or not, social media has deviated from the mission of enhancing real-world experiences, maintaining and building connections to “how addicted can we make you?” Time spent on a platform has become the target, and when a metric becomes a target, it ceases to be a good metric (Goodhart’s law).

Implication of generative AI

Generative AI is driving the cost of content creation to zero. This will compound the exponential growth of data generation and the internet. When the set of all content on a platform that it can present to a user explodes, effective filtering and ranking algorithms will become even more important to the platform and garner more attention from the public.

Legislative pressure

Since their inception, social media companies have insisted on self-governance. Today, users and legislators are calling for laws instead of platform guideline. We are already seeing this in full-effect—state-required social media literacy education in public schools, lawsuits alleging school districts and students have been harmed by social media’s negative effects on youth mental health, laws requiring audits of AI systems used for hiring.

This trend will continue—we can expect increased demand for broad legislative measures that provide higher levels of transparency into algorithmic bias, safeguards for users, and accountability among platforms. I imagine legislators will even call these bills something like the Algorithmic Transparency Act, or the Platform Responsibility Bill.

A path forward

We should provide users with a higher degree of control over the algorithms that filter and rank content. Rather than having algorithms that are passively trained on user activity, allow users to directly embed their values and interest into ranking algorithms. This offers several benefits to both users and platforms. Users can actively shape the content they are recommended and ensure that it’s aligned with their preferences, and platforms can demonstrate their commitment to responsible AI and transparency.

By giving users the ability to parameterize content recommendations, platforms can help to ensure that their algorithms are not solely driven by user activity, but also reflect the values and interests of their users. Users will have more personalized online experiences while platforms will reduce the risk of exposing users to content found inappropriate or irrelevant.

By linking recommendations directly to explicitly defined user interests, transparency is embedded into the recommendation algorithms. This elevates user accountability for the content they receive, allowing platforms to partly transfer responsibility to users. This results in a more personalized and accountable online experience for users and a transparent recommendation system.