Big Tech CEOs have become a regular sight on Capitol Hill, called in time and time again to testify before Congress about their misinformation practices and policies. These debates often revolve around what’s been called the “false take-down/leave-up binary,” where the central question is whether platforms should allow misleading (however that’s defined) content on their platforms or not. A quick scroll through platform policies, however, will reveal a variety of intervention tactics beyond simple removal, including labeling, downranking, and information panels. When this range of approaches to misinformation is considered, far more fundamental questions arise: When should each of these approaches be used, why, and who gets to decide?
To date, there has been no public resource to understand and interrogate the landscape of options that might be used by platforms to act on misinformation. At the Partnership on AI (PAI), we have heard from our Partner community across civil society, academia, and industry that one obstacle to understanding what is and isn’t working is the difficulty of comparing what platforms are doing in the first place. This blog post is presented as a resource for doing just that. Building on our previous research on labeling manipulated and AI-generated media, we now turn our attention to identifiable patterns in the variety of tactics, or “interventions,” used to classify and act on information credibility across platforms. This can then provide a broader view of the intervention landscape and help us assess what’s working.
In this post, we will look at several interventions: labeling, downranking, removal, and other external approaches. Within these interventions, we will look at patterns in what type of content they’re applied to and where the information for the intervention is coming from. Finally, we will turn to platform policies and transparency reports to understand what information is available about the impact of these interventions and the motivations behind them.
This post is intended as a first step, providing common language and reference points for the intervention options that have been used to address false and misleading information. Given the breadth of platforms and examples, we recognize that our references are far from comprehensive, and not all fields are complete. With that in mind, we invite readers to explore and add to our public database with additional resources to include. As a result of our collective work, platforms and policymakers can learn from these themes to design more informed and valuable interventions in the future and better debate what it means for an intervention to be valuable in the first place.
Our process for defining interventions
It seems like every other day a platform announces a new design to fight misinformation, so how did we decide on which interventions to categorize? We started by comparing a non-comprehensive subset of several dozen interventions (and counting) on top social media and messaging platforms for an initial categorization of intervention patterns, based on usage statistics. We also included some data about other platforms with lower usage statistics, including Twitter, due to its prominent interventions and interest amongst the Partnership on AI Partners in our AI and Media Integrity Program.
We included any intervention we found that was related to improving the overall credibility of information on a platform. That means the focus of interventions is not always limited to “misinformation” (the inadvertent sharing of false information) but also disinformation (the deliberate creation and sharing of information known to be false), as well as more general approaches that aim to amplify credible information. Note that public documentation of these interventions varies widely, and in some cases may be outdated or incomplete. In general, we based our intervention findings on conversations with PAI Partners, available press releases, platform product blogs, and external press coverage. If you see something to add or correct, let us know in the submission form for our intervention database!
In order to organize the patterns across interventions, we classified them according to three characteristics: 1) type of intervention, 2) element being targeted, and 3) the source of information for the intervention. These characteristics emerged as we noted the key differences between each intervention. Apart from the surface design features of the intervention, we realized it was key to address what aspect of a platform the design is applied to (for example, labeling on individual posts vs. accounts) as well as where the information was coming from, or the source. We’ll walk through these high-level patterns, summarized in the table below:
1. What is the type of intervention?
There are many ways platforms might intervene on posts that they classify as false or misleading (more on the complexities of such classifications in part three). You might already be familiar with some tactics, such as fact-checking labels or the removal of posts and accounts. Others, like downranking posts to make them appear less often in social media feeds, you might not think of, or even be aware of. We refer to these various approaches as “interventions,” or “intervention types,” as the high-level types of approaches employed by platforms.
Note that the visual and interaction design of interventions can vary widely, even for interventions of similar types (e.g. veracity labels on Facebook compared to those on Twitter feature different terminology, colors, shapes, and positions). In this post we focus on general approaches, rather than comparing specific design choices within types.
Labels are one of the more noticeable and varied types of interventions, especially as platforms like Facebook have ramped up to label millions of posts related to COVID-19 since 2020. We define labels as any kind of partial or full overlay on a piece of content that is applied by platforms to communicate information credibility to users.
Informational banner on TikTok about COVID-19 vaccines.
However, labels are far from alike in their design: in particular, we differentiate between credibility labels and contextual labels. Credibility labels provide explicit information about credibility, including factual corrections (for example, a “false” label, also known as “veracity label” in a review by Morrow and colleagues in 2021). Contextual labels, on the other hand, simply provide more related information without making any explicit claim or judgement about the credibility of the content being labeled (for example, TikTok detects videos with words related to COVID-19 vaccines and applies a generic informational banner to “Learn more about COVID-19 vaccines”).
Beyond this, label designs can vary in other crucial ways, such as the extent to which they create friction for a user or cover a piece of content. Labels may be a small tag added alongside content or may make it more difficult to open the content. Each choice may have profound implications for how any given user will react to that content. For a more thorough discussion of the tradeoffs in design choices around labeling posts, you can check out our 12 Principles for Labeling Manipulated Media.
Platforms with user-generated content, such as Facebook and TikTok, use various signals to rank what and how content appears to users. The same ranking infrastructure used to enhance user engagement has also been used to prioritize content based on credibility signals. For example, Facebook has used a “news ecosystem quality” (NEQ) score to uprank certain news sources over others. Conversely, downranking can reduce the number of times content appears in other users’ social media feeds, often algorithmically. For example, Facebook downranks “exaggerated or sensational health claims, as well as those trying to sell products or services based on health-related claims.” At the extreme end of this spectrum, content may even be downranked to 0, or “no ranking,” meaning content will not be taken off of a platform, but it will not be algorithmically delivered to other users in their feeds. The ranking scores of any given content remains an opaque process across platforms, thus it is hard to point to examples that had a low ranking (that is, were “downranked”) vs. those with no ranking.
Removal is perhaps the most self-explanatory—and often most controversial—approach. We define removal as the temporary or permanent removal of any type of content on a platform. For example, Facebook, YouTube, and others removed all detected instances of “Plandemic,” a COVID-19 conspiracy theory video, from their platforms in May 2020.
Other (e.g. digital literacy campaigns)
Though labeling, downranking, and removal are the most prevalent types of approaches, platforms also employ other methods related to promoting digital literacy and reducing conflict in relationships. We’ll discuss more specific examples in the next section.
2. What element is being targeted?
While a lot of attention has been given to platform actions on individual posts, interventions act on a lot different levels. To understand the intervention landscape, it’s worth knowing and considering what element on a platform is being targeted. In assessing interventions, we found that different approaches act on different scopes of content, including posts, accounts, across feeds, and external efforts.
Post-level interventions are arguably the most visible and salient to users, as platforms indicate that specific posts of interest have been flagged and removed. This sometimes seems to trigger a “Streisand effect” in which the flagged posts receive additional attention for having been flagged. (This is especially true when the poster is a prominent public figure, such as former President Donald Trump.) In addition to credibility labels with explicit corrections such as Facebook and Instagram “false information” ratings, interventions on posts can also include contextual labels that simply provide more information, such as TikTok’s labels on posts tagged with vaccine information encouraging users to “Learn more about the COVID-19 vaccine.”
Additionally, some post-level interventions like downranking are by definition less visible, as posts classified to be downranked, for example on Facebook due to exaggerated health claims, are distributed less on social media feeds. In these cases, users may only suspect that an intervention has taken place without being able to confirm this. Finally, post-level interventions also include sharing or engagement restrictions, such as WhatsApp’s limits on sharing messages more than five times.
In many cases, these post-level interventions may be done in tandem with each other. For example, when Facebook adds a fact-checking label, the post is also downranked, and when Twitter labeled certain Trump tweets following the 2020 election for containing misleading information, liking and sharing was also prohibited.
Accounts and Groups
Account/group interventions target a specific user or group of users. When labeled, they are typically contextual in nature, offering identity verification according to platform-specific processes, or else surfacing relevant information about an account or group’s origin, such as the account’s country or if it is state-sponsored.
Accounts and groups are also subject to downranking and removal. Sometimes this is temporary or conditional until certain changes are made, such as the deletion of an offending post. Other times it is permanent. For example, platforms like Twitter have released guidelines detailing different account actions taken according to a five-strike system.
Five strikes leading to permanent suspension on Twitter.
Instead of targeting individual posts or accounts, some interventions affect an entire platform ecosystem. Examples of feed-level interventions include the “shadow banning” of certain tags, keywords, or accounts across a platform, preventing search. It is not always clear what feed-level actions are taking place, leading to widespread suspicion and speculations of bias—for example the debunked idea that conservative accounts and keywords are systematically downranked and banned across platforms like Facebook for ideological reasons.
Users on Pinterest who searched terms like “2020 Census” saw a banner linking to the U.S. Census Bureau’s website.
There are feed-level labels as well, such as information hubs and information panels that are displayed prominently on platforms without being attached to particular posts. The banners shown on Twitter, Facebook, Instagram ahead of the 2020 U.S. elections, which linked to election resources, are one prominent example. Other feed-level labels only appear when triggered by search. These can take the form of both credibility and contextual labels. Google, for example, highlights a fact-check if a query matches a fact-check in the ClaimReview database. And on Pinterest, merely searching for a keyword related to a misinformation-prone topic like “census” results in a banner linking to additional information.
Finally, in some cases, platforms don’t depend on labels, removal, or ranking, and instead aim to promote digital literacy education either using embedded digital literacy educators and fact-checkers or outside of a platform environment entirely. This tactic is particularly useful in closed messaging environments where content can’t be easily monitored for privacy reasons. For this reason, platforms like WhatsApp have announced funding for seven fact-checking organizations groups to embed themselves in groups and find other relational approaches to promote credibility. In other cases, the intervention involves direct support of partner sources identified by a platform as credible to create ads or other content to be amplified to users.
3. What are the intervention sources?
In making intervention decisions, platforms must decide what to intervene on. They currently rely on a variety of sources to both identify the need to intervene and provide what they consider authoritative information. We refer to these actors and institutions as “intervention sources” and in many ways, the quality of an intervention can only be as good or trustworthy as its source, regardless of other design factors. These intervention sources include different systems, both human and algorithmic. Below we describe sources including crowds, fact-checkers, authoritative sources, and user metadata.
Very few crowd-based rating systems for misinformation currently exist publicly. In 2021, Twitter released Birdwatch in beta. The platform allows users to add notes with ratings about the credibility of posts. Others may then rate these notes, and the most “helpful” notes are surfaced first.
An example “note” on the Birdwatch platform.
An early study from Poynter observed very low engagement with the feature, as well as evidence of politicized notes. Indeed, ensuring quality of notes and preventing organized gaming of ratings by motivated political actors remains a challenge for any crowd-based intervention at scale—a challenge that Twitter itself is attentive to.