Data Access

Social media research is powered by data, but understanding which data is available and how to access it can be a challenging task for researchers. Our Data Access Resource simplifies this process by outlining the types of data offered by each platform, their vetting procedures, and how these connect to the data access obligations under the Digital Services Act.

 

DATA ACCESS JOURNEY

 

 

 

 

Social media platforms play a pivotal role in shaping public discourse, whether during electoral campaigns or events that impact citizens' lives. To analyse these dynamics and understand the internet's influence on society, researchers need reliable data. But how can they access it?

Under Article 40(12) of the Digital Services Act (DSA), very large online platforms (VLOPs) and search engines (VLOSEs) must provide researchers, including those from non-profits, with real-time access to publicly available data without undue delay. Previously, platforms like Meta offered such access voluntarily through tools like the now-defunct CrowdTangle, but the DSA has made it a mandatory obligation with certain standards. This change has also influenced smaller platforms, many of which now provide dedicated programmes to facilitate data access for researchers.

 

Accessing social media data often involves navigating platform-specific criteria and vetting processes. Researchers may also need to use technologies such as APIs, user interfaces, or Secure Processing Environments (SPEs). Only after meeting these requirements and receiving approval can they access the data for in-depth, systematic analysis.

 

DATA ACCESS OVERVIEW

 

 

 

 

 

The table above provides information about possible points of access that researchers can consider when designing their research. In 2024, we expanded our analysis to include new platforms, such as Threads and Bluesky—the latter experiencing significant growth in users. Reflecting the changes introduced by the DSA, this year’s focus is exclusively on data access programmes designed specifically for researchers, excluding APIs meant for developers or commercial purposes.

APIs remain the most common method for platforms to provide research access. However, other data access mechanisms like User Interfaces and limited scraping are also offered.

 

Most platforms provide data access without location-based restrictions, covering all countries and regions. Some exceptions exist, though. Platforms that grant access solely to meet DSA requirements, such as LinkedIn and X, may limit access to research focused on systemic risks within the European Union.

 

WHAT DATA ARE ACCESSIBLE FOR RESEARCHERS?

 

 

 

 

 

In 2024, we took a closer look at the data points social media platforms offer to researchers. Using API codebooks— the documentation where platforms outline what data they provide—we gained insight into what’s available. Because our analysis is based on what platforms claim to offer, we cannot verify the actual quality or consistency of the data researchers receive after gaining access.

The good news? Many platforms are expanding the range of data points and metadata they make available. LinkedIn stands out as a notable exception, offering significantly less detailed documentation.

 

That said, important gaps persist. Key features like shorts, reels, and stories—hugely popular with users—are still off-limits for researchers. These are critical components of today’s social media landscape, and platforms should prioritise making them accessible. Additionally, some platforms do not include URLs for posts or comments, making it difficult for researchers to verify API data against what’s visible on the platform itself.

 

Access methods also present challenges. While platforms often allow researchers to view posts and comments, downloading this data is sometimes restricted. Access to richer media, like photos and videos, is often limited to secure processing environments (cleanrooms), where data is deleted regularly — making replicability nearly impossible. Some platforms also require VPN connections or track researcher queries, raising valid concerns about surveillance.

 

AN INSIDE LOOK AT VLOPS VETTING PROCESSES

 

 

 

The table above highlights key points about VLOPs vetting processes that researchers should consider before applying to data access. Overall, platforms have made the vetting process for researchers more demanding, a shift from tools like CrowdTangle which once allowed for simple applications.

Transparency Reports under the Code of Practice on Disinformation and the platform audit reports offer a slightly clearer understanding of these processes. These reports reveal how many researchers apply for access, how many are approved, and what criteria and decision times platforms use to assess applications.

 

DATA ACCESS SCORE

 

 

 

 

Building on our earlier analyses, we developed a methodology to evaluate platforms based on two key factors: the data points they make available and the vetting processes researchers must navigate. While platforms are making strides in offering more data points, the vetting process has become noticeably more restrictive and complex.

 

Another critical issue is the quality and consistency of the data provided, which can only be assessed once researchers gain access. While this was beyond the scope of our current analysis, it remains a vital area for future investigation.




 
More resources about data access:
 

 

The Data Access Problem: Limitations on Access to Public Data on VLOPs

See resource

Decoding Access to Social Media Data: Insights from the CoP Compliance Report

See resource

The DSA must ensure public data for public interest research

See resource

What the Scientific Community Needs from Data Access under Art. 40 DSA

See resource

DRI’s Feedback to the Delegated Regulation on Data Access

See resource

Access Granted: Why the European Commission Should Issue Guidance on Access to Publicly Available Data Now

See resource