Research · Qatar

Arabic social media index

A dialect-aware index of Arabic social media for a leading policy research institute — one hundred million posts across five networks, queryable in near real time.

100M+ Arabic posts ingested and indexed
5 networks unified into one corpus
~10× richer granularity than public APIs
70% research workload reduction

The challenge

What was in the way.

  • Platform APIs are rate-limited or paywalled, and Arabic content is under-represented in most open corpora.
  • Researchers manually scraped posts or relied on periodic surveys that capture sentiment only in snapshots.
  • Posts are frequently deleted or made private, leaving gaps in longitudinal studies.

What we built

The system, in brief.

Multi-platform collection

A crawler mesh across five networks

Dialect-aware parsing

LLM-guided parsers detect Arabic dialects and extract text

Vector enrichment

Sentiment

Researcher access

A query API and dashboards — keyword

Compliance layer

Personal data anonymized

Outcomes

What changed.

  • One hundred million Arabic posts indexed — historical backfill plus ongoing collection.
  • A unified five-network corpus with roughly ten times the granularity of public APIs.
  • The first Arabic index offering near-real-time sentiment and topic scores for policy and academic use.
  • Ad-hoc scraping and manual cleaning eliminated — research workload down about 70%.

Client referenced by sector and country · detailed references on request