Research · Qatar

Arabic social media index

A dialect-aware index of Arabic social media for a leading policy research institute — one hundred million posts across five networks, queryable in near real time.

100M+ Arabic posts ingested and indexed

5 networks unified into one corpus

~10× richer granularity than public APIs

70% research workload reduction

The challenge

What was in the way.

Platform APIs are rate-limited or paywalled, and Arabic content is under-represented in most open corpora.
Researchers manually scraped posts or relied on periodic surveys that capture sentiment only in snapshots.
Posts are frequently deleted or made private, leaving gaps in longitudinal studies.

What we built

The system, in brief.

Multi-platform collection

A crawler mesh across five networks

Dialect-aware parsing

LLM-guided parsers detect Arabic dialects and extract text

Vector enrichment

Sentiment

Researcher access

A query API and dashboards — keyword

Compliance layer

Personal data anonymized

Outcomes

What changed.

One hundred million Arabic posts indexed — historical backfill plus ongoing collection.
A unified five-network corpus with roughly ten times the granularity of public APIs.
The first Arabic index offering near-real-time sentiment and topic scores for policy and academic use.
Ad-hoc scraping and manual cleaning eliminated — research workload down about 70%.

Client referenced by sector and country · detailed references on request

Have a workflow like this one?

Book a discovery call