Methodology · GeoMemo

The corpus

We ingest continuously from a few hundred sources. As of this moment, the corpus stands at:

252,483

articles

162,176

events extracted

304

reports published

9,595

authors tracked

3,218

publications scored

languages covered

Language coverage spans the usual English-tier plus Arabic, Russian, Mandarin, Spanish, French, Portuguese, German, Hindi, Urdu, Persian, Turkish, Indonesian, Swahili, Korean, Japanese, Vietnamese, Bengali, Tamil, and a long tail of regional dailies. The 47-language figure is the count of ISO-639 codes we have active ingestion for — not a vanity claim about reading every language on Earth (see §7).

Publication coverage runs across wire services, named national and regional outlets, think tanks, open-source intelligence collectives, and social-surfaced primary sources. The top 20 by ingestion volume are published on request; the full list is in our internal config.

Cadence: continuous for wire-grade sources, hourly for named secondaries, daily-batched for the long tail. Retention: articles indefinitely, structured events indefinitely, entity-extraction cache 90 days.

The pipeline

Article becomes event becomes profile becomes report. The stack:


  SOURCE (RSS · API · scrape)
    │
    ▼
  INGEST (normalize · dedupe · language-detect · translate-hint)
    │
    ▼
  ENTITY EXTRACTION (people · places · orgs · events · relations)
    │
    ▼
  STRUCTURED EVENT (geocoded · timestamped · typed · credibility-weighted)
    │
    ▼
  PROFILE AGGREGATION (country · conflict · person · publication)
    │
    ▼
  REPORT GENERATION (Sonnet 4.6 · Haiku · Llama-planner)
    │
    ▼
  EDITORIAL (human-authored · cited against the corpus)

Ingest normalizes HTML to canonical text, dedupes by URL and content hash, runs fastText language detection, and flags translation needs for downstream.

Entity extraction uses NER with a custom gazetteer for geopolitical proper nouns that off-the-shelf models misclassify. Geocoding resolves place mentions to ISO-3166 codes where the context is unambiguous; ambiguous references are left as text.

Structured event is our internal unit — a timestamped, geocoded, typed record of a thing that happened, with links back to the originating articles. Event typing is a closed taxonomy (~52 categories in V3, down from 8,200+ LLM-invented types in V2).

Profile aggregation rolls events into country and conflict rolling windows, plus person and publication profiles that feed credibility scoring.

Report generation (see §4) is the model-authored step. Always cited, always reviewed by a human before publish.

Editorial is human-authored on top of the same corpus, with the same citation discipline. We don’t publish model-written editorial.

The scoring

Credibility, confidence, stability — all run through explicit formulas, not model judgment.

Publication credibility score

A composite in [0, 100]:

age_score · weight 0.15 · log of years-since-first-observed-byline, clipped at 30
volume_score · weight 0.20 · log of distinct author count, clipped at 500
cross_ref_score · weight 0.30 · fraction of our ingested articles that cite this publication by name
whitelist_bonus · weight 0.20 · flat bonus for known-reliable outlets (wire services, academic presses)
recency_score · weight 0.15 · how recently the publication has published in our window

Weights are reviewed quarterly and published in the “State of the dataset” post (§8). When we change a weight, we announce it, restate the formula, and regenerate scores. We do not retroactively alter published reports; the score at time-of-publish is what stands.

Seven-pillar stability score

A country-level composite, weighted across political, security, economic, regulatory, operational, institutional, and societal pillars. Each pillar pulls from structured corpus signals over a rolling 90-day window. Known weakness: over-rewards authoritarian stability (Gulf monarchies score higher than we believe they should). Rewrite on the Q3 roadmap.

Report confidence

Classified HIGH · MEDIUM · LOW based on: external-source count ≥ 10 + total-citation count ≥ 20 (HIGH); external ≥ 5 or total ≥ 12 (MEDIUM); else LOW. Always visible on the report detail page.

The models

Model	What we use it for	What we don’t
Claude Sonnet 4.6	Primary report authoring · editorial-room suggestions · composer side-car	Fact-assertion without dataset citation · off-corpus reasoning at load-bearing weight
Claude Haiku	Fallback author for cost-sensitive batches · tag suggestion · short summaries	Anything shown to a paying customer under “Sonnet output” branding
Llama (planner)	Decomposing a report prompt into structured corpus queries	Writing the final report copy

We use LLMs to assemble citations faster. We do not use LLMs to decide what is true.

Model choices get updated when the frontier moves. When we change the primary, we say so in the next “State of the dataset” post along with the eval deltas that justified the swap.

Sourcing hierarchy

Every article citation and report citation carries one of these five labels inline. Labels are visible to readers, not internal-only.

Primary — the subject’s own statement, the raw document, the court filing, the satellite image
Wire — Reuters, AP, AFP, Bloomberg — high-recency, factual, low-analysis
Secondary — named regional publications with track records in our publication_scores table
Opinion — columnists, think-tank papers, blog posts — identified as opinion inline
Counter-narrative — explicitly sourced from the perspective we’re arguing against, required in Red Team pieces

When a piece argues against a consensus, counter-narrative sources are non-negotiable. A Red Team piece without a named counter-narrative source doesn’t ship.

Corrections policy

Three commitments:

Itemized, not aggregated. When we’re wrong, we say what we got wrong, in a post of its own or as a dated update block on the original piece. No stealth edits.
Dated-update blocks. A correction to a report increments the version on the row and renders a visible “Updated — see change log” strip at top of the report page.
The “we were wrong” editorial. When a call was substantially wrong, we publish a standalone piece on /editorial analyzing why. These are pinned for 30 days.

What we don't cover

Named gaps. Not a comprehensive list — just the ones we get asked about.

Minor-language coverage — Pashto, Uzbek, Tigrinya, Quechua, Māori, and about a dozen more. Roadmap Q3.
Closed-regime internal dynamics — North Korea, Turkmenistan, Eritrea. We read what leaks and cite it; we do not pretend to insider knowledge.
Tactical military analysis — we cover strategic posture, not battle-damage-assessment-on-the-hour. Other people do it better.
Financial market calls — we read market signals as inputs to geopolitical analysis; we do not publish trading views.
US domestic politics — except where it materially affects foreign-policy execution.

State of the dataset

Quarterly, we publish a long-form methodology post on /editorial that opens up the dataset. Each issue itemizes:

Corpus deltas — articles, events, reports, authors · absolute + QoQ
Language coverage delta — what was added, what’s still missing
Publication credibility distribution — histogram, top additions and top demotions
Scoring formula changes — any weight change in publication_scores or the stability score, with before/after and rationale
Model changes — primary · fallback · planner
Corrections ledger — every correction issued this quarter, itemized, with links
Forecast scoring — last quarter’s calls, graded against outcomes
Gaps fixed · gaps still open

The first “State of the dataset” ships within 30 days of this page going live. It is the first major editorial piece in the queue after launch.

We publish these four times a year — mid-January, mid-April, mid-July, mid-October — pinned to /editorial for 14 days before sliding into the archive.

How we show our work.