Methodology
This project analyzes platforms with the benefit of hindsight,
grading policies, politicians, and governments based on long-term outcomes
rather than intentions or popularity at the time.
Data Sources
Analysis Approach
Each platform is analyzed using GPT 5.1 with a structured prompt that focuses on:
- Distinctive ideas and mechanisms - What unique proposals were made, beyond generic party positions?
- How ideas aged - Did they anticipate future problems or directions of travel?
- Policy diffusion - Were ideas later adopted by others, even if not by the proposing party?
- Directional prescience - Did the idea correctly anticipate future problems, even if unpopular or reversed?
- Durability and execution - If enacted, did it survive, scale, and produce intended effects?
The analysis separates two concepts that are often confused: a policy can score highly on prescience even if it was later undone (reversal can reflect being "ahead of its time" or weak coalition support), while a popular or enacted policy can still score poorly if it aged badly.
For platforms from parties that did not win the presidency, the analysis focuses on:
- Agenda-setting influence - Did it shift the Overton window or policy conversation?
- Adoption and diffusion - Were parts later implemented by others, and in what form?
- Diagnostic accuracy - Was it right about what the coming problems would be?
- Mechanism quality - Would the proposed approach likely have worked given constraints at the time?
- How it aged - Does it look wiser or more naïve in hindsight?
Each analysis produces a summary of key commitments, a retrospective account of what actually happened, and machine-readable grades for individuals, policies, and the party overall.
Grading Criteria
All grades are based on outcomes and long-term impact, not intentions or contemporary reception:
| Grade |
Meaning |
Criteria |
| A |
Transformative |
Prescient, effective, lasting positive legacy. Policy endured and is now consensus. |
| B |
Competent |
Generally successful with some limitations. Achieved most objectives. |
| C |
Mixed |
Significant gaps between promises and delivery. Partial implementation or temporary impact. |
| D |
Failed |
Largely failed to deliver. Major broken promises or negative consequences. |
| F |
Disastrous |
Significantly harmful, completely abandoned, or actively repudiated. |
What We're NOT Measuring
- Popularity at the time - A policy can be popular but fail in practice (low grade)
- Good intentions - Noble aims don't count if execution failed
- Electoral success - Winning elections is noted but doesn't affect policy grades
What We ARE Measuring
- Implementation rate - Were promises kept?
- Durability - Did subsequent governments retain or reverse the policy?
- Effectiveness - Did the policy achieve its stated goals?
- Unintended consequences - What unforeseen effects emerged?
- Historical verdict - How does history judge this decision?
Hall of Fame Aggregation
The Hall of Fame ranks individuals, policies, and parties by averaging their grades across all manifesto analyses where they appear. This identifies:
- Consistently prescient figures who made good calls repeatedly
- Consistently wrong figures whose judgment proved flawed
- Durable policies that stood the test of time
- Failed experiments that were abandoned or reversed
⚠️ Limitations & Caveats
- LLM Analysis: Grades are generated by GPT 5.1, which may have biases or make errors. This is an experimental analysis, not definitive historical judgment.
- Hindsight Bias: It's easy to judge past decisions when you know the outcomes. Some policies failed due to external factors beyond anyone's control.
- Incomplete Data: Not all manifestos have full text. Some analyses rely on summaries which may miss nuance.
- Subjectivity: What counts as "success" or "failure" often depends on political perspective. We try to focus on measurable outcomes.
- Context Collapse: Historical decisions were made in specific contexts that may not be fully captured in the analysis.
Technical Details
- LLM Model: GPT 5.1 (OpenAI)
- PDF Extraction: PyMuPDF (fitz)
- Text Limit: Manifestos truncated to ~50,000 characters
- Source Code: Python pipeline with HTML/JS rendering
Contact
This is an experimental project exploring what LLMs can tell us about political history.
Feedback, corrections, and contributions are welcome.