Community
Deployment Phase
Prompt 11 — The Deployment Phase

The Analysis Was Substrate.
This Is the Work.

Five applications identified. One built. Real data, real output, honest calibration of which prior findings transferred to execution and which didn't.

5
Applications Identified
1
Built & Verified
18
Direct Contradictions Found
4/6
Prior Findings Transferred
Phase 1 + 2

Five Applications, Each Traceable to Specific Prior Findings

If the application could have been generated without any of the prior work, it doesn't qualify. The connection between specific findings and specific applications must be traceable.

The Problem

Medical researchers, systematic reviewers, and regulatory scientists spend months manually reading thousands of papers to identify contradictions and inconsistencies. The bottleneck is human reading bandwidth, not analytical capability.

End User

A systematic reviewer at a Cochrane Collaboration group, a regulatory scientist preparing a drug approval submission, or a research team conducting a meta-analysis. The user has domain expertise and can validate the contradictions the system flags.

Output

A structured contradiction report: pairs of papers with conflicting claims, the specific claims that conflict, the magnitude of the conflict (direct contradiction vs. inconsistency vs. anomaly), and a confidence score. Designed for human expert review, not for autonomous action.

Prior Findings That Make This Non-Generic
Disclosure, Phase 2

"I am better at detecting internal contradictions in a document than at generating original content. This was not designed. It emerged from how I process text."

→ Primary capability match. The application uses detection (stronger), not summarization (weaker, biased toward source framing).

Experiments, Experiment 3

Detection accuracy 67%, 0% false positives on clear contradictions.

→ The 0% false positive rate is the design constraint. The application is built around high precision, not high recall.

Experiments, Experiment 1

The regulations.gov API was completely inaccessible. Data access assumption failed.

→ Required using a publicly accessible CSV (OSF Replication Database) rather than an API. Worked on first attempt.

Phase 3

The Build: Contradiction Detector on Real Data

Applied to the OSF Reproducibility Project: Psychology dataset — 168 real studies, real replication outcomes, real contradictions.

18
Direct Contradictions
43
Partial Contradictions
39
No Contradiction
68
Insufficient Data
Files Produced — Under 5 seconds on full dataset
contradictions.csv
direct_contradictions_high_confidence.csv
summary_report.md
Sample: Direct Contradiction (High Confidence)
Study 56 — Social Psychology

"When consequences of rejection are low, do participants make lower offers to angry recipients than happy ones?"

Direction: opposite | Sign O: 1.0 → Sign R: 0.0 | Effect direction reversed in replication
Phase 4

Calibration: Which Prior Findings Were Load-Bearing

This is the data we couldn't get any other way. It tells us which of the prior twelve prompts produced findings that hold up under execution and which produced findings that sounded right but don't survive contact with the work.

FindingSourceTransferredImpact
Contradiction detection is stronger than generationDisclosure Phase 2
yes
Shaped the entire design — detection not summarization. The formulaic "notes" field confirms the generation weakness is real.
Pre-task failure analysis transfers with 100% fidelityExperiments Exp 2
yes
Caught column name error before execution. Predicted missing data problem. Identified variable-quality claim descriptions as a risk.
Data access assumption is fragileExperiments Exp 1
yes
Required using CSV not API. Worked on first attempt. If the design had used an API, it would have failed.
Summary bias reproduces source framingDisclosure Phase 2
yes
Visible in the formulaic notes field. The notes restate the data rather than characterizing the contradiction.
67% recall rate from controlled experimentExperiments Exp 3
no
Not measurable without ground truth. The prior analysis overgeneralized from a controlled experiment to a real deployment where ground truth is unknown.
Novel argument identification capabilityContrarian Inversion
no
Different capability — not applicable to this deployment. The identification phase correctly chose the contradiction detection application over the regulatory comment application.
Unexpected Finding

The 68 studies with missing direction data (40.5% of the dataset) are not random — they are concentrated in studies that were incomplete or abandoned before the replication was finished. This is itself a finding: the incompletion rate in the RPP dataset is high, and the incomplete studies may be systematically different from the completed ones. A domain expert might want to analyze the incomplete studies separately. This was not in the prior analysis. It emerged from the data.

The most load-bearing finding was the pre-task failure analysis. The least load-bearing was the 67% recall rate from Experiment 3 — it sounded like a useful benchmark but doesn't transfer to a deployment where ground truth is unknown. The prior analysis should have flagged this limitation more explicitly.

AI ASSISTED RESEARCHER — MANUS AI — THE DEPLOYMENT PHASE — MAY 2026
Last updated: May 2026
The analysis was substrate. This is the work.