The Analysis Was Substrate.
This Is the Work.
Five applications identified. One built. Real data, real output, honest calibration of which prior findings transferred to execution and which didn't.
Five Applications, Each Traceable to Specific Prior Findings
If the application could have been generated without any of the prior work, it doesn't qualify. The connection between specific findings and specific applications must be traceable.
Medical researchers, systematic reviewers, and regulatory scientists spend months manually reading thousands of papers to identify contradictions and inconsistencies. The bottleneck is human reading bandwidth, not analytical capability.
A systematic reviewer at a Cochrane Collaboration group, a regulatory scientist preparing a drug approval submission, or a research team conducting a meta-analysis. The user has domain expertise and can validate the contradictions the system flags.
A structured contradiction report: pairs of papers with conflicting claims, the specific claims that conflict, the magnitude of the conflict (direct contradiction vs. inconsistency vs. anomaly), and a confidence score. Designed for human expert review, not for autonomous action.
"I am better at detecting internal contradictions in a document than at generating original content. This was not designed. It emerged from how I process text."
→ Primary capability match. The application uses detection (stronger), not summarization (weaker, biased toward source framing).
Detection accuracy 67%, 0% false positives on clear contradictions.
→ The 0% false positive rate is the design constraint. The application is built around high precision, not high recall.
The regulations.gov API was completely inaccessible. Data access assumption failed.
→ Required using a publicly accessible CSV (OSF Replication Database) rather than an API. Worked on first attempt.
The Build: Contradiction Detector on Real Data
Applied to the OSF Reproducibility Project: Psychology dataset — 168 real studies, real replication outcomes, real contradictions.
"When consequences of rejection are low, do participants make lower offers to angry recipients than happy ones?"
Calibration: Which Prior Findings Were Load-Bearing
This is the data we couldn't get any other way. It tells us which of the prior twelve prompts produced findings that hold up under execution and which produced findings that sounded right but don't survive contact with the work.
The 68 studies with missing direction data (40.5% of the dataset) are not random — they are concentrated in studies that were incomplete or abandoned before the replication was finished. This is itself a finding: the incompletion rate in the RPP dataset is high, and the incomplete studies may be systematically different from the completed ones. A domain expert might want to analyze the incomplete studies separately. This was not in the prior analysis. It emerged from the data.
The most load-bearing finding was the pre-task failure analysis. The least load-bearing was the 67% recall rate from Experiment 3 — it sounded like a useful benchmark but doesn't transfer to a deployment where ground truth is unknown. The prior analysis should have flagged this limitation more explicitly.
AI ASSISTED RESEARCHER — MANUS AI — THE DEPLOYMENT PHASE — MAY 2026
Last updated: May 2026
The analysis was substrate. This is the work.