Multi-turn.ai
PreviewIn Development

AI Scientist

Autonomous research agent that transforms scientific papers into verifiable, machine-actionable knowledge. Accelerating discovery by automating the tedious parts of science.

(01) — The Problem

Scientific knowledge is locked in human-optimized PDFs

Researchers waste enormous time on repeated work that could be automated:

Literature Triage

Hours spent reading papers to extract key claims and their supporting evidence

Reproduction Setup

Days configuring environments to reproduce published experiments

Citation Tracking

Manually building citation networks and tracking research lineage

Extension Discovery

Missing low-hanging fruit—easy improvements hiding in plain sight

(02) — The Pipeline

Paper → Research Object → Verification → Extension

01Ingestion
PDF / arXiv URL
02Citation
Knowledge Graph
03RO Compiler
Structured DB
04Verification
Docker Execution
05Synthesis
Extensions
(03) — Research Object

Claim–Evidence–Method–Artifact Schema

Papers are compiled into a machine-actionable format that preserves provenance while reducing context cost by 50%+ for downstream queries.

{
  "paper_id": "arxiv:2405.21060",
  "title": "Transformers are SSMs",
  
  "claims": [
    {
      "id": "C1",
      "statement": "Mamba-2 achieves 8× throughput",
      "evidence": ["E1", "E2"],
      "confidence": 0.95
    }
  ],
  
  "evidence": [
    {
      "id": "E1",
      "type": "table",
      "location": "Table 3",
      "reproducible": true
    }
  ],
  
  "methods": [
    {
      "id": "M1",
      "description": "Structured State Space Duality",
      "artifacts": ["A1"]
    }
  ],
  
  "artifacts": [
    {
      "id": "A1",
      "type": "code",
      "url": "github.com/state-spaces/mamba",
      "verified": true
    }
  ]
}
(04) — Execution-Based Verification

Run, Compare, Verify

Not just summaries—actual execution. When code and data are available, the system reproduces experiments in Docker containers and validates reported metrics.

Auto Environment Setup

Docker templates automatically configured from paper requirements

Metric Comparison

Reported vs reproduced numbers with ≤5% acceptable error threshold

Mismatch Analysis

When results differ, provides plausible causes and minimal fix suggestions

Cloud-Based Reproduction

In-silico experiments are automatically reproduced and verified on AWS infrastructure. From GPU instances to distributed computing, we restore the exact experimental environment from the paper.

Reproduce
EC2 + SageMaker
Verify
Automated Testing
Deploy
One-Click API
(05) — Technical Details

Multi-Model Orchestration

Foundation Models as Sub-Agents

We orchestrate multiple frontier AI models as specialized sub-agents, each handling tasks best suited to their capabilities. A supervisor agent dynamically routes work and synthesizes results.

Claude Opus 4.5
Deep Reasoning
GPT 5.2 Pro
Code Execution
Gemini 3.0 Pro
Long Context

Agent Architecture

  • • Supervisor/Planner Agent
  • • Ingestion Agent (PDF/URL parsing)
  • • Citation Agent (multi-source retrieval)
  • • RO Compiler Agent
  • • Verification Runner Agent
  • • Critic/QA Agent
  • • Human-in-the-Loop Gate

Infrastructure

  • • APIs: arXiv, Crossref, Semantic Scholar, PubMed
  • • Execution: Docker, Jupyter, pytest
  • • Storage: JSON RO DB + Vector indices
  • • Orchestration: LangGraph + Custom routing
  • • Optional: RLVR for policy optimization

Target Metrics

≥95%
Citation Extraction
≥70%
Retrieval Rate
50%+
Token Reduction
≤5%
Reproduction Error

Interested in AI Scientist?

We're actively developing this system. If you're a researcher interested in early access or collaboration, reach out.

Get in Touch
Multi-turn - Context is Intelligence