multi-turn inc.
INDEX

index

AI Scientist

An autonomous research agent that reads papers, structures them, reproduces experiments, and explores extensions.

Preview — In Development

Problem

Most scientific knowledge is trapped in PDFs.

TaskTimeAutomation potential
Literature Triagehours / paperhigh
Reproduction Setupdays / papermedium
Citation Networkhourshigh
Extension Discoveryvariesmedium

These tasks are closer to structured information processing than deep intellectual judgment. A good fit for agents.

Research Object Schema

Papers are compiled into a machine-readable structure like the one below. Context cost is cut by more than 50% while source information is fully preserved.

{
  "paper_id": "arxiv:2405.21060",
  "title": "Transformers are SSMs",
  
  "claims": [{
    "id": "C1",
    "statement": "Mamba-2 achieves 8× throughput",
    "evidence": ["E1", "E2"],
    "confidence": 0.95
  }],
  
  "evidence": [{
    "id": "E1",
    "type": "table",
    "location": "Table 3",
    "reproducible": true
  }],
  
  "methods": [{
    "id": "M1",
    "description": "Structured State Space Duality",
    "artifacts": ["A1"]
  }],
  
  "artifacts": [{
    "id": "A1",
    "type": "code",
    "url": "github.com/state-spaces/mamba",
    "verified": true
  }]
}

Multi-Model Orchestration

Rather than relying on a single model, the system combines multiple foundation models as sub-agents matched to task characteristics. A Supervisor Agent dynamically routes work and synthesizes results.

RoleModelStrength
Deep ReasoningClaude Opus 4.5complex argument structure analysis
Code ExecutionGPT 5.2 Procode generation and execution
Long ContextGemini 3.0 Profull-paper processing

Agent Architecture

AgentRole
Supervisor / Planneroverall workflow management and task routing
Ingestion AgentPDF/URL parsing
Citation Agentmulti-source search (arXiv, Crossref, Semantic Scholar, PubMed)
RO Compiler Agentstructured transformation
Verification RunnerDocker execution and metric comparison
Critic / QA Agentquality verification
Human-in-the-Loop Gatefinal approval

Infrastructure

APIs: arXiv, Crossref, Semantic Scholar, PubMed. Execution: Docker, Jupyter, pytest. Storage: JSON Research Object DB with vector indices. Orchestration: LangGraph with custom routing. Optionally, RLVR for policy optimization.

Current Status

Each pipeline stage works independently. Full autonomous orchestration is being stabilized.

Paper structuring and citation graph construction are production-ready. Autonomous experiment reproduction and extension discovery are in progress. Method-based reproduction for papers without code is still in the research phase.

AI Scientist | Multi-turn Inc.