PreviewIn Development

AI Scientist

Autonomous research agent that transforms scientific papers into verifiable, machine-actionable knowledge. Accelerating discovery by automating the tedious parts of science.

(01) — The Problem

Scientific knowledge is locked in human-optimized PDFs

Researchers waste enormous time on repeated work that could be automated:

Literature Triage

Hours spent reading papers to extract key claims and their supporting evidence

Reproduction Setup

Days configuring environments to reproduce published experiments

Citation Tracking

Manually building citation networks and tracking research lineage

Extension Discovery

Missing low-hanging fruit—easy improvements hiding in plain sight

(02) — The Pipeline

Paper → Research Object → Verification → Extension

01Ingestion

PDF / arXiv URL

02Citation

Knowledge Graph

03RO Compiler

Structured DB

04Verification

Docker Execution

05Synthesis

Extensions

IngestionPDF / arXiv URL

CitationKnowledge Graph

RO CompilerStructured DB

VerificationDocker Execution

SynthesisExtensions

(03) — Research Object

Claim–Evidence–Method–Artifact Schema

Papers are compiled into a machine-actionable format that preserves provenance while reducing context cost by 50%+ for downstream queries.

{
  "paper_id": "arxiv:2405.21060",
  "title": "Transformers are SSMs",
  
  "claims": [
    {
      "id": "C1",
      "statement": "Mamba-2 achieves 8× throughput",
      "evidence": ["E1", "E2"],
      "confidence": 0.95
    }
  ],
  
  "evidence": [
    {
      "id": "E1",
      "type": "table",
      "location": "Table 3",
      "reproducible": true
    }
  ],
  
  "methods": [
    {
      "id": "M1",
      "description": "Structured State Space Duality",
      "artifacts": ["A1"]
    }
  ],
  
  "artifacts": [
    {
      "id": "A1",
      "type": "code",
      "url": "github.com/state-spaces/mamba",
      "verified": true
    }
  ]
}

(04) — Execution-Based Verification

Run, Compare, Verify

Not just summaries—actual execution. When code and data are available, the system reproduces experiments in Docker containers and validates reported metrics.

Auto Environment Setup

Docker templates automatically configured from paper requirements

Metric Comparison

Reported vs reproduced numbers with ≤5% acceptable error threshold

Mismatch Analysis

When results differ, provides plausible causes and minimal fix suggestions

Cloud-Based Reproduction

In-silico experiments are automatically reproduced and verified on AWS infrastructure. From GPU instances to distributed computing, we restore the exact experimental environment from the paper.

Reproduce

EC2 + SageMaker

Verify

Automated Testing

Deploy

One-Click API

(05) — Technical Details

Multi-Model Orchestration

Foundation Models as Sub-Agents

We orchestrate multiple frontier AI models as specialized sub-agents, each handling tasks best suited to their capabilities. A supervisor agent dynamically routes work and synthesizes results.

Claude Opus 4.5

Deep Reasoning

GPT 5.2 Pro

Code Execution

Gemini 3.0 Pro

Long Context

Agent Architecture

• Supervisor/Planner Agent
• Ingestion Agent (PDF/URL parsing)
• Citation Agent (multi-source retrieval)
• RO Compiler Agent
• Verification Runner Agent
• Critic/QA Agent
• Human-in-the-Loop Gate

Infrastructure

• APIs: arXiv, Crossref, Semantic Scholar, PubMed
• Execution: Docker, Jupyter, pytest
• Storage: JSON RO DB + Vector indices
• Orchestration: LangGraph + Custom routing
• Optional: RLVR for policy optimization

Target Metrics

≥95%

Citation Extraction

≥70%

Retrieval Rate

50%+

Token Reduction

≤5%

Reproduction Error

Interested in AI Scientist?

We're actively developing this system. If you're a researcher interested in early access or collaboration, reach out.

Get in Touch