AI Scientist

Preview — In Development

Problem

과학 지식의 대부분은 PDF에 갇혀 있다.

작업	소요 시간	자동화 가능성
Literature Triage	수 시간 / 논문	높음
Reproduction Setup	수 일 / 논문	중간
Citation Network 구축	수 시간	높음
Extension Discovery	가변적	중간

이런 작업은 고도의 지적 판단이라기보다 구조화된 정보 처리에 가깝다. 에이전트가 잘 해낼 수 있는 영역이다.

Research Object Schema

논문은 아래와 같은 기계 판독 가능한 구조로 컴파일된다. 원본 대비 컨텍스트 비용을 50% 이상 줄이면서도 출처 정보는 빠짐없이 보존한다.

{
  "paper_id": "arxiv:2405.21060",
  "title": "Transformers are SSMs",
  
  "claims": [{
    "id": "C1",
    "statement": "Mamba-2 achieves 8× throughput",
    "evidence": ["E1", "E2"],
    "confidence": 0.95
  }],
  
  "evidence": [{
    "id": "E1",
    "type": "table",
    "location": "Table 3",
    "reproducible": true
  }],
  
  "methods": [{
    "id": "M1",
    "description": "Structured State Space Duality",
    "artifacts": ["A1"]
  }],
  
  "artifacts": [{
    "id": "A1",
    "type": "code",
    "url": "github.com/state-spaces/mamba",
    "verified": true
  }]
}

Multi-Model Orchestration

단일 모델이 아닌, 작업 특성에 맞는 여러 기반 모델(foundation model)을 하위 에이전트로 조합한다. Supervisor Agent가 작업을 동적으로 분배하며 결과를 종합한다.

Role	Model	Strength
Deep Reasoning	Claude Opus 4.5	복잡한 논증 구조 분석
Code Execution	GPT 5.2 Pro	코드 생성 및 실행
Long Context	Gemini 3.0 Pro	긴 논문 전문 처리

Agent Architecture

Agent	Role
Supervisor / Planner	전체 워크플로우 관리 및 작업 분배
Ingestion Agent	PDF/URL 파싱
Citation Agent	다중 소스 검색 (arXiv, Crossref, Semantic Scholar, PubMed)
RO Compiler Agent	구조화하여 변환
Verification Runner	Docker 실행 및 지표 비교
Critic / QA Agent	품질을 검증
Human-in-the-Loop Gate	최종 승인

Infrastructure

API는 arXiv, Crossref, Semantic Scholar, PubMed를 사용한다. 실행 환경은 Docker, Jupyter, pytest로 구성하고, 데이터는 JSON Research Object DB와 Vector indices에 저장한다. 오케스트레이션에는 LangGraph와 자체 라우팅을 쓰며, 선택적으로 RLVR을 정책 최적화에 활용할 수 있다.

Current Status

파이프라인의 각 단계는 독립적으로 작동한다. 전체 자율 오케스트레이션은 안정화를 진행하고 있다.

논문 구조화와 인용 그래프 구성은 프로덕션 수준에 올라와 있다. 자율 실험 재현과 확장점 탐색은 개발 중이고, 코드 없는 논문의 방법론 기반 재현은 아직 리서치 단계에 머물러 있다.

Preview — In Development

Problem

Most scientific knowledge is trapped in PDFs.

Task	Time	Automation potential
Literature Triage	hours / paper	high
Reproduction Setup	days / paper	medium
Citation Network	hours	high
Extension Discovery	varies	medium

These tasks are closer to structured information processing than deep intellectual judgment. A good fit for agents.

Research Object Schema

Papers are compiled into a machine-readable structure like the one below. Context cost is cut by more than 50% while source information is fully preserved.

{
  "paper_id": "arxiv:2405.21060",
  "title": "Transformers are SSMs",
  
  "claims": [{
    "id": "C1",
    "statement": "Mamba-2 achieves 8× throughput",
    "evidence": ["E1", "E2"],
    "confidence": 0.95
  }],
  
  "evidence": [{
    "id": "E1",
    "type": "table",
    "location": "Table 3",
    "reproducible": true
  }],
  
  "methods": [{
    "id": "M1",
    "description": "Structured State Space Duality",
    "artifacts": ["A1"]
  }],
  
  "artifacts": [{
    "id": "A1",
    "type": "code",
    "url": "github.com/state-spaces/mamba",
    "verified": true
  }]
}

Multi-Model Orchestration

Rather than relying on a single model, the system combines multiple foundation models as sub-agents matched to task characteristics. A Supervisor Agent dynamically routes work and synthesizes results.

Role	Model	Strength
Deep Reasoning	Claude Opus 4.5	complex argument structure analysis
Code Execution	GPT 5.2 Pro	code generation and execution
Long Context	Gemini 3.0 Pro	full-paper processing

Agent Architecture

Agent	Role
Supervisor / Planner	overall workflow management and task routing
Ingestion Agent	PDF/URL parsing
Citation Agent	multi-source search (arXiv, Crossref, Semantic Scholar, PubMed)
RO Compiler Agent	structured transformation
Verification Runner	Docker execution and metric comparison
Critic / QA Agent	quality verification
Human-in-the-Loop Gate	final approval

Infrastructure

APIs: arXiv, Crossref, Semantic Scholar, PubMed. Execution: Docker, Jupyter, pytest. Storage: JSON Research Object DB with vector indices. Orchestration: LangGraph with custom routing. Optionally, RLVR for policy optimization.

Current Status

Each pipeline stage works independently. Full autonomous orchestration is being stabilized.

Paper structuring and citation graph construction are production-ready. Autonomous experiment reproduction and extension discovery are in progress. Method-based reproduction for papers without code is still in the research phase.