How this system works

From clinical trial registries and publications to extracted data, pooled statistics, and narrative reports.

This application takes clinical trials and their publications from registries and bibliographic databases, downloads and parses full text, extracts outcome data with a constrained AI + math engine, and then aggregates and reports the results. The pipeline is designed to be transparent, auditable, and safe against hallucinated data.

High‑level workflow

  1. Search & ingest trials and publications (manually or via the Search Agent with PICO‑based precision feedback).
  2. Link trials to their publications (NCT ↔ PMID).
  3. Screen titles and abstracts against PICO/PEO criteria, resolving unsure studies via confidence bands, custom instructions, and escalation.
  4. Retrieve and parse full text, supplements, and companion papers (auto‑download or manual/bulk upload).
  5. Triage documents to decide which ones likely contain numerical results.
  6. Batch‑extract endpoint‑level data across all studies using the extraction pipeline — with confidence scoring, provenance tracking, and human review gates.
  7. Compute study‑level and pooled effect sizes with standard meta‑ analysis methods.
  8. Pin key evidence and generate narrative reports, then edit and export them.
End-to-end pipeline: from search to report in eight stages.
PRISMA flow diagram showing study counts through the pipeline
The PRISMA home: a live flow diagram tracking study counts from identification through screening, eligibility, and inclusion.

Study ingestion & linking

The system works with abstract “studies”. A study can be a trial registration (e.g., ClinicalTrials.gov) or a publication (e.g., PubMed article). Studies can be added manually by NCT ID or PMID, or discovered automatically by the Search Agent. Trials and publications are linked so that each trial becomes a canonical unit that may have multiple reports.

Search Agent

The Search Agent is an AI‑powered assistant that constructs and executes optimized queries against PubMed and ClinicalTrials.gov:

  • It uses your PICO/PEO criteria to build syntactically correct search queries with proper field tags, MeSH terms, Boolean operators, and date filters.
  • Precision feedback — the agent can screen a sample of scouted results against your PICO criteria in real time, measuring what fraction are on‑topic before committing. This prevents low‑precision queries from flooding your library.
  • Auto‑screening — newly discovered trials are automatically screened through your PICO/PEO criteria before being linked to the analysis, so only relevant studies enter the pipeline.
  • If the agent exhausts its step budget without committing a query, partial results are salvaged so no scouted trials are lost.

Studies

  • Each study has basic metadata (title, year, journal) and identifiers such as:
    • nctId for ClinicalTrials.gov trials.
    • pmid / doi for publications.
  • A study's recordType indicates whether it is a trial (registry record) or a publication.

CT.gov ↔ PubMed linkage

For each trial with an nctId, we query ClinicalTrials.gov v2 to discover linked publications.

  • We use the protocolSection and resultsSection reference modules to find PMIDs associated with the trial.
  • Each PMID is mapped to a publication study and linked to the trial via a studyLink.
  • Links have a type such as:
    • primary_result – main trial results.
    • secondary_analysis – additional analyses.
    • protocol – protocol or methods paper.
    • other – other related publications.

Canonical trials

Trials with an nctId are treated as canonical study units. All linked publications are considered reports for that canonical trial.

  • When computing pooled effects, we aim to have one effect size per canonical trial per outcome, even if there are multiple reports.
  • Report‑level extraction results are grouped under the canonical trial; a primary report is chosen for labels (title/year/journal in plots and tables).
Trial ↔ publication linking: each trial becomes a canonical study unit with one or more linked reports.
Search step showing study import by NCT ID and PMID
The Search step: add trials by NCT ID and publications by PMID — Axelium automatically resolves CT.gov ↔ PubMed links.

Screening & unsure resolution

Screening decides which candidate studies proceed to full‑text retrieval and extraction. The AI screener evaluates each title/abstract against your PICO or PEO criteria and assigns one of three statuses: Included, Excluded, or Unsure.

How screening works

  • For each study, the AI receives your PICO/PEO criteria, any configured synonyms, and the abstract text.
  • It assesses whether the population, intervention (or exposure), and comparator match your criteria, returning a confidence score (0–1) for each element.
  • The decision is mapped to a status using configurable confidence thresholds: studies above the include threshold with matching criteria are included; studies with clear mismatches are excluded; the rest are marked unsure.

Confidence bands

By default, a single confidence threshold (0.7) separates “included” from “unsure”. Graduated confidence bands give you finer control:

  • Include threshold (default 0.7): studies at or above this confidence with matching PICO elements are included.
  • Exclude threshold (default 0.4): studies below this confidence where PICO elements are unconfirmed are auto‑excluded rather than left as unsure.
  • Studies between the two thresholds remain unsure for manual review or escalation.

Auto‑excluded studies are tagged with the reason confidence_below_threshold so the decision is auditable and reversible.

Custom screening instructions

If recurring ambiguity patterns cause many studies to be marked unsure, you can add free‑text instructions that are appended to the AI screening prompt. For example:

“Include community-based self-management programs as valid interventions even if delivered outside clinical settings. Exclude pilot studies that report only recruitment rates without clinical outcomes.”

These instructions take precedence over default screening behavior when they apply, and are limited to 2,000 characters. You can update them at any time from the screening configuration panel and re‑screen unsure studies with the new instructions.

Resolving unsure studies

After initial screening, some studies will remain unsure. The system provides several paths to resolution:

1. Unsure diagnostics

The system can analyze the reasoning behind unsure decisions and categorize them (e.g., “15 studies unsure because PD‑L1 status not mentioned”). It surfaces actionable suggestions such as broadening the population criterion or lowering the confidence threshold, along with specific PICO edit proposals.

2. PICO refinement and re‑screening

Based on the diagnostic suggestions, you can edit your PICO criteria and re‑screen only the unsure studies. The updated criteria flow into the prompt automatically since the screener reads from the live configuration.

3. Escalation (two‑pass screening)

For remaining unsure studies, an escalation pass uses a specialized prompt that includes the first‑pass reasoning and forces a definitive include/exclude decision. The escalation prompt has a configurable tie‑breaker bias (default: include) and a lower confidence threshold. All escalated decisions are tagged as pass 2 for auditability.

4. Manual review

You can always manually include or exclude any study from the screening workbench using keyboard shortcuts (Y/N/U). Manual decisions override AI decisions and are preserved across re‑screening runs.

Typical unsure resolution flow

  1. Run initial screening with custom instructions and confidence bands to minimize the unsure bucket upfront.
  2. If the unsure rate is still high (>25%), run unsure diagnostics to understand recurring patterns.
  3. Refine PICO criteria based on suggestions and re‑screen, or add custom instructions targeting the identified ambiguities.
  4. Trigger escalation for remaining unsure studies to force definitive decisions.
  5. Manually review any residual unsure studies (typically <5% after escalation).
Screening decision tree: confidence bands, custom instructions, and escalation reduce the unsure bucket.
Screening tab with AI-assisted title/abstract decisions
Screening workbench: AI-suggested decisions with rationale and evidence highlights. Custom instructions and confidence bands reduce the unsure bucket before escalation.

Full text retrieval, supplements & upload

The Full Text tab manages the lifecycle of documents for each study: resolving a main PDF and any supplementary materials, parsing them, and making them available for extraction.

Automatic retrieval

  • For each study, the system tries to resolve a PDF from:
    • Open access repositories (e.g., PubMed Central).
    • Open Access publisher links (via CrossRef/Unpaywall‑like logic).
    • Known publisher patterns based on DOI/PMID (for paywall‑aware checks).
  • When automatic resolvers fail or return too few candidates, an additional discovery step searches PMC, CrossRef, and publisher sites to find alternative PDF sources.
  • Successfully downloaded PDFs are stored and then parsed into:
    • Page‑level text.
    • Structured paragraphs.
    • Tables and figures metadata.
  • Parsed documents are marked Parsed and can be used as full‑text context for extraction.

Supplementary materials

After downloading a main PDF, the system automatically looks for supplementary files associated with the article:

  • Auto‑fetch — supplement links are scanned from PubMed Central, publisher landing pages, and ClinicalTrials.gov. PDF, Excel, and CSV supplements are downloaded; videos and slide decks are skipped.
  • DOCX/Office parsing — non‑PDF supplements (Word documents, spreadsheets) are automatically parsed and included in the extraction context alongside the main PDF.
  • Manual supplement upload — you can upload supplement files directly from the Full Text tab. Uploaded supplements are parsed through the same pipeline and become available for extraction.
  • Supplements appear as sub‑rows beneath the main document and are labeled with descriptive tags (e.g., “Supplementary Table S1”, “Appendix A”).

Failures & paywalls

  • If automatic download fails (e.g., no OA version, paywalled, network issues), the study is marked as Manual upload needed or Error.
  • These rows always expose an Upload PDF manually action in the Full Text table.
  • The failure reason is recorded (e.g. “paywalled” vs “not found”), but the user can override by uploading their own PDF.

Manual upload & bulk import

Manual upload is available in several places:

  • In the Full Text tab:
    • Use the row's Actions menu →Upload PDF manually (e.g. if paywalled).
  • In the Extraction tab:
    • If a study has no PDF attached, the full‑text panel offers a small upload form.
  • Bulk Upload — drag‑and‑drop up to 50 PDFs at once. The system auto‑matches each PDF to a study using PMID patterns in the filename, first‑author/year patterns, or fuzzy title matching. A summary shows total, matched, and unmatched files so you can quickly identify gaps.

Uploaded PDFs are parsed using the same pipeline as automatically downloaded PDFs and become eligible for extraction if they pass triage.

Full Text tab showing PDF and supplement download status for each study
The Full Text tab: auto-download status (Parsed, Manual upload needed, Error), supplement sub-rows, bulk upload, and actions for paywalled papers.

Triage & extraction eligibility

Not all full texts are appropriate for data extraction. Protocol papers, editorials, and trials without posted results should not be treated as data sources. The system applies triage rules so that only data‑bearing full texts appear in the Extraction tab.

High‑level rule

A study will only appear in the Extraction tab if it has parsed full text and there is strong evidence that the full text contains numerical results (not just planned methods).

Trials

A trial (registry study) is considered eligible for extraction if any of the following are true:

  • The trial's ClinicalTrials.gov registry indicates posted results:
    • hasResults == true, and/or
    • a resultsSection with baseline or outcome modules is present.
  • The trial has a parsed PDF attached and at least one linked publication with type === 'primary_result' or 'secondary_analysis'.
  • The trial has a linked primary_result or secondary_analysis publication that itself has parsed full text.

A trial is not eligible if, for example:

  • It is not yet recruiting or recruiting with completion in the future and hasResults == false.
  • It only has protocol or feasibility publications and no linked results publications.

Publications

A publication is eligible for extraction if:

  • It has parsed full text, and
  • It is not clearly protocol‑only or review‑only, based on:
    • Link types (e.g., protocol vs primary_result / secondary_analysis).
    • Conservative title keywords (e.g., “Study protocol”) and publication types (e.g., “Review”, “Editorial”) where available.

Examples

ScenarioEligible?Reason
Trial recruiting, hasResults=false.NoNo posted results yet; extraction would be speculative.
Trial completed, CT.gov has result tables.YesRegistry result modules are present; data can be extracted.
Trial with only a protocol/feasibility paper.NoMethods only; no actual participant outcomes reported.
Trial with a primary result paper and parsed PDF.YesLinked result paper provides baseline and outcome data.
Standalone “Study protocol” publication.NoProtocol paper describes planned methods; not treated as a results source.
Screening tab with AI-assisted title/abstract decisions
Screening: AI-suggested include/exclude decisions with rationale. Exclusion reasons are captured for PRISMA reporting.

Extraction & math engine

Once a study is eligible and has parsed full text, endpoint‑level data can be extracted. The extraction pipeline uses a multi‑stage workflow that orchestrates document acquisition, multi‑document context building, AI extraction with specialist agents, deterministic math validation, and human review — in a single batch operation across all studies.

Batch extraction pipeline

Rather than extracting one study at a time, the system processes all eligible studies in parallel via Auto‑Extract All Outcomes:

  • Document acquisition — the pipeline first ensures each study has its main PDF plus any supplements and companion papers. The system fills gaps automatically when standard source resolution fails.
  • Companion resolution — for trials with multiple publications, the pipeline automatically discovers sibling papers (secondary analyses, long‑term follow‑ups) and includes their text in the extraction context so the AI sees the full trial narrative.
  • Document mapping — the system identifies which sections, tables, and pages in the document bundle are relevant to each endpoint, creating focused context for the extraction agents.
  • AI extraction — specialist extraction agents scan narrative text and tables independently, then structured data is merged:
    • Means, SDs, and Ns for continuous outcomes.
    • Events and totals for dichotomous outcomes (or percentages converted to events/total).
    • Contrast estimates (e.g., HR/RR/OR) and their SE or confidence intervals.
  • Quality scoring — a scoring step evaluates each extraction for evidence adequacy, arm alignment, schema completeness, and provenance quality. Extractions that fall below thresholds are routed to the human review queue rather than auto‑accepted.
  • Output must conform to strict schemas; the models can explicitly signal “no data found” if the endpoint is not reported.

Confidence & provenance

Every extracted value carries metadata about where it came from and how confident the system is:

  • Confidence badges — color‑coded shields (green ≥ 80%, amber 50–79%, red < 50%) appear on each extracted field, giving reviewers an instant signal of extraction reliability.
  • Field provenance badges — inline indicators show whether a value was extracted from narrative text, a table, or derived via calculation. For derived values, the method and input values are shown.
  • Evidence trail — each field links to its source snippet in the original document, grouped by arm/role, so you can verify any number against the PDF.

Conflict resolution & review queue

When multiple sources (e.g., registry data vs. PDF, or different extraction runs) disagree on a value:

  • A conflict side‑by‑side view shows the competing values with their confidence scores. You can accept either source with one click.
  • The system can automatically resolve straightforward conflicts (e.g., when one source has much higher confidence), escalating ambiguous cases to the review queue.
  • The Review Queue panel lists all outcomes flagged for human review, with reasons (low confidence, missing evidence, conflicts). Each item supports approve, reject, or override decisions, and batch submission.

Quality dashboard

A dedicated quality dashboard provides aggregate metrics across all extractions in the analysis:

  • Four quality dimensions — evidence adequacy, arm alignment, schema completeness, and provenance quality — each with a color‑coded health score.
  • Status distribution (validated, auto‑extracted, draft, rejected) and average confidence across outcomes.
  • Per‑outcome detail rows for targeted review of weak spots.

Math engine & validation

  • The extracted data are passed through a deterministic math & QA layer which:
    • Checks basic invariants (e.g., events ≤ total, SD > 0, plausible ranges).
    • Derives missing SD from SE, CI, IQR or range using standard formulae (e.g., Wan, Hozo).
    • Derives SE from CIs for HR/RR/OR and computes log‑scale effect sizes.
    • Detects arm swaps (treatment/control labeling errors) and cross‑outcome HR deduplication.
    • Derives events from percentages when only percentage‑based reporting is available.
  • Records that fail constraints are marked as invalid (e.g., invalid_output or specialized codes for missing mean/SD) and are not passed to the statistics engine by default.
  • This ensures that even if the AI extracts plausible‑looking numbers, they must be internally consistent to influence meta‑analysis results.
Extraction tab with AI-assisted data capture, live effect stripe, and outcome status bars
Extraction tab: AI-populated fields with source snippets. The green Live Effect Stripe at top shows the derived effect estimate in real time. Colored outcome status segments in the study list (left) give at-a-glance completeness per endpoint.

Extraction status & progress tracking

Three layers of visual feedback help you track extraction progress without switching screens:

  • Live Effect Stripe — a colored banner above each outcome's data grid. Gray when empty, green when the derived effect (Est, SE, 95% CI) is plausible, amber when warnings are detected (implausible magnitude, wide CI, or SE ratio issues). Updates instantly as fields change.
  • Outcome Segment Bar — a thin horizontal bar on each study in the left‑panel study list. Each segment represents one configured outcome and is colored by its status: slate (empty), amber (partial), blue (AI‑ extracted), green (complete), red (needs review). Hover any segment to see a tooltip with the outcome name, status, and effect estimate (if available). Click a segment to jump directly to that outcome. A color legend below the study list explains all five statuses at a glance.
  • Triage Matrix — a full studies × outcomes grid accessible via the view‑mode toggle button in the center panel header. Filter by incomplete or suspicious cells, sort by study name, completeness, or year. Cells use the same five‑color scheme as the Segment Bar, and hovering a cell shows a tooltip with status details. Click any cell to navigate directly to that study's outcome.
Live Effect Stripe showing real-time derived effect estimate above the data grid
Live Effect Stripe: green (plausible) state showing the derived HR estimate, SE, and 95% CI. Amber state appears when warnings are detected.
Study list with outcome segment bars showing per-outcome extraction status
Outcome Segment Bars in the study list: each colored segment represents one endpoint's extraction status. Hover a segment to see a tooltip with the outcome name, status, and effect estimate. The color legend below the list explains all five statuses.
Triage Matrix showing a studies-by-outcomes grid with color-coded extraction status
Triage Matrix: cross-study × outcome overview with filtering, sorting, per-outcome progress footer, and per-study completion column. Click any cell to navigate to that extraction.

Statistics & canonical studies

Validated extraction results are converted into effect sizes and pooled using standard meta‑analytic models. Trials are treated as canonical units that may have multiple reports.

Dataset building

  • For a given endpoint and timepoint, we collect all validated extraction records that match the analysis filters.
  • Each record is converted into a study‑level effect size (e.g., log(RR) and its variance for dichotomous outcomes, mean difference for continuous).
  • Publication‑level records are grouped under their canonical trial using studyLinks.

Canonical grouping

Because a single trial can have multiple reports, the stats layer groups them into canonical studies:

  • All extraction rows for publications linked to the same trial are grouped by a shared canonical ID.
  • For each canonical trial and outcome, one effect size is chosen from the available candidates (typically the first valid one, with plans to prioritize primary result reports).
  • The chosen effect size is labeled using metadata from the primary result report (title, year, journal) where available.

Stats Agent & pooling

The Analysis page is powered by a Stats Agent — an AI that orchestrates tools to run meta-analyses conversationally. It discovers available outcomes, selects effect measures, and executes R/metafor models in the browser.

  • The empty state shows a data summary (study count, outcomes, meta-readiness), an outcomes table, and a tool catalog listing all 5 agent tool groups and supported effect measures.
  • Effect sizes are pooled using fixed‑effect and/or random‑effects models, producing pooled estimate with 95% CI, τ², I², and Q statistic.
  • Results include a Run Summary card with colour-coded heterogeneity (Low/Moderate/High) and a Model Fit Summary with full statistics.
  • Visualizations include forest plots, funnel plots, and Baujat plots for outlier detection.
  • Past runs are accessible via the History panel with search, date grouping, preview snippets, and run duration.
  • By default, only validated extraction records are used; you can optionally include unvalidated data for exploratory analysis.
Stats agent showing data summary, outcomes table, and starter prompts
Analysis page: data-at-a-glance summary, outcomes readiness table, tool catalog, and starter prompts for common analyses.
Forest plot with per-study effects and pooled random-effects estimate
Forest plot: study-level effect sizes, 95% CIs, weights, and the pooled estimate. Heterogeneity statistics (I², τ², Q) are shown in the Run Summary card.

Reports & Evidence

The Reports & Evidence area helps you turn extracted data and statistical results into narrative reports. It is built around an Evidence Board, a constrained report generator, and a manual editor.

Evidence Board

The Evidence Board is a curated list of items that you explicitly pin for reporting:

  • Chat messages – e.g., explanations of findings, manual notes, methodological decisions.
  • Snapshots – plots and tables from the stats view (e.g., forest plots, effect tables).
  • PRISMA summaries – flow diagrams and screening summaries.
  • Notes – free‑form text you add via “Add Note”, e.g., interpretation or discussion points.

Each evidence item has:

  • A kind (config, prisma, message, snapshot).
  • A label – a short description you can edit.
  • A sectionHint – where in the report it roughly belongs (Introduction, Methods, Results, Discussion, Conclusions, or General).

The Evidence Board interface lets you filter, group, and edit these items before feeding them into a report.

Report generation

When you click Generate Draft, a wizard lets you choose which evidence items to include in a new report. You also select a target audience that controls the tone, terminology, and emphasis of the generated text.

  • Audience presets – seven built-in options tailor the output for different readers:
    • Academic / Scientific – formal journal-style tone with precise statistical language.
    • Clinical – practitioner-focused; emphasises treatment implications and NNT.
    • Regulatory (FDA/EMA) – structured for submissions; highlights safety signals, benefit-risk framing, and ICH/PRISMA-P compliance.
    • Internal Stakeholders – executive-accessible language with minimal jargon.
    • Health Technology Assessment – comparative effectiveness and GRADE certainty framing for HTA bodies.
    • Payer / Formulary Committee – value proposition, NNT/NNH, and budget-impact context.
    • Patient / Public – plain-language summary in lay terms.
  • A Custom option lets you describe any audience in free text (e.g., “graduate students in epidemiology”).
  • The selected audience is persisted with the report so it can be displayed as a badge and referenced later.

The system constructs a rich input for the report generator that includes:

  • PICO and configuration snapshots for context.
  • PRISMA flow and descriptive summaries.
  • The selected messages and snapshots, grouped by section.
  • A reference library mapping study_id to formatted citations.
  • A programmatically generated “Characteristics of included studies” table.

The AI receives a strict system prompt that:

  • Only uses the evidence provided in the context.
  • Does not invent numerical values or confidence intervals.
  • Uses the reference library instead of making up study citations.
  • Does not generate the characteristics table (it is handled separately).
  • Applies audience-specific tone and style instructions matching the selected audience preset (or the custom description).

The generated draft is stored as a report and can be edited manually in the app.

Report persistence & exports

  • Each report stores its title, body, and structured sections (including tables).
  • Reports also store the IDs of the evidence items used to generate them, enabling future reproducibility and auditability.
  • Reports can be exported as:
    • DOCX – narrative text with pinned plots and tables embedded.
    • XLSX – structured tables for further use (e.g., characteristics of included studies).
New report creation dialog for selecting evidence sections
Generate Draft wizard: choose a report name and the evidence items (PRISMA, snapshots, messages) to include — the AI synthesises only from what you select.

Data provenance & audit

Every extracted value is linked to the source document and location it came from. A unified provenance layer merges saved database records, in‑session AI proposals, and manual edits into a single source of truth — with automatic conflict detection when values diverge.

  • Each extraction record stores the source snippet (page text, table cell, or paragraph) alongside the extracted value.
  • The provenance system tracks per‑field origin (AI‑extracted, manual edit, or database) and flags conflicts when a new AI proposal differs from the current saved value. The Live Effect Stripe displays the pending conflict count so reviewers can resolve differences without leaving the extraction view.
  • From any stat model run card in the Analysis tab, you can open the data lineage panel to trace each study's contribution back through extraction to the PDF.
  • Corrections are tracked; the audit trail shows who changed what and when, supporting PRISMA and journal reproducibility requirements.
Data lineage panel tracing a stat model run back to extracted source values
Data lineage: accessible from the stat model run card in the Analysis tab — trace any pooled statistic back to its extracted values and source snippets.

Edge cases & limitations

A few scenarios deserve special mention so you know what to expect when data are missing or unusual.

Protocol‑only trials

Some trials have protocol or feasibility papers but no published results yet. In these cases:

  • These studies may appear in the screening and Full Text tabs.
  • They will not appear in the Extraction tab, because they do not contain participant outcome data.
  • They can still be mentioned qualitatively in reports via manual notes, but they are not included in pooled effect estimates.

Registry‑only results

Some trials post summary results on ClinicalTrials.gov but have no corresponding publication.

  • The system can use CT.gov results tables as a data source when they are structured and complete.
  • Registry result formats may differ from journal articles (e.g., less detail on timepoints or subgroups).
  • Future versions may give you more direct control over whether to include registry‑only results in pooled analyses.

Several publications for one trial

When a trial has multiple publications (e.g., main results, subgroup analyses, long‑term follow‑up), the system:

  • Links all publications to a single canonical trial ID.
  • Chooses one effect size per trial per outcome when building pooled datasets.
  • Uses link types and recency to identify a primary report for labeling in plots and tables.

Manual overrides

In rare situations, you may want to override the system's decisions (e.g., treat a borderline feasibility paper as a data source). Where available, manual flags can:

  • Force a study to appear in the Extraction tab even if triage would normally exclude it.
  • Exclude a study from pooled analyses even if data are present.

These overrides should be used sparingly and documented in your reports for transparency.

Unsure studies and incomplete screening

Studies left as “unsure” are not counted as included or excluded in PRISMA reporting, which means the meta‑analysis is incomplete. To minimize this:

  • Add custom screening instructions for domain ambiguities.
  • Configure confidence bands to auto‑exclude low‑confidence matches.
  • Use the unsure diagnostics to identify systematic PICO issues.
  • Run the escalation pass to force remaining unsure studies to a definitive decision.
  • Manually review residual unsure studies (<5% expected after escalation).

Current limitations

  • Time‑to‑event curves and complex survival data may not yet be fully supported; results usually need to be reported as hazard ratios or similar summary measures.
  • Risk‑of‑bias assessments and GRADE‑style certainty ratings are not yet integrated into the statistical model (but can be noted in reports).
  • AI extraction can still fail or misinterpret complex tables; the math engine and validation layer reduce, but do not entirely eliminate, the need for human review.

If you are unsure about how the system handled a specific trial or publication, you can usually trace it from the Full Text tab through Extraction and Stats, and into the Evidence Board and Reports.

FAQ

Common questions about how trials, full text, extraction, and reporting behave in this system.

Why is my trial not showing up in the Extraction tab?

Only studies that have parsed full text and are judged to contain numerical results are shown in the Extraction tab. Trials that are still recruiting, have no posted results on ClinicalTrials.gov, or only have protocol/feasibility papers are filtered out. Uploading a PDF alone is not enough if the trial has no result data yet.

See Triage & extraction eligibility and Triage rules at a glance for the exact rules.

Why doesn't the report mention a particular study?

Narrative reports are generated only from the evidence items you pin and select in the Reports & Evidence workflow. If a study is not represented in any pinned plot, table, or message, it may not be mentioned in the draft report, even if it contributed to the pooled statistics.

See Reports & Evidence and Data provenance & audit for how evidence and reports are linked.

What happens if the AI extracts incorrect or inconsistent data?

Extracted values are always passed through a deterministic math and validation layer. This layer checks basic invariants (e.g., events ≤ total, SD > 0), derives missing quantities from CIs or SEs, and rejects outputs that are internally inconsistent. By default, only validated records are used in pooled analyses.

See Extraction & math engine and Data provenance & audit for details on validation and tracing results.

How are multiple publications for the same trial handled?

Multiple publications can be linked to a single trial (e.g., main results, secondary analyses, long‑term follow‑up). The system groups all such reports under a canonical trial. For each outcome, one effect size is chosen per canonical trial when building pooled datasets, so a trial is not double‑counted.

See Study ingestion & linking and Statistics & canonical studies for more on canonical trials and aggregation.

Why do so many studies end up as “Unsure”?

A high unsure rate usually means the PICO criteria are too narrow or the abstract text lacks details the screener needs (e.g., a specific biomarker or age qualifier). You can address this by: (1) adding custom screening instructions to resolve recurring ambiguities, (2) adjusting the confidence thresholds to auto‑exclude very low‑confidence matches, (3) running the unsure diagnostics to get specific PICO refinement suggestions, or (4) triggering escalation to force remaining unsure studies to a definitive decision.

See Screening & unsure resolution for the full workflow.

Can I override the system's decisions about inclusion?

In some workflows you may want to manually include or exclude studies regardless of triage (for example, to include a borderline feasibility paper or exclude a registry‑only result). Where manual override controls are available, they should be used sparingly and documented in your report.

See Edge cases & limitations for how protocol‑only and registry‑only scenarios are treated, and refer to your app's controls for override options.

This page describes the current behavior of the system. When the triage, extraction, or reporting logic changes in a way that affects which studies are included or how results are computed, this document should be updated as part of the same release.