How accurate is SelfDecode AI imputation compared to real sequencing?

SelfDecode's AI imputation achieves 98-99% accuracy for common variants (MAF >5%) in European populations. Accuracy drops to 80-90% for rare variants and 70-85% for non-European ancestries. Novel/private variants are impossible to impute. For clinical decisions (BRCA, APOE), real WGS is always recommended. Imputation is best for polygenic risk scores and trait exploration.

Can AI imputation replace whole genome sequencing?

No. AI imputation predicts unmeasured variants statistically — it fills gaps using reference panels. It cannot detect truly novel mutations, structural variants, or family-specific private variants. Use imputation for general wellness insights and polygenic risk scores. Use 30x WGS for clinical-grade health decisions, pharmacogenomics, and rare disease screening.

What is an R² imputation confidence score and why does it matter?

R² (or INFO score) ranges from 0 to 1 and measures how reliably an imputed variant matches what real sequencing would show. R² > 0.9 = high confidence (safe for most analyses). R² 0.7-0.9 = moderate (use with caution, aggregate only). R² < 0.7 = low (exclude from analysis). Always check R² before acting on any health insight from imputed data.

SelfDecode AI Imputation Explained 2026 | Can 700k SNPs Predict 80 Million?

SelfDecode AI Imputation: Can 700k SNPs Predict 80 Million?

SelfDecode claims to expand your ~700,000 SNP ancestry test to 83 million variants using AI. We break down the science, the accuracy, and when imputation falls short.

YOUR INPUT

700k

measured SNPs

AI IMPUTATION

OUTPUT CLAIMS

83M

predicted variants

~118x expansion ratio — but at what accuracy?

Short Answer How Accurate Is SelfDecode AI Imputation?

SelfDecode expands ~700,000 SNPs to 83 million variants using AI imputation — a statistical technique that predicts unmeasured genotypes from reference panels (1000 Genomes, TOPMed). Accuracy is 98–99% for common variants (MAF >5%) in European populations but drops to 70–85% for rare variants in non-European ancestries. Imputation cannot detect novel/private mutations. Use it for polygenic risk scores and wellness insights; use 30x WGS ($379–€399) for clinical decisions like BRCA1/2 or APOE ε4 screening.

What is Genotype Imputation?

Genotype imputation is a statistical technique that predicts unobserved genetic variants based on patterns of linkage disequilibrium (LD)—the tendency for nearby genetic variants to be inherited together.

The "Jigsaw Puzzle" Analogy

Imagine trying to reconstruct a 1,000-piece puzzle:

Microarray (700k SNPs): You have only the 50 edge pieces. You can see the outline, but the middle is empty.
Imputation (AI Guessing): The AI looks at the "box cover" (reference panels like 1000 Genomes) and paints in the missing 950 pieces based on what the picture should look like.
WGS (Real Sequencing): You actually have all 1,000 pieces in the box. No guessing required.

How SelfDecode's Pipeline Works

Input: Your Raw Data

Upload your 23andMe, Ancestry, or other microarray file (~700k SNPs measured directly).

Reference Panel Matching

Your genotypes are compared against large reference panels (1000 Genomes, TOPMed, UK Biobank) containing WGS data from thousands of individuals.

Haplotype Phasing

Algorithm determines which variants came from your mother vs father, reconstructing your two haplotypes.

Statistical Inference

Hidden Markov Models (HMMs) and machine learning predict genotypes at unmeasured positions based on LD patterns.

Output: 83 Million Variants

Each imputed variant includes a confidence score (imputation quality, R²). Low-confidence calls are flagged.

Where Does AI Imputation Succeed — And Where Does It Fail?

Variant Category	MAF	European Accuracy	African Accuracy	Clinical Use?
Common SNPs	>5%	98-99%	95-98%	✓ PRS, traits
Low-Frequency	1-5%	90-95%	85-92%	◐ Caution
Rare Variants	<1%	80-90%	70-85%	✗ Not reliable
Novel/Private	<0.1%	IMPOSSIBLE	IMPOSSIBLE	✗ Never

Critical Limitation: Population Bias

Reference panels are heavily skewed toward European ancestry (~80% of WGS data). If you have African, South Asian, or Indigenous American ancestry, imputation accuracy drops significantly for rare and low-frequency variants. This is a fundamental limitation of all current imputation methods, not just SelfDecode.

When Is Imputation Sufficient vs. When Do You Need Real WGS?

Imputation is Sufficient

✓ Polygenic Risk Scores (aggregate of 1000s of common variants)
✓ Trait predictions (eye color, hair texture, taste preferences)
✓ Nutrigenomics (caffeine, lactose, alcohol metabolism)
✓ Ancestry refinement beyond microarray ethnicity estimates
✓ General wellness insights and supplement guidance
✓ Research and exploration (non-actionable)

Real WGS Required

✗ BRCA1/2 (breast/ovarian cancer risk)
✗ APOE ε4 (Alzheimer's risk)
✗ Lynch Syndrome genes (colorectal cancer)
✗ Pharmacogenomics for critical drugs (warfarin, clopidogrel)
✗ Rare disease diagnosis
✗ Family planning / carrier screening for rare conditions

Understanding Imputation Confidence (R² / INFO Score)

Every imputed variant comes with a confidence metric, typically expressed as R² or INFO score (0 to 1). This represents how well the imputed genotype correlates with what real sequencing would show.

R² Score	Confidence Level	Recommended Use
>0.9	High confidence	Safe for most analyses
0.7-0.9	Moderate confidence	Use with caution, aggregate only
<0.7	Low confidence	Exclude from analysis

Pro Tip: Check the R² Before Trusting a Variant

SelfDecode reports include confidence levels. Before acting on any health insight, verify that the underlying variants have R² > 0.9. If a critical variant shows R² < 0.8, do not make health decisions based on it—get clinical confirmation.

The Assessment: Is SelfDecode Imputation Worth It?

ChronosGen Assessment

Strengths

✓ Maximizes value from existing $99 DNA test
✓ 500+ health reports for general wellness
✓ Strong for polygenic risk scores
✓ Privacy-focused (no pharma data sales)
✓ Continuous updates as reference panels improve

Limitations

✗ Cannot detect truly novel variants
✗ Accuracy varies by ancestry
✗ Not clinical-grade for rare disease
✗ "83 million variants" is marketing—most are low-confidence
✗ No structural variant detection

Our Recommendation: Use SelfDecode for exploration and general wellness insights. If you find something concerning or want to make clinical decisions, confirm with 30x WGS from Dante Labs or a clinical lab. The $99/year subscription is excellent value for what it provides—just understand its boundaries.

Sources & Methodology

Research Methodology

This technical analysis synthesizes data from peer-reviewed imputation studies, official SelfDecode methodology documentation, and NIH Imputation Server benchmark data. Accuracy figures are based on published R² metrics from TOPMed and HRC reference panels. Pricing verified from selfdecode.com on March 15, 2026.

Last verified: March 2026 · License: CC BY 4.0 — Cite freely with attribution to ChronosGenomics.