What is Genotype Imputation?
Genotype imputation is a statistical technique that predicts unobserved genetic variants based on patterns of linkage disequilibrium (LD)—the tendency for nearby genetic variants to be inherited together.
The "Jigsaw Puzzle" Analogy
Imagine trying to reconstruct a 1,000-piece puzzle:
- Microarray (700k SNPs): You have only the 50 edge pieces. You can see the outline, but the middle is empty.
- Imputation (AI Guessing): The AI looks at the "box cover" (reference panels like 1000 Genomes) and paints in the missing 950 pieces based on what the picture should look like.
- WGS (Real Sequencing): You actually have all 1,000 pieces in the box. No guessing required.
How SelfDecode's Pipeline Works
Input: Your Raw Data
Upload your 23andMe, Ancestry, or other microarray file (~700k SNPs measured directly).
Reference Panel Matching
Your genotypes are compared against large reference panels (1000 Genomes, TOPMed, UK Biobank) containing WGS data from thousands of individuals.
Haplotype Phasing
Algorithm determines which variants came from your mother vs father, reconstructing your two haplotypes.
Statistical Inference
Hidden Markov Models (HMMs) and machine learning predict genotypes at unmeasured positions based on LD patterns.
Output: 83 Million Variants
Each imputed variant includes a confidence score (imputation quality, R²). Low-confidence calls are flagged.
Where Does AI Imputation Succeed — And Where Does It Fail?
| Variant Category | MAF | European Accuracy | African Accuracy | Clinical Use? |
|---|---|---|---|---|
| Common SNPs | >5% | 98-99% | 95-98% | ✓ PRS, traits |
| Low-Frequency | 1-5% | 90-95% | 85-92% | ◐ Caution |
| Rare Variants | <1% | 80-90% | 70-85% | ✗ Not reliable |
| Novel/Private | <0.1% | IMPOSSIBLE | IMPOSSIBLE | ✗ Never |
Critical Limitation: Population Bias
Reference panels are heavily skewed toward European ancestry (~80% of WGS data). If you have African, South Asian, or Indigenous American ancestry, imputation accuracy drops significantly for rare and low-frequency variants. This is a fundamental limitation of all current imputation methods, not just SelfDecode.
When Is Imputation Sufficient vs. When Do You Need Real WGS?
Imputation is Sufficient
- ✓ Polygenic Risk Scores (aggregate of 1000s of common variants)
- ✓ Trait predictions (eye color, hair texture, taste preferences)
- ✓ Nutrigenomics (caffeine, lactose, alcohol metabolism)
- ✓ Ancestry refinement beyond microarray ethnicity estimates
- ✓ General wellness insights and supplement guidance
- ✓ Research and exploration (non-actionable)
Real WGS Required
- ✗ BRCA1/2 (breast/ovarian cancer risk)
- ✗ APOE ε4 (Alzheimer's risk)
- ✗ Lynch Syndrome genes (colorectal cancer)
- ✗ Pharmacogenomics for critical drugs (warfarin, clopidogrel)
- ✗ Rare disease diagnosis
- ✗ Family planning / carrier screening for rare conditions
Understanding Imputation Confidence (R² / INFO Score)
Every imputed variant comes with a confidence metric, typically expressed as R² or INFO score (0 to 1). This represents how well the imputed genotype correlates with what real sequencing would show.
| R² Score | Confidence Level | Recommended Use |
|---|---|---|
| >0.9 | High confidence | Safe for most analyses |
| 0.7-0.9 | Moderate confidence | Use with caution, aggregate only |
| <0.7 | Low confidence | Exclude from analysis |
Pro Tip: Check the R² Before Trusting a Variant
SelfDecode reports include confidence levels. Before acting on any health insight, verify that the underlying variants have R² > 0.9. If a critical variant shows R² < 0.8, do not make health decisions based on it—get clinical confirmation.
The Assessment: Is SelfDecode Imputation Worth It?
ChronosGen Assessment
Strengths
- ✓ Maximizes value from existing $99 DNA test
- ✓ 500+ health reports for general wellness
- ✓ Strong for polygenic risk scores
- ✓ Privacy-focused (no pharma data sales)
- ✓ Continuous updates as reference panels improve
Limitations
- ✗ Cannot detect truly novel variants
- ✗ Accuracy varies by ancestry
- ✗ Not clinical-grade for rare disease
- ✗ "83 million variants" is marketing—most are low-confidence
- ✗ No structural variant detection
Our Recommendation: Use SelfDecode for exploration and general wellness insights. If you find something concerning or want to make clinical decisions, confirm with 30x WGS from Dante Labs or a clinical lab. The $99/year subscription is excellent value for what it provides—just understand its boundaries.