[Articles] A universal SNP and small-indel variant caller using deep neural networks

About this Article

Authors: Ryan Poplin, Pi-Chuan Chang, David Alexander, Scott Schwartz, et al.
Journal: Nature Biotechnology
Year: 2018
Official Citation: Poplin, R., Chang, PC., Alexander, D. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol 36, 983-987 (2018).

Step 1: By scanning BAM(text file), figure out suspicious spots of variations.
Step 2: By using the information from step 1, make pileup image. Pileup image is an image that reads from Next Generation Sequencing are aligned along the reference sequence.
Step 3: Train Deep CNN by using true genotype and pileup images. Now, preparation for a new prediction is complete.
Step 4: Enter a new pileup image as an input.
Step 5: Deep Variant returns the variation likelihood of the genotype.

Won ‘highest performance’ award for SNP in FDA-sponsored Truth Challenge 2016 (evaluated on sample NA24385).
This blind test result shows that Deep Variant is also powerful to unseen dataset.
Powerful than all other methods with more than 50% fewer errors per genome (sample NA24385). (Table 1)
Outperformed all other methods even in artificially synthesized sample CHM1-CHM13 (Table 2).

Robust to changes in sequencing depth. Performed well on other version of genome without additional training.
Can applied to other mammalian species (e.g. mouse).
No further performance loss in other instrument types, preparation protocols.
If retraining is conducted in parallel, Deep Variant can be applied to exome data (gene on exon) which is hard because of shortage of data.

Changing to RGB image is not always optimal.
Rely on sensitivity of candidate callset (created by step 1).
The process (finding candidates -> making images -> apply CNN) is applicable to other fields.