Broad WGS Single Sample pipeline

Identifier: WF_63e0c3.bc


This WDL pipeline implements data pre-processing and initial variant calling (GVCF generation) according to the GATK Best Practices (June 2016) for germline SNP and Indel discovery in human whole-genome data

Requirements/expectations :

  • Human whole-genome pair-end sequencing data in unmapped BAM (uBAM) format
  • One or more read groups, one per uBAM file, all belonging to a single sample (SM)
  • Input uBAM files must additionally comply with the following requirements
    • filenames all have the same suffix (we use ".unmapped.bam")
    • files must pass validation by ValidateSamFile
    • reads are provided in query-sorted order
    • all reads must have an RG tag
  • GVCF output names must end in ".g.vcf.gz"
  • Reference genome must be Hg38 with ALT contigs

Job Costs*

Not enough data to determine cost estimates for this workflow yet. Shouldn't be more than a dollar or two.

*Job cost examples are for estimates only. To get a more accurate idea of job costs, try running a single job before running many jobs.v