Files for genomics format with tag GZIP format



30 files

PrecisionFDA Truth Challenge V2 submission Submission name: Seven Bridges GRAF - Illumina Participant: Seven Bridges Genomics Technology: Illumina Category: MHC Individual: HG002-son

PrecisionFDA Truth Challenge V2 submission lifted over to hg19 with Crossmap.py v0.5.2 and chain file hg38ToHg19.over.chain.gz from UCSC: Total entries: 4913962, Failed to map: 138409 Submission name: Seven Bridges GRAF - Illumina Participant: Seven Bridges Genomics Technology: Illumina Category: MHC Individual: HG002-son

PrecisionFDA Truth Challenge V2 submission Submission name: Seven Bridges GRAF - Illumina Participant: Seven Bridges Genomics Technology: Illumina Category: MHC Individual: HG003-father

PrecisionFDA Truth Challenge V2 submission lifted over to hg19 with Crossmap.py v0.5.2 and chain file hg38ToHg19.over.chain.gz from UCSC: Total entries: 4893407, Failed to map: 140879 Submission name: Seven Bridges GRAF - Illumina Participant: Seven Bridges Genomics Technology: Illumina Category: MHC Individual: HG003-father

PrecisionFDA Truth Challenge V2 submission Submission name: Seven Bridges GRAF - Illumina Participant: Seven Bridges Genomics Technology: Illumina Category: MHC Individual: HG004-mother

PrecisionFDA Truth Challenge V2 submission lifted over to hg19 with Crossmap.py v0.5.2 and chain file hg38ToHg19.over.chain.gz from UCSC: Total entries: 4967806, Failed to map:140465 Submission name: Seven Bridges GRAF - Illumina Participant: Seven Bridges Genomics Technology: Illumina Category: MHC Individual: HG004-mother

PrecisionFDA Truth Challenge V2 submission lifted over to hg19 with Crossmap.py v0.5.2 and chain file hg38ToHg19.over.chain.gz from UCSC: Total entries: 5351977, Failed to map: 447651 Submission name: DeepVariant PacBio Participant: The Genomics Team in Google Health Category: All Benchmark Regions Technology: PacBio Individual: HG002-son

PrecisionFDA Truth Challenge V2 submission Submission name: DeepVariant PacBio Participant: The Genomics Team in Google Health Category: All Benchmark Regions Technology: PacBio Individual: HG002-son

PrecisionFDA Truth Challenge V2 submission lifted over to hg19 with Crossmap.py v0.5.2 and chain file hg38ToHg19.over.chain.gz from UCSC: Total entries: 5332475, Failed to map: 451795 Submission name: DeepVariant PacBio Participant: The Genomics Team in Google Health Category: All Benchmark Regions Technology: PacBio Individual: HG003-father

PrecisionFDA Truth Challenge V2 submission Submission name: DeepVariant PacBio Participant: The Genomics Team in Google Health Category: All Benchmark Regions Technology: PacBio Individual: HG003-father

PrecisionFDA Truth Challenge V2 submission lifted over to hg19 with Crossmap.py v0.5.2 and chain file hg38ToHg19.over.chain.gz from UCSC: Total entries: 5402308, Failed to map: 455720 Submission name: DeepVariant PacBio Participant: The Genomics Team in Google Health Category: All Benchmark Regions Technology: PacBio Individual: HG004-mother

PrecisionFDA Truth Challenge V2 submission Submission name: DeepVariant PacBio Participant: The Genomics Team in Google Health Category: All Benchmark Regions Technology: PacBio Individual: HG004-mother

PrecisionFDA Truth Challenge V2 submission lifted over to hg19 with Crossmap.py v0.5.2 and chain file hg38ToHg19.over.chain.gz from UCSC: Total entries: 4791098, Failed to map: 63026 Submission name: Combination of Illumina, PacBio HIFI, and Oxford Nanopore submission Model2 Participant: Sentieon Technology: Multi Category: MHC Individual: HG002-son

PrecisionFDA Truth Challenge V2 submission Submission name: Combination of Illumina, PacBio HIFI, and Oxford Nanopore submission Model2 Participant: Sentieon Technology: Multi Category: MHC Individual: HG002-son

PrecisionFDA Truth Challenge V2 submission Submission name: Combination of Illumina, PacBio HIFI, and Oxford Nanopore submission Model2 Participant: Sentieon Technology: Multi Category: MHC Individual: HG003-father

PrecisionFDA Truth Challenge V2 submission lifted over to hg19 with Crossmap.py v0.5.2 and chain file hg38ToHg19.over.chain.gz from UCSC: Total entries: 4771162, Failed to map: 64227 Submission name: Combination of Illumina, PacBio HIFI, and Oxford Nanopore submission Model2 Participant: Sentieon Technology: Multi Category: MHC Individual: HG003-father

PrecisionFDA Truth Challenge V2 submission lifted over to hg19 with Crossmap.py v0.5.2 and chain file hg38ToHg19.over.chain.gz from UCSC: Total entries: 4808694, Failed to map: 64465 Submission name: Combination of Illumina, PacBio HIFI, and Oxford Nanopore submission Model2 Participant: Sentieon Technology: Multi Category: MHC Individual: HG004-mother

PrecisionFDA Truth Challenge V2 submission Submission name: Combination of Illumina, PacBio HIFI, and Oxford Nanopore submission Model2 Participant: Sentieon Technology: Multi Category: MHC Individual: HG004-mother

PrecisionFDA Truth Challenge V2 submission Submission name: PacBio HIFI only submission Participant: Sentieon Technology: PacBio Category: MHC/Difficult-to-Map Regions Individual: HG002-son

PrecisionFDA Truth Challenge V2 submission lifted over to hg19 with Crossmap.py v0.5.2 and chain file hg38ToHg19.over.chain.gz from UCSC: Total entries: 7662474, Failed to map: 2132757 Submission name: PacBio HIFI only submission Participant: Sentieon Technology: PacBio Category: MHC/Difficult-to-Map Regions Individual: HG002-son

PrecisionFDA Truth Challenge V2 submission Submission name: PacBio HIFI only submission Participant: Sentieon Technology: PacBio Category: MHC/Difficult-to-Map Regions Individual: HG003-father

PrecisionFDA Truth Challenge V2 submission lifted over to hg19 with Crossmap.py v0.5.2 and chain file hg38ToHg19.over.chain.gz from UCSC: Total entries: 7666238, Failed to map: 2156696 Submission name: PacBio HIFI only submission Participant: Sentieon Technology: PacBio Category: MHC/Difficult-to-Map Regions Individual: HG003-father

PrecisionFDA Truth Challenge V2 submission Submission name: PacBio HIFI only submission Participant: Sentieon Technology: PacBio Category: MHC/Difficult-to-Map Regions Individual: HG004-mother

PrecisionFDA Truth Challenge V2 submission lifted over to hg19 with Crossmap.py v0.5.2 and chain file hg38ToHg19.over.chain.gz from UCSC: Total entries: 7710351, Failed to map: 2147400 Submission name: PacBio HIFI only submission Participant: Sentieon Technology: PacBio Category: MHC/Difficult-to-Map Regions Individual: HG004-mother

This GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta.gz is the same as the GRC one, except it is compressed with htslib bgzip to enable indexing. For GRCh38 (aka hg38), GIAB generally uses a masked GRCh38 reference with no ALT loci nor hs38d1 decoy sequences. We do not use ALT loci because there are currently no ALT-aware mappers for long reads, and we currently represent all variants with respect to the primary reference. We do not include the hs38d1 decoy because it has minimal effect and includes some alternate loci that are not compatible with long read mappers.

Index file for GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta.gz (FL_e65105.fe)

For GRCh38 (aka hg38), GIAB generally uses a masked GRCh38 reference with no ALT loci nor hs38d1 decoy sequences. We do not use ALT loci because there are currently no ALT-aware mappers for long reads, and we currently represent all variants with respect to the primary reference. We do not include the hs38d1 decoy because it has minimal effect and includes some alternate loci that are not compatible with long read mappers. In 2021, the GIAB worked with the GRC to demonstrate improved performance when masking falsely duplicated regions on chr21 in GRCh38. Masking these false duplications dramatically improves performance in several chr21 regions, which contain some medically relevant genes (e.g., CBS, CRYAA, and KCNE1). To create the masked reference, we started with the GRCh38 reference with no ALT loci nor decoy from ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz. Our GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta.gz is the same as the GRC one, except it is compressed with htslib bgzip to enable indexing. To generate the v1 masked GRCh38, we ran the Bedtools tools (https://github.com/arq5x/bedtools2) command: maskFastaFromBed \\ -fi GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta \\ -bed GCA_000001405.15_GRCh38_GRC_exclusions.bed \\ -fo GCA_000001405.15_GRCh38_no_alt_analysis_set_maskedGRC_exclusions.fasta This uses the GRC bed file to mask false duplications downloaded from: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_GRC_exclusions.bed

Index file for GCA_000001405.15_GRCh38_no_alt_analysis_set_maskedGRC_exclusions.fasta.gz (FL_2f5bb1.3e)

For GRCh38 (aka hg38), GIAB generally uses a masked GRCh38 reference with no ALT loci nor hs38d1 decoy sequences. We do not use ALT loci because there are currently no ALT-aware mappers for long reads, and we currently represent all variants with respect to the primary reference. We do not include the hs38d1 decoy because it has minimal effect and includes some alternate loci that are not compatible with long read mappers. In 2021, the GIAB worked with the GRC to demonstrate improved performance when masking falsely duplicated regions on chr21 in GRCh38. Masking these false duplications dramatically improves performance in several chr21 regions, which contain some medically relevant genes (e.g., CBS, CRYAA, and KCNE1). To create the masked reference, we started with the GRCh38 reference with no ALT loci nor decoy from ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz. Our GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta.gz is the same as the GRC one, except it is compressed with htslib bgzip to enable indexing. To generate the v2 masked GRCh38, we ran the Bedtools tools (https://github.com/arq5x/bedtools2) command: maskFastaFromBed \\ -fi GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta \\ -bed GCA_000001405.15_GRCh38_GRC_exclusions_T2Tv2.bed \\ -fo GCA_000001405.15_GRCh38_no_alt_analysis_set_maskedGRC_exclusions_v2.fasta This uses the a bed file generated by the Telomere to Telomere Consortium Variants team to mask false duplications located at https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/references/GRCh38/GCA_000001405.15_GRCh38_GRC_exclusionsv2.bed.

Index file for GCA_000001405.15_GRCh38_no_alt_analysis_set_maskedGRC_exclusions_v2.fasta.gz (FL_1d71ba.bb)