Identifier: WF_254045.d5


This repository contains all components of the RNA-seq pipeline used by the GTEx Consortium, including alignment, expression quantification, and quality control.

  • SamToFastq: BAM to FASTQ conversion
  • STAR: spliced alignment of RNA sequence reads (v2.5.3a)
  • Picard MarkDuplicates: mark duplicate reads
  • RSEM transcript expression quantification (v1.3.0)
  • bamsync: utility for transferring QC flags from the input BAM and for re-generating read group IDs
  • RNA-SeQC: QC metrics and gene-level expression quantification (v1.1.9)

Reference indexes for STAR and RSEM are needed to run the pipeline. All reference files are available at gs://gtex-resources.

GTEx releases from V8 onward are based on the GRCh38/hg38 reference genome. Please see for details and links for this reference. Releases up to V7 were based on the GRCh37/hg19 reference genome (download).

Release V8 uses the GENCODE v26 annotation. Releases V6/V6p and V7 used GENCODE v19.

For hg19-based analyses, the GENCODE annotation should be patched to use Ensembl chromosome names:

zcat gencode.v19.annotation.gtf.gz | \
  sed 's/chrM/chrMT/;s/chr//' > gencode.v19.annotation.patched_contigs.gtf

A 2x76 bp paired-end sequencing protocol, will use a sjdbOverhang of 75

Job Costs*

Public example Job cost Notes
Quertermous SRR7058289 $ 1.68

*Job cost examples are for estimates only. To get a more accurate idea of job costs, try running a single job before running many jobs.


