Identifier: TL_e24fa5.c8


A set of command line tools for manipulating high-throughput sequencing (HTS) data in formats such as SAM/BAM/CRAM and VCF. Available as a standalone program or within the GATK4 program.


  • picard GatherVcfs

    Gathers multiple VCF files from a scatter operation into a single VCF file. Input files must be supplied in genomic order and must not have events at overlapping positions.

  • picard SamFormatConverter

    Convert a BAM file to a SAM file, or SAM to BAM. Input and output formats are determined by file extension.

  • picard GenotypeConcordance

    Evaluate genotype concordance between callsets. This tool evaluates the concordance between genotype calls for samples in different callsets where one is being considered as the truth (aka standard, or reference) and the other as the call that is being evaluated for accuracy.

  • picard RenameSampleInVcf

    Renames a sample within a VCF or BCF. This tool enables the user to rename a sample in either a VCF or BCF file. It is intended to change the name of a sample in a VCF prior to merging with VCF files in which one or more samples have similar names. Note that the input VCF file must be single-sample VCF and that the NEW_SAMPLE_NAME is required.

  • picard VcfToIntervalList

    Converts a VCF or BCF file to a Picard Interval List.

  • picard AddCommentsToBam

    Adds comments to the header of a BAM file. This tool makes a copy of the input bam file, with a modified header that includes the comments specified at the command line (prefixed by @CO). Use double quotes to wrap comments that include whitespace or special characters. Note that this tool cannot be run on SAM files.

  • picard BamToBfq

    Create BFQ files from a BAM file for use by the maq aligner. BFQ is a binary version of the FASTQ file format. This tool creates bfq files from a BAM file for use by the maq aligner.

  • picard ReorderSam

    Not to be confused with SortSam which sorts a SAM or BAM file with a valid sequence dictionary, ReorderSam reorders reads in a SAM/BAM file to match the contig ordering in a provided reference file, as determined by exact name matching of contigs. Reads mapped to contigs absent in the new reference are dropped. Runs substantially faster if the input is an indexed BAM file.

  • picard ReplaceSamHeader

    Replaces the SAMFileHeader in a SAM or BAM file. This tool makes it possible to replace the header of a SAM or BAM file with the header of anotherfile, or a header block that has been edited manually (in a stub SAM file). The sort order (@SO) of the two input files must be the same. Note that validation is minimal, so it is up to the user to ensure that all the elements referred to in the SAMRecor

  • picard RevertSam

    Reverts SAM or BAM files to a previous state. This tool removes or restores certain properties of the SAM records, including alignment information, which can be used to produce an unmapped BAM (uBAM) from a previously aligned BAM. It is also capable of restoring the original quality scores of a BAM file that has already undergone base quality score recalibration (BQSR) if theoriginal qualities wer

  • picard SetNmMdAndUqTags

    Fixes the NM, MD, and UQ tags in a SAM file. This tool takes in a SAM or BAM file (sorted by coordinate) and calculates the NM, MD, and UQ tags by comparing with the reference. This may be needed when MergeBamAlignment was run with SORT_ORDER different from 'coordinate' and thus could not fix these tags then.

  • picard SortSam

    Sorts a SAM or BAM file. This tool sorts the input SAM or BAM file by coordinate, queryname (QNAME), or some other property of the SAM record. The SortOrder of a SAM/BAM file is found in the SAM file header tag @HD in the field labeled SO. For a coordinate sorted SAM/BAM file, read alignments are sorted first by the reference sequence name (RNAME) field using the reference sequence dictionary (@SQ

  • picard SortVcf

    Sorts one or more VCF files. This tool sorts the records in VCF files according to the order of the contigs in the header/sequence dictionary and then by coordinate. It can accept an external sequence dictionary. If no external dictionary is supplied, the VCF file headers of multiple inputs must have the same sequence dictionaries. If running on multiple inputs (originating from e.g. some scatter-

  • picard SplitSamByLibrary

    Takes a SAM or BAM file and separates all the reads into one SAM or BAM file per library name. Reads that do not have a read group specified or whose read group does not have a library name are written to a file called 'unknown.' The format (SAM or BAM) of the output files matches that of the input file.

  • picard SplitVcfs

    Splits SNPs and INDELs into separate files. This tool reads in a VCF or BCF file and writes out the SNPs and INDELs it contains to separate files. The headers of the two output files will be identical and index files will be created for both outputs. If records other than SNPs or INDELs are present, set the STRICT option to "false", otherwise the tool will raise an exception and quit.

  • picard FilterVcf

    Applies one or more hard filters to a VCF file to filter out genotypes and variants.

  • picard ViewSam

    Prints a SAM or BAM file to the screen.

  • picard FifoBuffer

    Provides a large, configurable, FIFO buffer that can be used to buffer input and output streams between programs with a buffer size that is larger than that offered by native unix FIFOs (usually 64k).

  • picard UpdateVcfSequenceDictionary

    Takes a VCF and a second file that contains a sequence dictionary and updates the VCF with the new sequence dictionary.

  • picard SamToFastq

    Converts a SAM or BAM file to FASTQ. This tool extracts read sequences and base quality scores from the input SAM/BAM file and outputs them in FASTQ format. This can be used by way of a pipe to run BWA MEM on unmapped BAM (uBAM) files efficiently.

  • picard LiftOverIntervalList

    Lifts over an interval list from one reference build to another. This tool adjusts the coordinates in an interval list derived from one reference to match a new reference, based on a chain file that describes the correspondence between the two references. It is based on the UCSC liftOver tool (see: http://genome.ucsc.edu/cgi-bin/hgLiftOver) and uses a UCSC chain file to guide its operation. It acc

  • picard VcfFormatConverter

    Converts VCF to BCF or BCF to VCF. This tool converts files between the plain-text VCF format and its binary compressed equivalent, BCF. Input and output formats are determined by file extensions specified in the file names. For best results, it is recommended to ensure that an index file is present and set the REQUIRE_INDEX option to true.

  • picard RevertOriginalBaseQualitiesAndAddMateCigar

    Reverts the original base qualities and adds the mate cigar tag to read-group BAMs.

  • picard CollectInsertSizeMetrics

    This tool provides useful metrics for validating library construction including the insert size distribution and read orientation of paired-end libraries. The expected proportions of these metrics vary depending on the type of library preparation used, resulting from technical differences between pair-end libraries and mate-pair libraries. For a brief primer on paired-end sequencing and mate-pair

  • picard MeanQualityByCycle

    Collect mean quality by cycle. This tool generates a data table and chart of mean quality by cycle from a BAM file. It is intended to be used on a single lane or a read group's worth of data, but can be applied to merged BAMs if needed. This metric gives an overall snapshot of sequencing machine performance. For most types of sequencing data, the output is expected to show a slight reduction in ov

  • picard CollectJumpingLibraryMetrics

    Collect jumping library metrics. This tool collects high-level metrics about the presence of outward-facing (jumping) and inward-facing (non-jumping) read pairs within a SAM or BAM file. For a brief primer on jumping libraries, see the GATK Dictionary. This program gets all data for computation from the first read in each pair in which the mapping quality (MQ) tag is set with the mate's mapping qu

  • picard CheckFingerprint

    Computes a fingerprint from the supplied input file (SAM/BAM or VCF) file and compares it to the expected fingerprint genotypes provided. The key output is a LOD score which represents the relative likelihood of the sequence data originating from the same sample as the genotypes vs. from a random sample. Two outputs are produced: (1) a summary metrics file that gives metrics at the single sample l

  • picard CollectVariantCallingMetrics

    Collects per-sample and aggregate (spanning all samples) metrics from the provided VCF file.

  • picard ScatterIntervalsByNs

    Writes an interval list based on splitting a reference by Ns. This tool identifies positions in a reference where the bases are 'no-calls' and writes out an interval-list using the resulting coordinates. This can be used to create an interval list for whole genome sequence (WGS) for e.g. scatter-gather purposes, as an alternative to using fixed-length intervals. The number of contiguous nocalls th

  • picard AddOrReplaceReadGroups

    Replace read groups in a BAM file. This tool enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file. For more information about read groups, see the GATK Dictionary entry. This tool accepts INPUT BAM and SAM files or URLs from the Global Alliance for Genomics and Health (GA4GH) (see http://ga4gh.org/

  • picard BaitDesigner

    Designs oligonucleotide baits for hybrid selection reactions. This tool is used to design custom bait sets for hybrid selection experiments. The following files are input into BaitDesigner: a (TARGET) interval list indicating the sequences of interest, e.g. exons with their respective coordinates, a reference sequence, and a unique identifier string (DESIGN_NAME). The tool will output interval_lis

  • picard BamIndexStats

    Generate index statistics from a BAM fileThis tool calculates statistics from a BAM index (.bai) file, emulating the behavior of the "samtools idxstats" command. The statistics collected include counts of aligned and unaligned reads as well as all records with no start coordinate. The input to the tool is the BAM file name but it must be accompanied by a corresponding index file.

  • picard BedToIntervalList

    Converts a BED file to a Picard Interval List. This tool provides easy conversion from BED to the Picard interval_list format which is required by many Picard processing tools. Note that the coordinate system of BED files is such that the first base or position in a sequence is numbered "0", while in interval_list files it is numbered "1". BED files contain sequence data displayed in a flexible fo

  • picard BuildBamIndex

    Generates a BAM index ".bai" file. This tool creates an index file for the input BAM that allows fast look-up of data in a BAM file, lke an index on a database. Note that this tool cannot be run on SAM files, and that the input BAM file must be sorted in coordinate order.

  • picard CalculateReadGroupChecksum

    Creates a hash code based on the read groups (RG). This tool creates a hash code based on identifying information in the read groups (RG) of a ".BAM" or "SAM" file header. Addition or removal of RGs changes the hash code, enabling the user to quickly determine if changes have been made to the read group information.

  • picard CheckIlluminaDirectroy

    Asserts the validity for specified Illumina basecalling data. This tool will check that the basecall directory and the internal files are available, exist, and are reasonably sized for every tile and cycle. Reasonably sized means non-zero sized for files that exist per tile and equal size for binary files that exist per cycle or per tile. If DATA_TYPES {Position, BaseCalls, QualityScores, PF, or B

  • picard CheckTerminatorBlock

    Asserts the provided gzip file's (e.g., BAM) last block is well-formed; RC 100 otherwise

  • picard CleanSam

    Cleans the provided SAM/BAM, soft-clipping beyond-end-of-reference alignments and setting MAPQ to 0 for unmapped reads

  • picard ClusterCrosscheckMetrics

    Clusters the results from a CrosscheckFingerprints into groups that are connected according to a large enough LOD score.

  • picard CollectAlignmentSummaryMetrics

    Produces a summary of alignment metrics from a SAM or BAM file. This tool takes a SAM/BAM file input and produces metrics detailing the quality of the read alignments as well as the proportion of the reads that passed machine signal-to-noise threshold quality filters. Note that these quality filters are specific to Illumina data; for additional information, please see the corresponding GATK Dictio

  • picard CollectIlluminaBasecallingMetrics

    Collects Illumina Basecalling metrics for a sequencing run. This tool will produce per-barcode and per-lane basecall metrics for each sequencing run. Mean values for each metric are determined using data from all of the tiles. This tool requires the following data, LANE(#), BASECALLS_DIR, READ_STRUCTURE, and an input file listing the sample barcodes. Program will provide metrics including: the tot

  • picard CollectIlluminaLaneMetrics

    Collects Illumina lane metrics for the given BaseCalling analysis directory. This tool produces quality control metrics on cluster density for each lane of an Illumina flowcell. This tool takes Illumina TileMetrics data and places them into directories containing lane- and phasing-level metrics. In this context, phasing refers to the fraction of molecules that fall behind or jump ahead (prephasing

  • picard CollectBaseDistributionByCycle

    Chart the nucleotide distribution per cycle in a SAM or BAM fileThis tool produces a chart of the nucleotide distribution per cycle in a SAM or BAM file in order to enable assessment of systematic errors at specific positions in the reads. Interpretation notes: Increased numbers of miscalled bases will be reflected in base distribution changes and increases in the number of Ns. In general, we expe

  • picard CollectGcBiasMetrics

    Collect metrics regarding GC bias. This tool collects information about the relative proportions of guanine (G) and cytosine (C) nucleotides in a sample. Regions of high and low G + C content have been shown to interfere with mapping/aligning, ultimately leading to fragmented genome assemblies and poor coverage in a phenomenon known as 'GC bias'. Detailed information on the effects of GC bias on t

  • picard CollectHiSeqXPfFailMetrics

    Classify PF-Failing reads in a HiSeqX Illumina Basecalling directory into various categories. This tool categorizes the reads that did not pass filter (PF-Failing) into four groups. These groups are based on a heuristic that was derived by looking at a few titration experiments. After examining the called bases from the first 24 cycles of each read, the PF-Failed reads are grouped into the followi

  • picard CollectWgsMetrics

    Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments. This tool collects metrics about the fractions of reads that pass base- and mapping-quality filters as well as coverage (read-depth) levels for WGS analyses. Both minimum base- and mapping-quality values as well as the maximum read depths (coverage cap) are user defined. Note: Metrics labeled as percentage

  • picard CollectHsMetrics

    Collects hybrid-selection (HS) metrics for a SAM or BAM file. This tool takes a SAM/BAM file input and collects metrics that are specific for sequence datasets generated through hybrid-selection. Hybrid-selection (HS) is the most commonly used technique to capture exon-specific sequences for targeted sequencing experiments such as exome sequencing; for more information, please see the correspondin

  • picard CollectMultipleMetrics

    Collect multiple classes of metrics. This 'meta-metrics' tool runs one or more of the metrics collection modules at the same time to cut down on the time spent reading in data from input files. Available modules include CollectAlignmentSummaryMetrics, CollectInsertSizeMetrics, QualityScoreDistribution, MeanQualityByCycle, CollectBaseDistributionByCycle, CollectGcBiasMetrics, RnaSeqMetrics, Collect

  • picard CollectOxoGMetrics

    Collect metrics to assess oxidative artifacts. This tool collects metrics quantifying the error rate resulting from oxidative artifacts. For a brief primer on oxidative artifacts, see the GATK Dictionary. This tool calculates the Phred-scaled probability that an alternate base call results from an oxidation artifact. This probability score is based on base context, sequencing read orientation, and

  • picard CollectQualityYieldMetrics

    Collect metrics about reads that pass quality thresholds and Illumina-specific filters. This tool evaluates the overall quality of reads within a bam file containing one read group. The output indicates the total numbers of bases within a read group that pass a minimum base quality score threshold and (in the case of Illumina data) pass Illumina quality filters as described in the GATK Dictionary

  • picard CollectRawWgsMetrics

    Collect whole genome sequencing-related metrics. This tool computes metrics that are useful for evaluating coverage and performance of whole genome sequencing experiments. These metrics include the percentages of reads that pass minimal base- and mapping- quality filters as well as coverage (read-depth) levels. The histogram output is optional and for a given run, displays two separate outputs on

  • picard CollectRnaSeqMetrics

    Produces RNA alignment metrics for a SAM or BAM file. This tool takes a SAM/BAM file containing the aligned reads from an RNAseq experiment and produces metrics describing the distribution of the bases within the transcripts. It calculates the total numbers and the fractions of nucleotides within specific genomic regions including untranslated regions (UTRs), introns, intergenic sequences (between

  • picard CollectRrbsMetrics

    Collects metrics from reduced representation bisulfite sequencing (Rrbs) data. This tool uses reduced representation bisulfite sequencing (Rrbs) data to determine cytosine methylation status across all reads of a genomic DNA sequence. For a primer on bisulfite sequencing and cytosine methylation, please see the corresponding GATK Dictionary entry. Briefly, bisulfite reduction converts un-methylate

  • picard CollectSequencingArtifactMetrics

    Collect metrics to quantify single-base sequencing artifacts. This tool examines two sources of sequencing errors associated with hybrid selection protocols. These errors are divided into two broad categories, pre-adapter and bait-bias. Pre-adapter errors can arise from laboratory manipulations of a nucleic acid sample e.g. shearing and occur prior to the ligation of adapters for PCR amplification

  • picard CollectTargetedPcrMetrics

    Calculate PCR-related metrics from targeted sequencing data. This tool calculates a set of PCR-related metrics from an aligned SAM or BAM file containing targeted sequencing data. It is appropriate for data produced with multiple small-target technologies including exome sequencing an custom amplicon panels such as the Illumina TruSeq Custom Amplicon (TSCA) kit. If a reference sequence is provided

  • picard CollectWgsMetricsWithNonZeroCoverage

    Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments. This tool collects metrics about the percentages of reads that pass base- and mapping- quality filters as well as coverage (read-depth) levels. Both minimum base- and mapping-quality values as well as the maximum read depths (coverage cap) are user defined. This extends CollectWgsMetrics by including metri

  • picard CompareMetrics

    Compare two metrics files.This tool compares the metrics and histograms generated from metric tools to determine if the generated results are identical. This tool is useful to test and compare outputs when code changes are implemented. It is not meant for use by end-users of this toolkit. The tool's output simply indicates whether two metrics files are equal or not equal.

  • picard CompareSAMs

    Compare two input ".sam" or ".bam" files. This tool initially compares the headers of SAM or BAM files. If the file headers are comparable, the tool will examine and compare readUnmapped flag, reference name, start position and strand between the SAMRecords. The tool summarizes information on the number of read pairs that match or mismatch, and of reads that are missing or unmapped (stratified by

  • picard ConvertSequencingArtifactToOxoG

    Extract OxoG metrics from generalized artifacts metrics. This tool extracts 8-oxoguanine (OxoG) artifact metrics from the output of CollectSequencingArtifactsMetrics (a tool that provides detailed information on a variety of artifacts found in sequencing libraries) and converts them to the CollectOxoGMetrics tool's output format. This conveniently eliminates the need to run CollectOxoGMetrics if w

  • picard CreateSequenceDictionary

    Creates a sequence dictionary for a reference sequence. This tool creates a sequence dictionary file (with ".dict" extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools. The output file contains a header but no SAMRecords, and the header contains only sequence records. The reference sequence can be gzipped (both .fasta and .fasta.gz

  • picard CrosscheckFingerprints

    Checks if all fingerprints within a set of files appear to come from the same individual. The fingerprints are calculated initially at the readgroup level (if present) but can be "rolled-up" by library, sample or file, to increase power and provide results at the desired resolution. Regular output is in a "Moltenized" format, one row per comparison. In this format the output will include the LOD s

  • picard GatherBamFiles

    Concatenate one or more BAM files as efficiently as possibleThis tool performs a rapid "gather" operation on BAM files after scatter operations where the same process has been performed on different regions of a BAM file creating many smaller BAM files that now need to be concatenated (reassembled) back together. Assumes that the list of BAM files provided as INPUT are in the order that they shoul

  • picard DownsampleSam

    Downsample a SAM or BAM file. This tool applies a random downsampling algorithm to a SAM or BAM file to retain only a random subset of the reads. Reads in a mate-pair are either both kept or both discarded. Reads marked as not primary alignments are all discarded. Each read is given a probability P of being retained so that runs performed with the exact same input in the same order and with the sa

  • picard EstimateLibraryComplexity

    Estimates the numbers of unique molecules in a sequencing library. This tool outputs quality metrics for a sequencing library preparation. Library complexity refers to the number of unique DNA fragments present in a given library. Reductions in complexity resulting from PCR amplification during library preparation will ultimately compromise downstream analyses via an elevation in the number of dup

  • picard ExtractIlluminaBarcodes

    Tool determines the barcode for each read in an Illumina lane. This tool determines the numbers of reads containing barcode-matching sequences and provides statistics on the quality of these barcode matches. Illumina sequences can contain at least two types of barcodes, sample and molecular (index). Sample barcodes (B in the read structure) are used to demultiplex pooled samples while index barcod

  • picard ExtractSequences

    Subsets intervals from a reference sequence to a new FASTA file. This tool takes a list of intervals, reads the corresponding subsquences from a reference FASTA file and writes them to a new FASTA file as separate records. Note that the reference FASTA file must be accompanied by an index file and the interval list must be provided in Picard list format. The names provided for the intervals will b

  • picard FastqToSam

    Converts a FASTQ file to an unaligned BAM or SAM file. This tool extracts read sequences and base qualities from the input FASTQ file and writes them out to a new file in unaligned BAM (uBAM) format. Read group information can be provided on the command line. Three versions of FASTQ quality scales are supported: FastqSanger, FastqSolexa and FastqIllumina (see http://maq.sourceforge.net/fastq.shtml

  • picard FilterSamReads

    Subset read data from a SAM or BAM fileThis tool takes a SAM or BAM file and subsets it to a new file that either excludes or only includes either aligned or unaligned reads (set using FILTER), or specific reads based on a list of reads names supplied in the READ_LIST_FILE.

  • picard FindMedelianViolations

    Finds mendelian violations of all types within a VCF. Takes in VCF or BCF and a pedigree file and looks for high confidence calls where the genotype of the offspring is incompatible with the genotypes of the parents. Assumes the existence of format fields AD, DP, GT, GQ, and PL fields. Take note that the implementation assumes that reads from the PAR will be mapped to the female chromosome rather

  • picard FixMateInformation

    Verify mate-pair information between mates and fix if needed. This tool ensures that all mate-pair information is in sync between each read and its mate pair. If no OUTPUT file is supplied then the output is written to a temporary file and then copied over the INPUT file. Reads marked with the secondary alignment flag are written to the output file unchanged.

  • picard IlluminaBasecallsToFastq

    Generate FASTQ file(s) from Illumina basecall read data. This tool generates FASTQ files from data in an Illumina BaseCalls output directory. Separate FASTQ files are created for each template, barcode, and index (molecular barcode) read. Briefly, the template reads are the target sequence of your experiment, the barcode sequence reads facilitate sample demultiplexing, and the index reads help mit

  • picard IluminaBasecallsToSam

    Transforms raw Illumina sequencing data into an unmapped SAM or BAM file. The IlluminaBaseCallsToSam program collects, demultiplexes, and sorts reads across all of the tiles of a lane via barcode to produce an unmapped SAM/BAM file. An unmapped BAM file is often referred to as a uBAM. All barcode, sample, and library data is provided in the LIBRARY_PARAMS file. Note, this LIBRARY_PARAMS file shoul

  • picard IntervalListTools

    Manipulates interval lists. This tool offers multiple interval list file manipulation capabilities include sorting, merging, subtracting, padding, customizing, and other set-theoretic operations. If given one or more inputs, the default operation is to merge and sort them. Other options e.g. interval subtraction are controlled by the arguments. The tool lists intervals with respect to a reference

  • picard LiftoverVcf

    Lifts over a VCF file from one reference build to another. This tool adjusts the coordinates of variants within a VCF file to match a new reference. The output file will be sorted and indexed using the target reference build. To be clear, REFERENCE_SEQUENCE should be the target reference build. The tool is based on the UCSC liftOver tool (see: http://genome.ucsc.edu/cgi-bin/hgLiftOver) and uses a

  • picard MakeSitesOnlyVcf

    Reads a VCF/VCF.gz/BCF and removes all genotype information from it while retaining all site level information, including annotations based on genotypes (e.g. AN, AF). Output an be any support variant format including .vcf, .vcf.gz or .bcf.

  • picard MarkDuplicates

    Identifies duplicate reads. This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA. Duplicates can arise during sample preparation e.g. library construction using PCR. See also EstimateLibraryComplexity for additional notes on PCR duplication artifacts. Duplicate reads can also result from a single amplificati

  • picard MarkDuplicatesWithMateCigar

    Identifies duplicate reads, accounting for mate CIGAR. This tool locates and tags duplicate reads (both PCR and optical) in a BAM or SAM file, where duplicate reads are defined as originating from the same original fragment of DNA, taking into account the CIGAR string of read mates. It is intended as an improvement upon the original MarkDuplicates algorithm, from which it differs in several ways,

  • picard MarkIlluminaAdapters

    Reads a SAM or BAM file and rewrites it with new adapter-trimming tags. This tool clears any existing adapter-trimming tags (XT:i:) in the optional tag region of a SAM file. The SAM/BAM file must be sorted by query name. Outputs a metrics file histogram showing counts of bases_clipped per read.

  • picard MergeBamAlignment

    Merge alignment data from a SAM or BAM with data in an unmapped BAM file. This tool produces a new SAM or BAM file that includes all aligned and unaligned reads and also carries forward additional read attributes from the unmapped BAM (attributes that are otherwise lost in the process of alignment). The purpose of this tool is to use information from the unmapped BAM to fix up aligner output. The

  • picard MergeSamFiles

    Merges multiple SAM and/or BAM files into a single file. This tool is used for combining SAM and/or BAM files from different runs or read groups, similarly to the "merge" function of Samtools (http://www.htslib.org/doc/samtools.html). Note that to prevent errors in downstream processing, it is critical to identify/label read groups appropriately. If different samples contain identical read group I

  • picard MergeVcfs

    Merges multiple VCF or BCF files into one VCF file. Input files must be sorted by their contigs and, within contigs, by start position. The input files must have the same sample and contig lists. An index file is created and a sequence dictionary is required by default.

  • picard NormalizeFasta

    Normalizes lines of sequence in a FASTA file to be of the same length.This tool takes any FASTA-formatted file and reformats the sequence to ensure that all of the sequence record lines are of the same length (with the exception of the last line). Although the default setting is 100 bases per line, a custom line_length can be specified by the user. In addition, record names can be truncated at the

  • picard PositionBasedDownsampleSam

    Class to downsample a BAM file while respecting that we should either get rid of both ends of a pair or neither end of the pair. In addition, this program uses the read-name and extracts the position within the tile whence the read came from. The downsampling is based on this position. Results with the exact same input will produce the same results. Note 1: This is technology and read-name depende

  • picard QualityScoreDistribution

    Chart the distribution of quality scores. This tool is used for determining the overall 'quality' for a library in a given run. To that effect, it outputs a chart and tables indicating the range of quality scores and the total numbers of bases corresponding to those scores. Options include plotting the distribution of all of the reads, only the aligned reads, or reads that have passed the Illumina

  • picard UmiAwareMarkDuplicatesWithMateCigar

    Identifies duplicate reads using information from read positions and UMIs. This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA. It is based on the MarkDuplicatesWithMateCigar tool, with added logic to leverage Unique Molecular Identifier (UMI) information. In addition to assuming that all members of a dupli

  • picard ValidateSamFile

    Validates a SAM or BAM file. This tool reports on the validity of a SAM or BAM file relative to the SAM format specification. This is useful for troubleshooting errors encountered with other tools that may be caused by improper formatting, faulty alignments, incorrect flag values, etc. By default, the tool runs in VERBOSE mode and will exit after finding 100 errors and output them to the console (