picard ExtractIlluminaBarcodes

Tool determines the barcode for each read in an Illumina lane. This tool determines the numbers of reads containing barcode-matching sequences and provides statistics on the quality of these barcode matches. Illumina sequences can contain at least two types of barcodes, sample and molecular (index). Sample barcodes (B in the read structure) are used to demultiplex pooled samples while index barcodes (M in the read structure) are used to differentiate multiple reads of a template when carrying out paired-end sequencing. Note that this tool only extracts sample (B) and not molecular barcodes (M). Barcodes can be provided in the form of a list (BARCODE_FILE) or a string representing the barcode (BARCODE). The BARCODE_FILE contains multiple fields including 'barcode_sequence_1', 'barcode_sequence_2' (optional), 'barcode_name', and 'library_name'. In contrast, the BARCODE argument is used for runs with reads containing a single barcode (nonmultiplexed) and can be added directly as a string of text e.g. BARCODE=CAATAGCG. Data is output per lane/tile within the BaseCalls directory with the file name format of 's_{lane}_{tile}_barcode.txt'. These files contain the following tab-separated columns: Read subsequence at barcode position Y or N indicating if there was a barcode match Matched barcode sequence (empty if read did not match one of the barcodes) The number of mismatches if there was a barcode match The number of mismatches to the second best barcode if there was a barcode match If there is no match but we're close to the threshold of calling it a match, we output the barcode that would have been matched but in lower case. Threshold values can be adjusted to accommodate barcode sequence mismatches from the reads. The metrics file produced by the ExtractIlluminaBarcodes program indicates the number of matches (and mismatches) between the barcode reads and the actual barcodes. These metrics are provided both per-barcode and per lane and can be found in the BaseCalls directory. For poorly matching barcodes, the order of specification of barcodes can cause arbitrary output differences.


