picard

picard CollectHiSeqXPfFailMetrics

picard CollectHiSeqXPfFailMetrics

Version:
2.20.x
Identifier: TL_e24fa5_e4.c8
Tool

Description


Classify PF-Failing reads in a HiSeqX Illumina Basecalling directory into various categories. This tool categorizes the reads that did not pass filter (PF-Failing) into four groups. These groups are based on a heuristic that was derived by looking at a few titration experiments. After examining the called bases from the first 24 cycles of each read, the PF-Failed reads are grouped into the following four categories: * MISALIGNED - The first 24 basecalls of a read are uncalled (numNs~24). These types of reads appear to be flow cell artifacts because reads were only found near tile boundaries and were concentration (library) independent * EMPTY - All 24 bases are called (numNs~0) but the number of bases with quality scores greater than two is less than or equal to eight (numQGtTwo<=8). These reads were location independent within the tiles and were inversely proportional to the library concentration * POLYCLONAL - All 24 bases were called and numQGtTwo>=12, were independent of their location with the tiles, and were directly proportional to the library concentration. These reads are likely the result of PCR artifacts * UNKNOWN - The remaining reads that are PF-Failing but did not fit into any of the groups listed above The tool defaults to the SUMMARY output which indicates the number of PF-Failed reads per tile and groups them into the categories described above accordingly. A DETAILED metrics option is also available that subdivides the SUMMARY outputs by the x- y- position of these reads within each tile. To obtain the DETAILED metric table, you must add the PROB_EXPLICIT_READS option to your command line and set the value between 0 and 1. This value represents the fractional probability of PF-Failed reads to send to output. For example, if PROB_EXPLICIT_READS=0, then no metrics will be output. If PROB_EXPLICIT_READS=1, then it will provide detailed metrics for all (100%) of the reads. It follows that setting the PROB_EXPLICIT_READS=0.5, will provide detailed metrics for half of the PF-Failed reads. Note: Metrics labeled as percentages are actually expressed as fractions!

Tags

No Tags found

Biostars

No Biostars posts found