I recently had to inspect some genomic alignments as part of a project. Usually, I am just working with BAM files and if inspection is needed, I just visualize the pileups to see what is going on.
In this case, I just wanted a quick answer to how the reads were aligning to the reference, and I didn’t want to go through the process of subsetting and copying the BAM files to my local machine.
The SAM file is the uncompressed record of the read alignments produced by an aligner method (STAR, TopHat, BWA, etc….). This file can get very large, and so is usually compressed into BAM (faster for machine parsing, but not human readable) and the SAM file is discarded.
In my case, I still had the SAM files around to inspect. If you find yourself needing to read a SAM file, here are three helpful reference tools to make the process less painful:
1) This page has an enormous amount of detail about SAM files including this helpful chart that enumerates all of the fields that you can expect to find specified within each alignment:
2) This post from the blog “zenfractal.com” contains a great exposition on CIGAR strings and how to decode them:
3) And finally, if you’re trying to decode the SAM bitwise flags, you can calculate them using this tool from the Broad Institute: