Artificial intelligent assistant

What are unmapped reads? I am mapping some reads to a reference genome (hg19). I used bamtools to have the percentage of mapped vs unmapped reads. Total reads: 1150004 Mapped reads: 1052983 (91.5634%) Forward strand: 624067 (54.2665%) Reverse strand: 525937 (45.7335%) Failed QC: 0 (0%) Duplicates: 0 (0%) Paired-end reads: 1150004 (100%) 'Proper-pairs': 1046208 (90.9743%) Both pairs mapped: 1049400 (91.2519%) Read 1: 575002 Read 2: 575002 Singletons: 3583 (0.311564%) Average insert size (absolute value): 534.479 Median insert size (absolute value): 215 My question: What represent exactly these unmapped reads? What information do they tell us? How can we analyze them? Thanks

In transcriptome (RNA-Seq) libraries, unmapped reads are reads that fail to map to known exons. More often than not, they represent genomic DNA. RSeQC has tools to help identify genomic (also sometimes called intergenic) sequences.

For the most part, the percentage of unmapped reads just provides you with QC information, like how much gDNA you purified along with your mRNA. It's unlikely to be very useful beyond that, but it really comes down to what you're looking for in those sequences.

If you're doing DNA-Seq, then unmapped reads generally represent reads that failed to map unambiguously to known sequences. This will depend on a threshold (usually provided by the user) for alignment stringency.

xcX3v84RxoQ-4GxG32940ukFUIEgYdPy 31d0e9fc2b3d37e3912b2c87f32b8d4a