1、Question: prep_reads.info vs. align_summary.txt
###参考:https://www.biostars.org/p/163356/
used TopHat to map my reads against their relative reference genome.
When I look inside prep_reads.info, I see:
- left_min_read_len=90
- left_max_read_len=90
- left_reads_in =24995053
- left_reads_out=24994132
- right_min_read_len=90
- right_max_read_len=90
- right_reads_in =24995053
- right_reads_out=24994422
Then when I open align_summary.txt, I see:
Left reads:
Input: 24995053
Mapped: 22715900 (90.9% of input)
of these: 2106892 ( 9.3%) have multiple alignments (89 have >20)
Right reads:
Input: 24995053
Mapped: 22310498 (89.3% of input)
of these: 2088630 ( 9.4%) have multiple alignments (148 have >20)
90.1% overall read alignment rate.
Aligned pairs: 21074559
of these: 1469415 ( 7.0%) have multiple alignments
and: 107380 ( 0.5%) are discordant alignments
83.9% concordant pair alignment rate.
In align_summary.txt I know the changes between "Input" number and "Mapped" is because some of reads are unmapped to reference genome. ^Ok^.
But for prep_reads.info I do not know why "_reads_out" numbers are different from "_reads_in" numbers
and If this difference is due to unmapped reads, why the difference is not equal to difference between the Input number and Mapped number in align_summary.txt?
<caption>Differences</caption>
prep_reads.info | align_summary.txt | |
---|---|---|
left | 24995053-24994132=921 | 24995053-22715900=2279153 |
right |
24995053-24994422=631 |
24995053-22310498=2684555 |
The difference is due to filtering for things such as read length. Some reads are too short, so they're excluded. This occurs before any mapping takes place.
I seeeeeee. I did not know thaaat. I thought we can eliminate short reads only by trimmomatic (MINLEN). I did not know mapping tools also eliminate some reads.
Well, "things such as read length". It's filtering for other things too. In your case, one of these "other things" is what's causing additional reads to get dropped, since your input is all 90 bases
1、Question: prep_reads.info vs. align_summary.txt的更多相关文章
- 2、Tophat align_summary.txt and samtools flagstat accepted_hits.bam disagree
###https://www.biostars.org/p/195758/ Left reads: Input : 49801387 Mapped : 46258301 (92.9% of input ...
- 【Linux】【一】linux 目录切换、创建目录和文件、编辑目录以及文件(txt)
以下 是在指定目录下创建文件夹目录,以及在该目录下创建txt文件进行编辑,保存. 然后删除相关文件以及目录的命令操作记录. 本操作记录中的命令简单解释: pwd 显示当前路径 ls 显示当前目录下的文 ...
- 爬虫-----爬取所有国家的首都、面积 ,并保存到txt文件中
# -*- coding:utf-8 -*- import urllib2import lxml.htmlfrom lxml import etree def main(): file = open( ...
- 8、显示程序占用内存多少.txt
方法一: 要加单元 PsAPI procedure TForm1.tmr1Timer(Sender: TObject); begin edt1.Text:= format('memory use: % ...
- 网站迁移服务器后CPU、内存飙升,设置robots.txt 问题
User-agent: SemrushBotDisallow: /User-agent: SemrushBot-SADisallow: /User-agent: SemrushBot-BADisall ...
- 『动善时』JMeter基础 — 26、使用txt文件实现JMeter参数化
目录 1.测试计划中的元件 2.数据文件内容 3.线程组元件内容 4.HTTP信息头管理器组件内容 5.CSV数据文件设置组件内容 6.HTTP请求组件内容 7.脚本运行结果 之前我们都是使用.csv ...
- jmeter分布式导致重复登录的问题、以及写txt、csv、统计行数
经常收到微信好友的各种问题咨询,今天分享一个比较有代表性的,希望对大家有所帮助. 一位微信好友的提问 问题如下: 问题分析 先简单介绍下服务端的处理逻辑,关于登录,服务端的逻辑一般是:校验用户名.密码 ...
- mysql命令行的导入导出sql,txt,excel(都在linux或windows命令行操作)(转自筑梦悠然)
原文链接https://blog.csdn.net/wuhuagu_wuhuaguo/article/details/73805962 Mysql导入导出sql,txt,excel 首先我们通过命令行 ...
- python基础之迭代器、装饰器、软件开发目录结构规范
生成器 通过列表生成式,我们可以直接创建一个列表.但是,受到内存限制,列表容量肯定是有限的.而且,创建一个包含100万个元素的列表,不仅占用很大的存储空间,如果我们仅仅需要访问前面几个元素,那后面绝大 ...
随机推荐
- 大话设计模式--备忘录 Memento -- C++实现实例
1. 备忘录: 在不破坏封装性的前提下, 捕获一个对象的内部状态,并在该对象之外保存这个状态,这样以后可将该对象恢复到原先保存的状态. Originator 发起人: 负责创建一个备忘录Memento ...
- mysql 使用过程中出现问题
1. mysql_front连接报错,sql执行错误#3167的解决方案 提示:The 'INFORMATION_SCHEMA.SESSION_VARIABLES' feature is disabl ...
- jquery中bind,live,delegate,on的区别
这几种方法都是绑定事件用到的,但是他们之间有些差别 bind(type,[data],fn) 为每个匹配元素的特定事件绑定事件处理函数 例如: <ul> <a href=" ...
- 分享知识-快乐自己:Caused by: org.hibernate.tool.schema.extract.spi.SchemaExtractionException: More than one table found in namespace (, ) : Dept (XXX)
在命名空间(,)中找到多个表 - SchemaExtractionException? 问题: 尝试在Java应用程序中使用Hibernate将一些值保存到表中时,我一直面临着这个奇怪的异常. 但是, ...
- hibernate一级缓存和二级缓存的区别(转)
缓存是介于应用程序和物理数据源之间,其作用是为了降低应用程序对物理数据源访问的频次,从而提高了应用的运行性能.缓存内的数据是对物理数据源中的数据的复制,应用程序在运行时从缓存读写数据,在特定的时刻或事 ...
- 2018.5.8 Project review
1 .product introduced A. Function requirement (customer) The product function is control the 1KW and ...
- 关于c++中命名空间namespace
一.定义命名空间: 步骤一:在.h文件中:namespace ns{.......}//将定义的类和全局变量,全局函数写入花括号内. 步骤二:在.cpp文件中: using namespace ns ...
- 【leetcode刷题笔记】Search a 2D Matrix
Write an efficient algorithm that searches for a value in an m x n matrix. This matrix has the follo ...
- FFMPEG实现H264的解码(从源代码角度)
农历2014年底了,将前段时间工作中研究的FFMPEG解码H264流程在此做一下整理,也算作年终技术总结了! H264解码原理: H264的原理参考另一篇博文 http://blog.csdn.net ...
- MYSQL root密码修改找回命令
方法1: 用SET PASSWORD命令 mysql -u root mysql> SET PASSWORD FOR 'root'@'localhost' = PASSWORD('newpass ...