https://www.biostars.org/p/198405/

Quick question is: I have some mapped reads in bam file which have good read quality, but they have sam flag 0x200 which means they didn't pass the vendor check. Should I include them or not in downstream analysis?

Long question is: what' s the relationship between read quality score and Chastity score?

First, everybody may know read quality score:

Reads quality score(phred score) is calculated by -10*log(P(error_base)), P(error_base) represents the probability that the base is incorrect.

Second, I want to talk about Chastity score during the vendor check:

For reads in fastq format, there is a header field 'Y/N' which indicates whether the read pass filtering step. And the corresponding sam flag is 0x200, indicating "not passing filters, such as platform/vendor quality controls". How does Illumina set the filtering criteria?

As far as I know, read filtering by Illumina Real Time Analysis (RTA) happens during the run, and filtering is determined by Chastity score. Chastity Score is calculated by “the ratio of the highest of the four (base type) intensities to the sum of highest two”. Illumina described the vendor check as follows:

"To remove the least reliable data from the analysis, the raw data can be filtered to remove any clusters that have “too much” intensity corresponding to bases other than the called base. By default, the purity of the signal from each cluster is examined over the first 25 cycles and calculated as Chastity = Highest_Intensity / (Highest_Intensity + Next_Highest_Intensity) for each cycle. The new default filtering implemented at the base calling stage allows at most one cycle that is less than the Chastity threshold. The higher the value, the better. This value is very dependent on cluster density, since the major cause of an impure signal in the early cycles is the presence of another cluster within a few micrometers."

So, to my understanding, every cycle the Sequencer scan a cluster, there would be 4 kinds of signals from 4 bases(am I right?) the most significant base would the final choice. The bigger the signal intensity divergence is the better for base calling. For the first 25 cycles, Illumina allow at most one base with smaller signal intensity divergence, otherwise, Illumina would set the read as vendor failed. Is my understanding right so far?

But what is the relationship between the Phred score and the Chastity score? if they really have. Can I still use vendor failed reads if they have high phred score?

Thanks! Tao

ADD COMMENT • link •

Not following 

modified 10 months ago by ablanchetcohen • 1.1k • written 10 months ago by Tao • 110

 
1

Curious. Why are the vendor failed reads in your dataset?

ADD REPLY • linkmodified 10 months ago • written 10 months ago by genomax2 ♦ 26k
 

I downloaded the bam file from GTEx (dbGaP). The bam file contains all the reads, including mapped, unmapped, vendor failed reads. For a sample with ~100M reads, ~12M are labeled as vendor failed including both mapped and unmapped reads. Part of the vendor failed reads have read good quality. So, I'm not sure if I should include them.

ADD REPLY • linkmodified 10 months ago • written 10 months ago by Tao • 110
 
0
 
10 months ago by
Canada

I second this comment. You should contact your vendor. I have never seen reads failing the filtering step indicated in the header field of a FASTQ file being given to a client. Why include these reads? They just take up storage space, and are likely to induce errors in the downstream analysis. There was either an error in the setting of the flag, or a mistake in giving you the reads.

ADD COMMENT • linkwritten 10 months ago by ablanchetcohen • 1.1k
 
1

I checked the GTEx Project FAQ. The alignment was probably done in 2012, since TopHat v1.4.1 was used. This was the very dawn of RNA-Seq. The analyses dating back to this period are often suspicious since bioinformaticians were not yet familiar with RNA-Seq, and the software programs contained bugs more often than not. My recommendation is always to treat with suspicion any analysis results dating back to this period. Most likely, those preparing the data were not aware yet that these reads should be filtered out.

I would filter out all the "vendor failed reads", and redo the alignment using a more recent aligner, genome, and annotation. At least, that would be my recommendation based on my knowledge. To get a definitive answer, you could contact the staff at the GTex project.

ADD REPLY • linkwritten 10 months ago by ablanchetcohen • 1.1k
 

Thanks, your comments are very helpful!

ADD REPLY • linkwritten 10 months ago by Tao • 110
 

thanks for your comments. The sample is downloaded from a public project GTEx. I'm also confused why they deposit so many(10M vendor failed for a 100M sample) vendor-failed reads on dbGaP. In my study, I didn't realize this problem at first, which causing a big problem now. In your opinion, such reads should be removed without considering reads quality?

ADD REPLY • linkwritten 10 months ago by Tao • 110
 
1

Short answer yes.

They were "failed" by Illumina pre-processing software for a reason (e.g. mixed sequence from one cluster, phasing issues etc).

ADD REPLY • link

Question: Should I use reads with good quality but failed-vendor flag?--biostart for vendor quality的更多相关文章

  1. 去除外显子低质量reads时弹出错误“Invalid quality score value (char '#' ord 35 quality value -29) on line 4”和“Invalid quality score value (char '.' ord 46 quality value -18) on line 12”的解决方法

    楼主跑以下程序时分别弹出了“fastq_quality_filter: Invalid quality score value (char '.' ord 46 quality value -18) ...

  2. Quality assessment and quality control of NGS data

    http://www.molecularevolution.org/resources/activities/QC_of_NGS_data_activity_new table of contents ...

  3. XSplit Quality, VBV-Buffer, VBV-Maxrate and Preset Settings

    XSplit uses the x264 encoder, so let's start off by saying that parameters mentioned in the title, w ...

  4. Quality 是什么?

    Quality 是什么? 通常,我们谈及 Quality(质量)时,最常见的问题就是:Quality 是什么? 有很多业界先驱和研究人员已经回答了这个问题,我在这里并不会再给出一个新的答案.在学习总结 ...

  5. 视频主观质量评价工具:MSU Perceptual Video Quality tool

    MSU Perceptual Video Quality tool是莫斯科国立大学(Moscow State University)的Graphics and Media Lab制作的一款视频主观评价 ...

  6. software quality assurance 常见问题收录

    1. What is Quality? Quality means, “meeting requirements.” ..Whether or not the product or service d ...

  7. ITU-T G.1080 IPTV的体验质量(QoE)要求 (Quality of experience requirements for IPTV services)

    IPTV的服务质量(QoE)要求 Quality of experience requirements for IPTV services Summary This Recommendation de ...

  8. unity 质量设置 Quality Settings

    Unity allows you to set the level of graphical quality it will attempt to render. Generally speaking ...

  9. samtools常用命令详解

    samtools的说明文档:http://samtools.sourceforge.net/samtools.shtmlsamtools是一个用于操作sam和bam文件的工具合集.包含有许多命令.以下 ...

随机推荐

  1. mysql字符串根据指定字符分割

    1.分割函数:SUBSTRING_INDEX('浙江温州-中国电信','-','1') 2.用例(筛选'-'前至少4个汉字的数据) a.数据分布 b.筛选sql select t.mobile_num ...

  2. [ASP.NET]从Request.Url获取根网址的最简单方法

    在拼接绝对路径的网址时,经常需要从Request.Url中获取根网址(比如http://www.cnblogs.com),然后与相对路径一起拼接为绝对路径. 以前的做法如下: var uri = Re ...

  3. Expedition---poj2431(优先队列-堆的实现)

    题目链接:http://poj.org/problem?id=2431 题意:一辆卡车需要行驶 L 距离,车上油的含量为 P,在行驶的过程中有 n 个加油站 每个加油站到终点的距离是ai,每个加油站最 ...

  4. rpc、socket、mq

    关于RPC与MQ异同的理解 相同:1.都利于大型系统的解耦:2.都提供子系统之间的交互,特别是异构子系统(如java\node等不同开发语言):不同:1.RPC侧重功能调用,因此多半是同步的:备注:也 ...

  5. git-【六】分支的创建与合并

    在版本回填退里,已经知道,每次提交,Git都把它们串成一条时间线,这条时间线就是一个分支.截止到目前,只有一条时间线,在Git里,这个分支叫主分支,即master分支.HEAD严格来说不是指向提交,而 ...

  6. POI - Excel API

    一.概述    1. Apache POI是Apache软件基金会的开放源码函式库,POI提供API给java程式对Microsoft Office格式档案读和写的功能.    2. 结构       ...

  7. 一个简单的3D范例,是在别人基础上面整理的。

    一个简单的范例,是在别人基础上面整理的.原来的例子,框图太乱了,没有条理感. http://pan.baidu.com/s/1eQTyGCE

  8. rest-framework框架的基本组件

    快速实例 Quickstart 大致步骤 (1)创建表,数据迁移 (2)创建表序列化类BookSerializer class BookSerializer(serializers.Hyperlink ...

  9. 3.10 Templates -- Development Helpers

    一.Development Helpers Handlebar和Ember有好多个辅助器可以使模板开发更容易. 这些辅助器输出变量到浏览器的控制台,或者从模板中激活debugger. 二.Loggin ...

  10. 4.1 Routing -- Introduction

    一.Routing 1. 当用户与应用程序交互时,它会经过很多状态.Ember.js为你提供了有用的工具去管理它的状态和扩展你的app. 2. 要理解为什么这是重要的,假设我们正在编写一个Web应用程 ...