【数据使用】3k水稻数据库现成SNP的使用

---恢复内容开始---

我们经常说幻想着使用已有数据发表高分文章，的确，这样的童话故事每天都在发生，但如何走出第一步我们很多小伙伴不清楚，那么我们就从水稻SNP数据库的使用来讲起。

http://snp-seek.irri.org/

这是3k的水稻变异库，上面保存着现成的SNP，由于数据过大，网站的维护方使用了Plink的格式来给我们在线储存SNP的信息，可以理解毕竟3025个水稻的全基因组SNP，怎么算都不是个小数。

Plink格式是如下三个文件：

base_filtered_v0.7.bed.gz

base_filtered_v0.7.bim.gz

base_filtered_v0.7.fam.gz

用Plink软件的“--recode”就可以把这三个软件转化为Vcf格式：

--recode [output format] < | > <tab | tabx | spacex | bgz | gen-gz>

         <include-alt> <omit-nonmale-y>

  Create a new text fileset with all filters applied.  The following output

  formats are supported:

  * '': 23andMe -column format.  This can only be used on a single

    sample's data (--keep may be handy), and does not support multicharacter

    allele codes.

  * 'A': Sample-major additive (//) coding, suitable for loading from R.

    If you need uncounted alleles to be named in the header line, add the

    'include-alt' modifier.

  * 'AD': Sample-major additive (//) + dominant (het=/hom=) coding.

    Also supports 'include-alt'.

  * 'A-transpose': Variant-major //.

  * 'beagle': Unphased per-autosome .dat and .map files, readable by early

    BEAGLE versions.

  * 'beagle-nomap': Single .beagle.dat file.

  * 'bimbam': Regular BIMBAM format.

  * 'bimbam-1chr': BIMBAM format, with a two-column .pos.txt file.  Does not

    support multiple chromosomes.

  * 'fastphase': Per-chromosome fastPHASE files, with

    .chr-[chr #].recode.phase.inp filename extensions.

  * 'fastphase-1chr': Single .recode.phase.inp file.  Does not support

    multiple chromosomes.

  * 'HV': Per-chromosome Haploview files, with .chr-[chr #][.ped + .info]

    filename extensions.

  * 'HV-1chr': Single Haploview .ped + .info file pair.  Does not support

    multiple chromosomes.

  * 'lgen': PLINK  long-format (.lgen + .fam + .map), loadable with --lfile.

  * 'lgen-ref': .lgen + .fam + .map + .ref, loadable with --lfile +

     --reference.

  * 'list': Single genotype-based list, up to  lines per variant.  To omit

    nonmale genotypes on the Y chromosome, add the 'omit-nonmale-y' modifier.

  * 'rlist': .rlist + .fam + .map fileset, where the .rlist file is a

    genotype-based list which omits the most common genotype for each

    variant.  Also supports 'omit-nonmale-y'.

  * 'oxford': Oxford-format .gen + .sample.  With the 'gen-gz' modifier, the

    .gen file is gzipped.

  * 'ped': PLINK  sample-major (.ped + .map), loadable with --file.

  * 'compound-genotypes': Same as 'ped', except that the space between each

    pair of same-variant allele codes is removed.

  * 'structure': Structure-format.

  * 'transpose': PLINK  variant-major (.tped + .tfam), loadable with

    --tfile.

  * 'vcf', 'vcf-fid', 'vcf-iid': VCFv4..  'vcf-fid' and 'vcf-iid' cause

    family IDs or within-family IDs respectively

to be used for the sample

    IDs in the last header row, while 'vcf' merges both IDs and puts an

    underscore between them.  If the 'bgz' modifier is added, the VCF file is

    block-gzipped.

    The A2 allele is saved as the reference and normally flagged as not based

    on a real reference genome (INFO:PR).  When it is important for reference

    alleles to be correct, you'll also want to include --a2-allele and

    --real-ref-alleles in your command.

  In addition,

  * The '' modifier causes A1 (usually minor) alleles to be coded as ''

    and A2 alleles to be coded as '', while '' maps A1 ->  and A2 -> .

  * The 'tab' modifier makes the output mostly tab-delimited instead of

    mostly space-delimited.  'tabx' and 'spacex' force all tabs and all

    spaces, respectively.

plink --bfile <prefix> --recode vcf-iid --out ./<out-prefix>

通过这种方式就可以把bed的信息转化为可用的vcf。

【数据使用】3k水稻数据库现成SNP的使用的更多相关文章

如何将MongoDB数据库的数据迁移到MySQL数据库中
FAQ v2.0终于上线了,断断续续忙了有2个多月.这个项目是我实践的第一个全栈的项目,从需求(后期有产品经理介入)到架构,再到设计(有征询设计师的意见).构建(前端.后台.数据库.服务器部署),也是 ...
数据导入导出Oracle数据库
临近春节,接到了一个导入数据的任务,在Linux客户端中的数据有50G,大约3亿3千万行: 刚开始很天真,把原始的txt/csv文件用sh脚本转化成了oralce 的insert into 语句,然后 ...
测试Oracle 11gr2 RAC 非归档模式下，offline drop数据文件后的数据库的停止与启动测试全过程
测试Oracle 11gr2 RAC 非归档模式下,offline drop数据文件后的数据库的停止与启动测试全过程最近系统出现问题,由于数据库产生的日志量太大无法开启归档模式,导致offline的 ...
怎样把excel的数据导入到sqlserver2000数据库中
在做程序的时候有时需要把excel数据导入到sqlserver2000中,以前没从外部导入过数据,今天刚做了一下导入数据,感觉还是蛮简单的,没做过之前还想着多么的复杂呢,下面就来分享一下我是如何把ex ...
学习springMVC框架配置遇到的问题-数据写入不进数据库时的处理办法
配置完了,运行,数据写入不到数据库中,就应该想UserAction 中的handleRequest()方法有没有进去,然后就设置断点.如果发现程序没有进去,就再想办法进去.
paip.导入数据英文音标到数据库mysql为空的问题之道解决原理
paip.导入数据英文音标到数据库mysql为空的问题之道解决原理 #---原因:mysql 导入工具的bug #---解决:使用双引号不个音标括起来. 作者老哇的爪子 Attilax 艾龙, E ...
python 读取SQLServer数据插入到MongoDB数据库中
# -*- coding: utf-8 -*-import pyodbcimport osimport csvimport pymongofrom pymongo import ASCENDING, ...
极限挑战—C#100万条数据导入SQL SERVER数据库仅用4秒 (附源码)
原文:极限挑战-C#100万条数据导入SQL SERVER数据库仅用4秒 (附源码) 实际工作中有时候需要把大量数据导入数据库,然后用于各种程序计算,本实验将使用5中方法完成这个过程,并详细记录各种方 ...
将DataTable 数据插入 SQL SERVER 数据库
原文:将DataTable 数据插入 SQL SERVER 数据库以下提供3中方式将DataTable中的数据插入到SQL SERVER 数据库: 一:使用sqlcommand.executenon ...

随机推荐

基于PHP给大家讲解防刷票的一些技巧
刷票行为,一直以来都是个难题,无法从根本上防止. 但是我们可以尽量减少刷票的伤害,比如:通过人为增加的逻辑限制. 基于 PHP,下面介绍防刷票的一些技巧: 1.使用CURL进行信息伪造 $ch = c ...
Golang 笔记 5 go语句
Go语句和通道类型是Go语言的并发编程理念的最终体现.与defer语句相同,go语句也可以携带一个表达式语句.Go语句的执行会很快结束,并不会对当前流程的进行造成阻塞或明显的延迟.一个简单的示例: ...
[python]socket.listen(backlog)中的backlog含义
http://www.nosa.me/2015/09/16/socket-listenbacklog-%E4%B8%AD-backlog-%E6%8C%87%E7%9A%84%E6%98%AF%E4% ...
python简述
python男神:龟叔三,python基础初识. 1,运行python代码. 在d盘下创建一个t1.py文件内容是: print('hello world') 打开windows命令行输入cmd,确 ...
[LeetCode] Majority Element 求大多数
Given an array of size n, find the majority element. The majority element is the element that appear ...
Codeforces Round #498 (Div. 3)
被虐惨了,实验室里数十位大佬中的一位闲来无事切题(二,然后出了5t,当然我要是状态正常也能出5,主要是又热又有蚊子什么的... 题都挺水的.包括F题. A: 略 B: 找k个最大的数存一下下标然后找段 ...
[Educational Round 3][Codeforces 609F. Frogs and mosquitoes]
这题拖了快一周_(:з」∠)_就把这货单独拿出来溜溜吧~ 本文归属:Educational Codeforces Round 3 题目链接:609F - Frogs and mosquitoes 题目 ...
一次完整的从webshell到域控的探索之路
前言内网渗透测试资料基本上都是很多大牛的文章告诉我们思路如何,但是对于我等小菜一直是云里雾里. 于是使用什么样的工具才内网才能畅通无阻,成了大家一直以来的渴求. 今天小菜我本着所有师傅们无私分享的精 ...
Python学习之旅（三十六）
Python基础知识(35):电子邮件(Ⅱ) 收取邮件就是编写一个MUA作为客户端,从MDA把邮件获取到用户的电脑或者手机上收取邮件最常用的协议是POP协议,目前版本号是3,俗称POP3 Pytho ...
java中基本类型double和对象类型Double
Double.valueOf(str)把String转化成Double类型的对象比如Stirng str="1.0";那么Double.valueOf(str)等价于new Dou ...

【数据使用】3k水稻数据库现成SNP的使用

【数据使用】3k水稻数据库现成SNP的使用的更多相关文章

随机推荐

热门专题