Use a regular expression for filtering sequences by id from a FASTA file, e.g. just certain chromosomes from a genome. There are other tools as part of bigger packages to install (and no regex support), mostly awk-based awkward (sorry for the pun) bash solutions, and scripts using packages that one needs to install and with still no support for regular expressions. This however is a simple, straightforward little python script for a simple task. It doesn’t do anything else and doesn’t need anything but a stock python installation. Based on the FASTA reader snippet.

Download here.

Usage:

python FASTAfilter.py [-h] regex infile outfile

From a FASTA-file with multiple >entries, filter by sequence ids using a
regex.

positional arguments:
regex Regex to filter entry ids, e.g. ‘chr[1-4]’. Note that the id does not contain the initial > character.
infile A FASTA input file, usually with multiple entries.
outfile The new file with only the matching entries.

optional arguments:
-h, –help show this help message and exit

INSTALL:

cd /data/software
wget http://dm516.user.srcf.net/fastafilter/FASTAfilter.zip
unzip FASTAfilter.zip
easy_install argparse

USAGE:

python FASTAfilter.py   [1-9,10,11,12,13,14,15,16,17,18,X]  \
/dat2/INPUT.fa \
/dat2/OUTPUT.fa

Error:

Traceback (most recent call last):
  File "FASTAfilter.py", line 3, in <module>
    import argparse
ImportError: No module named argparse

Solution:

run "easy_install argparse" as root user.

http://dm516.user.srcf.net/?p=314

Filter FASTA files的更多相关文章

  1. Extract Fasta Sequences Sub Sets by position

    cut -d " " -f 1 sequences.fa | tr -s "\n" "\t"| sed -s 's/>/\n/g' & ...

  2. elfinder中通过DirectoryStream.Filter实现筛选隐藏目录(二)

    今天还是没事看了看elfinder源码,发现之前说的两个版本实现都是基于不同的jdkelfinder源码浏览-Volume文件系统操作类(1), 带前端页面的是基于1.6中File实现,另一个是基于1 ...

  3. OpenFileDialog.Filter 属性

    如果 Filter 属性为 Empty,将显示所有文件. 始终显示文件夹. Filter 由以下部分组成:筛选器说明,后跟竖线 (|) 和筛选模式. 筛选器可以指定一个或多个文件类型. 说明描述了对话 ...

  4. python 高阶函数之filter

    前文说到python高阶函数之map,相信大家对python中的高阶函数有所了解,此次继续分享python中的另一个高阶函数filter. 先看一下filter() 函数签名 >>> ...

  5. Falcon Genome Assembly Tool Kit Manual

    Falcon Falcon: a set of tools for fast aligning long reads for consensus and assembly The Falcon too ...

  6. Linux command line exercises for NGS data processing

    by Umer Zeeshan Ijaz The purpose of this tutorial is to introduce students to the frequently used to ...

  7. 构建NCBI本地BLAST数据库 (NR NT等) | blastx/diamond使用方法 | blast构建索引 | makeblastdb

    参考链接: FTP README 如何下载 NCBI NR NT数据库? 下载blast:ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+ 先了解 ...

  8. STAR manual

    来源:STARmanual.pdf 来源:Calling variants in RNAseq PART0 准备工作 #STAR 安装前的依赖的工具 #Red Hat, CentOS, Fedora. ...

  9. &lt;二代測序&gt; 下载 NCBI sra 文件

    本文近期更新地址: http://blog.csdn.net/tanzuozhev/article/details/51077222 随着測序技术的不断提高.二代測序数据成指数增长. NCBI提供了S ...

随机推荐

  1. iOS 计算时间差

    /** * 计算指定时间与当前的时间差 * @param compareDate 某一指定时间 * @return 多少(秒or分or天or月or年)+前 (比如,3天前.10分钟前) */ +(NS ...

  2. IOS 计算本周的起至日期

    unsigned units=NSMonthCalendarUnit|NSDayCalendarUnit|NSYearCalendarUnit|NSWeekdayCalendarUnit; NSCal ...

  3. [LintCode] 带最小值操作的栈

    class MinStack { public: MinStack() { // do initialization if necessary } void push(int number) { // ...

  4. windosw启动redis

    1.cmd控制台 cd C:\Program Files\Redis 2.redis-server.exe redis.windows.conf 3. ok!!

  5. mysql 获取id最大值

    数据库表中id列不为自动增加,需要程序来增加id的SQL SELECTCASE IFNULL(MAX(id),1)WHEN 1 THEN 1ELSE MAX(id) + 1END AS newmaxi ...

  6. yum -y install epel-release

    EPEL - Fedora Project Wiki https://fedoraproject.org/wiki/EPEL

  7. Struts2 框架的值栈

    1. OGNL 表达式 1.1 概述 OGNL(Object Graphic Navigation Language),即对象图导航语言; 所谓对象图,即以任意一个对象为根,通过OGNL可以访问与这个 ...

  8. 004-shiro简介

    一.什么是shiro shiro是apache的一个开源框架,是一个权限管理的框架,实现 用户认证.用户授权. spring中有spring security (原名Acegi),是一个权限框架,它和 ...

  9. Linux环境配置全局jdk和局部jdk并生效

    全局jdk配置: 1.root用户登录 2.进入opt目录,新建java文件夹 cd  /opt mkdir java  上传jdk7u79linuxx64.tar.gz包到java文件夹并解压 jd ...

  10. 关于source insight、添加.s和.S文件,显示全部路径、加入项目后闪屏幕

    1.source insight使用也有一年多时间了,今天出现建工程后添加文件“no files found” 百思不得姐: 后面发现是原工程命名时出现非法字符.重新命名就ok了. 切记切记 2.实用 ...