awk - group adjacent rows by identical columns

Liang always brings me interesting quiz questions. Here is one:

If i have a table like below:

chr1	113438	114495	1	chr1	114142	114143

chr1	113438	114495	2	chr1	114171	114172

chr1	170977	174817	1	chr1	171511	171512

chr1	170977	174817	2	chr1	171514	171515

chr1	170977	174817	2	chr1	173545	173546

and I would like to collapse the rows if the first 3 columns are identical to make the following output:

chr1	113438	114495	114142,114143,114171,114172

chr1	170977	174817	171511,171512,171514,171515,173545,173546

Is there any easy awk approach to do it?

Since I am so rusty at awk, I had to google around to find the solution:

awk -F '\t' '

$1FS$2FS$3==x{

    printf ",%s,%s", $6, $7

    next

}

{

    x=$1FS$2FS$3

    printf "\n%s\t%s,%s", x, $6, $7

}

END {

    printf "\n"

}' test.txt

Assuming the input file is test.txt. Note that the input and output are both tab-separated.

Explanation:

x=$1FS$2FS$3: variable x stores the value of columns 1, 2, and 3 separated by field separator FS.

Print the first part of an output line (columns 1, 2, 3, 6, 7).

For next line, if columns 1, 2, and 3 equal x, print columns 6 and 7.

Group and then count:

https://stackoverflow.com/questions/14916826/awk-unix-group-by

have this text file:

name, age
joe,42
jim,20
bob,15
mike,24
mike,15
mike,54
bob,21

Trying to get this (count):

joe 1
jim 1
bob 2
mike 3

awk -F, 'NR>1{arr[$1]++}END{for (a in arr) print a, arr[a]}' file.txt

References:

http://azaleasays.com/2014/10/06/awk-group-adjacent-rows-by-identical-columns/

Group rows in text file and aggregate corresponding rows to column

keeping last record among group of records with common fields (awk)

awk - group adjacent rows by identical columns的更多相关文章

openpyxl使用sheet.rows或sheet.columns报TypeError: 'generator' object is not subscriptable解决方式
解决方案: 因为新版本的openpyxl使用rows或者columns返回一个生成器所以可以使用List来解决报错问题 >>> sheet.columns[0] Traceback ...
shell awk命令
语法: awk '{command}' filename 多个命令以分号分隔. awk 'BEGIN {command1} {command2} END{command3}' 注意:BEGIN , ...
MySQL如何优化GROUP BY ：松散索引扫描 VS 紧凑索引扫描
执行GROUP BY子句的最一般的方法:先扫描整个表,然后创建一个新的临时表,表中每个组的所有行应为连续的,最后使用该临时表来找到组并应用聚集函数.在某些情况中,MySQL通过访问索引就可以得到结果 ...
Crazy Rows
Problem You are given an N x N matrix with 0 and 1 values. You can swap any two adjacent rows of the ...
2009 Round2 A Crazy Rows （模拟）
Problem You are given an N x N matrix with 0 and 1 values. You can swap any two adjacent rows of the ...
awk 输出前 N 列的最简单方法
最近遇到一种场景,需要输出一个文本信息的前 N 列. 众所周知 cut 可以指定分隔符并指定列的范围,如 cut -d' ' -f-4 就是以空格为分隔符输出前 4 列.但是 cut 的分隔符只能是一 ...
Oracle 10gR2分析函数
Oracle 10gR2分析函数汇总 (Translated By caizhuoyi 2008‐9‐19) 说明: 1. 原文中底色为黄的部分翻译存在商榷之处,请大家踊跃提意见: 2. 原文中淡 ...
[转帖]Introduction to text manipulation on UNIX-based systems
Introduction to text manipulation on UNIX-based systems https://www.ibm.com/developerworks/aix/libra ...
R2—《R in Nutshell》读书笔记(连载)
R in Nutshell 前言例子(nutshell包) 本书中的例子包括在nutshell的R包中,使用数据,需加载nutshell包 install.packages("nutshe ...

随机推荐

MySql 应用语句
[1]MySQL基础语句 -- 查询mysql版本号 SELECT VERSION(); -- 创建数据库 DROP DATABASE IF EXISTS study; -- 如果存在先删除 CREA ...
numpy高级应用
reshape重塑数组 ravel 拉平数组 concatenate 最一般化的连接,沿一条轴连接一组数组 # print(np.concatenate([arr1,arr2],axis = 0)) ...
rgferg
dfgsdfg fdvgdsafg fgdfgdfg
flask框架----flask基础
知识点回顾 1.flask依赖wsgi,实现wsgi的模块:wsgiref,werkzeug,uwsgi 2.实例化Flask对象,里面是有参数的 app = Flask(__name__,templ ...
ubuntu_virtualenv
sudo pip install virtualenv 1.安装virtualenv(需要先安装pip): $ [sudo] pip install virtualenv 2.创建虚拟环境: $ vi ...
await
单个的task await task 多个await asyncio.wait(tasks)
远程图片转化为base64
远程图片转化为base64 <?php /* * * 第一种方法 * 远程图片转化为base64,只支持http(推荐使用) * */ public static function imgUrl ...
Spring Boot(二十)：使用spring-boot-admin对spring-boot服务进行监控
Spring Boot(二十):使用spring-boot-admin对spring-boot服务进行监控 Spring Boot Actuator提供了对单个Spring Boot的监控,信息包含: ...
php路由
打开httpd.ini添加: RewriteRule (.*)$ /index\.php\?s=$1 [I] 高版本打开web.Config添加节点:<rewrite> <rules ...
Angular4.x 中的服务
Angular4.x 中的服务写下前面学习angular4.x中的服务需要借助 ToDoList 项目来做,项目代码在上篇博客中讲到. https://www.cnblogs.com/wjw101 ...

awk - group adjacent rows by identical columns

awk - group adjacent rows by identical columns的更多相关文章

随机推荐

热门专题