【转载】谷歌集群数据分析 clusterdata-2011-2

原文地址：

https://www.twblogs.net/a/5c2dc304bd9eee35b21c418b/zh-cn

------------------------------------------------------------------------------------------------

本篇主要是解析数据集clusterdata-2011-2

by ——https://github.com/google/cluster-data

dataset的说明文档：https://drive.google.com/file/d/0B5g07T_gRDg9Z0lsSTEtTWtpOW8/view

数据集描述：The clusterdata-2011-2 trace represents 29 day's worth of cell information from May 2011, on a cluster of about 12.5k machines.

将csv文件导入到MySQL中的各表信息如下：（表结构在末尾）

job event表：

row1672923 286.86 MB (300,795,688)

index：jobid，btree 35.56 MB (37,285,888)

machine events表：

row：37780 2.99 MB (3,138,540)

machine attribute：

row：10748566 1.09 GB (1,175,642,124)

task constrains：

row：28485619 2.95 GB (3,163,127,240)

task usage：

row：1232799308 182.55 GB (196,015,089,972)

index：69.61 GB (74,743,799,808)

machineid（btree） jobid（btree）

task event：（导入数据有点问题，正在处理）

row：144648292 12.76 GB (13,700,652,148)

index： 6.90 GB (7,414,187,008)

machineid，jobid，username

explain part1:字段

explain part2:表格

part1.字段

一个job包含多个task，每一个task表示一个Linux项目，可能有多个进程。

timestamp：以微秒为单位，在日志开始前600s开始计时（如20s开始的时间为620s）

0时刻的记录代表在日志记录之前发生的事件，因为作业可能在日志记录之前被提交。

2的63次方-1的时间为日志记录结束之后的事件。

job和machine的ID不会被复用，可以当作唯一表识。（machineID重复可能是由于一个机器被移除集群后又重新加了进来，jobID重复可能是一个job被停止然后配置重新启动）

user和job的name被hash了，为了保密以及测试时相同。

machine event type：0.add 1.remove 2.update

job和task的event type：0.submit 1.schedule 2.evict 3.fail 4.kill 5.finish 6.lost 7.update_pending 8.update_running

priority：0为最低的

infrastructure (11)—this is the highest (most entitled to get resources) priority in the trace and accounts for most of the recorded disk I/O, so we speculate it includes some storage services;
monitoring (10)
normal production (9)—this is the lowest (and most occupied) of the priorities labeled ‘production’. The trace providers indicate that jobs at this priority and higher which are latency-sensitive should not be “evicted due to over-allocation of machine resources” .
other (2-8) — we speculate that these priorities are dominated by batch jobs;
gratis (free) (0-1) — the trace providers indicate that resources used by tasks at these priorities are generally not charged.

missing info：正常数据为NULL，丢失数据为0-2.

0.SNAPSHOT_BUT_NO_TRANSITION：we did not find a record representing the given event, but a later snapshot of the job or task state indicated that the transition must have occurred. The timestamp of the synthesized event is the timestamp of the snapshot.

1.NO_SNAPSHOT_OR_TRANSITION : we did not find a record representing the given termination event, but the job or task disappeared from later snapshots of cluster states, so it must have been terminated. The timestamp of the synthesized
event is a pessimistic upper bound on its actual termination time assuming it could have legitimately been missing from one snapshot.
2.EXISTS_BUT_NO_CREATION : we did not find a record representing the creation of the given task or job. In this case, we may be missing metadata (job name, resource requests, etc.) about the job or task and we may have placed SCHEDULE or SUBMIT events latter than they actually are.

scheduleclass，该类粗略地表示作业的延迟敏感程度。调度类型由一个数字表示，3表示一个对延迟比较敏感的作业，0表示一个非生产任务（例如:非关键业务分析等）

comparison operator：？？

怎么比的不明白。。。

小于(2)，大于(3)：将机器属性表示为整数(或0，如果属性不存在)，然后将其与提供的属性值进行比较。这些比较严格小于和严格大于;等于(0)，不等于(1)：机器属性表示为字符串(或空字符串如果它不存在的话),然后比较所提供的属性值。（翻译文档）

part2:

table：

1.Machine events
Each machine is described by one or more records in the machine event table. The majority of records describe machines that existed at the start of the trace.
1. timestamp
2. machine ID
3. event type
4. platform ID
5. capacity: CPU
6. capacity: memory

2.job event&task event

The two event tables describe jobs/tasks and their lifecycles. The constraints table describes task placement constraints that restrict the machines onto which tasks can schedule.

The simplest case is shown by the top path in the diagram above: a job is SUBMITted and gets put into a pending queue; soon afterwards, it is SCHEDULEd onto a machine and starts running; some time later it FINISHes successfully.

先提交（0），然后进队（1），之后完成（4）

3.task usage

这篇博客详细解释了https://blog.csdn.net/yangss123/article/details/78298749

生成的中间表有

分别是各平台内包含的机器id，以及所有中等优先级的task（priority为2-8），以及所有成功进入队列的task（event type为1）的表，并建立相应的索引。（使用中间表后，检索时间由数小时级别下降到1min以内）

----------------------------------------------------------------------------------

【转载】谷歌集群数据分析 clusterdata-2011-2的更多相关文章

【转载】 clusterdata-2011-2 谷歌集群数据分析（三）
原文地址: https://blog.csdn.net/yangss123/article/details/78306270 由于原文声明其原创文章不得允许不可转载,故这里没有转载其正文内容. --- ...
【转载】 clusterdata-2011-2 谷歌集群数据分析（二）--task_usage
原文地址: https://blog.csdn.net/yangss123/article/details/78298749 由于原文声明其原创文章不得允许不可转载,故这里没有转载其正文内容. --- ...
【转载】 clusterdata-2011-2 谷歌集群数据分析（一）
原文地址: https://blog.csdn.net/yangss123/article/details/78298679 由于原文声明其原创文章不得允许不可转载,故这里没有转载其正文内容. --- ...
谷歌集群数据 clusterdata-2011-2 Cluster workload traces
谷歌集群数据 clusterdata-2011-2 https://github.com/google/cluster-data/blob/master/ClusterData2011_2.md 链接 ...
[转载] Redis集群搭建最佳实践
转载自http://blog.csdn.net/sweetvvck/article/details/38315149?utm_source=tuicool 要搭建Redis集群,首先得考虑下面的几个问 ...
转载 Tomcat集群配置学习篇-----分布式应用
Tomcat集群配置学习篇-----分布式应用现目前基于javaWeb开发的应用系统已经比比皆是,尤其是电子商务网站,要想网站发展壮大,那么必然就得能够承受住庞大的网站访问量:大家知道如果服务器访问 ...
window xp Apache与Tomcat集群配置--转载
转载地址:http://www.cnblogs.com/obullxl/archive/2011/06/09/apache-tomcat-cluster-config.html 一. 环境说明 Win ...
[转载] 一共81个，开源大数据处理工具汇总（下），包括日志收集系统/集群管理/RPC等
原文: http://www.36dsj.com/archives/25042 接上一部分:一共81个,开源大数据处理工具汇总(上),第二部分主要收集整理的内容主要有日志收集系统.消息系统.分布式服务 ...
转载:【Oracle 集群】RAC知识图文详细教程(九)--RAC基本测试与使用
文章导航集群概念介绍(一) ORACLE集群概念和原理(二) RAC 工作原理和相关组件(三) 缓存融合技术(四) RAC 特殊问题和实战经验(五) ORACLE 11 G版本2 RAC在LINUX ...

随机推荐

Spring源码窥探之：@Value
1. 首先定义实体 /** * @author 70KG * @Title: Apple * @Description: 苹果实体 * @date 2018/10/22下午9:26 * @From w ...
PostgreSQL物理坏块和文件损坏案例分享
作者简介王睿操,平安好医数据库架构岗,多年postgresql数据库运维开发工作.曾就职于中国民航信息,迪卡侬.对其他数据库产品也有一定的涉猎. 背景笔者最近发现很多朋友经常遇到PostgreSQ ...
.net大文件分块上传断点续传demo
IE的自带下载功能中没有断点续传功能,要实现断点续传功能,需要用到HTTP协议中鲜为人知的几个响应头和请求头. 一. 两个必要响应头Accept-Ranges.ETag 客户端每次提交下载请求时,服务 ...
nyar4psg: Cannot find a class or type named "MultiMarker"
Cannot find a class or type named "MultiMarker" 是一种常见错误,产生的原因是Library里面有1个以上的ar库. 以我的电脑为例, ...
如何解决”ArcGIS Server Site is currently being configured by another administrative operation“的问题
ArcGIS Server管理员在发布服务或对服务修改的时候,偶尔会遇到以下提示: “ArcGIS Server Site is currently being configured by anoth ...
洛谷P1514引水入城
题目搜索加贪心其实并不需要用到$DP$,搜索也是比较简单地搜索. 对于每个第一行的城市进行类似于滑雪那道题的搜索,然后记录最后一行它所覆盖的区间,易得一个一行城市只会有一个区间.然后可以在最后进 ...
A2T和T2A，===string和CString互转方法一：--用宏的方式
USES_CONVERSION它是在堆栈上分配空间的,也就是说你在你在函数未结束就不会被释放掉.所有要注意不要在一个函数中用while循环执行它,不然栈空间就马上会分配完(栈空间一般只有2M,很小). ...
Leetcode Majority Element系列摩尔投票法
先看一题,洛谷2397: 题目背景自动上次redbag用加法好好的刁难过了yyy同学以后,yyy十分愤怒.他还击给了redbag一题,但是这题他惊讶的发现自己居然也不会,所以只好找你题目描述 [h ...
ZR#990
ZR#990 解法: 首先,一个 $ k $ 进制的数的末尾 $ 0 $ 的个数可以这么判断 while(x) { x /= k; cnt++;//cnt为0的个数 } 因为这道题的 $ 0 $ 的个 ...
Tkinter 之NoteBook选项卡标签
一.参数说明参数作用 width 选项卡宽度,单位像素 height 选项卡高度 cursor 鼠标停留的样式 padding 外部空间填充,是个最多4个元素的列表 style 设置menubo ...

【转载】 谷歌集群数据分析 clusterdata-2011-2

【转载】 谷歌集群数据分析 clusterdata-2011-2的更多相关文章

随机推荐

热门专题

【转载】谷歌集群数据分析 clusterdata-2011-2

【转载】谷歌集群数据分析 clusterdata-2011-2的更多相关文章