Easy and cheap cluster building on AWS backup
https://grapeot.me/easy-and-cheap-cluster-building-on-aws.html
Thu 17 July 2014 , by Yan Wang | 2 Comments Linux Parallel github ImageWhy?
It often requires a lot of computational resources to do machine learning / computer vision research, like extracting features from a lot of images, and training large-scale or many classifiers. Therefore people use more than one machines to do the task. The procedures are often like, copy executable/data files to all the machines, configure environments, manually divide the tasks, actually run the commands, and collect the results. In addition to the complicated workflow, another practical problem is where to get the machines. Maintaining your own cluster is definitely an option, an extremely expensive and time-costing option. Renting from AWS, especially using spot instances, is a much cheaper and more practical alternative.
But a lot of factors prevent them to be really useful (I assume you already know how spot instances work):
- Spot instances don't have persistent storage, which means whatever you have on the hard disk may lost in the next minute. How to deal with this?
- This property of spot instances also makes system configuration a problem -- how do you easily make a blank system usable?
- How to efficiently copy bulk of data to AWS?
- Manual task division and command execution doesn't sound right. How to make it easier and smarter (and faster)?
After quite a few months, I gradually accumulate a tool chain to handle all of these problems.
What will you get?

Here is an example of a 128-core 240GB cluster. It requires ~10 minutes to build it from scratch (or ~1 minute to build from AMI image), and costs about 1 dollar per hour. Like any AWS instances, the instances themselves cost nothing if you don't use them (by shutting them down). All your data will be on your hard disk and the loss due to spot request failure will be minimized. The best thing is, task submission is fairly simple -- one single line of bash command will do the job, like
cat cluster.sh | parallel --sshlogin 8/m1 --sshlogin 8/m2 --sshlogin 8/m3 --sshlogin 8/m4 bash -c '{}'
It will automatically distribute every line of cluster.sh to the four nodes, and display all the stdouts on your screen. Whenever a node has less than 8 tasks running, the script will automatically dispatch one to it.
How? (TL; DR)
- Use automated script to do fast system configuration.
- Use
sshfsto do selective file transfer with compression, including training data transfer and result collection. - Use GNU
parallelto do job submission. - AMI can also be used to further expedite virtual machine initialization
How?
- Create spot instances on AWS.
- On each machine, run
curl https://grapeot.me/aws.sh | shif that fits you. Orgit clone http://github.com/grapeot/debianinitand executesetup-ubuntu.shto initialize the system. Note the script is personalized for me withpythonandvimsupport. Folk it to add your own stuffs. - That's it for configuration. To submit jobs, use
parallel. Let's look at this example:
cat cluster.sh | parallel --sshlogin 8/m1 --sshlogin 8/m2 --sshlogin 8/m3 --sshlogin 8/m4 bash -c '{}'
We already explained what it means, and here are more details. For switches like --sshlogin 8/m1, --sshlogin means to send the task to remote machines. 8/m1 tells parallel to send it to a ssh host named m1, which you can configure in ~/.ssh/config, and maintain at most 8 tasks on that host. bash -c '{}' is the actual command to execute on the remote machine, with {} as the placeholder for each line from stdin. parallel is much more flexible than this, and I'd leave the exploration of more switches and usage to you. :)
Easy and cheap cluster building on AWS backup的更多相关文章
- Nacos Cluster Building
原文链接:https://www.javaspring.net/nacos/nacos-cluster-building Continue to talk about the Nacos build ...
- AWS backup
shadowsocks ssserver -c /etc/shadowsocks/config.json start/stop/reset
- AWS 免费套餐
AWS 免费套餐 转载自:https://aws.amazon.com/cn/free/?sc_channel=PS&sc_campaign=acquisition_CN&sc_pub ...
- AWS 存储服务(三)
目录 AWS S3 业务场景 挑战 解决方案 S3的好处 S3 属性 存储桶 Buckets 对象 Object S3 特性 S3 操作 可用性和持久性 一致性 S3 定价策略 S3高级功能 存储级别 ...
- Awesome Go
A curated list of awesome Go frameworks, libraries and software. Inspired by awesome-python. Contrib ...
- Go 语言相关的优秀框架,库及软件列表
If you see a package or project here that is no longer maintained or is not a good fit, please submi ...
- Awesome Go (http://awesome-go.com/)
A curated list of awesome Go frameworks, libraries and software. Inspired by awesome-python. Contrib ...
- Awesome Go精选的Go框架,库和软件的精选清单.A curated list of awesome Go frameworks, libraries and software
Awesome Go financial support to Awesome Go A curated list of awesome Go frameworks, libraries a ...
- RAC的QA
RAC: Frequently Asked Questions [ID 220970.1] 修改时间 13-JAN-2011 类型 FAQ 状态 PUBLISHED Appli ...
随机推荐
- 你真的了解restful api吗?
前言 在以前,一个网站的完成总是“all in one”,页面,数据,渲染全部在服务端完成,这样做的最大的弊端是后期维护,扩展极其痛苦,开发人员必须同时具备前后端知识.于是慢慢的后来兴起了前后端分离的 ...
- asp.net 虹软人脸识别sdk 释放内存
初始化时申请内存,用完记得释放,不然就会报“内存已满”的. 使用时: pMem = Marshal.AllocHGlobal(detectSize); 释放内存: Marshal.FreeHGloba ...
- const修饰函数
#include <iostream> using namespace std; class A { public: A(int age); void printAge() const; ...
- Git安装与使用
转载自:https://www.cnblogs.com/smuxiaolei/p/7484678.html git 提交 全部文件 git add . git add xx命令可以将xx文件添加到暂 ...
- Golang简单日志类
实现简单的日志写入文件功能运行环境:golang1.4.2+win7x64golang1.4.2+centos6.5×64 package Helper import ( “fmt” “log” “o ...
- alfred
1.alfred怎么设置默认的搜索项. https://www.zhihu.com/question/20205127 2.
- qrcode render 二维码扫描读取
著名的 qrcode 是 zxing https://github.com/zxing/zxing 基于 java, java 真的是轮子多啊... zxing 的 javascript 版本是 ht ...
- 如何模拟一个http请求并把response的内容保存下载下来,导出到excel中(结尾福利)
def doExport(self): # 模拟一个http请求 url = u'%s?dumptype=investigation&dumpid=%s&timezone=8' % ( ...
- m_Orchestrate learning system---三十六、如何修改插件的样式(比如ueditor)
m_Orchestrate learning system---三十六.如何修改插件的样式(比如ueditor) 一.总结 一句话总结:所有的js,html插件,修改样式无非是两种,一是直接修改css ...
- RESTful Web Services中API的设计原则(转)
当下前后端分离的设计已经是web app开发的标配,但是如何设计一个强壮,扩展性好,又规范的API呢 参考以下link,可以得到需要有益的启示.同时个人推荐一本书<web API的设计和开发&g ...