Chp10: Scalability and Memory Limits
The Step-by-Step Approach
break down a tricky problem and to solve problems using what you do know.
Step 1: Make Believe
Pretend that the data can all fit on one machine and there are no memory limitations. Provide the general outline for your solution.
Step 2: Get Real
figure out how to logically divide the data up, and how one machine would identify where to look up a different piece of data.
Step 3: Solve Problems
Dividing Up Lots of Data:
By Order of Appearance:
By Hash Value: 1)pick some sort of key relating to the data 2)hash the key 3)mod the hash value by the number of machines 4)store data on the machine with that value
there is no relationship between what the data represents and which machine stores data.
By Acutal Value: reduce system latency by using information about what the data represents.
Arbitrarily:
Good Example: Find all documents that contains a list of words.
10.1 build some sort of service that will be called by up to 1000 client applications to get simple end-of-day stock price information.
We want to start off by thinking about what the different aspects we should consider in a given proposal are:
1. Client Ease of Use: we want the service to be easy for the clients to implement and useful for them
2. Ease for Ourselves: consider in this not only the cost of implementing, but also the cost of maintenance
3. Flexibility for Future Demands:
4. Scalability and Efficiency: not to overly burden our service.
DataBase vs XML(json) P 343
10.2 good problem
10.7 LRU Cache
Chp10: Scalability and Memory Limits的更多相关文章
- is running beyond physical memory limits. Current usage: 2.0 GB of 2 GB physical memory used; 2.6 GB of 40 GB virtual memory used
昨天使用hadoop跑五一的数据,发现报错: Container [pid=,containerID=container_1453101066555_4130018_01_000067] GB phy ...
- Memory Limits for Windows and Windows Server Releases
来源:https://msdn.microsoft.com/en-us/library/windows/desktop/aa366778(v=vs.85).aspx Limits on memory ...
- [hadoop] - Container [xxxx] is running beyond physical/virtual memory limits.
当运行mapreduce的时候,有时候会出现异常信息,提示物理内存或者虚拟内存超出限制,默认情况下:虚拟内存是物理内存的2.1倍.异常信息类似如下: Container [pid=13026,cont ...
- hive: insert数据时Error during job, obtaining debugging information 以及beyond physical memory limits
insert overwrite table canal_amt1...... 2014-10-09 10:40:27,368 Stage-1 map = 100%, reduce = 32%, Cu ...
- hadoop is running beyond virtual memory limits问题解决
单机搭建了2.6.5的伪分布式集群,写了一个tf-idf计算程序,分词用的是结巴分词,使用standalone模式运行没有任何问题,切换到伪分布式模式运行一直报错: hadoop is running ...
- hadoop的job执行在yarn中内存分配调节————Container [pid=108284,containerID=container_e19_1533108188813_12125_01_000002] is running beyond virtual memory limits. Current usage: 653.1 MB of 2 GB physical memory used
实际遇到的真实问题,解决方法: 1.调整虚拟内存率yarn.nodemanager.vmem-pmem-ratio (这个hadoop默认是2.1) 2.调整map与reduce的在AM中的大小大于y ...
- [转载]Memory Limits for Windows and Windows Server Releases
Memory Limits for Windows and Windows Server Releases This topic describes the memory limits for sup ...
- Kafka:ZK+Kafka+Spark Streaming集群环境搭建(十三)kafka+spark streaming打包好的程序提交时提示虚拟内存不足(Container is running beyond virtual memory limits. Current usage: 119.5 MB of 1 GB physical memory used; 2.2 GB of 2.1 G)
异常问题:Container is running beyond virtual memory limits. Current usage: 119.5 MB of 1 GB physical mem ...
- Container [pid=6263,containerID=container_1494900155967_0001_02_000001] is running beyond virtual memory limits
以Spark-Client模式运行,Spark-Submit时出现了下面的错误: User: hadoop Name: Spark Pi Application Type: SPARK Applica ...
随机推荐
- 各个公司的来源/The etymology of company
1.List of Company Etymology 下面的维基百科词条是比较有名的一些公司的名称的来源 List of company name etymologies 2.Atmel : adv ...
- CentOS6.4下使用默认的PDF文档阅读器出现乱码的解决方案
方法一:修改/etc/fonts/conf.d/49-sansserif.conf文件,如下: 1: <?xml version="1.0"?> 2: <!DOC ...
- wage
#include<iostream> using namespace std; int main() { double wage1,wage2,time; cout<<&quo ...
- unity 多线程
对于客户端来说,好的用户体验,需要保持一个快速响应的用户界面.于是便要求:网络请求.io操作等 开销比较大的操作必须在后台线程进行,从而避免主线程的ui卡顿.(注:协程也是主线程的一部分,进行大量的i ...
- 利用rsyslog 对linux 操作进行审计
环境:客户端和服务端都需要安装rsyslog服务 rsyslog server端 cd /etc/rsyslog.d/ cat server.conf $ModLoad imtcp $InputTC ...
- tp中让头疼似懂非懂的create
项目中多次用到create() 只能它是表单验证,不过好出错,痛下心扉好好了解理解它的来龙去脉和所用的用法 一:通过create() 方法或者 赋值的方法生成数据对象,然后写入数据库 $model = ...
- [转]PHP中fopen,file_get_contents,curl的区别
1. fopen /file_get_contents 每次请求都会重新做DNS查询,并不对 DNS信息进行缓存.但是CURL会自动对DNS信息进行缓存.对同一域名下的网页或者图片的请求只 ...
- MySQL语法语句大全
一.SQL速成 结构查询语言(SQL)是用于查询关系数据库的标准语言,它包括若干关键字和一致的语法,便于数据库元件(如表.索引.字段等)的建立和操纵. 以下是一些重要的SQL快速参考,有关SQ ...
- [C/C++] 各种C/C++编译器对UTF-8源码文件的兼容性测试(VC、GCC、BCB)
在不同平台上开发C/C++程序时,为了避免源码文件乱码,得采用UTF-8编码来存储源码文件.但是很多编译器对UTF-8源码文件兼容性不佳,于是我做了一些测试,分析了最佳保存方案. 一.测试程序 为了测 ...
- PySide 简易教程<二>-------工欲善其事,必先利其器
OK , 在Linux的开发环境下,对于我们的简短的PySide程序而言,不需要使用QtCreator,使用文本编辑器.之所以,使用文本编辑器,是因为小应用代码量很少,更重要的是一行行的写可以加深我们 ...