Chp10: Scalability and Memory Limits
The Step-by-Step Approach
break down a tricky problem and to solve problems using what you do know.
Step 1: Make Believe
Pretend that the data can all fit on one machine and there are no memory limitations. Provide the general outline for your solution.
Step 2: Get Real
figure out how to logically divide the data up, and how one machine would identify where to look up a different piece of data.
Step 3: Solve Problems
Dividing Up Lots of Data:
By Order of Appearance:
By Hash Value: 1)pick some sort of key relating to the data 2)hash the key 3)mod the hash value by the number of machines 4)store data on the machine with that value
there is no relationship between what the data represents and which machine stores data.
By Acutal Value: reduce system latency by using information about what the data represents.
Arbitrarily:
Good Example: Find all documents that contains a list of words.
10.1 build some sort of service that will be called by up to 1000 client applications to get simple end-of-day stock price information.
We want to start off by thinking about what the different aspects we should consider in a given proposal are:
1. Client Ease of Use: we want the service to be easy for the clients to implement and useful for them
2. Ease for Ourselves: consider in this not only the cost of implementing, but also the cost of maintenance
3. Flexibility for Future Demands:
4. Scalability and Efficiency: not to overly burden our service.
DataBase vs XML(json) P 343
10.2 good problem
10.7 LRU Cache
Chp10: Scalability and Memory Limits的更多相关文章
- is running beyond physical memory limits. Current usage: 2.0 GB of 2 GB physical memory used; 2.6 GB of 40 GB virtual memory used
		
昨天使用hadoop跑五一的数据,发现报错: Container [pid=,containerID=container_1453101066555_4130018_01_000067] GB phy ...
 - Memory Limits for Windows and Windows Server Releases
		
来源:https://msdn.microsoft.com/en-us/library/windows/desktop/aa366778(v=vs.85).aspx Limits on memory ...
 - [hadoop] - Container [xxxx] is running beyond physical/virtual memory limits.
		
当运行mapreduce的时候,有时候会出现异常信息,提示物理内存或者虚拟内存超出限制,默认情况下:虚拟内存是物理内存的2.1倍.异常信息类似如下: Container [pid=13026,cont ...
 - hive: insert数据时Error during job, obtaining debugging information 以及beyond physical memory limits
		
insert overwrite table canal_amt1...... 2014-10-09 10:40:27,368 Stage-1 map = 100%, reduce = 32%, Cu ...
 - hadoop is running beyond virtual memory limits问题解决
		
单机搭建了2.6.5的伪分布式集群,写了一个tf-idf计算程序,分词用的是结巴分词,使用standalone模式运行没有任何问题,切换到伪分布式模式运行一直报错: hadoop is running ...
 - hadoop的job执行在yarn中内存分配调节————Container [pid=108284,containerID=container_e19_1533108188813_12125_01_000002] is running beyond virtual memory limits. Current usage: 653.1 MB of 2 GB physical memory used
		
实际遇到的真实问题,解决方法: 1.调整虚拟内存率yarn.nodemanager.vmem-pmem-ratio (这个hadoop默认是2.1) 2.调整map与reduce的在AM中的大小大于y ...
 - [转载]Memory Limits for Windows and Windows Server Releases
		
Memory Limits for Windows and Windows Server Releases This topic describes the memory limits for sup ...
 - Kafka:ZK+Kafka+Spark Streaming集群环境搭建(十三)kafka+spark streaming打包好的程序提交时提示虚拟内存不足(Container is running beyond virtual memory limits. Current usage: 119.5 MB of 1 GB physical memory used; 2.2 GB of 2.1 G)
		
异常问题:Container is running beyond virtual memory limits. Current usage: 119.5 MB of 1 GB physical mem ...
 - Container [pid=6263,containerID=container_1494900155967_0001_02_000001] is running beyond virtual memory limits
		
以Spark-Client模式运行,Spark-Submit时出现了下面的错误: User: hadoop Name: Spark Pi Application Type: SPARK Applica ...
 
随机推荐
- 配置PostgreSQL Streaming Replication集群
			
运行环境: Primary: 192.168.0.11 Standby: 192.168.0.21, 192.168.0.22 OS: CentOS 6.2 PostgreSQL: 9.1.2 版本以 ...
 - 十个最常见的Java字符串问题
			
翻译自:Top 10 questions of Java Strings 1.怎样比较字符串?用”==”还是用equals()? 简单地说,”==”测试两个字符串的引用是否相同,equals()测试两 ...
 - Java 多线程的基本概念
			
一.线程介绍 多线程同时运行时,单CPU系统实际上是分给每个线程固定的时间片,用这种方式使得线程“看起来像是并行的”.在多CPU系统中,每个CPU可以单独运行一个线程,实现真正意义上的并行,但是如果线 ...
 - git记住用户名密码
			
git config --global credential.helper store
 - django_auth_ldap
			
使用django_auth_ldap来实现ldap和django自己的认证系统auth 下载插件 python-ldap和django_auth_ldap 配置settings.py 一些基本说明: ...
 - (转)使用getevent监听Android输入设备文件
			
尊重原创转载请注明:From AigeStudio(http://blog.csdn.net/aigestudio)Power by Aige 侵权必究! 炮兵镇楼 上一节Android事件分发完全解 ...
 - Oracle中的IF...THEN...ELSE判断
			
if...then...else是最常见的一种判断语句,他可以实现判断两种情况. 标准语法如下: if <condition_expression> then plsql_sentence ...
 - MySQL在ROW模式下通过binlog提取SQL语句
			
Linux基于row模式的binlog,生成DML(insert/update/delete)的rollback语句通过mysqlbinlog -v 解析binlog生成可读的sql文件提取需要处理的 ...
 - libcurl
			
一.LibCurl基本编程框架 二.一些基本的函数 三.curl_easy_setopt函数部分选项介绍 四.curl_easy_perform 函数说明(error 状态码) 五.libcurl使用 ...
 - 一个Java对象到底占用多大内存
			
在网上搜到了一篇博客讲的非常好,里面提供的这个类也非常实用: import java.lang.instrument.Instrumentation; import java.lang.reflect ...