Let’s take a closer look at the term Big Data. To be honest, it’s become something of a loaded term, especially now that enterprise marketing engines have gotten hold of it. We’ll keep this discussion as grounded as possible.

让我们仔细思考下“大数据”这个词。老实,它已经成为当下的一种流行说法,特别是现在企业营销方面已经紧紧地抓住了它来推广产品。我们会尽可能地继续这个讨论。

http://www.uifanr.com/

What is Big Data? Several definitions are floating around, and we don’t believe that any of them explains the term clearly. Some definitions say that Big Data means the data is large enough that you have to think about it in order to gain insights from it. Others say it’s Big Data when it stops fitting on a single machine. These definitions are accurate in their own respect but not necessarily complete. Big Data, in our opinion, is a fundamentally different way of thinking about data and how it’s used to drive business value. Traditionally, there were transaction recording (OLTP) and analytics  (OLAP) on the recorded data. But not much was done to understand the reasons  behind the transactions or what factors contributed to business taking place the way it did, or to come up with insights that could drive the customer’s behavior directly. In the context of the earlier LinkedIn example, this could translate into finding missing connections based on user attributes, second-degree connections, and browsing behavior, and then prompting users to connect with people they may know. Effectively pursuing such initiatives typically requires working with a large amount of varied data.

大数据是什么?有好几个定义在那漂着呢,呵呵。我们不相信存在能清楚地解释它的任何术语。一些定义说,大数据意味着数据足够大,大到你觉得有必要好好地参考下它,以便从它获得一些见解。还有一些定义说,当一个数据无法用一台计算机装下的时候,就是大数据。这些定义有它自己的道理,但不一定是完全准确的。大数据,在我们看来,它是一种对数据和数据如何驱动业务价值的全新的思维方式。传统上, 我们有交易记录(OLTP)和交易记录的分析(OLAP)行为。但没有多少行为是为了理解交易背后的原因,影响业务发生方式的因素,或者提出可以直接驱动客户行为的见解。在早些时候LinkedIn的例子中,系统基于用户的属性,用户的二度人脉和用户的浏览行为等,发现并提示用户联系他们可能认识的人。想有效地实现这些功能效果通常需要处理大量不同的数据。

http://www.uifanr.com/

This new approach to data was pioneered by web companies like Google and Amazon, followed by Yahoo! and Facebook. These companies also wanted to work with different kinds of data, and it was often unstructured or semistructured (such as logs of users’ interactions with the website).  This required the system to process several orders of magnitude more data. Traditional relational databases were able to scale up to a great extent for some use cases, but doing so often meant expensive licensing and/or complex application logic. But owing to the data models they provided, they didn’t do a good job of working with evolving datasets that didn’t adhere to the schemas defined up front. There was a need for systems that could work with different kinds of data formats and sources without requiring strict schema definitions up front, and do it at scale. The requirements were different enough that going back to the drawing board made sense to some of the internet pioneers, and that’s what they did. This was the dawn of the world of Big Data systems and NoSQL. (Some might argue that it happened much later, but that’s not the point. This did mark the beginning of a different way of thinking about data.)

这种针对数据的新方法是由网络公司首创的,一开始是谷歌和亚马逊,紧随其后的是雅虎和Facebook。这些公司还想处理不同类型的数据,而且这些数据经常是非结构化或半结构化的(比如用户与网站的交互日志)。这需要系统处理多好几个数量级的数据。传统的关系数据库能够通过扩展在很大程度满足一些应用系统的需求,但是这样做往往意味着昂贵的许可费用和(或)复杂的应用程序逻辑。同时由于他们需要使用数据模型,而数据集并不遵循预先定义的模式,所以他们并不能很好地处理不断发展变化的数据集。于是我们需要一种应用系统,能够处理不同类型的数据格式,数据来源不需要严格的模型定义,并且还能做大规模的服务群集。需求是各不相同的,所以回到白板时期对一些互联网先驱来讲是有意义的,而且他们正在这么做。现在正是NoSQL和大数据系统的黎明期。(有些人可能会认为它的发生在太晚了,但这不是重点。它开启了一种不同的思考数据的方式。)

http://www.uifanr.com/

As part of this innovation in data management systems, several new technologies were built. Each solved different use cases and had a different set of design assumptions and features. They had different data models, too.

作为数据管理系统创新的一部分,目前业界已经出现了一些新的技术。每种技术都是为了解决不同的问题和拥有一些不同的设计理念和特性的,同时也有着不同的数据模型。

http://www.uifanr.com/

How did we get to HBase? What fueled the creation of such a system? That’s up next.

我们应该如何开启HBase的学习? 是什么原因推动人们去创建了这样一个数据系统? 这是我们的下一个话题,敬请期待。

http://www.uifanr.com/

3.HBase In Action 第一章-HBase简介(1.1.1 大数据你好呀)的更多相关文章

  1. 1.HBase In Action 第一章-HBase简介(后续翻译中)

    This chapter covers ■ The origins of Hadoop, HBase, and NoSQL ■ Common use cases for HBase ■ A basic ...

  2. 4.HBase In Action 第一章-HBase简介(1.1.2 数据创新)

    As we now know, many prominent internet companies, most notably Google, Amazon, Yahoo!, and Facebook ...

  3. 8.HBase In Action 第一章-HBase简介(1.2.2 捕获增量数据)

    Data often trickles in and is added to an existing data store for further usage, such as analytics, ...

  4. 7.HBase In Action 第一章-HBase简介(1.2.1 典型的网络搜索问题:Bigtable的起原)

    Search is the act of locating information you care about: for example, searching for pages in a text ...

  5. 6.HBase In Action 第一章-HBase简介(1.2 HBase的使用场景和成功案例)

    Sometimes the best way to understand a software product is to look at how it's used. The kinds of pr ...

  6. 5.HBase In Action 第一章-HBase简介(1.1.3 HBase的兴起)

    Pretend that you're working on an open source project for searching the web by crawling websites and ...

  7. 2.HBase In Action 第一章-HBase简介(1.1数据管理系统:快速学习)

    Relational database systems have been around for a few decades and have been hugely successful in so ...

  8. 第一章 C++简介

    第一章  C++简介 1.1  C++特点 C++融合了3种不同的编程方式:C语言代表的过程性语言,C++在C语言基础上添加的类代表的面向对象语言,C++模板支持的泛型编程. 1.2  C语言及其编程 ...

  9. python 教程 第一章、 简介

    第一章. 简介 官方介绍: Python是一种简单易学,功能强大的编程语言,它有高效率的高层数据结构,简单而有效地实现面向对象编程.Python简洁的语法和对动态输入的支持,再加上解释性语言的本质,使 ...

随机推荐

  1. C#中利用LightningChart绘制曲线图表

    最近在做一个“基于C#语言的电炉温控制软件设计”的设计,我在大学并不是专业学习C#语言编程的,对C#的学习研究完全是处于兴趣,所以编程技术也不是很厉害,遇到问题多参照网络上的开源码. 这不,在做这个课 ...

  2. java数组创建

    java数组创建:int[] number = new int[10]; int[]:表明这是一个数组 new int[10]:给前面的数组类型的number变量分配10个int类型的空间大小

  3. 新建hadoop用户以及用户组,给予sudo权限

    1.首先新建用户,adduser命令 sudo adduser hadoop passwd hadoop 输入密码之后,一路 y 确定. 2.添加用户组 在创建hadoop用户的同时也创建了hadoo ...

  4. Linux 安装mysql,mariadb,mysql主从同步

    myariadb安装 centos7 mariadb的学习 在企业里面,多半不会使用阿里云的mariadb版本,因为版本太低,安全性太低,公司会配置myariadb官方的yum仓库 1.手动创建mar ...

  5. 【学习笔记】---老男孩学Python,day1

    老早同学就推荐自己学编程了,因为各种事耽误了几年的时间,也可以说自己没有居安思危的意识吧… 直到今年2月份决定掏钱学线上课,但是又被兼职打断了,公司忙,兼职事多,拖来拖去只能把课程延期.这一拖就到了五 ...

  6. Python爬虫入门教程石家庄链家租房数据抓取

    1. 写在前面 这篇博客爬取了链家网的租房信息,爬取到的数据在后面的博客中可以作为一些数据分析的素材.我们需要爬取的网址为:https://sjz.lianjia.com/zufang/ 2. 分析网 ...

  7. [bug]解决chrome浏览器不支持所有媒体音乐不自动播放问题

    声音无法自动播放这个在 IOS/Android 上面一直是个惯例,桌面版的 Safari 在 2017 年的 11 版本也宣布禁掉带有声音的多媒体自动播放功能,紧接着在 2018 年 4 月份发布的 ...

  8. jQuery中的deferred对象和extend方法

    1⃣️deferred对象 deferred对象是jQuery的回调函数解决方案,它是从jQuery1.5.0版本开始引入的功能 deferred对象的方法 (1) $.Deferred() 生成一个 ...

  9. vue 结合 animate.css

    这里说的是vue2.0和animate.css的结合使用.其实用过就知道用法是比较简单的.但是就是刚开始使用的时候,难免有的会遇到各种问题.简单的说说我所用过并且遇过的坑. 首先是transition ...

  10. vs中nuget命令的用法

    一.安装 1.安装指定版本类库install-package <程序包名> -version <版本号>        ( 注意:-version <版本号> 可以 ...