使用sqoop过程

With Sqoop, you can import data from a relational database system or a mainframe(主机) into HDFS. The input(投入) to the import process is either database table or mainframe datasets. For databases, Sqoop will read the table row-by-row into HDFS. For mainframe datasets, Sqoop will read records from each mainframe dataset into HDFS. The output(输出) of this import process is a set of files containing a copy of the imported table or datasets. The import process is performed in parallel(平行线). For this reason, the output will be in multiple files. These files may be delimited(划界) text files (for example, with commas or tabs separating each field), or binary(二进制的) Avro or SequenceFiles containing serialized(序列化) record data.
在Sqoop,你可以从关系型数据库或主机中导入数据到HDFS，导入过程的输入的数据要么是数据库表，要么是大型机数据集。如果是数据库，sqoop将以row-by-row的方式写进hdfs，如果是大型机的数据集，sqoop将在读取数据集中每条集合到hdfs。此导入过程是输出一组包含导入表或数据集副本的文件。这个导入过程是并行执行的。基于这个原因，输出的时候会在多个文件中。这些文件应该可能会分隔文本文件（例如，会以逗号或者tabs分割开每个field），或者binary Avro 或者序列文件包括序列化的数据记录

A by-product of the import process is a generated(生成的) Java class which can encapsulate(压缩) one row of the imported table. This class is used during the import process by Sqoop itself. The Java source code for this class is also provided to you, for use in subsequent(后来的) MapReduce processing of the data. This class can serialize and deserialize(并行化) data to and from the SequenceFile format. It can also parse(解析) the delimited-text form of a record. These abilities allow you to quickly develop MapReduce applications that use the HDFS-stored records in your processing pipeline(管道). You are also free to parse the delimiteds record data yourself, using any other tools you prefer.
导入过程的副产物是生成一个能压缩导入的数据表中一行java类，这个类在导入过程中由Sqoop自身使用。还向您提供了该类的Java源代码，用于数据的后续MapReduce处理。这个类可以序列化和反序列化数据到Sequence文件格式。它还可以解析带分隔符内容文件的记录。这些功能允许您快速开发MapReduce应用程序，这个应用程序在处理管道中使用hdfs存储的记录的。您也可以使用您喜欢的任何其他工具自行解析分隔记录数据。

After manipulating(操纵) the imported records (for example, with MapReduce or Hive) you may have a result data set which you can then export back to the relational database. Sqoop’s export process will read a set of delimited text files from HDFS in parallel, parse them into records, and insert them as new rows in a target database table, for consumption by external a pplications or users.
在操作导入的记录(例如，使用MapReduce或Hive)之后，您将有一个结果数据集，然后可以将其导出回关系数据库。sqoop的导出过程将并行地从HDFS读取一组分隔的文本文件，将它们解析为记录，并将它们作为新行插入目标数据库表中，供外部应用程序或用户使用

Sqoop includes some other commands which allow you to inspect the database you are working with. For example, you can list the available database schemas (with the sqoop-list-databases tool) and tables within a schema (with the sqoop-list-tables tool). Sqoop also includes a primitive(原始的) SQL execution(执行) shell(剥皮) (the sqoop-eval tool).
Sqoop包括一些其他命令，这些命令允许您检查正在使用的数据库。例如，可以列出可用的数据库集合(使用sqoop-list-database工具)和集合中的表(使用sqoop-list-table工具)。sqoop还包括一个基本的SQL执行shell(sqoop-val工具)。

Most aspects of the import, code generation, and export processes can be customized. For databases, you can control the specific row range or columns imported. You can specify particular delimiters（指定特定的分隔符） and escape characters（转义字符） for the file-based representation of the data, as well as the file format used. You can also control the class or package names used in generated(生成的) code. Subsequent(后来的) sections of this document explain how to specify these and other
大多数的导入、代码生成和导出过程的都可以定制。对于数据库，可以控制导入的特定行范围或列。可以为基于文件的数据表示指定特定的分隔符和转义字符，以及文件使用的格式。还可以控制生成代码中使用的类或包名称。本文档的后续部分将解释如何指定这些和其他方面。

使用sqoop过程的更多相关文章

Oozie 配合 sqoop hive 实现数据分析输出到 mysql
文件/RDBMS -> flume/sqoop -> HDFS -> Hive -> HDFS -> Sqoop -> RDBMS 其中,本文实现了使用 sqoo ...
（转） Sqoop使用实例讲解
原博客地址:http://blog.csdn.net/evankaka 摘要:本文主要讲了笔者在使用sqoop过程中的一些实例一.概述与基本原理 Apache Sqoop(SQL-to-Hadoop ...
c++ primer plus 第6版部分二 5- 8章
---恢复内容开始--- c++ primer plus 第6版部分二 5- 章第五章计算机除了存储外还可以对数据进行分析.合并.重组.抽取.修改.推断.合成.以及其他操作 1.for ...
记录sqoop同步失败问题解决过程，过程真的是很崎岖。(1月6日解决)
记录sqoop同步失败问题解决过程,过程真的是很崎岖.事发原因:最近突然出现sqoop export to mysql时频繁出错.看了下日志是卡在某条数据过不去了,看异常.看sqoop生成的mr并未发 ...
Sqoop import加载HBase过程中，遇到Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
在执行hbase sqoop抽取的时候,遇到了一个错误,如下图: 在执行程序的过程中,遇到权限问题很正常,也容易让人防不胜防,有问题就想办法解决,这个是关键. 解决办法如下: 第一步:su hdfs, ...
[Hadoop] Sqoop安装过程详解
Sqoop是一个用来将Hadoop和关系型数据库中的数据相互转移的工具,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS中,也可 ...
Sqoop安装与应用过程
1. 参考说明参考文档: http://sqoop.apache.org/ http://sqoop.apache.org/docs/1.99.7/admin/Installation.html ...
sqoop的数据抽取过程记录
今天公司抽取了4千万的表大概十几G 用sqoop抽取是30--40分钟开了两个map.模型是oracle----hdfs(hive).以前只抽过几十万级别,所以千万级别感觉还是spilt做好切分和定 ...
Hadoop学习笔记—18.Sqoop框架学习
一.Sqoop基础:连接关系型数据库与Hadoop的桥梁 1.1 Sqoop的基本概念 Hadoop正成为企业用于大数据分析的最热门选择,但想将你的数据移植过去并不容易.Apache Sqoop正在加 ...

随机推荐

如何在PIXI.js里面使用json文件来管理瓦片集(tileset)?
如何在PIXI.js里面使用json文件来管理瓦片集(tileset)? PIXI建议我们将素材图片汇总成一个瓦片集(tileset),然后用纹理地图集(texture atlas,通常是一个json ...
Dllmain的作用
DllMain函数是DLL模块的默认入口点.当Windows加载DLL模块时调用这一函数.系统首先调用全局对象的构造函数,然后调用全局函数 DLLMain.DLLMain函数不仅在将DLL链接加载到进 ...
Mysql-安装指南
1.设置用户名密码首次登录后修改密码如下: 如果密码设置太过简单会报以下错误 mysql修改密码Your password does not satisfy the current policy r ...
MySQL的异步复制、全同步复制与半同步复制
异步复制异步复制,主库将事务 Binlog 事件写入到 Binlog 文件中,此时主库只会通知一下 Dump 线程发送这些新的 Binlog,然后主库就会继续处理提交操作,而此时不会保证这些 Bin ...
vue-router 实现导航守卫（路由卫士）
路由跳转前做一些验证,比如登录验证,是网站中的普遍需求. 对此,vue-route 提供的 beforeRouteUpdate 可以方便地实现导航守卫(navigation-guards). 导航守卫 ...
ThreeJS两个点作为起始坐标画一个立方体
drawLineBox(new THREE.Vector3(100, 50, 0), new THREE.Vector3(200, 100, 100)); function drawLineBox(s ...
leetcode题解之分解字符串域名
1.题目描述 A website domain like "discuss.leetcode.com" consists of various subdomains. At the ...
2017年秋季个人阅读计划 ---《掌握需求过程》第二版 pdf
这学期我们学习是软件需求分析,为了扩展视野,我们老师要求精读一本书,我根据老师推荐的书籍中找到了一本,名字叫做<掌握需求过程>,我大概浏览了一下这本书,这本书论述了软件开发中的重要课题—如 ...
PHP获取用户的真实IP地址
本文出至:新太潮流网络博客 PHP获取用户的真实IP地址,非代理IP function getClientIP(){ global $ip; if(getenv("HTTP_CLIENT_I ...
Oracle存储过程_语法
create or replace procedure procedure_name --存储过程名字 ( --进行输入/输出的量量_name in out 量_类型 --e.g. username ...

使用sqoop过程

使用sqoop过程的更多相关文章

随机推荐

热门专题