Scoop是用来实现HDFS文件系统和关系型数据库如MySQL之间数据传输和转换的工具。

从MySQL导出到HDFS可以通过--table, --columns and --where等设置数据抽出的条件。但是同时也只是自由sql语句(Free-form Query )的方式抽出数据。此时我们用--query加sql语句方式自由抽取数据。

1,必须制定目标文件的位置--target-dir

2,必须使用$CONDITIONS关键字,

3,你也可以选择使用--split-by分片(分区,结果分成多个小文件,请参考mapreduce分区)

我们主要讨论$CONDITIONS关键字的作用是什么。

1如果直接输出,这里面是空的条件

2,我们在执行log中发现被替换成了1=0

sqoop import   --connect jdbc:mysql://server74:3306/Server74   --username root  --password 123456  --target-dir /sqoopout2  --m 1 --delete-target-dir 
--query 'select id,name,deg from emp where id>1202 and $CONDITIONS'
[root@server72 sqoop]# sqoop import   --connect jdbc:mysql://server74:3306/Server74   --username root  --password 123456  --target-dir /sqoopout2  
--m 1 --delete-target-dir  --query 'select id,name,deg from emp where id>1202 and $CONDITIONS'
Warning: /usr/local/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/11/10 13:42:14 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/11/10 13:42:14 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/11/10 13:42:16 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/11/10 13:42:16 INFO tool.CodeGenTool: Beginning code generation
17/11/10 13:42:18 INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id>1202 and  (1 = 0)
17/11/10 13:42:18 INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id>1202 and  (1 = 0)
17/11/10 13:42:18 INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id>1202 and  (1 = 0)
17/11/10 13:42:18 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-root/compile/ac7745794cf5f0bf5859e7e8369a8c5f/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/11/10 13:42:31 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/ac7745794cf5f0bf5859e7e8369a8c5f/QueryResult.jar
17/11/10 13:42:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/10 13:42:41 INFO tool.ImportTool: Destination directory /sqoopout2 deleted.
17/11/10 13:42:41 INFO mapreduce.ImportJobBase: Beginning query import.
17/11/10 13:42:41 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/11/10 13:42:41 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/11/10 13:42:43 INFO client.RMProxy: Connecting to ResourceManager at server71/192.168.32.71:8032
17/11/10 13:42:58 INFO db.DBInputFormat: Using read commited transaction isolation
17/11/10 13:42:58 INFO mapreduce.JobSubmitter: number of splits:1
17/11/10 13:43:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1510279795921_0011
17/11/10 13:43:03 INFO impl.YarnClientImpl: Submitted application application_1510279795921_0011
17/11/10 13:43:04 INFO mapreduce.Job: The url to track the job: http://server71:8088/proxy/application_1510279795921_0011/
17/11/10 13:43:04 INFO mapreduce.Job: Running job: job_1510279795921_0011
17/11/10 13:44:01 INFO mapreduce.Job: Job job_1510279795921_0011 running in uber mode : false
17/11/10 13:44:01 INFO mapreduce.Job:  map 0% reduce 0%
17/11/10 13:44:58 INFO mapreduce.Job:  map 100% reduce 0%
17/11/10 13:45:00 INFO mapreduce.Job: Job job_1510279795921_0011 completed successfully
17/11/10 13:45:01 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=124473
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=61
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=45099
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=45099
        Total vcore-milliseconds taken by all map tasks=45099
        Total megabyte-milliseconds taken by all map tasks=46181376
    Map-Reduce Framework
        Map input records=3
        Map output records=3
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=370
        CPU time spent (ms)=6380
        Physical memory (bytes) snapshot=106733568
        Virtual memory (bytes) snapshot=842854400
        Total committed heap usage (bytes)=16982016
    File Input Format Counters
        Bytes Read=0
    File Output Format Counters
        Bytes Written=61
17/11/10 13:45:01 INFO mapreduce.ImportJobBase: Transferred 61 bytes in 139.3429 seconds (0.4378 bytes/sec)
17/11/10 13:45:01 INFO mapreduce.ImportJobBase: Retrieved 3 records. 输出结果查看,发现1202以上的数据被正常抽出
[root@server72 sqoop]# hdfs dfs -cat /sqoopout2/part-m-00000
17/11/10 13:48:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1203,khalil,php dev
1204,prasanth,php dev
1205,kranthi,admin

通过以上过程,我们得知一点:$CONTITONS是linux系统的变量,在执行过程中被赋值为(1=0),虽然实际执行的这个sql很奇怪。

现在正式开始研究CONTITONS到底是什么,所以我们先查看官方文档。

If you want to import the results of a query in parallel, then each map task will need to execute a copy of the query, with results partitioned by bounding conditions inferred by Sqoop. Your query must include the token $CONDITIONS which each Sqoop process will replace with a unique condition expression. You must also select a splitting column with --split-by.

如果你想通过并行的方式导入结果,每个map task需要执行sql查询语句的副本,结果会根据sqoop推测的边界条件分区。query必须包含$CONDITIONS。这样每个scoop程序都会被替换为一个独立的条件。同时你必须指定--split-by.分区

For example:

$ sqoop import \
--query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
--split-by a.id --target-dir /user/foo/joinresults

直接理解可能有点困难,我先修改一些条件,大家观察joblog的区别。

sqoop import   --connect jdbc:mysql://server74:3306/Server74   --username root  --password 123456  --target-dir /sqoopout2

      --m 2 --delete-target-dir  --query 'select id,name,deg from emp where id>1202 and $CONDITIONS'

      --split-by id

我按照要求添加了--split-by id 分区,并设置map task数量为2

[root@server72 sqoop]# sqoop import   --connect jdbc:mysql://server74:3306/Server74   --username root 
--password 123456 --target-dir /sqoopout2 --m 2 --delete-target-dir --query 'select id,name,deg from emp where id>1202 and $CONDITIONS' --split-by id
Warning: /usr/local/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
// :: INFO sqoop.Sqoop: Running Sqoop version: 1.4.
// :: WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
// :: INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
// :: INFO tool.CodeGenTool: Beginning code generation
// :: INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id> and ( = )
// :: INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id> and ( = )
// :: INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id> and ( = )
// :: INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-root/compile/1024341fa58082466565e5bd648cb10e/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
// :: INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/1024341fa58082466565e5bd648cb10e/QueryResult.jar
// :: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
// :: INFO tool.ImportTool: Destination directory /sqoopout2 deleted.
// :: INFO mapreduce.ImportJobBase: Beginning query import.
// :: INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
// :: INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
// :: INFO client.RMProxy: Connecting to ResourceManager at server71/192.168.32.71:
// :: INFO db.DBInputFormat: Using read commited transaction isolation
// :: INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(id), MAX(id) FROM (select id,name,deg from emp where id>1202 and (1 = 1) ) AS t1
// :: INFO mapreduce.JobSubmitter: number of splits:
// :: INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1510279795921_0012
// :: INFO impl.YarnClientImpl: Submitted application application_1510279795921_0012
// :: INFO mapreduce.Job: The url to track the job: http://server71:8088/proxy/application_1510279795921_0012/
// :: INFO mapreduce.Job: Running job: job_1510279795921_0012
// :: INFO mapreduce.Job: Job job_1510279795921_0012 running in uber mode : false
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: Job job_1510279795921_0012 completed successfully
// :: INFO mapreduce.Job: Counters:
File System Counters
FILE: Number of bytes read=
FILE: Number of bytes written=
FILE: Number of read operations=
FILE: Number of large read operations=
FILE: Number of write operations=
HDFS: Number of bytes read=
HDFS: Number of bytes written=
HDFS: Number of read operations=
HDFS: Number of large read operations=
HDFS: Number of write operations=
Job Counters
Killed map tasks=
Launched map tasks=
Other local map tasks=
Total time spent by all maps in occupied slots (ms)=
Total time spent by all reduces in occupied slots (ms)=
Total time spent by all map tasks (ms)=
Total vcore-milliseconds taken by all map tasks=
Total megabyte-milliseconds taken by all map tasks=
Map-Reduce Framework
Map input records=
Map output records=
Input split bytes=
Spilled Records=
Failed Shuffles=
Merged Map outputs=
GC time elapsed (ms)=
CPU time spent (ms)=
Physical memory (bytes) snapshot=
Virtual memory (bytes) snapshot=
Total committed heap usage (bytes)=
File Input Format Counters
Bytes Read=
File Output Format Counters
Bytes Written=
												

Sqoop--Free-form Query Imports 自由查询模式下$CONDITIONS关键字的作用的更多相关文章

  1. Query Object--查询对象模式(下)

    回顾 上一篇对模式进行了介绍,并基于ADO.NET进行了实现,虽然现在ORM框架越来越流行,但是很多中小型的公司仍然是使用ADO.NET来进行数据库操作的,随着项目的需求不断增加,业务不断变化,ADO ...

  2. Query Object--查询对象模式(上)

    回顾 上两篇文章主要讲解了我对于数据层的Unit Of Work(工作单元模式)的理解,其中包括了CUD的操作,那么今天就来谈谈R吧,文章包括以下几点: 什么是Query Object 基于SQL的实 ...

  3. Oracle ADF VO排序及VO的查询模式

    常规应用中,当需要使用Table向终端用户展示数据时,Table中数据的显示排序一致性极大程度的影响到了客户体验.通常希望诸如多次查询结果显示顺序相同.插入数据在原数据上方等的实现. ADF为开发人员 ...

  4. 重构改善既有代码设计--重构手法04:Replace Temp with Query (以查询取代临时变量)

    所谓的以查询取代临时变量:就是当你的程序以一个临时变量保存某一个表达式的运算效果.将这个表达式提炼到一个独立函数中.将这个临时变量的所有引用点替换为对新函数的调用.此后,新函数就可以被其他函数调用. ...

  5. iPhone CSS media query(媒体查询)

    iPhone5  iPhone6  iPhone6Plus iPad设备 media query(媒体查询)代码. iPhone < 5: @media screen and (device-a ...

  6. Elasticsearch(入门篇)——Query DSL与查询行为

    ES提供了丰富多彩的查询接口,可以满足各种各样的查询要求.更多内容请参考:ELK修炼之道 Query DSL结构化查询 Query DSL是一个Java开源框架用于构建类型安全的SQL查询语句.采用A ...

  7. Spring Boot 整合 Elasticsearch,实现 function score query 权重分查询

    摘要: 原创出处 www.bysocket.com 「泥瓦匠BYSocket 」欢迎转载,保留摘要,谢谢! 『 预见未来最好的方式就是亲手创造未来 – <史蒂夫·乔布斯传> 』 运行环境: ...

  8. 重构手法之Replace Temp with Query(以查询取代临时变量)

    返回总目录 6.4Replace Temp with Query(以查询取代临时变量) 概要 你的程序以一个临时变量保存某一表达式的运算结果. 将这个表达式提炼到一个独立函数中.将这个临时变量的所有引 ...

  9. GIS-010-ArcGIS JS 三种查询模式(转)

    QueryTask.FindTask.IdentifyTask都是继承自ESRI.ArcGIS.Client.Tasks: 1.QueryTask:是一个进行空间和属性查询的功能类,它可以在某个地图服 ...

随机推荐

  1. 无法处理文件 MainForm.resx,因为它位于 Internet 或受限区域中,或者文件上具有 Web 标记。要想处理这些文件,请删除 Web 标记

    无法处理文件 MainForm.resx,因为它位于 Internet 或受限区域中,或者文件上具有 Web 标记.要想处理这些文件,请删除 Web 标记 问题: 由于文件锁定,VS不能正常读取. 解 ...

  2. python中编写带参数decorator

    考察上一节的 @log 装饰器: def log(f): def fn(x): print 'call ' + f.__name__ + '()...' return f(x) return fn 发 ...

  3. transition失效问题

    关于transition,css教程中有一个很简单的例子: <!DOCTYPE html> <html> <head> <meta charset=" ...

  4. Introduction to vSphere Integrated Containers

    vSphere Integrated Containers enables IT teams to seamlessly run traditional workloads and container ...

  5. unbunto关闭触摸屏

    sudo rmmod psmouse 这个是禁用的 sudo modprobe psmouse 这个是启用的

  6. Struts2 hibernate spring 概念总结

    Hibernate工作原理及为什么要用? 原理:1.通过Configuration().configure();读取并解析hibernate.cfg.xml配置文件2.由hibernate.cfg.x ...

  7. mysql错误总结-ERROR 1067 (42000): Invalid default value for TIMESTAMP

    1. ERROR 1067 (42000): Invalid default value for 'FAILD_TIME'   (对TIMESTAMP  类型的子段如果不设置缺省值或没有标志not n ...

  8. 去除带有iframe页面中的2个滚动条[转]

    方法一:加载frame时修改高度 <div>    <iframe id="frame_content" name="frame_content&quo ...

  9. Linux 解压压缩命令

    一.概述: 1.压缩命令: 命令格式:tar  -zcvf   压缩文件名.tar.gz   被压缩文件名 可先切换到当前目录下.压缩文件名和被压缩文件名都可加入路径. 2.解压缩命令: 命令格式:t ...

  10. Myeclipse快捷键(二)

    Ctrl+1 快速修复(最经典的快捷键,就不用多说了) Ctrl+D: 删除当前行Ctrl+Alt+↓ 复制当前行到下一行(复制增加) Ctrl+Alt+↑ 复制当前行到上一行(复制增加) Alt+↓ ...