Scoop是用来实现HDFS文件系统和关系型数据库如MySQL之间数据传输和转换的工具。

从MySQL导出到HDFS可以通过`--table`, `--columns` and `--where`等设置数据抽出的条件。但是同时也只是自由sql语句（Free-form Query ）的方式抽出数据。此时我们用--query加sql语句方式自由抽取数据。

1，必须制定目标文件的位置--target-dir

2，必须使用$CONDITIONS关键字，

3，你也可以选择使用--split-by分片(分区，结果分成多个小文件，请参考mapreduce分区)

我们主要讨论`$CONDITIONS关键字的作用是什么。`

1如果直接输出，这里面是空的条件

2，我们在执行log中发现被替换成了1=0

sqoop import   --connect jdbc:mysql://server74:3306/Server74   --username root  --password 123456  --target-dir /sqoopout2  --m 1 --delete-target-dir 
 --query 'select id,name,deg from emp where id>1202 and $CONDITIONS'

[root@server72 sqoop]# sqoop import   --connect jdbc:mysql://server74:3306/Server74   --username root  --password 123456  --target-dir /sqoopout2  
--m 1 --delete-target-dir  --query 'select id,name,deg from emp where id>1202 and $CONDITIONS' 
Warning: /usr/local/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/11/10 13:42:14 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/11/10 13:42:14 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/11/10 13:42:16 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/11/10 13:42:16 INFO tool.CodeGenTool: Beginning code generation
17/11/10 13:42:18 INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id>1202 and  (1 = 0) 
17/11/10 13:42:18 INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id>1202 and  (1 = 0) 
17/11/10 13:42:18 INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id>1202 and  (1 = 0) 
17/11/10 13:42:18 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-root/compile/ac7745794cf5f0bf5859e7e8369a8c5f/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/11/10 13:42:31 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/ac7745794cf5f0bf5859e7e8369a8c5f/QueryResult.jar
17/11/10 13:42:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/10 13:42:41 INFO tool.ImportTool: Destination directory /sqoopout2 deleted.
17/11/10 13:42:41 INFO mapreduce.ImportJobBase: Beginning query import.
17/11/10 13:42:41 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/11/10 13:42:41 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/11/10 13:42:43 INFO client.RMProxy: Connecting to ResourceManager at server71/192.168.32.71:8032
17/11/10 13:42:58 INFO db.DBInputFormat: Using read commited transaction isolation
17/11/10 13:42:58 INFO mapreduce.JobSubmitter: number of splits:1
17/11/10 13:43:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1510279795921_0011
17/11/10 13:43:03 INFO impl.YarnClientImpl: Submitted application application_1510279795921_0011
17/11/10 13:43:04 INFO mapreduce.Job: The url to track the job: http://server71:8088/proxy/application_1510279795921_0011/
17/11/10 13:43:04 INFO mapreduce.Job: Running job: job_1510279795921_0011
17/11/10 13:44:01 INFO mapreduce.Job: Job job_1510279795921_0011 running in uber mode : false
17/11/10 13:44:01 INFO mapreduce.Job:  map 0% reduce 0%
17/11/10 13:44:58 INFO mapreduce.Job:  map 100% reduce 0%
17/11/10 13:45:00 INFO mapreduce.Job: Job job_1510279795921_0011 completed successfully
17/11/10 13:45:01 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=124473
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=61
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=45099
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=45099
        Total vcore-milliseconds taken by all map tasks=45099
        Total megabyte-milliseconds taken by all map tasks=46181376
    Map-Reduce Framework
        Map input records=3
        Map output records=3
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=370
        CPU time spent (ms)=6380
        Physical memory (bytes) snapshot=106733568
        Virtual memory (bytes) snapshot=842854400
        Total committed heap usage (bytes)=16982016
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=61
17/11/10 13:45:01 INFO mapreduce.ImportJobBase: Transferred 61 bytes in 139.3429 seconds (0.4378 bytes/sec)
17/11/10 13:45:01 INFO mapreduce.ImportJobBase: Retrieved 3 records.

输出结果查看，发现1202以上的数据被正常抽出
[root@server72 sqoop]# hdfs dfs -cat /sqoopout2/part-m-00000
17/11/10 13:48:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1203,khalil,php dev
1204,prasanth,php dev
1205,kranthi,admin

通过以上过程，我们得知一点：$CONTITONS是linux系统的变量，在执行过程中被赋值为（1=0），虽然实际执行的这个sql很奇怪。

现在正式开始研究CONTITONS到底是什么，所以我们先查看官方文档。

If you want to import the results of a query in parallel, then each map task will need to execute a copy of the query, with results partitioned by bounding conditions inferred by Sqoop. Your query must include the token $CONDITIONS which each Sqoop process will replace with a unique condition expression. You must also select a splitting column with --split-by.

如果你想通过并行的方式导入结果，每个map task需要执行sql查询语句的副本，结果会根据sqoop推测的边界条件分区。query必须包含$CONDITIONS。这样每个scoop程序都会被替换为一个独立的条件。同时你必须指定--split-by.分区

For example:
$ sqoop import \

  --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \

  --split-by a.id --target-dir /user/foo/joinresults

直接理解可能有点困难，我先修改一些条件，大家观察joblog的区别。

sqoop import --connect jdbc:mysql://server74:3306/Server74 --username root --password 123456 --target-dir /sqoopout2

　　　　　　--m 2 --delete-target-dir --query 'select id,name,deg from emp where id>1202 and $CONDITIONS'

　　　　　　--split-by id

我按照要求添加了--split-by id 分区，并设置map task数量为2

[root@server72 sqoop]# sqoop import   --connect jdbc:mysql://server74:3306/Server74   --username root 
 --password 123456  --target-dir /sqoopout2  --m 2 --delete-target-dir  --query 'select id,name,deg from emp where id>1202 and $CONDITIONS' --split-by id

Warning: /usr/local/sqoop/../hbase does not exist! HBase imports will fail.

Please set $HBASE_HOME to the root of your HBase installation.

Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.

Please set $HCAT_HOME to the root of your HCatalog installation.

Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

// :: INFO sqoop.Sqoop: Running Sqoop version: 1.4.

// :: WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

// :: INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

// :: INFO tool.CodeGenTool: Beginning code generation

// :: INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id> and  ( = )

// :: INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id> and  ( = )

// :: INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id> and  ( = )

// :: INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop

Note: /tmp/sqoop-root/compile/1024341fa58082466565e5bd648cb10e/QueryResult.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

// :: INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/1024341fa58082466565e5bd648cb10e/QueryResult.jar

// :: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

// :: INFO tool.ImportTool: Destination directory /sqoopout2 deleted.

// :: INFO mapreduce.ImportJobBase: Beginning query import.

// :: INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

// :: INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

// :: INFO client.RMProxy: Connecting to ResourceManager at server71/192.168.32.71:

// :: INFO db.DBInputFormat: Using read commited transaction isolation

// :: INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(id), MAX(id) FROM (select id,name,deg from emp where id>1202 and  (1 = 1) ) AS t1

// :: INFO mapreduce.JobSubmitter: number of splits:

// :: INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1510279795921_0012

// :: INFO impl.YarnClientImpl: Submitted application application_1510279795921_0012

// :: INFO mapreduce.Job: The url to track the job: http://server71:8088/proxy/application_1510279795921_0012/

// :: INFO mapreduce.Job: Running job: job_1510279795921_0012

// :: INFO mapreduce.Job: Job job_1510279795921_0012 running in uber mode : false

// :: INFO mapreduce.Job:  map % reduce %

// :: INFO mapreduce.Job:  map % reduce %

// :: INFO mapreduce.Job:  map % reduce %

// :: INFO mapreduce.Job:  map % reduce %

// :: INFO mapreduce.Job: Job job_1510279795921_0012 completed successfully

// :: INFO mapreduce.Job: Counters:

    File System Counters

        FILE: Number of bytes read=

        FILE: Number of bytes written=

        FILE: Number of read operations=

        FILE: Number of large read operations=

        FILE: Number of write operations=

        HDFS: Number of bytes read=

        HDFS: Number of bytes written=

        HDFS: Number of read operations=

        HDFS: Number of large read operations=

        HDFS: Number of write operations=

    Job Counters

        Killed map tasks=

        Launched map tasks=

        Other local map tasks=

        Total time spent by all maps in occupied slots (ms)=

        Total time spent by all reduces in occupied slots (ms)=

        Total time spent by all map tasks (ms)=

        Total vcore-milliseconds taken by all map tasks=

        Total megabyte-milliseconds taken by all map tasks=

    Map-Reduce Framework

        Map input records=

        Map output records=

        Input split bytes=

        Spilled Records=

        Failed Shuffles=

        Merged Map outputs=

        GC time elapsed (ms)=

        CPU time spent (ms)=

        Physical memory (bytes) snapshot=

        Virtual memory (bytes) snapshot=

        Total committed heap usage (bytes)=

    File Input Format Counters

        Bytes Read=

    File Output Format Counters

        Bytes Written=

Sqoop--Free-form Query Imports 自由查询模式下$CONDITIONS关键字的作用的更多相关文章

Query Object--查询对象模式(下)
回顾上一篇对模式进行了介绍,并基于ADO.NET进行了实现,虽然现在ORM框架越来越流行,但是很多中小型的公司仍然是使用ADO.NET来进行数据库操作的,随着项目的需求不断增加,业务不断变化,ADO ...
Query Object--查询对象模式(上)
回顾上两篇文章主要讲解了我对于数据层的Unit Of Work(工作单元模式)的理解,其中包括了CUD的操作,那么今天就来谈谈R吧,文章包括以下几点: 什么是Query Object 基于SQL的实 ...
Oracle ADF VO排序及VO的查询模式
常规应用中,当需要使用Table向终端用户展示数据时,Table中数据的显示排序一致性极大程度的影响到了客户体验.通常希望诸如多次查询结果显示顺序相同.插入数据在原数据上方等的实现. ADF为开发人员 ...
重构改善既有代码设计--重构手法04：Replace Temp with Query （以查询取代临时变量）
所谓的以查询取代临时变量:就是当你的程序以一个临时变量保存某一个表达式的运算效果.将这个表达式提炼到一个独立函数中.将这个临时变量的所有引用点替换为对新函数的调用.此后,新函数就可以被其他函数调用. ...
iPhone CSS media query（媒体查询）
iPhone5 iPhone6 iPhone6Plus iPad设备 media query(媒体查询)代码. iPhone < 5: @media screen and (device-a ...
Elasticsearch(入门篇)——Query DSL与查询行为
ES提供了丰富多彩的查询接口,可以满足各种各样的查询要求.更多内容请参考:ELK修炼之道 Query DSL结构化查询 Query DSL是一个Java开源框架用于构建类型安全的SQL查询语句.采用A ...
Spring Boot 整合 Elasticsearch，实现 function score query 权重分查询
摘要: 原创出处 www.bysocket.com 「泥瓦匠BYSocket 」欢迎转载,保留摘要,谢谢! 『预见未来最好的方式就是亲手创造未来 – <史蒂夫·乔布斯传> 』运行环境: ...
重构手法之Replace Temp with Query（以查询取代临时变量）
返回总目录 6.4Replace Temp with Query(以查询取代临时变量) 概要你的程序以一个临时变量保存某一表达式的运算结果. 将这个表达式提炼到一个独立函数中.将这个临时变量的所有引 ...
GIS-010-ArcGIS JS 三种查询模式（转）
QueryTask.FindTask.IdentifyTask都是继承自ESRI.ArcGIS.Client.Tasks: 1.QueryTask:是一个进行空间和属性查询的功能类,它可以在某个地图服 ...

随机推荐

OPENGL ES2.0如何不使用glActiveTexture而显示多个图片
https://www.oschina.net/question/253717_72107 用opengl es 2.0显示多个图片的话,我只会一种方式,先将图片生成纹理,然后用下面的方式渲染 // ...
Qt浅谈之二十六图片滑动效果
一.简介博客中发现有作者写的仿360的代码,觉得其中图片滑动的效果很有意思,特提取其中的代码.并加上类似mac的画面移动的动画效果. 二.详解 1.代码一:界面滑动(QWidget) (1)slid ...
HashSet，TreeSet和LinkedHashSet的区别
1. Set接口 Set不允许包含相同的元素,如果试图把两个相同元素加入同一个集合中,add方法返回false. Set判断两个对象相同不是使用==运算符,而是根据equals方法.也就是说,只要两个 ...
Linux 多线程编程实例
一.多线程 VS 多进程和进程相比,线程有很多优势.在Linux系统下,启动一个新的进程必须分配给它独立的地址空间,建立众多的数据表来维护代码段和数据.而运行于一个进程中的多个线程,他们之间使用相同 ...
linux下安装编译网卡驱动的方法
安装linux操作系统后发现没有网卡驱动,表现为 system → Administration → Network下Hardware列表为空. 以下为安装编译网卡驱动的过程,本人是菜鸟,以下是我从网 ...
MongoDB快速入门（五）- Where子句
RDBMS Where子句等效于MongoDB 查询文档在一些条件的基础上,可以使用下面的操作操作语法示例 RDBMS等效语句 Equality {<key>:<value&g ...
Mybatis映射配置文件Mapper.xml详解
1.概述: MyBatis 的真正强大在于它的映射语句,也是它的魔力所在. 2.常用的属性常用的几个属性: select元素:代表查询,类似的还有update.insert.delete id:这个 ...
javax.mail.MessagingException: Could not connect to SMTP host: smtp.xdf.cn
1.问题描述:关于使用Java Mail进行邮件发送,抛出Could not connect to SMTP host: xx@xxx.com, port: 25的异常可能: 当我们使用Java Ma ...
how to use composer in fiddler
https://www.cnblogs.com/youxin/p/3570310.html http://docs.telerik.com/fiddler/generate-traffic/tasks ...
【P2361】yyy棋（博弈论+贪心+模拟）
这个题看上去本来不好处理,然而善意的题面已经基本告诉你做法了,小时候玩的那个游戏就是代码的核心.动动脑子想想,如果长和宽的积是奇数,那么一定要先手,如果是偶数,那么后手就会获胜. 好了,那么怎么处理对 ...

Sqoop--Free-form Query Imports 自由查询模式下$CONDITIONS关键字的作用