Hadoop 使用Sqoop导入时Apache Sqoop Where子句不起作用

Hadoop 使用Sqoop导入时Apache Sqoop Where子句不起作用,hadoop,sqoop,Hadoop,Sqoop,谁能告诉我这个命令的输出是什么 这里的departments表有默认的6行(从depart_id 2到7),然后我向Mysql db'retail_db.departments'表(department_id 8和9)添加了2条新记录。我试图做的是使用–where参数只选择新添加的记录,并将其附加到部门的现有HDFS目录中。 因此,当我运行下面的命令时,它创建了一个新的part-m-000006文件(之前默认的6条记录被拆分为part-m-00000到part-m-00005文件),并且从dep

谁能告诉我这个命令的输出是什么 这里的departments表有默认的6行(从depart_id 2到7),然后我向Mysql db'retail_db.departments'表(department_id 8和9)添加了2条新记录。我试图做的是使用–where参数只选择新添加的记录,并将其附加到部门的现有HDFS目录中。 因此,当我运行下面的命令时,它创建了一个新的part-m-000006文件(之前默认的6条记录被拆分为part-m-00000到part-m-00005文件),并且从department_id 2到9的所有记录(包括2个新添加的记录)都被添加到该文件中,正如您可以看到的,下面的输出中有重复的记录

不理解为什么不遵守where条款:

sqoop import \
–connect “jdbc:mysql://quickstart.cloudera:3306/retail_db” \
–username retail_dba \
–password cloudera \
–query “Select * from departments where \$CONDITIONS” \
–where “department_id > 7” \
–append \
-m 1 \
–target-dir /user/cloudera/sqoop_import/departments

Output :
—————————————————————————————————————————–
[cloudera@quickstart ~]$ hdfs dfs -cat /user/cloudera/sqoop_import/departments/part*
2,Fitness
3,Footwear
4,Apparel
5,Golf
6,Outdoors
7,Fan Shop
2,Fitness
3,Footwear
4,Apparel
5,Golf
6,Outdoors
7,Fan Shop
8,Sports
9,Jewellery
-----------------------------------------–

LOGS GENERATED :
—————————————————————————————————————————–
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/10/23 12:23:30 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.4.0
16/10/23 12:23:30 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/10/23 12:23:31 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/10/23 12:23:31 INFO tool.CodeGenTool: Beginning code generation
16/10/23 12:23:31 INFO manager.SqlManager: Executing SQL statement: Select * from departments where (1 = 0)
16/10/23 12:23:31 INFO manager.SqlManager: Executing SQL statement: Select * from departments where (1 = 0)
16/10/23 12:23:31 INFO manager.SqlManager: Executing SQL statement: Select * from departments where (1 = 0)
16/10/23 12:23:31 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/b704a6e6d921fb544ba25c6343b18a36/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/10/23 12:23:33 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/b704a6e6d921fb544ba25c6343b18a36/QueryResult.jar
16/10/23 12:23:33 INFO mapreduce.ImportJobBase: Beginning query import.
16/10/23 12:23:34 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/10/23 12:23:35 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/10/23 12:23:36 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
16/10/23 12:23:38 INFO db.DBInputFormat: Using read commited transaction isolation
16/10/23 12:23:38 INFO mapreduce.JobSubmitter: number of splits:1
16/10/23 12:23:39 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1477192024680_0012
16/10/23 12:23:40 INFO impl.YarnClientImpl: Submitted application application_1477192024680_0012
16/10/23 12:23:40 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1477192024680_0012/
16/10/23 12:23:40 INFO mapreduce.Job: Running job: job_1477192024680_0012
16/10/23 12:23:56 INFO mapreduce.Job: Job job_1477192024680_0012 running in uber mode : false
16/10/23 12:23:56 INFO mapreduce.Job: map 0% reduce 0%
16/10/23 12:24:25 INFO mapreduce.Job: map 100% reduce 0%
16/10/23 12:24:26 INFO mapreduce.Job: Job job_1477192024680_0012 completed successfully
16/10/23 12:24:27 INFO mapreduce.Job: Counters: 30
您同时使用了
--query
--where
。这就是为什么sqoop不尊重
——where
标记

--query
--where
的超集。它涵盖了任何条件。 这就是为什么您会在日志中看到:

INFO manager.SqlManager: Executing SQL statement: Select * from departments where (1 = 0)
使用以下任何一项:

  • --查询“从部门id>7和\$CONDITIONS的部门中选择*”

  • ——其中“部门id>7”

您提出了类似的问题: