Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/amazon-s3/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop 简单where条件不显示配置单元中的预期输出_Hadoop_Amazon S3_Hive - Fatal编程技术网

Hadoop 简单where条件不显示配置单元中的预期输出

Hadoop 简单where条件不显示配置单元中的预期输出,hadoop,amazon-s3,hive,Hadoop,Amazon S3,Hive,为了掌握Hive,我将人口普查数据(“在美国工作的不同国家的人的收入数据”)上传到S3存储桶中 能够运行其他查询,但无法运行以下简单查询 我试图列出收入水平>5万美元的不同国家的人。 我在配置单元中创建了表,并从AWS S3 bucket导入数据,这里的收入列定义为string,该列的可能值为“50K” 以下查询将导致空结果集。这里有什么问题?此SQL语句在普通MySQL控制台上运行良好为什么不在配置单元中显示预期结果集? hive> select country, income from

为了掌握Hive,我将人口普查数据(“在美国工作的不同国家的人的收入数据”)上传到S3存储桶中

能够运行其他查询,但无法运行以下简单查询

我试图列出收入水平>5万美元的不同国家的人。

我在配置单元中创建了表,并从AWS S3 bucket导入数据,这里的收入列定义为string,该列的可能值为“50K”

以下查询将导致空结果集。这里有什么问题?此SQL语句在普通MySQL控制台上运行良好为什么不在配置单元中显示预期结果集?

hive> select country, income from census_income_data where income = '>50K';
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201312281227_0011, Tracking URL = http://ip-172-31-44-80.us-west-2.compute.internal:9100/jobdetails.jsp?jobid=job_201312281227_0011
Kill Command = /home/hadoop/bin/hadoop job  -kill job_201312281227_0011
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-12-28 13:21:05,086 Stage-1 map = 0%,  reduce = 0%
2013-12-28 13:21:26,279 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 7.74 sec
2013-12-28 13:21:27,289 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 7.74 sec
2013-12-28 13:21:28,299 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 7.74 sec
2013-12-28 13:21:29,310 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 7.74 sec
2013-12-28 13:21:30,321 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 7.74 sec
2013-12-28 13:21:31,334 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 7.74 sec
2013-12-28 13:21:32,369 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.74 sec
MapReduce Total cumulative CPU time: 7 seconds 740 msec
Ended Job = job_201312281227_0011
Counters:
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 7.74 sec   HDFS Read: 219 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 7 seconds 740 msec
OK
Time taken: 56.559 seconds
以下是上述代码中使用的数据集的样本数据

30, State-gov, 141297, Bachelors, 13, Married-civ-spouse, Prof-specialty, Husband, Asian-Pac-Islander, Male, 0, 0, 40, India, >50K
23, Private, 122272, Bachelors, 13, Never-married, Adm-clerical, Own-child, White, Female, 0, 0, 30, United-States, <=50K
32, Private, 205019, Assoc-acdm, 12, Never-married, Sales, Not-in-family, Black, Male, 0, 0, 50, United-States, <=50K
40, Private, 121772, Assoc-voc, 11, Married-civ-spouse, Craft-repair, Husband, Asian-Pac-Islander, Male, 0, 0, 40, ?, >50K
34, Private, 245487, 7th-8th, 4, Married-civ-spouse, Transport-moving, Husband, Amer-Indian-Eskimo, Male, 0, 0, 45, Mexico, <=50K
25, Self-emp-not-inc, 176756, HS-grad, 9, Never-married, Farming-fishing, Own-child, White, Male, 0, 0, 35, United-States, <=50K
32, Private, 186824, HS-grad, 9, Never-married, Machine-op-inspct, Unmarried, White, Male, 0, 0, 40, United-States, <=50K
38, Private, 28887, 11th, 7, Married-civ-spouse, Sales, Husband, White, Male, 0, 0, 50, United-States, <=50K
43, Self-emp-not-inc, 292175, Masters, 14, Divorced, Exec-managerial, Unmarried, White, Female, 0, 0, 45, United-States, >50K
40, Private, 193524, Doctorate, 16, Married-civ-spouse, Prof-specialty, Husband, White, Male, 0, 0, 60, United-States, >50K
54, Private, 302146, HS-grad, 9, Separated, Other-service, Unmarried, Black, Female, 0, 0, 20, United-States, <=50K
35, Federal-gov, 76845, 9th, 5, Married-civ-spouse, Farming-fishing, Husband, Black, Male, 0, 0, 40, United-States, <=50K
43, Private, 117037, 11th, 7, Married-civ-spouse, Transport-moving, Husband, White, Male, 0, 2042, 40, United-States, <=50K
59, Private, 109015, HS-grad, 9, Divorced, Tech-support, Unmarried, White, Female, 0, 0, 40, United-States, <=50K
56, Local-gov, 216851, Bachelors, 13, Married-civ-spouse, Tech-support, Husband, White, Male, 0, 0, 40, United-States, >50K
19, Private, 168294, HS-grad, 9, Never-married, Craft-repair, Own-child, White, Male, 0, 0, 40, United-States, <=50K
54, ?, 180211, Some-college, 10, Married-civ-spouse, ?, Husband, Asian-Pac-Islander, Male, 0, 0, 60, South, >50K
39, Private, 367260, HS-grad, 9, Divorced, Exec-managerial, Not-in-family, White, Male, 0, 0, 80, United-States, <=50K
49, Private, 193366, HS-grad, 9, Married-civ-spouse, Craft-repair, Husband, White, Male, 0, 0, 40, United-States, <=50K
23, Local-gov, 190709, Assoc-acdm, 12, Never-married, Protective-serv, Not-in-family, White, Male, 0, 0, 52, United-States, <=50K
30,州政府,141297,单身,13岁,已婚公民配偶,教授专业,丈夫,亚洲太平洋岛民,男性,0,0,40,印度,>50K
23岁,二等兵,122272岁,单身汉,13岁,未婚,行政文书,亲生子女,白人,女性,0,0,30岁,美国,您的SQL代码

select country, income from census_income_data where income = '>50K';
使用“=”运算符比较两个字符串。据我所知,运算符考虑了字符集、周围的空格等。也许你会更幸运地使用“LIKE”操作符

select country, income from census_income_data where income LIKE ">50K";

首先在您的表上运行
选择表限制20中的*以验证预期列中是否存在预期值。
现在可能还有其他字符(如空格)会导致查询返回0结果。
请尝试以下操作:
选择国家,人口普查收入数据中的收入,其中收入为“%50%”
如果不起作用,则可能是在创建表时将数据放错了位置。
如果有效,请尝试:
选择国家/地区,普查收入数据中的收入,如“%>50K%”
如果有效,则该字段中可能有其他字符,请尝试运行:
从人口普查收入数据中选择concat('INCOME:',INCOME'.'),其中收入“%>50K%”

看看你是否得到了这个字符串
INCOME:>50K.
准确地说。

尝试了上面的语句,但仍然是一样的,结果是空的resultset:(