Hadoop 简单where条件不显示配置单元中的预期输出_Hadoop_Amazon S3_Hive

Hadoop 简单where条件不显示配置单元中的预期输出

hadoop amazon-s3 hive

Hadoop 简单where条件不显示配置单元中的预期输出,hadoop,amazon-s3,hive,Hadoop,Amazon S3,Hive,为了掌握Hive，我将人口普查数据（“在美国工作的不同国家的人的收入数据”）上传到S3存储桶中能够运行其他查询，但无法运行以下简单查询我试图列出收入水平>5万美元的不同国家的人。我在配置单元中创建了表，并从AWS S3 bucket导入数据，这里的收入列定义为string，该列的可能值为“50K” 以下查询将导致空结果集。这里有什么问题？此SQL语句在普通MySQL控制台上运行良好为什么不在配置单元中显示预期结果集？ hive> select country, income from

为了掌握Hive，我将人口普查数据（“在美国工作的不同国家的人的收入数据”）上传到S3存储桶中

能够运行其他查询，但无法运行以下简单查询

我试图列出收入水平>5万美元的不同国家的人。

我在配置单元中创建了表，并从AWS S3 bucket导入数据，这里的收入列定义为string，该列的可能值为“50K”

以下查询将导致空结果集。这里有什么问题？此SQL语句在普通MySQL控制台上运行良好为什么不在配置单元中显示预期结果集？

hive> select country, income from census_income_data where income = '>50K'; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201312281227_0011, Tracking URL = http://ip-172-31-44-80.us-west-2.compute.internal:9100/jobdetails.jsp?jobid=job_201312281227_0011 Kill Command = /home/hadoop/bin/hadoop job -kill job_201312281227_0011 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2013-12-28 13:21:05,086 Stage-1 map = 0%, reduce = 0% 2013-12-28 13:21:26,279 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 7.74 sec 2013-12-28 13:21:27,289 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 7.74 sec 2013-12-28 13:21:28,299 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 7.74 sec 2013-12-28 13:21:29,310 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 7.74 sec 2013-12-28 13:21:30,321 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 7.74 sec 2013-12-28 13:21:31,334 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 7.74 sec 2013-12-28 13:21:32,369 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.74 sec MapReduce Total cumulative CPU time: 7 seconds 740 msec Ended Job = job_201312281227_0011 Counters: MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 7.74 sec HDFS Read: 219 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 7 seconds 740 msec OK Time taken: 56.559 seconds
以下是上述代码中使用的数据集的样本数据

30, State-gov, 141297, Bachelors, 13, Married-civ-spouse, Prof-specialty, Husband, Asian-Pac-Islander, Male, 0, 0, 40, India, >50K 23, Private, 122272, Bachelors, 13, Never-married, Adm-clerical, Own-child, White, Female, 0, 0, 30, United-States, <=50K 32, Private, 205019, Assoc-acdm, 12, Never-married, Sales, Not-in-family, Black, Male, 0, 0, 50, United-States, <=50K 40, Private, 121772, Assoc-voc, 11, Married-civ-spouse, Craft-repair, Husband, Asian-Pac-Islander, Male, 0, 0, 40, ?, >50K 34, Private, 245487, 7th-8th, 4, Married-civ-spouse, Transport-moving, Husband, Amer-Indian-Eskimo, Male, 0, 0, 45, Mexico, <=50K 25, Self-emp-not-inc, 176756, HS-grad, 9, Never-married, Farming-fishing, Own-child, White, Male, 0, 0, 35, United-States, <=50K 32, Private, 186824, HS-grad, 9, Never-married, Machine-op-inspct, Unmarried, White, Male, 0, 0, 40, United-States, <=50K 38, Private, 28887, 11th, 7, Married-civ-spouse, Sales, Husband, White, Male, 0, 0, 50, United-States, <=50K 43, Self-emp-not-inc, 292175, Masters, 14, Divorced, Exec-managerial, Unmarried, White, Female, 0, 0, 45, United-States, >50K 40, Private, 193524, Doctorate, 16, Married-civ-spouse, Prof-specialty, Husband, White, Male, 0, 0, 60, United-States, >50K 54, Private, 302146, HS-grad, 9, Separated, Other-service, Unmarried, Black, Female, 0, 0, 20, United-States, <=50K 35, Federal-gov, 76845, 9th, 5, Married-civ-spouse, Farming-fishing, Husband, Black, Male, 0, 0, 40, United-States, <=50K 43, Private, 117037, 11th, 7, Married-civ-spouse, Transport-moving, Husband, White, Male, 0, 2042, 40, United-States, <=50K 59, Private, 109015, HS-grad, 9, Divorced, Tech-support, Unmarried, White, Female, 0, 0, 40, United-States, <=50K 56, Local-gov, 216851, Bachelors, 13, Married-civ-spouse, Tech-support, Husband, White, Male, 0, 0, 40, United-States, >50K 19, Private, 168294, HS-grad, 9, Never-married, Craft-repair, Own-child, White, Male, 0, 0, 40, United-States, <=50K 54, ?, 180211, Some-college, 10, Married-civ-spouse, ?, Husband, Asian-Pac-Islander, Male, 0, 0, 60, South, >50K 39, Private, 367260, HS-grad, 9, Divorced, Exec-managerial, Not-in-family, White, Male, 0, 0, 80, United-States, <=50K 49, Private, 193366, HS-grad, 9, Married-civ-spouse, Craft-repair, Husband, White, Male, 0, 0, 40, United-States, <=50K 23, Local-gov, 190709, Assoc-acdm, 12, Never-married, Protective-serv, Not-in-family, White, Male, 0, 0, 52, United-States, <=50K

30，州政府，141297，单身，13岁，已婚公民配偶，教授专业，丈夫，亚洲太平洋岛民，男性，0,0,40，印度，>50K 23岁，二等兵，122272岁，单身汉，13岁，未婚，行政文书，亲生子女，白人，女性，0,0,30岁，美国，您的SQL代码 select country, income from census_income_data where income = '>50K'; 使用“=”运算符比较两个字符串。据我所知，运算符考虑了字符集、周围的空格等。也许你会更幸运地使用“LIKE”操作符 select country, income from census_income_data where income LIKE ">50K"; 首先在您的表上运行选择表限制20中的*以验证预期列中是否存在预期值。现在可能还有其他字符（如空格）会导致查询返回0结果。请尝试以下操作：选择国家，人口普查收入数据中的收入，其中收入为“%50%” 如果不起作用，则可能是在创建表时将数据放错了位置。如果有效，请尝试：选择国家/地区，普查收入数据中的收入，如“%>50K%” 如果有效，则该字段中可能有其他字符，请尝试运行：从人口普查收入数据中选择concat（'INCOME:'，INCOME'.'），其中收入“%>50K%” 看看你是否得到了这个字符串INCOME:>50K. 准确地说。尝试了上面的语句，但仍然是一样的，结果是空的resultset:(