Hadoop 无法使用Pig FOREACH显示数据
我有一个txt文件Hadoop 无法使用Pig FOREACH显示数据,hadoop,mapreduce,apache-pig,bigdata,Hadoop,Mapreduce,Apache Pig,Bigdata,我有一个txt文件(格式:Firstname,Lastname,age,sex)中的示例数据集: 我想显示年龄大于27岁的员工的年龄和名字。在进行了一段时间并寻找一些指针之后,我陷入了困境: 我正在使用以下方式加载此数据集: tuple_record = LOAD '~/Documents/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray)); 描述给了我这
(格式:Firstname,Lastname,age,sex)中的示例数据集:
我想显示年龄大于27岁的员工的年龄
和名字
。在进行了一段时间并寻找一些指针之后,我陷入了困境:
我正在使用以下方式加载此数据集:
tuple_record = LOAD '~/Documents/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));
描述给了我这种格式:
describe tuple_record
tuple_record: {details: (firstname: chararray,lastname: chararray,age: int,sex: chararray)}
然后我使用以下方法将记录展平:
flatten_tuple_record = FOREACH tuple_record GENERATE FLATTEN(details);
描述扁平化给了我这样的信息:
describe flatten_tuple_record
flatten_tuple_record: {details::firstname: chararray,details::lastname: chararray,details::age: int,details::sex: chararray}
现在我想根据年龄对其进行筛选:
filter_by_age = FILTER flatten_tuple_record BY age > 27;
group_by_age = GROUP filter_by_age BY age;
然后我根据年龄分组:
filter_by_age = FILTER flatten_tuple_record BY age > 27;
group_by_age = GROUP filter_by_age BY age;
现在显示名字和年龄;我试过了,但没有成功:
display_details = FOREACH group_by_age GENERATE group,firstname;
以下是错误消息:
2015-02-01 08:39:37,752 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
<line 5, column 54> Invalid field projection. Projected field [firstname] does not exist in schema: group:int,filter_by_age:bag{:tuple(details::firstname:chararray,details::lastname:chararray,details::age:int,details::sex:chararray)}
2015-02-01 08:39:37752[main]错误org.apache.pig.tools.grunt.grunt-错误1025:
无效的字段投影。投影字段[firstname]在架构中不存在:group:int,filter_by_age:bag{:tuple(详细信息::firstname:chararray,详细信息::lastname:chararray,详细信息::age:int,详细信息::sex:chararray)}
请指导。您的pig语句看起来不错,但在按年龄筛选数据后,您可以直接获得结果的名字和年龄。遵循以下声明:
tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));
describe tuple_record;
flatten_tuple_record = FOREACH tuple_record GENERATE FLATTEN(details);
describe flatten_tuple_record;
filter_by_age = FILTER flatten_tuple_record BY age > 27;
details = FOREACH filter_by_age GENERATE firstname, age;
dump details;
更新:
在这里,我们甚至可以跳过展平语句:
tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));
describe tuple_record;
filter_by_age = FILTER tuple_record BY details.age > 27;
details = FOREACH filter_by_age GENERATE details.firstname, details.age;
dump details;
在这两种情况下,结果将是:
(Angs,28)
(Mahima,29)
您的pig语句看起来不错,但在按年龄筛选数据后,您可以直接获得名字和年龄作为结果。遵循以下声明:
tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));
describe tuple_record;
flatten_tuple_record = FOREACH tuple_record GENERATE FLATTEN(details);
describe flatten_tuple_record;
filter_by_age = FILTER flatten_tuple_record BY age > 27;
details = FOREACH filter_by_age GENERATE firstname, age;
dump details;
更新:
在这里,我们甚至可以跳过展平语句:
tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));
describe tuple_record;
filter_by_age = FILTER tuple_record BY details.age > 27;
details = FOREACH filter_by_age GENERATE details.firstname, details.age;
dump details;
在这两种情况下,结果将是:
(Angs,28)
(Mahima,29)
您的pig语句看起来不错,但在按年龄筛选数据后,您可以直接获得名字和年龄作为结果。遵循以下声明:
tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));
describe tuple_record;
flatten_tuple_record = FOREACH tuple_record GENERATE FLATTEN(details);
describe flatten_tuple_record;
filter_by_age = FILTER flatten_tuple_record BY age > 27;
details = FOREACH filter_by_age GENERATE firstname, age;
dump details;
更新:
在这里,我们甚至可以跳过展平语句:
tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));
describe tuple_record;
filter_by_age = FILTER tuple_record BY details.age > 27;
details = FOREACH filter_by_age GENERATE details.firstname, details.age;
dump details;
在这两种情况下,结果将是:
(Angs,28)
(Mahima,29)
您的pig语句看起来不错,但在按年龄筛选数据后,您可以直接获得名字和年龄作为结果。遵循以下声明:
tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));
describe tuple_record;
flatten_tuple_record = FOREACH tuple_record GENERATE FLATTEN(details);
describe flatten_tuple_record;
filter_by_age = FILTER flatten_tuple_record BY age > 27;
details = FOREACH filter_by_age GENERATE firstname, age;
dump details;
更新:
在这里,我们甚至可以跳过展平语句:
tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));
describe tuple_record;
filter_by_age = FILTER tuple_record BY details.age > 27;
details = FOREACH filter_by_age GENERATE details.firstname, details.age;
dump details;
在这两种情况下,结果将是:
(Angs,28)
(Mahima,29)
似乎我们甚至不需要将数据展平。。。请建议。您必须将元组展平,以便可以访问其他pig语句中的年龄和其他字段。这是真的。但是不使用展平
我也可以使用详细信息访问它;这就是我的意思。@user182944是的,在这种情况下,我们可以跳过展平,我们需要使用细节。在pig语句中引用我们的属性似乎我们甚至不需要将数据展平。。。请建议。您必须将元组展平,以便可以访问其他pig语句中的年龄和其他字段。这是真的。但是不使用展平
我也可以使用详细信息访问它;这就是我的意思。@user182944是的,在这种情况下,我们可以跳过展平,我们需要使用细节。在pig语句中引用我们的属性似乎我们甚至不需要将数据展平。。。请建议。您必须将元组展平,以便可以访问其他pig语句中的年龄和其他字段。这是真的。但是不使用展平
我也可以使用详细信息访问它;这就是我的意思。@user182944是的,在这种情况下,我们可以跳过展平,我们需要使用细节。在pig语句中引用我们的属性似乎我们甚至不需要将数据展平。。。请建议。您必须将元组展平,以便可以访问其他pig语句中的年龄和其他字段。这是真的。但是不使用展平
我也可以使用详细信息访问它;这就是我的意思。@user182944是的,在这种情况下,我们可以跳过展平,我们需要使用细节。在pig声明中引用我们的属性