Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop 无法使用Pig FOREACH显示数据_Hadoop_Mapreduce_Apache Pig_Bigdata - Fatal编程技术网

Hadoop 无法使用Pig FOREACH显示数据

Hadoop 无法使用Pig FOREACH显示数据,hadoop,mapreduce,apache-pig,bigdata,Hadoop,Mapreduce,Apache Pig,Bigdata,我有一个txt文件(格式:Firstname,Lastname,age,sex)中的示例数据集: 我想显示年龄大于27岁的员工的年龄和名字。在进行了一段时间并寻找一些指针之后,我陷入了困境: 我正在使用以下方式加载此数据集: tuple_record = LOAD '~/Documents/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray)); 描述给了我这

我有一个txt文件
(格式:Firstname,Lastname,age,sex)中的示例数据集

我想显示年龄大于27岁的员工的
年龄
名字
。在进行了一段时间并寻找一些指针之后,我陷入了困境:

我正在使用以下方式加载此数据集:

tuple_record = LOAD '~/Documents/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));
描述给了我这种格式:

describe tuple_record
tuple_record: {details: (firstname: chararray,lastname: chararray,age: int,sex: chararray)}
然后我使用以下方法将记录展平:

flatten_tuple_record = FOREACH tuple_record GENERATE FLATTEN(details);
描述扁平化给了我这样的信息:

describe flatten_tuple_record
flatten_tuple_record: {details::firstname: chararray,details::lastname: chararray,details::age: int,details::sex: chararray}
现在我想根据年龄对其进行筛选:

filter_by_age = FILTER flatten_tuple_record BY age > 27;
group_by_age = GROUP filter_by_age BY age;
然后我根据年龄分组:

filter_by_age = FILTER flatten_tuple_record BY age > 27;
group_by_age = GROUP filter_by_age BY age;
现在显示名字和年龄;我试过了,但没有成功:

display_details = FOREACH group_by_age GENERATE group,firstname;
以下是错误消息:

2015-02-01 08:39:37,752 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: 
<line 5, column 54> Invalid field projection. Projected field [firstname] does not exist in schema: group:int,filter_by_age:bag{:tuple(details::firstname:chararray,details::lastname:chararray,details::age:int,details::sex:chararray)}
2015-02-01 08:39:37752[main]错误org.apache.pig.tools.grunt.grunt-错误1025:
无效的字段投影。投影字段[firstname]在架构中不存在:group:int,filter_by_age:bag{:tuple(详细信息::firstname:chararray,详细信息::lastname:chararray,详细信息::age:int,详细信息::sex:chararray)}

请指导。

您的pig语句看起来不错,但在按年龄筛选数据后,您可以直接获得结果的名字和年龄。遵循以下声明:

tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));

describe tuple_record;

flatten_tuple_record = FOREACH tuple_record GENERATE FLATTEN(details);

describe flatten_tuple_record;

filter_by_age = FILTER flatten_tuple_record BY age > 27;

details = FOREACH filter_by_age GENERATE firstname, age;

dump details;
更新: 在这里,我们甚至可以跳过展平语句:

tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));

describe tuple_record;

filter_by_age = FILTER tuple_record BY details.age > 27;

details = FOREACH filter_by_age GENERATE details.firstname, details.age;

dump details;
在这两种情况下,结果将是:

(Angs,28)
(Mahima,29)

您的pig语句看起来不错,但在按年龄筛选数据后,您可以直接获得名字和年龄作为结果。遵循以下声明:

tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));

describe tuple_record;

flatten_tuple_record = FOREACH tuple_record GENERATE FLATTEN(details);

describe flatten_tuple_record;

filter_by_age = FILTER flatten_tuple_record BY age > 27;

details = FOREACH filter_by_age GENERATE firstname, age;

dump details;
更新: 在这里,我们甚至可以跳过展平语句:

tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));

describe tuple_record;

filter_by_age = FILTER tuple_record BY details.age > 27;

details = FOREACH filter_by_age GENERATE details.firstname, details.age;

dump details;
在这两种情况下,结果将是:

(Angs,28)
(Mahima,29)

您的pig语句看起来不错,但在按年龄筛选数据后,您可以直接获得名字和年龄作为结果。遵循以下声明:

tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));

describe tuple_record;

flatten_tuple_record = FOREACH tuple_record GENERATE FLATTEN(details);

describe flatten_tuple_record;

filter_by_age = FILTER flatten_tuple_record BY age > 27;

details = FOREACH filter_by_age GENERATE firstname, age;

dump details;
更新: 在这里,我们甚至可以跳过展平语句:

tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));

describe tuple_record;

filter_by_age = FILTER tuple_record BY details.age > 27;

details = FOREACH filter_by_age GENERATE details.firstname, details.age;

dump details;
在这两种情况下,结果将是:

(Angs,28)
(Mahima,29)

您的pig语句看起来不错,但在按年龄筛选数据后,您可以直接获得名字和年龄作为结果。遵循以下声明:

tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));

describe tuple_record;

flatten_tuple_record = FOREACH tuple_record GENERATE FLATTEN(details);

describe flatten_tuple_record;

filter_by_age = FILTER flatten_tuple_record BY age > 27;

details = FOREACH filter_by_age GENERATE firstname, age;

dump details;
更新: 在这里,我们甚至可以跳过展平语句:

tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));

describe tuple_record;

filter_by_age = FILTER tuple_record BY details.age > 27;

details = FOREACH filter_by_age GENERATE details.firstname, details.age;

dump details;
在这两种情况下,结果将是:

(Angs,28)
(Mahima,29)

似乎我们甚至不需要将数据展平。。。请建议。您必须将元组展平,以便可以访问其他pig语句中的年龄和其他字段。这是真的。但是不使用
展平
我也可以使用
详细信息访问它;这就是我的意思。@user182944是的,在这种情况下,我们可以跳过展平,我们需要使用细节。在pig语句中引用我们的属性似乎我们甚至不需要将数据展平。。。请建议。您必须将元组展平,以便可以访问其他pig语句中的年龄和其他字段。这是真的。但是不使用
展平
我也可以使用
详细信息访问它;这就是我的意思。@user182944是的,在这种情况下,我们可以跳过展平,我们需要使用细节。在pig语句中引用我们的属性似乎我们甚至不需要将数据展平。。。请建议。您必须将元组展平,以便可以访问其他pig语句中的年龄和其他字段。这是真的。但是不使用
展平
我也可以使用
详细信息访问它;这就是我的意思。@user182944是的,在这种情况下,我们可以跳过展平,我们需要使用细节。在pig语句中引用我们的属性似乎我们甚至不需要将数据展平。。。请建议。您必须将元组展平,以便可以访问其他pig语句中的年龄和其他字段。这是真的。但是不使用
展平
我也可以使用
详细信息访问它;这就是我的意思。@user182944是的,在这种情况下,我们可以跳过展平,我们需要使用细节。在pig声明中引用我们的属性