Hadoop 如何在pig中的多个加载文件中求和(查看)

Hadoop 如何在pig中的多个加载文件中求和(查看),hadoop,apache-pig,Hadoop,Apache Pig,我有一些问题。我想要两个加载文件中的总和视图 示例数据: load data - 1 id name view 1 A 4 2 B 5 3 C 6 load data - 2 id name view 1 A 4 2 B 5 4 D 6 我想要输出: output id name view 1 A 8 2 B 10 3 C 6 4 D 6 我的猪代码: inputdata = LOAD '/user/hdfs/

我有一些问题。我想要两个加载文件中的总和视图

示例数据:

load data - 1
id name view
1  A    4
2  B    5
3  C    6

load data - 2
id name view
1  A    4
2  B    5
4  D    6
我想要输出:

output
id name view
1  A    8
2  B    10
3  C    6
4  D    6
我的猪代码:

inputdata = LOAD '/user/hdfs/tes/part-1' AS (
    id:chararray, 
    nama:chararray, 
    view:int
);


inputdata2 = LOAD '/user/hdfs/tes/part-2' AS (
    id:chararray, 
    nama:chararray, 
    view:int
);

x = UNION inputdata, inputdata2;

dump x;
如何在示例数据中求和视图2加载文件


谢谢。

以下是使用Group By的工作解决方案:

inputdata = LOAD '/user/hdfs/tes/part-1' USING PigStorage(' ') AS (
    id:chararray, 
    nama:chararray, 
    view:int
);


inputdata2 = LOAD '/user/hdfs/tes/part-2' USING PigStorage(' ') AS (
    id:chararray, 
    nama:chararray, 
    view:int
);

A = UNION inputdata, inputdata2;
B = group A by (id, nama);
C = FOREACH B GENERATE group.id, group.nama, SUM(B.view) AS sum_views;
DUMP C;
还有其他的可能性。
此链接可以帮助您:

以下是使用Group By的工作解决方案:

inputdata = LOAD '/user/hdfs/tes/part-1' USING PigStorage(' ') AS (
    id:chararray, 
    nama:chararray, 
    view:int
);


inputdata2 = LOAD '/user/hdfs/tes/part-2' USING PigStorage(' ') AS (
    id:chararray, 
    nama:chararray, 
    view:int
);

A = UNION inputdata, inputdata2;
B = group A by (id, nama);
C = FOREACH B GENERATE group.id, group.nama, SUM(B.view) AS sum_views;
DUMP C;
还有其他的可能性。
此链接可以帮助您:

以下是使用Group By的工作解决方案:

inputdata = LOAD '/user/hdfs/tes/part-1' USING PigStorage(' ') AS (
    id:chararray, 
    nama:chararray, 
    view:int
);


inputdata2 = LOAD '/user/hdfs/tes/part-2' USING PigStorage(' ') AS (
    id:chararray, 
    nama:chararray, 
    view:int
);

A = UNION inputdata, inputdata2;
B = group A by (id, nama);
C = FOREACH B GENERATE group.id, group.nama, SUM(B.view) AS sum_views;
DUMP C;
还有其他的可能性。
此链接可以帮助您:

以下是使用Group By的工作解决方案:

inputdata = LOAD '/user/hdfs/tes/part-1' USING PigStorage(' ') AS (
    id:chararray, 
    nama:chararray, 
    view:int
);


inputdata2 = LOAD '/user/hdfs/tes/part-2' USING PigStorage(' ') AS (
    id:chararray, 
    nama:chararray, 
    view:int
);

A = UNION inputdata, inputdata2;
B = group A by (id, nama);
C = FOREACH B GENERATE group.id, group.nama, SUM(B.view) AS sum_views;
DUMP C;
还有其他的可能性。
此链接可以帮助您:

谢谢。您的代码已成功并完成运行。谢谢你的链接参考。谢谢。您的代码已成功并完成运行。谢谢你的链接参考。谢谢。您的代码已成功并完成运行。谢谢你的链接参考。谢谢。您的代码已成功并完成运行。谢谢你的链接参考。