Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/oracle/9.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Join 使用datetime字段的hadoop pig连接_Join_Hadoop_Apache Pig - Fatal编程技术网

Join 使用datetime字段的hadoop pig连接

Join 使用datetime字段的hadoop pig连接,join,hadoop,apache-pig,Join,Hadoop,Apache Pig,我有两个数据集 messages.txt 2014-06-23 08:42:34, 34569 2014-06-23 08:42:35, 23945 2014-06-23 08:42:36, 45673 ... etc CPU.txt 2014-06-23 08:42:34, 99 2014-06-23 08:42:35, 80 2014-06-23 08:42:36, 83 ... etc 我想使用时间戳连接表,我想将其格式化为datetime 这是我的尝试 MSG= load 'mess

我有两个数据集

messages.txt
2014-06-23 08:42:34, 34569
2014-06-23 08:42:35, 23945
2014-06-23 08:42:36, 45673
... etc

CPU.txt
2014-06-23 08:42:34, 99
2014-06-23 08:42:35, 80
2014-06-23 08:42:36, 83
... etc
我想使用时间戳连接表,我想将其格式化为datetime

这是我的尝试

MSG= load 'messages.txt' using pigstorage(',') as (date_time:chararray, msg_recv:int);
CPU= load 'CPU.txt' using pigstorage(',') as (date_time:chararray, cpu:int);
MSG_FORMATED = foreach MSG GENERATE ToDate(date_time, 'yyyy-MM-dd HH:mm:ss') as date_time, msg_recv;
CPU_FORMATED = foreach CPU GENERATE ToDate(date_time, 'yyyy-MM-dd HH:mm:ss') as date_time, cpu;
到目前为止还不错

我可以转储MSG_格式化和CPU_格式化,并查看它们是否为datetime格式

dump MSG_FORMATED;
2014-06-23T08:42:34.000-04:00, 34569
2014-06-23T08:42:35.000-04:00, 23945
2014-06-23T08:42:36.000-04:00, 45673

dump CPU_FORMATED;
2014-06-23T08:42:34.000-04:00, 99
2014-06-23T08:42:35.000-04:00, 80
2014-06-23T08:42:36.000-04:00, 83
现在,当我尝试加入时,我的问题来了

(哪一个应该相当直截了当?)

联合排土场

抛出错误

2014-07-01 13:10:23,065 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2014-07-01 13:10:23,065 [main] WARN  org.apache.hadoop.mapred.JobConf - The variable mapred.child.ulimit is no longer used.
2014-07-01 13:10:23,070 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backed error: AttemptID:attempt_1400260444475_25479_r_000000_3 Info:Error: org.joda.time.DateTime.compareTo(Lorg/joda/time/ReadableInstant;)

您可以使用时间戳连接表,然后将其格式化为datetime。

那么不能在datetime连接吗?展望未来,我将有稍微不同格式的原始时间戳。toDate很好,因为它会使他们正常化。是的,我认为PIG不喜欢在datetime加入!如果你想这样做,你可以在加入之前将其转换为字符串(chararray)。有解决方案吗?还是仍然打开?
2014-07-01 13:10:23,065 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2014-07-01 13:10:23,065 [main] WARN  org.apache.hadoop.mapred.JobConf - The variable mapred.child.ulimit is no longer used.
2014-07-01 13:10:23,070 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backed error: AttemptID:attempt_1400260444475_25479_r_000000_3 Info:Error: org.joda.time.DateTime.compareTo(Lorg/joda/time/ReadableInstant;)