Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/358.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/cmake/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 为什么我的avrokey数据在我显式地将数据写为avrokey<;特定记录>;?_Java_Hadoop_Avro - Fatal编程技术网

Java 为什么我的avrokey数据在我显式地将数据写为avrokey<;特定记录>;?

Java 为什么我的avrokey数据在我显式地将数据写为avrokey<;特定记录>;?,java,hadoop,avro,Java,Hadoop,Avro,我将avro输出从一个hadoop作业传送到另一个hadoop作业。第一个作业仅运行具有以下设置的映射程序。如果有任何用途,我的avsc文件将定义如下的复合对象: [ { "type": "record", "name": "MySubRecord", "namespace": "blah", "fields": [ {"name": "foobar", "type": ["null","string"], "default":null}, {"name": "bar","typ

我将avro输出从一个hadoop作业传送到另一个hadoop作业。第一个作业仅运行具有以下设置的映射程序。如果有任何用途,我的avsc文件将定义如下的复合对象:

[
{
"type": "record",
"name": "MySubRecord",
"namespace": "blah",
"fields": [
    {"name": "foobar", "type": ["null","string"], "default":null},
    {"name": "bar","type": ["null","string"], "default":null},
    {"name": "foo","type": ["null","string"], "default":null},
]
},{
"type": "record",
"name": "MyRecord",
"namespace" : "blah",
"fields" : [
       {"name": "ID", "type":["null", "string"], "default":null},
       {"name": "secondID", "type":["null", "string"], "default":null},
       {"name": "subRecordA", "type":["null","blah.MySubRecord"], "default":null},
       {"name": "subRecordB", "type":["null","blah.MySubRecord"], "default":null},
       {"name": "subRecordC", "type":["null","blah.MySubRecord"], "default":null},
       {"name": "subRecordD", "type":["null","blah.MySubRecord"], "default":null},
       {"name": "subRecordE", "type":["null","blah.MySubRecord"], "default":null},
       {"name": "subRecordF", "type":["null","blah.MySubRecord"], "default":null},
       {"name": "subRecordG", "type":["null","blah.MySubRecord"], "default":null},
       {"name": "subRecordH", "type":["null","blah.MySubRecord"], "default":null}
]
}
]
public static class MyMapper extends Mapper<LongWritable, Text, AvroKey<MyRecord>, NullWritable>
protected void setup(Context context) throws IOException, InterruptedException {
        super.setup(context);
        keyOut = new AvroKey<>();}
protected void map(AvroKey<MyRecord> key, NullWritable value, Context context) throws IOException, InterruptedException {
        MyRecord myRecord = key.datum();
##### some other logic
我的mapper类签名如下所示:

[
{
"type": "record",
"name": "MySubRecord",
"namespace": "blah",
"fields": [
    {"name": "foobar", "type": ["null","string"], "default":null},
    {"name": "bar","type": ["null","string"], "default":null},
    {"name": "foo","type": ["null","string"], "default":null},
]
},{
"type": "record",
"name": "MyRecord",
"namespace" : "blah",
"fields" : [
       {"name": "ID", "type":["null", "string"], "default":null},
       {"name": "secondID", "type":["null", "string"], "default":null},
       {"name": "subRecordA", "type":["null","blah.MySubRecord"], "default":null},
       {"name": "subRecordB", "type":["null","blah.MySubRecord"], "default":null},
       {"name": "subRecordC", "type":["null","blah.MySubRecord"], "default":null},
       {"name": "subRecordD", "type":["null","blah.MySubRecord"], "default":null},
       {"name": "subRecordE", "type":["null","blah.MySubRecord"], "default":null},
       {"name": "subRecordF", "type":["null","blah.MySubRecord"], "default":null},
       {"name": "subRecordG", "type":["null","blah.MySubRecord"], "default":null},
       {"name": "subRecordH", "type":["null","blah.MySubRecord"], "default":null}
]
}
]
public static class MyMapper extends Mapper<LongWritable, Text, AvroKey<MyRecord>, NullWritable>
protected void setup(Context context) throws IOException, InterruptedException {
        super.setup(context);
        keyOut = new AvroKey<>();}
protected void map(AvroKey<MyRecord> key, NullWritable value, Context context) throws IOException, InterruptedException {
        MyRecord myRecord = key.datum();
##### some other logic
我在第一份工作中的逻辑看起来不错,因为当我使用命令行avrotoolsjar将我的输出打印成json时,它看起来就像我期望的那样

我的问题发生在我运行第二个作业时。我的第二个作业的映射程序具有以下设置:

public static class MySecondJobMapper extends Mapper<AvroKey<MyRecord>, NullWritable, IntWritable, DoubleWritable>

看起来问题源于这样一个事实,即我在一个本地、伪分布式环境中进行测试,而在pom.xml中指定的正确avro版本没有被引入。取而代之的是,一个旧版本的avro,在我没有意识到的情况下被拉了进来。一旦我在EMR上运行了相同的程序,它就运行得很好,因为使用的是正确的avro版本。

看起来问题源于我在本地伪分布式环境中进行测试,而在pom.xml中指定的正确avro版本没有被引入。取而代之的是,一个旧版本的avro,在我没有意识到的情况下被拉了进来。一旦我在EMR上运行了相同的程序,它就运行得很好,因为使用的是正确版本的avro