如何在spring hadoop项目中从HCatalog API配置MultiOutputFormat？_Spring_Hadoop_Mapreduce_Spring Data Hadoop

如何在spring hadoop项目中从HCatalog API配置MultiOutputFormat？

spring hadoop mapreduce

如何在spring hadoop项目中从HCatalog API配置MultiOutputFormat？,spring,hadoop,mapreduce,spring-data-hadoop,Spring,Hadoop,Mapreduce,Spring Data Hadoop,我正在开发一个hadoop应用程序。现在我想把它移植到spring hadoop项目中。这件事在我的mapreduce工作中。我使用了hcatalogapi中的MultiOutputFormat，因为我想在多个表上存储信息。我找不到任何示例或文档说明如何在SpringHadoop中配置它谁能告诉我怎么做或者给我一些参考资料？非常感谢我正在开发一个hadoop应用程序。现在我想把它移植到spring hadoop项目中。这件事在我的mapreduce工作中。我使用了hcatalogapi中的M

我正在开发一个hadoop应用程序。现在我想把它移植到spring hadoop项目中。这件事在我的mapreduce工作中。我使用了hcatalogapi中的MultiOutputFormat，因为我想在多个表上存储信息。我找不到任何示例或文档说明如何在SpringHadoop中配置它

谁能告诉我怎么做或者给我一些参考资料？非常感谢

@托马斯·里斯伯格你好，托马斯，谢谢你的回复。我知道在SpringHadoop中，我们只需要在相应的xml文件中进行作业配置。目前，在我的应用程序中，当设置作业时，我有如下代码：

ArrayList<OutputJobInfo> tableList = new ArrayList<OutputJobInfo>();
tableList.add(OutputJobInfo.create("database", "request",
partitionValuesRequest));
tableList.add(OutputJobInfo.create("database", "requestdetail",
                partitionValues));
tableList.add(OutputJobInfo.create("database", "jobInfo",
                partitionValues));

List<HCatFieldSchema> requestSchemaList = new ArrayList<HCatFieldSchema>();
requestSchemaList .add(new HCatFieldSchema("type", Type.STRING,
                null));
requestSchemaList .add(new HCatFieldSchema("samplesize",
                Type.INT, null));
requestSchemaList .add(new HCatFieldSchema("userid",
                Type.SMALLINT, null));
configurer.addOutputFormat("request", HCatOutputFormat.class,
                BytesWritable.class, HCatRecord.class);
HCatOutputFormat.setOutput(configurer.getJob("request"),
                tableList.get(0));
HCatOutputFormat.setSchema(configurer.getJob("request"),
                new HCatSchema(requestSchemaList));
......
configurer.configure();

ArrayList tableList=新的ArrayList（）；
tableList.add（OutputJobInfo.create（“数据库”、“请求”），
分区值请求）；
tableList.add（OutputJobInfo.create（“数据库”、“请求详细信息”），
分区值）；
tableList.add（OutputJobInfo.create（“数据库”、“作业信息”），
分区值）；
List requestSchemaList=new ArrayList（）；
添加（新的HCatFieldSchema（“type”，type.STRING，
空）；
添加（新的HCatFieldSchema（“samplesize”，
Type.INT，null））；
添加（新的HCatFieldSchema（“userid”），
Type.SMALLINT，null））；
configurer.addOutputFormat（“请求”，HCatOutputFormat.class，
BytesWritable.class、HCatRecord.class）；
HCatOutputFormat.setOutput（configurer.getJob（“请求”），
tableList.get（0））；
HCatOutputFormat.setSchema（configurer.getJob（“请求”），
新的HCatSchema（requestSchemaList））；
......
configurer.configure（）；

这是我用来在减速器内的相关表中存储信息的。所以我的问题是如何在Spring hadoop中为MultiOutputFormat进行相应的配置？我检查了spring-hadoop.xsd。找不到与此相关的任何选项卡。

如果使用spring data hadoop，实际的MapReduce作业不会更改。将要改变的是如何使用Spring上下文提交作业，以及对此的作业支持。因此，如果可以从命令行运行作业，那么应该可以通过Spring配置运行相同的作业。如果您分享您目前拥有的信息，我们可以提供更多详细信息。@ThomasRisberg Hi Thomas，感谢您的回复。我上传了更多问题的细节，你能再看一下吗？谢谢。我们根本没有任何具体的HCatalog支持。您可以扩展JobFactoryBean并将outputFormat设置为字符串形式的完全限定类名。然后，您必须使用Hcatalog JobConfigurer来配置MultiOutputFormat。我还没试过，但看起来也不是不可能。