Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/EmptyTag/140.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Google cloud dataflow 无法读取存储在GCS存储桶中的XML文件_Google Cloud Dataflow_Apache Beam - Fatal编程技术网

Google cloud dataflow 无法读取存储在GCS存储桶中的XML文件

Google cloud dataflow 无法读取存储在GCS存储桶中的XML文件,google-cloud-dataflow,apache-beam,Google Cloud Dataflow,Apache Beam,我已尝试以最精确的方式遵循此文档: 请查看以下我的代码: public static void main(String args[]) { DataflowPipelineOptions options=PipelineOptionsFactory.as(DataflowPipelineOptions.class); options.setTempLocation("gs://balajee_test/stagging"); options.setProject(

我已尝试以最精确的方式遵循此文档:

请查看以下我的代码:

public static void main(String args[])
{

    DataflowPipelineOptions options=PipelineOptionsFactory.as(DataflowPipelineOptions.class);
     options.setTempLocation("gs://balajee_test/stagging");
     options.setProject("test-1-130106");

     Pipeline p=Pipeline.create(options);

     PCollection<XMLFormatter> record= p.apply(XmlIO.<XMLFormatter>read()
             .from("gs://balajee_test/sample_3.xml")
             .withRootElement("book")
             .withRecordElement("author")
             .withRecordElement("title")
             .withRecordElement("genre")
             .withRecordElement("price")
             .withRecordElement("description")
             .withRecordClass(XMLFormatter.class)
             );

     record.apply(ParDo.of(new DoFn<XMLFormatter,String>(){
                @ProcessElement

                public void processElement(ProcessContext c)
                {
                    System.out.println(c.element().getAuthor());    
                }
             }));

     p.run(); 
}   

Read PTransform不支持提供多个记录元素(作者、标题、流派等)。必须提供单个根元素和记录元素,并且XML文档必须包含具有相同记录元素的记录。请参见以下位置中给出的示例


你用的是什么跑步者?直接跑者?数据流管理器?还有别的吗?是否有失败管道的作业ID?已尝试使用DirectRunner和DataflowRunner。我确实有一个失败管道的作业ID(2017-08-30_02_55_24-4448720439481076797)。您介意分享一个用于读取XML文件的工作代码示例吗?@BenChambers您可以查看我的代码吗?但我仍然无法解决此问题。如果您共享相同的工作代码段,这将非常有帮助。您能提供您的测试文件吗?您是否尝试过从txt文件而不是GCS文件运行DirectPipeline?这将允许您知道问题在于读取GCS或使用XML格式化程序。或者您可以尝试从GCS文件中读取并逐行写入输出,以验证GCS文件是否能够读取。
package com.bitwise.cloud;

import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlType;

@XmlRootElement(name = "book")
@XmlType(propOrder = {"author", "title","genre","price","description"})
public class XMLFormatter {
private String author;
private String title;
private String genre;
private String price;
private String description;

public XMLFormatter() { }

public XMLFormatter(String author, String title,String genre,String price,String description) {
this.author = author;
this.title = title;
this.genre = genre;
this.price = price;
this.description = description;
}

@XmlElement
public void setAuthor(String author) {
this.author = author;
}

public String getAuthor() {
return author;
}

@XmlElement
public void setTitle(String title) {
this.title = title;
}

public String getTitle() {
return title;
}

@XmlElement
public void setGenre(String genre) {
this.genre = genre;
}

public String getGenre() {
return genre;
}

@XmlElement
public void setPrice(String price) {
this.price = price;
}

public String getPrice() {
return price;
}


@XmlElement
public void setDescription(String description) {
this.description = description;
}

public String getDescription() {
return description;
}
}