使用python 3打开beam.io FileBasedSource问题中的_文件_Python_Google Cloud Dataflow_Apache Beam_Apache Beam Io

使用python 3打开beam.io FileBasedSource问题中的_文件

python google-cloud-dataflow

使用python 3打开beam.io FileBasedSource问题中的_文件,python,google-cloud-dataflow,apache-beam,apache-beam-io,Python,Google Cloud Dataflow,Apache Beam,Apache Beam Io,我正在使用CSVRecordSource读取apachebeam管道中的CSV，该管道使用read_records函数中的open_文件对于Python2，一切都很好，但当我迁移到Python3时，它抱怨如下 next(csv_reader) _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?) 默认情况下，open_file方法以二进制模式打开文件所以我把它改

我正在使用CSVRecordSource读取apachebeam管道中的CSV，该管道使用read_records函数中的open_文件

对于Python2，一切都很好，但当我迁移到Python3时，它抱怨如下

next(csv_reader)
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

默认情况下，open_file方法以二进制模式打开文件

所以我把它改成了

with open(filename, "rt") as f:

但当我在谷歌云中运行数据流时，它失败了，因为它无法找到文件，并给出错误信息

FileNotFoundError: [Errno 2] No such file or directory

下面是我的代码

 with self.open_file(filename) as f:
      csv_reader = csv.reader(f, delimiter=self.delimiter, quotechar=self.quote_character)
      header = next(csv_reader)

如何将CSVRecordSource与python 3一起使用？

您是否使用此处定义的open_file方法：

如果是这样，我想您可以调用底层的

FileSystems.open（）

，将

'application/octet stream'

替换为

'text/plain'

，我通过使用iterCode对迭代器提供的输入（字节）进行迭代解码来解决这个问题

csv.reader(codecs.iterdecode(f, "utf-8"), delimiter=self.delimiter, quotechar=self.quote_character)

请告诉我您在哪里使用此功能？在DoFn中，我在Beam管道中的Read（CSVRecordSource（input））中使用它。