未实现错误apache beam python
我正在使用ApacheBeam向gcs编写json。但是遇到了以下错误未实现错误apache beam python,python,python-2.7,apache-beam,apache-beam-io,Python,Python 2.7,Apache Beam,Apache Beam Io,我正在使用ApacheBeam向gcs编写json。但是遇到了以下错误 NotImplementedError:offset:0,whence:0,position:50547,last:50547[在运行“将新数据写入gcs/写入数据gcs/write/WriteImpl/WriteBundles/WriteBundles”时] 不知道为什么会发生此错误。其代码如下: class WriteDataGCS(beam.PTransform): """ To wr
NotImplementedError:offset:0,whence:0,position:50547,last:50547[在运行“将新数据写入gcs/写入数据gcs/write/WriteImpl/WriteBundles/WriteBundles”时]
不知道为什么会发生此错误。其代码如下:
class WriteDataGCS(beam.PTransform):
"""
To write data to GCS
"""
def __init__(self, bucket):
"""
Initiate the bucket as a class field
:type bucket:string
:param bucket: query to be run for data
"""
self.bucket = bucket
def expand(self, pcoll):
"""
PTransform Method run when called on Class Name
:type pcoll: PCollection
:param pcoll: A pcollection
"""
(pcoll | "print intermediate" >> beam.Map(print_row))
return (pcoll | "write data gcs" >> beam.io.WriteToText(self.bucket, coder=JsonCoder(), file_name_suffix=".json"))
WriteToText
的coder
参数需要一个apache_beam.coders.coder
实例。您可以尝试使您的JsonCoder
从基本Coder
类继承,但我认为您也可以使用Map
将数据转换为字符串:
def展开(self,pcoll):
"""
ptTransform方法在对类名调用时运行
:type pcoll:PCollection
:param pcoll:一个pcollection
"""
返回(pcoll
|“打印中间”>>beam.Map(打印行))
|“to_json”>>beam.Map(lambda x:json.dumps(x,default=str)))
|“写入数据gcs”>>beam.io.WriteToText(self.bucket,文件名后缀=“.json”))
user@beam.apache.com也许是回答这个问题的好地方。
class JsonCoder:
"""
This class represents dump and load operations performed on json
"""
def encode(self,data):
"""
Encodes the json data.
:type data: string
:param data: Data to be encoded
"""
# logger.info("JSON DATA for encoding - {}".format(data))
return json.dumps(data,default=str)
def decode(self,data):
"""
Decodes the json data.
:type data: string
:param data: Data to be decoded
"""
# logger.info("JSON DATA for decoding - {}".format(data))
return json.loads(data)