Google cloud dataflow 数据流-分配自定义时间戳时出错
我试图分配一个自定义的时间戳,并检查允许的迟到是如何工作的。当我在interactive()runner()中运行下面的代码时,它工作正常,但当我切换到dataflowrunner()时,它开始抛出错误Google cloud dataflow 数据流-分配自定义时间戳时出错,google-cloud-dataflow,apache-beam,Google Cloud Dataflow,Apache Beam,我试图分配一个自定义的时间戳,并检查允许的迟到是如何工作的。当我在interactive()runner()中运行下面的代码时,它工作正常,但当我切换到dataflowrunner()时,它开始抛出错误 Map(lambda x: window.TimestampedValue(x, x["timestamp"])) 两种情况下的输入数据相同,即{'name':'rou','score':50,'timestamp':1618295060}。在DataflowUI中,我没
Map(lambda x: window.TimestampedValue(x, x["timestamp"]))
两种情况下的输入数据相同,即{'name':'rou','score':50,'timestamp':1618295060}。在DataflowUI中,我没有看到任何错误,但我看不到错误的详细信息。我包括了日志记录和异常。我不知道为什么没有记录错误
乍一看,似乎您正在以字符串形式发送timestamp参数,我认为它应该是integer/float。另外,直接转到云日志或单击步骤本身,可以更好地查看日志。这看起来像个bug。您是否尝试打开日志资源管理器以查找任何错误日志?你看过工人日志了吗?
class BuildRecordFn(beam.DoFn):
def __init__(self):
super(BuildRecordFn, self).__init__()
def process(self, s, window=beam.DoFn.WindowParam):
#window_start = window.start.to_utc_datetime()
window_end = window.end.to_utc_datetime()
return [dict(name=s[0],score=s[1], timestamp=str(window_end))]
windowed_words = (words_source
| "read" >>
beam.io.ReadFromPubSub(topic="projects/{}/topics/beambasics".format(project))
|"To Dict" >> beam.Map(json.loads)
|"with timestamp">> Map(lambda x: window.TimestampedValue(x, x["timestamp"]))
|"Map" >>Map(lambda x : (x['name'],x['score']))
| "window" >> beam.WindowInto(window.FixedWindows(60),
#trigger=Repeatedly(AfterProcessingTime(1 * 10)),
# accumulation_mode=AccumulationMode.ACCUMULATING,
allowed_lateness=Duration(seconds=1*50))
|"Group">> CombinePerKey(sum)
|"convert to dict">> ParDo(BuildRecordFn())
|"Write To BigQuery" >> WriteToBigQuery(table=table, schema=schema,
create_disposition=BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=BigQueryDisposition.WRITE_APPEND)
)