Python Apache Beam:日期值超出范围

Python Apache Beam:日期值超出范围,python,google-cloud-dataflow,apache-beam,Python,Google Cloud Dataflow,Apache Beam,应用或示例构建程序时,每次尝试插入到大查询时,我都会出现以下错误: 溢出错误:日期值超出范围[运行“格式”时] 我的梁管道是这样的: Bigquery = (transformation | 'Format' >> beam.ParDo(FormatBigQueryoFn()) | 'Write to BigQuery' >> beam.io.Write(beam.io.BigQuerySink( '

应用或示例构建程序时,每次尝试插入到大查询时,我都会出现以下错误:

溢出错误:日期值超出范围[运行“格式”时]

我的梁管道是这样的:

Bigquery = (transformation
            | 'Format' >> beam.ParDo(FormatBigQueryoFn())
            | 'Write to BigQuery' >> beam.io.Write(beam.io.BigQuerySink(
            'XXXX',
            schema=TABLE_SCHEMA,
            create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
            write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
        )))
在FormatBigQueryoFn类中,它应该是窗口数据时间的逻辑

示例1的代码:

def timestamp2str(t, fmt='%Y-%m-%d %H:%M:%S.000'):
  """Converts a unix timestamp into a formatted string."""
    return datetime.fromtimestamp(t).strftime(fmt)

    class TeamScoresDict(beam.DoFn):
  """Formats the data into a dictionary of BigQuery columns with their values
  Receives a (team, score) pair, extracts the window start timestamp, and
  formats everything together into a dictionary. The dictionary is in the format
  {'bigquery_column': value}
  """

def process(self, team_score, window=beam.DoFn.WindowParam):
    team, score = team_score
    start = timestamp2str(int(window.start))
    yield {
        'team': team,
        'total_score': score,
        'window_start': start,
        'processing_time': timestamp2str(int(time.time()))
}
示例2的代码:

class FormatDoFn(beam.DoFn):
  def process(self, element, window=beam.DoFn.WindowParam):
    ts_format = '%Y-%m-%d %H:%M:%S.%f UTC'
    window_start = window.start.to_utc_datetime().strftime(ts_format)
    window_end = window.end.to_utc_datetime().strftime(ts_format)
    return [{'word': element[0],
             'count': element[1],
             'window_start':window_start,
'window_end':window_end}]
我的管道可能出了什么问题

编辑:

例如,如果我打印window.start,我会得到:

Timestamp(-9223372036860)

问题是我在用Google Pub/Sub测试之前读取了一个文件中的数据

当我从文件中读取数据时,元素没有时间戳

元素中必须有时间戳

发布/订阅自动附加此时间戳

发件人:

最简单的窗口形式是使用固定时间窗口:给定一个可能持续更新的时间戳PCollection,每个窗口可能捕获(例如)时间戳为五分钟间隔的所有元素