Google cloud dataflow Composer-数据流钩子崩溃

Google cloud dataflow Composer-数据流钩子崩溃,google-cloud-dataflow,airflow,google-cloud-composer,Google Cloud Dataflow,Airflow,Google Cloud Composer,我在Airflow中创建了一个每小时一次的任务来调度一个数据流作业,但是Airflow库提供的钩子在数据流作业实际成功时大部分时间都会崩溃 [2018-05-25 07:05:03,523] {base_task_runner.py:98} INFO - Subtask: [2018-05-25 07:05:03,439] {gcp_dataflow_hook.py:109} WARNING - super(GcsIO, cls).__new__(cls, storage_client))

我在Airflow中创建了一个每小时一次的任务来调度一个数据流作业,但是Airflow库提供的钩子在数据流作业实际成功时大部分时间都会崩溃

[2018-05-25 07:05:03,523] {base_task_runner.py:98} INFO - Subtask: [2018-05-25 07:05:03,439] {gcp_dataflow_hook.py:109} WARNING -   super(GcsIO, cls).__new__(cls, storage_client))
[2018-05-25 07:05:03,721] {base_task_runner.py:98} INFO - Subtask: Traceback (most recent call last):
[2018-05-25 07:05:03,725] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/bin/airflow", line 27, in <module>
[2018-05-25 07:05:03,726] {base_task_runner.py:98} INFO - Subtask:     args.func(args)
[2018-05-25 07:05:03,729] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 392, in run
[2018-05-25 07:05:03,729] {base_task_runner.py:98} INFO - Subtask:     pool=args.pool,
[2018-05-25 07:05:03,731] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/lib/python2.7/site-packages/airflow/utils/db.py", line 50, in wrapper
[2018-05-25 07:05:03,732] {base_task_runner.py:98} INFO - Subtask:     result = func(*args, **kwargs)
[2018-05-25 07:05:03,734] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1492, in _run_raw_task
[2018-05-25 07:05:03,738] {base_task_runner.py:98} INFO - Subtask:     result = task_copy.execute(context=context)
[2018-05-25 07:05:03,740] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/lib/python2.7/site-packages/airflow/contrib/operators/dataflow_operator.py", line 313, in execute
[2018-05-25 07:05:03,746] {base_task_runner.py:98} INFO - Subtask:     self.py_file, self.py_options)
[2018-05-25 07:05:03,748] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/lib/python2.7/site-packages/airflow/contrib/hooks/gcp_dataflow_hook.py", line 188, in start_python_dataflow
[2018-05-25 07:05:03,751] {base_task_runner.py:98} INFO - Subtask:     label_formatter)
[2018-05-25 07:05:03,753] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/lib/python2.7/site-packages/airflow/contrib/hooks/gcp_dataflow_hook.py", line 158, in _start_dataflow
[2018-05-25 07:05:03,756] {base_task_runner.py:98} INFO - Subtask:     _Dataflow(cmd).wait_for_done()
[2018-05-25 07:05:03,757] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/lib/python2.7/site-packages/airflow/contrib/hooks/gcp_dataflow_hook.py", line 129, in wait_for_done
[2018-05-25 07:05:03,759] {base_task_runner.py:98} INFO - Subtask:     line = self._line(fd)
[2018-05-25 07:05:03,761] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/lib/python2.7/site-packages/airflow/contrib/hooks/gcp_dataflow_hook.py", line 110, in _line
[2018-05-25 07:05:03,763] {base_task_runner.py:98} INFO - Subtask:     line = lines[-1][:-1]
[2018-05-25 07:05:03,766] {base_task_runner.py:98} INFO - Subtask: IndexError: list index out of range
[2018-05-25 07:05:03523]{base_task_runner.py:98}信息-子任务:[2018-05-25 07:05:03439]{gcp_dataflow_hook.py:109}警告-超级(GcsIO,cls)。\uu新(cls,存储,客户端))
[2018-05-25 07:05:03721]{base_task_runner.py:98}信息-子任务:回溯(最近一次呼叫最后一次):
[2018-05-25 07:05:03725]{base_task_runner.py:98}信息-子任务:文件“/usr/local/bin/aiffair”,第27行,在
[2018-05-2507:05:03726]{base_task_runner.py:98}信息-子任务:args.func(args)
[2018-05-25 07:05:03729]{base_task_runner.py:98}信息-子任务:文件“/usr/local/lib/python2.7/site packages/afflow/bin/cli.py”,第392行,运行中
[2018-05-25 07:05:03729]{base_task_runner.py:98}INFO-子任务:pool=args.pool,
[2018-05-25 07:05:03731]{base_task_runner.py:98}信息-子任务:文件“/usr/local/lib/python2.7/site packages/aiffort/utils/db.py”,第50行,在包装器中
[2018-05-25 07:05:03732]{base_task_runner.py:98}信息-子任务:result=func(*args,**kwargs)
[2018-05-25 07:05:03734]{base_task_runner.py:98}信息-子任务:文件“/usr/local/lib/python2.7/site packages/afflow/models.py”,第1492行,在_run_raw_task中
[2018-05-25 07:05:03738]{base_task_runner.py:98}INFO-子任务:result=task_copy.execute(context=context)
[2018-05-25 07:05:03740]{base_task_runner.py:98}信息-子任务:文件“/usr/local/lib/python2.7/site packages/afflow/contrib/operators/dataflow_operator.py”,执行中第313行
[2018-05-25 07:05:03746]{base_task_runner.py:98}信息-子任务:self.py_文件,self.py_选项)
[2018-05-25 07:05:03748]{base_task_runner.py:98}信息-子任务:文件“/usr/local/lib/python2.7/site packages/afflow/contrib/hooks/gcp_dataflow_hook.py”,第188行,在start_python_dataflow中
[2018-05-2507:05:03751]{base_task_runner.py:98}信息-子任务:标签格式化程序)
[2018-05-25 07:05:03753]{base_task_runner.py:98}信息-子任务:文件“/usr/local/lib/python2.7/site packages/afflow/contrib/hooks/gcp_dataflow_hook.py”,第158行,在启动数据流中
[2018-05-25 07:05:03756]{base_task_runner.py:98}信息-子任务:_Dataflow(cmd)。等待_done()
[2018-05-25 07:05:03757]{base_task_runner.py:98}信息-子任务:文件“/usr/local/lib/python2.7/site packages/afflow/contrib/hooks/gcp_dataflow_hook.py”,第129行,等待完成
[2018-05-2507:05:03759]{base_task_runner.py:98}信息-子任务:line=self.\u line(fd)
[2018-05-25 07:05:03761]{base_task_runner.py:98}信息-子任务:文件“/usr/local/lib/python2.7/site packages/aiffair/contrib/hooks/gcp_dataflow_hook.py”,第110行
[2018-05-2507:05:03763]{base_task_runner.py:98}信息-子任务:line=line[-1][:-1]
[2018-05-25 07:05:03766]{base_task_runner.py:98}信息-子任务:索引器:列表索引超出范围

我在中查找该文件,但行错误不匹配,这使我认为Cloud Composer中的实际气流实例已过时。有没有办法更新它?

这将在1.10或2.0中解决

看看这个公关

这已合并到master。您可以使用此PR代码创建自己的插件