Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/287.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
云数据流写入BigQuery Python错误_Python_Google Bigquery_Google Cloud Dataflow_Apache Beam - Fatal编程技术网

云数据流写入BigQuery Python错误

云数据流写入BigQuery Python错误,python,google-bigquery,google-cloud-dataflow,apache-beam,Python,Google Bigquery,Google Cloud Dataflow,Apache Beam,我正在编写一个简单的Beam作业,将数据从GCS存储桶复制到BigQuery。代码如下所示: from apache_beam.options.pipeline_options import GoogleCloudOptions import apache_beam as beam pipeline_options = GoogleCloudOptions(flags=sys.argv[1:]) pipeline_options.project = PROJECT_ID pipeline_op

我正在编写一个简单的Beam作业,将数据从GCS存储桶复制到BigQuery。代码如下所示:

from apache_beam.options.pipeline_options import GoogleCloudOptions
import apache_beam as beam

pipeline_options = GoogleCloudOptions(flags=sys.argv[1:])
pipeline_options.project = PROJECT_ID
pipeline_options.region = 'us-west1'
pipeline_options.job_name = JOB_NAME
pipeline_options.staging_location = BUCKET + '/binaries'
pipeline_options.temp_location = BUCKET + '/temp'

schema = 'id:INTEGER,region:STRING,population:INTEGER,sex:STRING,age:INTEGER,education:STRING,income:FLOAT,statusquo:FLOAT,vote:STRING'
p = (beam.Pipeline(options = pipeline_options)
     | 'ReadFromGCS' >> beam.io.textio.ReadFromText('Chile.csv')
     | 'WriteToBigQuery' >> beam.io.WriteToBigQuery('project:tmp.dummy', schema = schema))
我们在项目中写入表tmp.dummy。这将导致以下堆栈跟踪:

Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 151, in _run_module_as_main
    mod_name, loader, code, fname = _get_module_details(mod_name)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 101, in _get_module_details
    loader = get_loader(mod_name)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pkgutil.py", line 464, in get_loader
    return find_loader(fullname)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pkgutil.py", line 474, in find_loader
    for importer in iter_importers(fullname):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pkgutil.py", line 430, in iter_importers
    __import__(pkg)
  File "WriteToBigQuery.py", line 49, in <module>
    | 'WriteToBigQuery' >> beam.io.WriteToBigQuery(str(PROJECT_ID + ':' + pipeline_options.write_file), schema = schema))
  File "/Users/mayansalama/Documents/GCP/gcloud_env/lib/python2.7/site-packages/apache_beam/io/gcp/bigquery.py", line 1337, in __init__
    self.table_reference = _parse_table_reference(table, dataset, project)
  File "/Users/mayansalama/Documents/GCP/gcloud_env/lib/python2.7/site-packages/apache_beam/io/gcp/bigquery.py", line 309, in _parse_table_reference
    if isinstance(table, bigquery.TableReference):
AttributeError: 'module' object has no attribute 'TableReference'
回溯(最近一次呼叫最后一次):
文件“/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py”,第151行,在运行模块中
mod_名称、加载程序、代码、fname=\u获取\u模块\u详细信息(mod_名称)
文件“/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py”,第101行,在获取模块详细信息中
加载器=获取加载器(模块名称)
文件“/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pkgutil.py”,第464行,在get_loader中
返回查找加载器(全名)
文件“/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pkgutil.py”,第474行,在find_loader中
iter_进口商(全名):
文件“/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pkgutil.py”,第430行,国际热核聚变实验堆
__进口(包装)
文件“WriteToBigQuery.py”,第49行,在
|'WriteToBigQuery'>>beam.io.WriteToBigQuery(str(PROJECT_ID+':'+pipeline_options.write_file),schema=schema))
文件“/Users/mayansalama/Documents/GCP/gcloud_env/lib/python2.7/site packages/apache_beam/io/GCP/bigquery.py”,第1337行,在__
self.table\u reference=\u parse\u table\u reference(表、数据集、项目)
文件“/Users/mayansalama/Documents/GCP/gcloud_env/lib/python2.7/site packages/apache_beam/io/GCP/bigquery.py”,第309行,在解析表参考中
如果isinstance(table,bigquery.TableReference):
AttributeError:“模块”对象没有属性“TableReference”

看来有些进口商品在什么地方出了问题;这是否可能是由于使用GoogleCloudOptions管道选项造成的?

我做了一些测试,无法重现您的问题,数据集是否已经存在?。以下代码片段对我很有用(为了更好地格式化,我使用了一个答案):

其中
dummy.csv
包含:

$ cat dummy.csv 
1,us-central1 
2,europe-west1 
BigQuery中的输出为:

使用了一些相关的依赖项:

apache-beam==2.4.0
google-cloud-bigquery==0.25.0
google-cloud-dataflow==2.4.0

我也犯了同样的错误。我意识到我安装了错误的apachebeam包。安装apachebeam时,需要将[gcp]添加到包名中

sudo pip install apache_beam[gcp]
还有一些可选的安装来修复安装错误,现在就可以开始了

sudo pip install oauth2client==3.0.0
sudo pip install httplib2==0.9.2

我试图在Mac中安装apache_beam[gcp],但没有返回结果。但这在一个好的Linux发行版中解决了这个问题,只需在apache_beam[gcp]周围加上引号。这是一个zsh问题
sudo pip install oauth2client==3.0.0
sudo pip install httplib2==0.9.2