Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/281.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用google cloud dataflow python导入电话号码_Python_Import_Google Cloud Platform_Google Cloud Dataflow_Importerror - Fatal编程技术网

使用google cloud dataflow python导入电话号码

使用google cloud dataflow python导入电话号码,python,import,google-cloud-platform,google-cloud-dataflow,importerror,Python,Import,Google Cloud Platform,Google Cloud Dataflow,Importerror,我正在尝试用Python对PhoneNumber模块进行相对简单的导入 我已经在一个单独的python文件上测试了该模块,没有任何其他导入,它工作得非常好 以下是我安装的软件包: from __future__ import absolute_import from __future__ import print_function import argparse import csv import logging import os import phonenumbers import ap

我正在尝试用Python对PhoneNumber模块进行相对简单的导入

我已经在一个单独的python文件上测试了该模块,没有任何其他导入,它工作得非常好

以下是我安装的软件包:

from __future__ import absolute_import
from __future__ import print_function

import argparse
import csv
import logging
import os
import phonenumbers

import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import SetupOptions
这是我的错误信息:

Traceback (most recent call last):
  File "clean.py", line 114, in <module>
    run()
  File "clean.py", line 109, in run
    | 'WriteOutputText' >> beam.io.WriteToText(known_args.output))
  File "C:\Python27\lib\site-packages\apache_beam\pipeline.py", line 389, in __exit__
    self.run().wait_until_finish()
  File "C:\Python27\lib\site-packages\apache_beam\runners\dataflow\dataflow_runner.py", line 996, in wait_until_finish
    (self.state, getattr(self._runner, 'last_error_msg', None)), self)
apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 733, in run
    self._load_main_session(self.local_staging_directory)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 472, in _load_main_session
    pickler.load_session(session_file)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 247, in load_session
    return dill.load_session(file_path)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 363, in load_session
    module = unpickler.load()
  File "/usr/lib/python2.7/pickle.py", line 864, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1139, in load_reduce
    value = func(*args)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 766, in _import_module
    return __import__(import_name)
ImportError: No module named phonenumbers
编辑:代码==

from __future__ import absolute_import
from __future__ import print_function

import argparse
import csv
import logging
import os
from collections import OrderedDict
import phonenumbers

import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import SetupOptions


class ParseCSVFn(beam.DoFn):
    """Parses the raw CSV data into a Python dictionary."""

    def process(self, elem):
        try:
            row = list(csv.reader([elem]))[0]
            month, day, year = row[2].split('/')
            birth_dict = {
                'day': day,
                'month': month,
                'year': year,
            }
            order_dict = OrderedDict(birth_dict)
            data_dict = {
                'phoneNumber': row[4],
                'firstName': row[0],
                'lastName': row[1],
                'birthDate': order_dict,
                'voterId': row[3],
            }

            order_data_dict = OrderedDict(data_dict)

            yield order_data_dict

        except:

            pass


def run(argv=None):
    """Pipeline entry point, runs the all the necessary processes"""
    parser = argparse.ArgumentParser()
    parser.add_argument('--input',
                        type=str,
                        dest='input',
                        default='gs://wordcount_project/demo-contacts-small*.csv',
                        help='Input file to process.')
    parser.add_argument('--output',
                        dest='output',
                        # CHANGE 1/5: The Google Cloud Storage path is required
                        # for outputting the results.
                        default='gs://wordcount_project/cleaned.csv',
                        help='Output file to write results to.')

    known_args, pipeline_args = parser.parse_known_args(argv)
    pipeline_args.extend([
        # CHANGE 2/5: (OPTIONAL) Change this to DataflowRunner to
        # run your pipeline on the Google Cloud Dataflow Service.
        '--runner=DataflowRunner',
        # CHANGE 3/5: Your project ID is required in order to run your pipeline on
        # the Google Cloud Dataflow Service.
        '--project=--------',
        # CHANGE 4/5: Your Google Cloud Storage path is required for staging local
        # files.
        # '--dataset=game_dataset',
        '--staging_location=gs://wordcount_project/staging',
        # CHANGE 5/5: Your Google Cloud Storage path is required for temporary
        # files.
        '--temp_location=gs://wordcount_project/temp',
        '--job_name=cleaning-jobs',
    ])

    pipeline_options = PipelineOptions(pipeline_args)

    pipeline_options.view_as(SetupOptions).save_main_session = True

    with beam.Pipeline(options=pipeline_options) as p:
        (p
         | 'ReadInputText' >> beam.io.ReadFromText(known_args.input)
         | 'ParseDataFn' >> beam.ParDo(ParseCSVFn())
         # | 'JsonBirthDay' >> beam.ParDo(JsonBirthDay())
         # | 'MatchNumber' >> beam.ParDo(MatchNumber('phoneNumber'))
         # | 'MapData' >> beam.Map(lambda elem: (elem['phoneNumber'], elem['firstName'], elem['lastName'],
         #                                       elem['birthDate'], elem['voterId']))
         | 'WriteOutputText' >> beam.io.WriteToText(known_args.output))


if __name__ == '__main__':
    logging.getLogger().setLevel(logging.INFO)
    run()
我也尝试过安装GoogleGax的特定软件包和请求,但似乎没有帮助

编辑:新编码错误:

  File "new_clean.py", line 226, in <module>
    run()
  File "new_clean.py", line 219, in run
    | 'WriteToText' >> beam.io.WriteToText(known_args.output)
  File "C:\Python27\lib\site-packages\apache_beam\pipeline.py", line 389, in __exit__
    self.run().wait_until_finish()
  File "C:\Python27\lib\site-packages\apache_beam\pipeline.py", line 369, in run
    self.to_runner_api(), self.runner, self._options).run(False)
  File "C:\Python27\lib\site-packages\apache_beam\pipeline.py", line 382, in run
    return self.runner.run_pipeline(self)
  File "C:\Python27\lib\site-packages\apache_beam\runners\dataflow\dataflow_runner.py", line 324, in run_pipeline
    self.dataflow_client.create_job(self.job), self)
  File "C:\Python27\lib\site-packages\apache_beam\utils\retry.py", line 180, in wrapper
    return fun(*args, **kwargs)
  File "C:\Python27\lib\site-packages\apache_beam\runners\dataflow\internal\apiclient.py", line 461, in create_job
    self.create_job_description(job)
  File "C:\Python27\lib\site-packages\apache_beam\runners\dataflow\internal\apiclient.py", line 491, in create_job_description
    job.options, file_copy=self._gcs_file_copy)
  File "C:\Python27\lib\site-packages\apache_beam\runners\dataflow\internal\dependency.py", line 328, in stage_job_resources
    setup_options.requirements_file, requirements_cache_path)
  File "C:\Python27\lib\site-packages\apache_beam\runners\dataflow\internal\dependency.py", line 262, in _populate_requirements_cache
    processes.check_call(cmd_args)
  File "C:\Python27\lib\site-packages\apache_beam\utils\processes.py", line 44, in check_call
    return subprocess.check_call(*args, **kwargs)
  File "C:\Python27\lib\subprocess.py", line 186, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['C:\\Python27\\python.exe', '-m', 'pip', 'download', '--dest', 'c:\\users\\james\\appdata\\local\\temp\\dataflow-requirements-cache', '-r', 'requirements.txt', '--no-binary', ':all:']' returned non-zero exit status 1
文件“new_clean.py”,第226行,在
运行()
文件“new_clean.py”,第219行,运行中
|'WriteToText'>>beam.io.WriteToText(已知参数输出)
文件“C:\Python27\lib\site packages\apache\u beam\pipeline.py”,第389行,在\u\u出口__
self.run()
文件“C:\Python27\lib\site packages\apache\u beam\pipeline.py”,第369行,正在运行
self.to_runner_api(),self.runner,self.\u选项)。运行(False)
文件“C:\Python27\lib\site packages\apache\u beam\pipeline.py”,第382行,正在运行
返回self.runner.run_管道(self)
运行管道中的文件“C:\Python27\lib\site packages\apache\u beam\runners\dataflow\dataflow\u runner.py”,第324行
self.dataflow\u client.create\u作业(self.job),self)
文件“C:\Python27\lib\site packages\apache\u beam\utils\retry.py”,第180行,在包装器中
返回乐趣(*args,**kwargs)
文件“C:\Python27\lib\site packages\apache\u beam\runners\dataflow\internal\apiclient.py”,第461行,位于create\u作业中
自我创建工作描述(工作)
文件“C:\Python27\lib\site packages\apache\u beam\runners\dataflow\internal\apiclient.py”,第491行,位于创建作业描述中
job.options,文件\u copy=self.\u gcs\u文件\u copy)
文件“C:\Python27\lib\site packages\apache\u beam\runners\dataflow\internal\dependency.py”,第328行,在stage\u job\u参考资料中
设置\u选项.requirements\u文件,requirements\u缓存\u路径)
文件“C:\Python27\lib\site packages\apache\u beam\runners\dataflow\internal\dependency.py”,第262行,位于\u populate\u requirements\u cache中
进程。检查调用(cmd\u args)
文件“C:\Python27\lib\site packages\apache\u beam\utils\processs.py”,第44行,在check\u调用中
返回子进程。检查调用(*args,**kwargs)
文件“C:\Python27\lib\subprocess.py”,第186行,在check\u调用中
引发被调用的进程错误(retcode,cmd)
subprocess.CalledProcessError:Command'['C:\\Python27\\python.exe','-m',pip',download','-dest','C:\\users\\james\\appdata\\local\\temp\\dataflow requirements cache','-r','requirements.txt','-no binary',':all:']'返回非零退出状态1

可能是数据流没有接收到包含管道额外依赖项的文件。要安装它们,请执行以下操作:

pip freeze > requirements.txt
然后,您需要编辑
requirements.txt
文件,只保留从PyPI安装并在管道中使用的软件包

运行管道时,传递以下命令行选项:

--requirements_file requirements.txt
这在中有记录


希望这会有所帮助。

可能是数据流没有接收到带有管道额外依赖项的文件。要安装它们,请执行以下操作:

pip freeze > requirements.txt
然后,您需要编辑
requirements.txt
文件,只保留从PyPI安装并在管道中使用的软件包

运行管道时,传递以下命令行选项:

--requirements_file requirements.txt
这在中有记录


希望这能有所帮助。

Hi Pablo,我制作了requirements.txt文件,并确保检查所有的软件包,并且只使用PyPI中的软件包,我在pipline中使用这些软件包,尽管我现在在我的问题中遇到了t错误。它起了作用,我在google SDK中使用了它,没有问题,事实上我使用的是gitbash。。。谢谢嗨,Pablo,我制作了requirements.txt文件,并确保检查所有软件包,并且只使用PyPI和我在pipline中使用的软件包,尽管我现在得到了我在问题中提出的t错误。它成功了,我在google SDK中完成了,没有问题,事实上我使用的是gitbash。。。谢谢