Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/neo4j/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Neo4j 如何在阿蒙森并行元摄取作业?_Neo4j_Parallel Processing_Python Multiprocessing_Snowflake Cloud Data Platform_Lyft Api - Fatal编程技术网

Neo4j 如何在阿蒙森并行元摄取作业?

Neo4j 如何在阿蒙森并行元摄取作业?,neo4j,parallel-processing,python-multiprocessing,snowflake-cloud-data-platform,lyft-api,Neo4j,Parallel Processing,Python Multiprocessing,Snowflake Cloud Data Platform,Lyft Api,我试图在我的项目中并行化元摄取工作,我使用的是Amundsen,但我面临一些问题。下面是同样的代码片段。我在Snowflake中执行帐户级别的并行化,然后将从Snowflake中获取的元数据摄取到Neo4J def process_all_snowflake_accounts(): """Function that loops through all the SF accounts""" snowflake_config

我试图在我的项目中并行化元摄取工作,我使用的是Amundsen,但我面临一些问题。下面是同样的代码片段。我在Snowflake中执行帐户级别的并行化,然后将从Snowflake中获取的元数据摄取到Neo4J

def process_all_snowflake_accounts():
    """Function that loops through all the SF accounts"""
    snowflake_config = read_snowflake_configuration()
    start_time = time.time()
    processes = []
    for ac_key, ac_config in snowflake_config.items():
        process = multiprocessing.Process(target=multiprocessing_snowflake_accounts, args=(ac_key, ac_config))
        processes.append(process)
        process.start()
    for process in processes:
        process.join()
    print("CPU Unit: ", multiprocessing.cpu_count)
    print('****************************************************************')
    print('Total time taken: ', time.time() - start_time)
    print('****************************************************************')
上面的代码很少会跳过一些帐户数据库而不显示任何错误,但大多数情况下会显示下面提到的错误:

"Scanning Snowflake ..."
"Process account: Account-1"
"Process account: Account-2"
"Process account: Account-3"
"Launching job for Account-1-DB-1"
"Launching job for Account-2-DB-1"
"Launching job for Account-3-DB-1"
"Launching job for Account-2-DB-2"
"Launching job for Account-1-DB-2"
ERROR:databuilder.publisher.neo4j_csv_publisher:Failed to publish. Rolling back.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/amundsen_databuilder-2.6.4-py3.7.egg/databuilder/publisher/neo4j_csv_publisher.py", line 202, in publish_impl
    tx = self._publish_node(node_file, tx=tx)
  File "/usr/local/lib/python3.7/site-packages/amundsen_databuilder-2.6.4-py3.7.egg/databuilder/publisher/neo4j_csv_publisher.py", line 266, in _publish_node
    with open(node_file, 'r', encoding='utf8') as node_csv:
FileNotFoundError: [Errno 2] No such file or directory: '/var/tmp/amundsen/tables/nodes/Description_4.csv'