Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/282.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 自动多处理一个';函数应用';在数据帧列上_Python_Performance_Python 2.7_Pandas_Multiprocessing - Fatal编程技术网

Python 自动多处理一个';函数应用';在数据帧列上

Python 自动多处理一个';函数应用';在数据帧列上,python,performance,python-2.7,pandas,multiprocessing,Python,Performance,Python 2.7,Pandas,Multiprocessing,我有一个包含两列的简单数据框 +---------+-------+ | subject | score | +---------+-------+ | wow | 0 | +---------+-------+ | cool | 0 | +---------+-------+ | hey | 0 | +---------+-------+ | there | 0 | +---------+-------+ | come on | 0

我有一个包含两列的简单数据框

+---------+-------+ | subject | score |
+---------+-------+ | wow     | 0     |
+---------+-------+ | cool    | 0     |
+---------+-------+ | hey     | 0     |
+---------+-------+ | there   | 0     |
+---------+-------+ | come on | 0     |
+---------+-------+ | welcome | 0     |
+---------+-------+
对于“主题”列中的每个记录,我调用一个函数并更新“分数”列中的结果:

df['score'] = df['subject'].apply(find_score)

Here find_score is a function, which processes strings and returns a score :

def find_score (row):
    # Imports the Google Cloud client library
    from google.cloud import language

    # Instantiates a client
    language_client = language.Client()

    import re
    pre_text = re.sub('<[^>]*>', '', row)
    text = re.sub(r'[^\w]', ' ', pre_text)

    document = language_client.document_from_text(text)

    # Detects the sentiment of the text
    sentiment = document.analyze_sentiment().sentiment

    print("Sentiment score - %f " % sentiment.score) 

    return sentiment.score
df['score']=df['subject']。应用(查找分数)
此处find_score是一个函数,它处理字符串并返回分数:
def find_分数(行):
#导入Google云客户端库
从google.cloud导入语言
#实例化客户机
language\u client=language.client()
进口稀土
pre_text=re.sub(']*>','',第行)
text=re.sub(r'[^\w]','',前文本)
文档=语言客户端。文档来自文本(文本)
#检测文本的情感
情绪=文件。分析情绪()。情绪
打印(“情绪分数-%f”%emotional.score)
返回1.score
正如预期的那样,它工作得很好,但是它一个接一个地处理记录时相当慢

有没有一种方法可以并行化?没有手动将数据帧分割成更小的块?有没有自动完成这项工作的图书馆


Cheers

语言的实例化。每次调用
find\u score
函数时,客户端都可能是一个主要的瓶颈。您不需要为函数的每次使用创建一个新的客户端实例,因此在调用它之前,请尝试在函数外部创建它:

# Instantiates a client
language_client = language.Client()

def find_score (row):
    # Imports the Google Cloud client library
    from google.cloud import language


    import re
    pre_text = re.sub('<[^>]*>', '', row)
    text = re.sub(r'[^\w]', ' ', pre_text)

    document = language_client.document_from_text(text)

    # Detects the sentiment of the text
    sentiment = document.analyze_sentiment().sentiment

    print("Sentiment score - %f " % sentiment.score) 

    return sentiment.score

df['score'] = df['subject'].apply(find_score)
#实例化客户端
language\u client=language.client()
def find_分数(行):
#导入Google云客户端库
从google.cloud导入语言
进口稀土
pre_text=re.sub(']*>','',第行)
text=re.sub(r'[^\w]','',前文本)
文档=语言客户端。文档来自文本(文本)
#检测文本的情感
情绪=文件。分析情绪()。情绪
打印(“情绪分数-%f”%emotional.score)
返回1.score
df['score']=df['subject']。应用(查找分数)
如果您坚持,您可以像这样使用多处理:

from multiprocessing import Pool
# <Define functions and datasets here>
pool = Pool(processes = 8) # or some number of your choice
df['score'] = pool.map(find_score, df['subject'])
pool.terminate()
来自多处理导入池的

# 
pool=pool(进程=8)#或您选择的一些数字
df['score']=pool.map(查找分数,df['subject'])
pool.terminate()

每次调用
find\u score
函数时,
language.Client的实例化可能是一个主要的瓶颈。您不需要为函数的每次使用创建一个新的客户端实例,因此在调用它之前,请尝试在函数外部创建它:

# Instantiates a client
language_client = language.Client()

def find_score (row):
    # Imports the Google Cloud client library
    from google.cloud import language


    import re
    pre_text = re.sub('<[^>]*>', '', row)
    text = re.sub(r'[^\w]', ' ', pre_text)

    document = language_client.document_from_text(text)

    # Detects the sentiment of the text
    sentiment = document.analyze_sentiment().sentiment

    print("Sentiment score - %f " % sentiment.score) 

    return sentiment.score

df['score'] = df['subject'].apply(find_score)
#实例化客户端
language\u client=language.client()
def find_分数(行):
#导入Google云客户端库
从google.cloud导入语言
进口稀土
pre_text=re.sub(']*>','',第行)
text=re.sub(r'[^\w]','',前文本)
文档=语言客户端。文档来自文本(文本)
#检测文本的情感
情绪=文件。分析情绪()。情绪
打印(“情绪分数-%f”%emotional.score)
返回1.score
df['score']=df['subject']。应用(查找分数)
如果您坚持,您可以像这样使用多处理:

from multiprocessing import Pool
# <Define functions and datasets here>
pool = Pool(processes = 8) # or some number of your choice
df['score'] = pool.map(find_score, df['subject'])
pool.terminate()
来自多处理导入池的

# 
pool=pool(进程=8)#或您选择的一些数字
df['score']=pool.map(查找分数,df['subject'])
pool.terminate()

你能显示你的Foffl分数FUNC的DEF吗?考虑使用dask@Allen我已经把函数DEF添加到问题中了,你能显示你的Foffl分数FUNC的DEF吗?考虑使用dask@Allen我已将函数def添加到问题中