Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/299.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 加速spacy和csv导出_Python_Pandas_For Loop_Spacy - Fatal编程技术网

Python 加速spacy和csv导出

Python 加速spacy和csv导出,python,pandas,for-loop,spacy,Python,Pandas,For Loop,Spacy,需要建议调整以下代码 import sys import pandas as pd import spacy #Spliting tokens using the Spacy def parsetext(df): nlp = spacy.load("en_core_web_sm") parsed_tokens = [] for index, row in df.iterrows(): filtered_tokens=[] doc = nl

需要建议调整以下代码

import sys
import pandas as pd
import spacy

#Spliting tokens using the Spacy
def parsetext(df):
    nlp = spacy.load("en_core_web_sm")
    parsed_tokens = []
    for index, row in df.iterrows():
        filtered_tokens=[]
        doc = nlp(str(row['Column1Text']))
        for word in doc:
            if word.is_stop==False:
                filtered_tokens.append(word)
        parsed_tokens.append(filtered_tokens)

    df['Tokens'] = parsed_tokens
    df['Processed'] = 0

# Main method point of entry | Read the excel and generate tokens
def main():
        df = pd.read_excel (sys.argv[1], sheet_name='Sheet1', header=None)
        df.columns = ["Column1Text","Column2Text","C3","C4","C5","C6","C7"]
        #Replace the Nan with empty values
        df["Column1Text"].fillna('',inplace=True)
        df["Column2Text"].fillna('',inplace=True)
        #Parse for tokens
        parsetext(df)
        export_csv = df[['Column1Text', 'Column2Text','Tokens','Processed']].to_csv(sys.path[0]+ r'\parsed_file.csv', index = None, header=True)
parsetext(df)方法需要2到3秒才能完成


任何加速的建议???

从数据帧中提取文本的速度应该更快,使用
nlp.pipe()
可以更快地批量处理文本,然后将结果保存回数据帧