Python 3.x 索引器多处理池_Python 3.x_Pandas_Multiprocessing

Python 3.x 索引器多处理池

python-3.x pandas

Python 3.x 索引器多处理池,python-3.x,pandas,multiprocessing,Python 3.x,Pandas,Multiprocessing,我得到了一个索引器，它使用多处理来并行处理pandas数据帧的一部分空缺是一个包含多个空缺的数据框，其中一列是原始文本 def addSkillRelevance(vacancies): skills = pickle.load(open("skills.pkl", "rb")) vacancies['skill'] = '' vacancies['skillcount'] = 0 vacancies['all_skills_in_vacancy'] = ''

我得到了一个索引器，它使用多处理来并行处理pandas数据帧的一部分<代码>空缺是一个包含多个空缺的数据框，其中一列是原始文本

def addSkillRelevance(vacancies):
    skills = pickle.load(open("skills.pkl", "rb"))

    vacancies['skill'] = ''
    vacancies['skillcount'] = 0
    vacancies['all_skills_in_vacancy'] = ''
    new_vacancies = pd.DataFrame(columns=vacancies.columns)

    for vacancy_index, vacancy_row in vacancies.iterrows():

        #Create a df for which each row is a found skill (with the other attributes of the vacancy)
        per_vacancy_df = pd.DataFrame(columns=vacancies.columns)
        all_skills_in_vacancy = []
        skillcount = 0

        for skill_index, skill_row in skills.iterrows():

            #Making the search for the skill in the text body a bit smarter
            spaceafter = ' ' + skill_row['txn_skill_name'] + ' '
            newlineafter = ' ' + skill_row['txn_skill_name'] + '\n'
            tabafter = ' ' + skill_row['txn_skill_name'] + '\t'

            #Statement that returns true if we find a variation of the skill in the text body
            if((spaceafter in vacancies.at[vacancy_index,'body']) or (newlineafter in vacancies.at[vacancy_index,'body']) or (tabafter in vacancies.at[vacancy_index,'body'])):
                #Adding the skill to the list of skills found in the vacancy
                all_skills_in_vacancy.append(skill_row['txn_skill_name'])

                #Increasing the skillcount
                skillcount += 1

                #Adding the skill to the row
                vacancies.at[vacancy_index,'skill'] = skill_row['txn_skill_name']

                #Add a row to the vacancy df where 1 row, means 1 skill
                per_vacancy_df = per_vacancy_df.append(vacancies.iloc[vacancy_index])

        #Adding the list of all found skills in the vacancy to each (skill) row
        per_vacancy_df['all_skills_in_vacancy'] = str(all_skills_in_vacancy)
        per_vacancy_df['skillcount'] = skillcount

        #Adds the individual vacancy df to a new vacancy df
        new_vacancies = new_vacancies.append(per_vacancy_df)  
    return(new_vacancies)

def executeSkillScript(vacancies):
        from multiprocessing import Pool

        vacancies = vacancies.head(100298)

        num_workers = 47
        pool = Pool(num_workers)

        vacancy_splits = np.array_split(vacancies, num_workers)
        results_list = pool.map(addSkillRelevance,vacancy_splits)
        new_vacancies = pd.concat(results_list, axis=0)

        pool.close()
        pool.join()

executeSkillScript(vacancies)

函数

addSkillRelevance（）

接收熊猫数据帧并输出熊猫数据帧（具有更多列）。出于某种原因，在完成所有的多处理之后，我在

results\u list=pool.map（addSkillRelevance，空缺\u splits）

上得到了一个索引器。我被卡住了，因为我不知道如何处理这个错误。有人知道索引器为什么会发生吗

错误：

    IndexError                                Traceback (most recent call last)
<ipython-input-11-7cb04a51c051> in <module>()
----> 1 executeSkillScript(vacancies)

<ipython-input-9-5195d46f223f> in executeSkillScript(vacancies)
     14 
     15     vacancy_splits = np.array_split(vacancies, num_workers)
---> 16     results_list = pool.map(addSkillRelevance,vacancy_splits)
     17     new_vacancies = pd.concat(results_list, axis=0)
     18 

~/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/multiprocessing/pool.py in map(self, func, iterable, chunksize)
    264         in a list that is returned.
    265         '''
--> 266         return self._map_async(func, iterable, mapstar, chunksize).get()
    267 
    268     def starmap(self, func, iterable, chunksize=None):

~/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
    642             return self._value
    643         else:
--> 644             raise self._value
    645 
    646     def _set(self, i, obj):

IndexError: single positional indexer is out-of-bounds

索引器错误回溯（最近一次调用）
在（）
---->1个executeSkillScript（空缺）
在executeSkillScript中（空缺）
14
15个空缺\u拆分=np.数组\u拆分（空缺，工人数量）
--->16结果列表=pool.map（addSkillRelevance，空缺分割）
17个新职位=pd.concat（结果列表，轴=0）
18
映射中的~/anaconda3/envs/amazonei\u tensorflow\u p36/lib/python3.6/multiprocessing/pool.py（self、func、iterable、chunksize）
返回的列表中的264。
265         '''
-->266返回self.\u map\u async（func、iterable、mapstar、chunksize）.get（）
267
268 def星图（self、func、iterable、chunksize=None）：
get中的~/anaconda3/envs/amazonei\u tensorflow\u p36/lib/python3.6/multiprocessing/pool.py（self，超时）
642返回自身值
643其他：
-->644提高自我价值
645
646 def_设置（自、i、obj）：
索引器：单个位置索引器超出范围

根据建议，错误来自这一行：

per_vacancy_df = per_vacancy_df.append(vacancies.iloc[vacancy_index])

发生错误的原因是

空缺_索引

不在

空缺

数据框的索引中。

您可以将代码包括在

添加技能相关性

中吗？错误可能来自您在

数据帧上执行的操作。嗨，Brandon，我包括了addSkillRelevance
函数