Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 索引器多处理池_Python 3.x_Pandas_Multiprocessing - Fatal编程技术网

Python 3.x 索引器多处理池

Python 3.x 索引器多处理池,python-3.x,pandas,multiprocessing,Python 3.x,Pandas,Multiprocessing,我得到了一个索引器,它使用多处理来并行处理pandas数据帧的一部分空缺是一个包含多个空缺的数据框,其中一列是原始文本 def addSkillRelevance(vacancies): skills = pickle.load(open("skills.pkl", "rb")) vacancies['skill'] = '' vacancies['skillcount'] = 0 vacancies['all_skills_in_vacancy'] = ''

我得到了一个索引器,它使用多处理来并行处理pandas数据帧的一部分<代码>空缺是一个包含多个空缺的数据框,其中一列是原始文本

def addSkillRelevance(vacancies):
    skills = pickle.load(open("skills.pkl", "rb"))

    vacancies['skill'] = ''
    vacancies['skillcount'] = 0
    vacancies['all_skills_in_vacancy'] = ''
    new_vacancies = pd.DataFrame(columns=vacancies.columns)

    for vacancy_index, vacancy_row in vacancies.iterrows():

        #Create a df for which each row is a found skill (with the other attributes of the vacancy)
        per_vacancy_df = pd.DataFrame(columns=vacancies.columns)
        all_skills_in_vacancy = []
        skillcount = 0

        for skill_index, skill_row in skills.iterrows():

            #Making the search for the skill in the text body a bit smarter
            spaceafter = ' ' + skill_row['txn_skill_name'] + ' '
            newlineafter = ' ' + skill_row['txn_skill_name'] + '\n'
            tabafter = ' ' + skill_row['txn_skill_name'] + '\t'

            #Statement that returns true if we find a variation of the skill in the text body
            if((spaceafter in vacancies.at[vacancy_index,'body']) or (newlineafter in vacancies.at[vacancy_index,'body']) or (tabafter in vacancies.at[vacancy_index,'body'])):
                #Adding the skill to the list of skills found in the vacancy
                all_skills_in_vacancy.append(skill_row['txn_skill_name'])

                #Increasing the skillcount
                skillcount += 1

                #Adding the skill to the row
                vacancies.at[vacancy_index,'skill'] = skill_row['txn_skill_name']

                #Add a row to the vacancy df where 1 row, means 1 skill
                per_vacancy_df = per_vacancy_df.append(vacancies.iloc[vacancy_index])

        #Adding the list of all found skills in the vacancy to each (skill) row
        per_vacancy_df['all_skills_in_vacancy'] = str(all_skills_in_vacancy)
        per_vacancy_df['skillcount'] = skillcount

        #Adds the individual vacancy df to a new vacancy df
        new_vacancies = new_vacancies.append(per_vacancy_df)  
    return(new_vacancies)

def executeSkillScript(vacancies):
        from multiprocessing import Pool

        vacancies = vacancies.head(100298)

        num_workers = 47
        pool = Pool(num_workers)

        vacancy_splits = np.array_split(vacancies, num_workers)
        results_list = pool.map(addSkillRelevance,vacancy_splits)
        new_vacancies = pd.concat(results_list, axis=0)

        pool.close()
        pool.join()

executeSkillScript(vacancies)
函数
addSkillRelevance()
接收熊猫数据帧并输出熊猫数据帧(具有更多列)。出于某种原因,在完成所有的多处理之后,我在
results\u list=pool.map(addSkillRelevance,空缺\u splits)
上得到了一个索引器。我被卡住了,因为我不知道如何处理这个错误。有人知道索引器为什么会发生吗

错误:

    IndexError                                Traceback (most recent call last)
<ipython-input-11-7cb04a51c051> in <module>()
----> 1 executeSkillScript(vacancies)

<ipython-input-9-5195d46f223f> in executeSkillScript(vacancies)
     14 
     15     vacancy_splits = np.array_split(vacancies, num_workers)
---> 16     results_list = pool.map(addSkillRelevance,vacancy_splits)
     17     new_vacancies = pd.concat(results_list, axis=0)
     18 

~/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/multiprocessing/pool.py in map(self, func, iterable, chunksize)
    264         in a list that is returned.
    265         '''
--> 266         return self._map_async(func, iterable, mapstar, chunksize).get()
    267 
    268     def starmap(self, func, iterable, chunksize=None):

~/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
    642             return self._value
    643         else:
--> 644             raise self._value
    645 
    646     def _set(self, i, obj):

IndexError: single positional indexer is out-of-bounds
索引器错误回溯(最近一次调用)
在()
---->1个executeSkillScript(空缺)
在executeSkillScript中(空缺)
14
15个空缺\u拆分=np.数组\u拆分(空缺,工人数量)
--->16结果列表=pool.map(addSkillRelevance,空缺分割)
17个新职位=pd.concat(结果列表,轴=0)
18
映射中的~/anaconda3/envs/amazonei\u tensorflow\u p36/lib/python3.6/multiprocessing/pool.py(self、func、iterable、chunksize)
返回的列表中的264。
265         '''
-->266返回self.\u map\u async(func、iterable、mapstar、chunksize).get()
267
268 def星图(self、func、iterable、chunksize=None):
get中的~/anaconda3/envs/amazonei\u tensorflow\u p36/lib/python3.6/multiprocessing/pool.py(self,超时)
642返回自身值
643其他:
-->644提高自我价值
645
646 def_设置(自、i、obj):
索引器:单个位置索引器超出范围

根据建议,错误来自这一行:

per_vacancy_df = per_vacancy_df.append(vacancies.iloc[vacancy_index])

发生错误的原因是
空缺_索引
不在
空缺
数据框的索引中。

您可以将代码包括在
添加技能相关性
中吗?错误可能来自您在
数据帧上执行的操作。嗨,Brandon,我包括了
addSkillRelevance
函数