Python 如何基于另一个数据帧为字符串创建标签列？_Python_Pandas

Python 如何基于另一个数据帧为字符串创建标签列？

python pandas

Python 如何基于另一个数据帧为字符串创建标签列？,python,pandas,Python,Pandas,我有以下数据帧 import pandas as pd df_occurencies = pd.DataFrame({'day':[1,2,3,4,5], 'occ':[['frog','wasp','bee'], ['frog','whale','barley','orchid'], ['orchid','barley','frog'

我有以下数据帧

import pandas as pd
df_occurencies = pd.DataFrame({'day':[1,2,3,4,5],
                           'occ':[['frog','wasp','bee'],
                           ['frog','whale','barley','orchid'],
                           ['orchid','barley','frog'],
                           ['orchid','whale','frog'],
                           ['orchid','barley','tulip']]})

df_kingdoms = pd.DataFrame({'item':['frog','wasp','bee',
                              'whale','barley','orchid',
                              'tulip'],
                      'kingdom':['animalia','animalia','animalia',
                              'animalia','plantae','plantae',
                              'plantae']})

我需要设置另一列，根据

df\u值对occ列中的观察结果进行分类。
这些值都是异质的，因此预期结果如下：
    day                     occ        desired_result
0    1              [frog, wasp, bee]   "animals"
1    2  [frog, whale, barley, orchid]   "animals and plants"
2    3         [orchid, barley, frog]   "mostly plants"
3    4          [orchid, whale, frog]   "mostly animals"
4    5        [orchid, barley, tulip]   "plants"

我知道有很多方法可以解决这个问题，我尝试了一个定义了很多.loc
的函数，但都没有成功，我认为这些函数甚至不值得发布。我需要在大型数据集上执行此操作，所以越快越好
 这应该可以：
dic_kd={i:j for i,j in zip(df_kingdoms.item,df_kingdoms.kingdom)}
desired_output=[]
for I in df_occurencies.occ:
    list_aux=[dic_kd[i] for i in I]
    if (list_aux.count('animalia')!=0) and (list_aux.count('plantae')==0) :
        desired_output.append('animals')
    elif (list_aux.count('animalia')==0) and (list_aux.count('plantae')!=0) :
        desired_output.append('plants')
    elif list_aux.count('animalia')>list_aux.count('plantae'):
        desired_output.append('mostly animals')
    elif list_aux.count('animalia')<list_aux.count('plantae'):
        desired_output.append('mostly plants')
    else:
        desired_output.append('animals and plants')

df_occurencies['desired output']=desired_output

dic_kd={i:j代表i，j在zip中（df_kingdes.item，df_kingdes.kingdom）}
所需的_输出=[]
对于df_occ中的I：
列表_aux=[dic_kd[i]代表i中的i]
如果（列表辅助计数（'animalia'）！=0）和（列表辅助计数（'plantae'）==0）：
所需的_输出。追加（'动物'）
elif（列表辅助计数（'animalia'）==0）和（列表辅助计数（'plantae'）！=0）：
所需的_输出。追加（‘工厂’）
elif列表辅助计数（'animalia'）>列表辅助计数（'plantae'）：
所需的_输出。追加（'主要是动物'）
elif列表辅助计数（“动物”）