Python 熊猫像嵌套的国家一样出色
我在这里碰到了一个街区。我必须翻译这个excel公式Python 熊猫像嵌套的国家一样出色,python,pandas,dataframe,excel-formula,pandas-groupby,Python,Pandas,Dataframe,Excel Formula,Pandas Groupby,我在这里碰到了一个街区。我必须翻译这个excel公式 =IF(COUNTIFS(advisor!$C:$C,$A3)=0,"0 disclosed", IF(COUNTIFS(advisor!$C:$C,$A3,advisor!$E:$E,2)>0,"Dependent", IF(IF(COUNTIFS(advisor!$C:$C,$A3,advisor!$B:$B,"auditor")>0,1,0)+IF(COUNTIFS(advisor!$C:$C,$A3,advisor!$B:
=IF(COUNTIFS(advisor!$C:$C,$A3)=0,"0 disclosed",
IF(COUNTIFS(advisor!$C:$C,$A3,advisor!$E:$E,2)>0,"Dependent",
IF(IF(COUNTIFS(advisor!$C:$C,$A3,advisor!$B:$B,"auditor")>0,1,0)+IF(COUNTIFS(advisor!$C:$C,$A3,advisor!$B:$B,"compensation")>0,1,0)=2,"Independent","1 disclosed")))
到目前为止,这是我的python解决方案:
df['auditor_compensation'] = np.where(df['id'].isin(df_advisor['company_id']).count() == 0,
'0 disclosed',
np.where(df_advisor['dependent'] == 2, 'dependent',
np.where((np.where(df_advisor['type']=='auditor', 1, 0)+np.where(df_advisor['type']=='compensation', 1, 0)) == 2, 'independent', '1 disclosed')))
我不断得到ValueError:值的长度与索引的长度不匹配
df样本数据:公司数据
id ticker iq_id company auditor_compensation
48299 ENXTAM:AALB IQ881736 Aalberts Industries ?
48752 ENXTAM:ABN IQ1090191 ABN AMRO Group ?
48865 ENXTAM:ACCEL IQ4492981 Accell Group ?
49226 ENXTAM:AGN IQ247906 AEGON ?
49503 ENXTAM:AD IQ373545 Koninklijke ?
下面是df_advisor的样本数据
id type company_id advisor_company_id dependent
1 auditor 4829 6091 1
17 auditor 4875 16512 1
6359 auditor 4886 7360 1
37 auditor 4922 8187 1
4415 compensation 4922 9025 1
53 auditor 4950 8187 1
非常感谢您的帮助。您的
numpy。其中
函数不会生成与原始数据帧长度相同的数组或序列。这是因为它试图组合不一致的条件,例如,df['id']
和df\u advisor['dependent']
将具有不同的长度
虽然将Excel公式转换为Pandas/NumPy很有诱惑力,但使用和可能会更高效、更可读
步骤1:组映射数据帧
df_advisor_grouped = df_advisor.groupby('company_id')\
.agg({'type': '|'.join, 'dependent': 'sum'})\
.reset_index()
print(df_advisor_grouped)
company_id type dependent
0 4829 auditor 1
1 4875 auditor 1
2 4886 auditor 1
3 4922 auditor|compensation 2
4 4950 auditor 1
# merge dataframes based on key column
res = df.merge(df_advisor_grouped, left_on='id', right_on='company_id', how='left')
步骤2:与主数据帧合并
df_advisor_grouped = df_advisor.groupby('company_id')\
.agg({'type': '|'.join, 'dependent': 'sum'})\
.reset_index()
print(df_advisor_grouped)
company_id type dependent
0 4829 auditor 1
1 4875 auditor 1
2 4886 auditor 1
3 4922 auditor|compensation 2
4 4950 auditor 1
# merge dataframes based on key column
res = df.merge(df_advisor_grouped, left_on='id', right_on='company_id', how='left')
步骤3:应用条件逻辑
# define 3 conditions
conds = [res['company_id'].isnull(), res['dependent'].eq(2),
res['type'].str.contains('auditor') & res['type'].str.contains('compensation')]
# define 3 choices
choices = ['0 disclosed', 'dependent', 'independent']
# apply np.select logic, including default argument if 3 conditions are not met
res['auditor_compensation'] = np.select(conds, choices, '1 disclosed')
没有<代码> dfiAdvor('Actudio')< /Cord>?所以,要清楚,这是在Excel中工作的,并不是Excel问题。@ Hoenie有<代码> DFIAdvor('Actudio)列,我将它添加到问题中,现在检查<代码> NP的返回向量长度。在这里,求和并考虑数据是如何被复制或删除的。这是一个不错的起点。非常容易理解,我设法学会了一些技巧。这些技巧指向了正确的方向。步骤2:是否使用
df
和df\u advisor\u分组完成
对吗?谢谢你,伙计!