Pandas 在新列中显示groupby和agg值
如果大型数据集中的值相等,我不知道如何获得按列分组的列“tin”Pandas 在新列中显示groupby和agg值,pandas,group-by,aggregate,Pandas,Group By,Aggregate,如果大型数据集中的值相等,我不知道如何获得按列分组的列“tin” import pandas as pd df = pd.DataFrame({'company' : [ABC, ABC , XYZ, XYZ], 'tin': ['5555', '1111', '5555', '2222'] }) 我相信你需要: Desirable result: df = pd.Dat
import pandas as pd
df = pd.DataFrame({'company' : [ABC, ABC , XYZ, XYZ],
'tin': ['5555', '1111', '5555', '2222']
})
我相信你需要:
Desirable result:
df = pd.DataFrame({'company' : [ABC, ABC , XYZ, XYZ],
'tin': ['5555', '1111', '5555', '2222'],
'column' : ['text' ABC and XYZ, None,'text' ABC and XYZ, None]
})
第一次用于测试是否通过名为df1
的第一个数据帧与参数indicator=True
匹配,以及左连接的方式为'left':
df1 = pd.DataFrame({ 'tin': ['5555', '5555'],
'name' : 'AAA,BBB'.split(',')})
print (df1)
tin name
0 5555 AAA
1 5555 BBB
df2 = pd.DataFrame({'company' : 'ABC,ABC,XYZ,XYZ,ABC,ABC,XYZ,XYZ'.split(','),
'tin': ['5555', '1111', '5555', '2222', '5555', '1111', '5555', '2222'],
'name' : 'AAA,AAA,AAA,AAA,BBB,BBB,BBB,BBB'.split(',')})
print (df2)
company tin name
0 ABC 5555 AAA
1 ABC 1111 AAA
2 XYZ 5555 AAA
3 XYZ 2222 AAA
4 ABC 5555 BBB
5 ABC 1111 BBB
6 XYZ 5555 BBB
7 XYZ 2222 BBB
然后按以下方式仅过滤两行:
最后按两列进行聚合,并按以下方式重新分配:
如果大数据集中的值相等。
-df看起来有多大?谢谢。在dfbig的“专栏”中,写下“公司”的价值观。如何为“uniques”行添加第二列?df['column']=(df['tin'].map(df[df['tin'].isin([vals,vals_2]).groupby('tin')['company'].agg('and'.join)),如果我们有两列'tin'和'name'作为pd df=pd.DataFrame({'company':[ABC,ABC,XYZ,XYZ,ABC,ABC,XYZ,XYZ]),tin:[5555',1111',5555',2222',5555',1111',5555',2222',name:[AAA,AAA,AAA,BBB,BBB,BBB],})我认为“列”:[ABC和XYZ,Nan,Nan,ABC和XYZ,ABC和XYZ,Nan,Nan,ABC和XYZ]因此,5555 AAA的ABC不会与5555 BBBDear jezrael的XYZ相交,是的。但是,更准确地说,它并不复杂,只是它不是目标。我们应该保存所有行并在列“company”的聚合信息中添加新列,因为“tin”和“name”是匹配的。我尝试了df['column'=(df['tin','name'])。map(df[df['tin','name'])。isin{'tin':vals,'name':vals2}].groupby('tin','name')['company'].agg('and'.join)))无效
df = df2.merge(df1, on=['tin','name'], how='left', indicator=True)
print (df)
company tin name _merge
0 ABC 5555 AAA both
1 ABC 1111 AAA left_only
2 XYZ 5555 AAA both
3 XYZ 2222 AAA left_only
4 ABC 5555 BBB both
5 ABC 1111 BBB left_only
6 XYZ 5555 BBB both
7 XYZ 2222 BBB left_only
df = df[df['_merge'].eq('both')]
print (df)
company tin name _merge
0 ABC 5555 AAA both
2 XYZ 5555 AAA both
4 ABC 5555 BBB both
6 XYZ 5555 BBB both
s = df.groupby(['tin','name'])['company'].agg(' and '.join).rename('new')
df = df2.join(s, on=['tin','name'])
print (df)
company tin name new
0 ABC 5555 AAA ABC and XYZ
1 ABC 1111 AAA NaN
2 XYZ 5555 AAA ABC and XYZ
3 XYZ 2222 AAA NaN
4 ABC 5555 BBB ABC and XYZ
5 ABC 1111 BBB NaN
6 XYZ 5555 BBB ABC and XYZ
7 XYZ 2222 BBB NaN