Pandas 在新列中显示groupby和agg值

Pandas 在新列中显示groupby和agg值,pandas,group-by,aggregate,Pandas,Group By,Aggregate,如果大型数据集中的值相等,我不知道如何获得按列分组的列“tin” import pandas as pd df = pd.DataFrame({'company' : [ABC, ABC , XYZ, XYZ], 'tin': ['5555', '1111', '5555', '2222'] }) 我相信你需要: Desirable result: df = pd.Dat

如果大型数据集中的值相等,我不知道如何获得按列分组的列“tin”

import pandas as pd    
df = pd.DataFrame({'company' : [ABC, ABC , XYZ, XYZ],
                   'tin': ['5555', '1111', '5555', '2222']                   
                   })
我相信你需要:

Desirable result:

df = pd.DataFrame({'company' : [ABC, ABC , XYZ, XYZ],                   
                   'tin': ['5555', '1111', '5555', '2222'],                     
                   'column' : ['text' ABC and XYZ, None,'text' ABC and XYZ, None]

               })
第一次用于测试是否通过名为
df1
的第一个数据帧与参数
indicator=True
匹配,以及左连接的方式为'left':

df1 = pd.DataFrame({ 'tin': ['5555', '5555'], 
                   'name' : 'AAA,BBB'.split(',')})

print (df1)
    tin name
0  5555  AAA
1  5555  BBB

df2 = pd.DataFrame({'company' : 'ABC,ABC,XYZ,XYZ,ABC,ABC,XYZ,XYZ'.split(','), 
                   'tin': ['5555', '1111', '5555', '2222', '5555', '1111', '5555', '2222'], 
                   'name' : 'AAA,AAA,AAA,AAA,BBB,BBB,BBB,BBB'.split(',')})

print (df2)
  company   tin name
0     ABC  5555  AAA
1     ABC  1111  AAA
2     XYZ  5555  AAA
3     XYZ  2222  AAA
4     ABC  5555  BBB
5     ABC  1111  BBB
6     XYZ  5555  BBB
7     XYZ  2222  BBB
然后按以下方式仅过滤两行:

最后按两列进行聚合,并按以下方式重新分配:


如果大数据集中的值相等。
-df看起来有多大?谢谢。在dfbig的“专栏”中,写下“公司”的价值观。如何为“uniques”行添加第二列?df['column']=(df['tin'].map(df[df['tin'].isin([vals,vals_2]).groupby('tin')['company'].agg('and'.join)),如果我们有两列'tin'和'name'作为pd df=pd.DataFrame({'company':[ABC,ABC,XYZ,XYZ,ABC,ABC,XYZ,XYZ]),tin:[5555',1111',5555',2222',5555',1111',5555',2222',name:[AAA,AAA,AAA,BBB,BBB,BBB],})我认为“列”:[ABC和XYZ,Nan,Nan,ABC和XYZ,ABC和XYZ,Nan,Nan,ABC和XYZ]因此,5555 AAA的ABC不会与5555 BBBDear jezrael的XYZ相交,是的。但是,更准确地说,它并不复杂,只是它不是目标。我们应该保存所有行并在列“company”的聚合信息中添加新列,因为“tin”和“name”是匹配的。我尝试了df['column'=(df['tin','name'])。map(df[df['tin','name'])。isin{'tin':vals,'name':vals2}].groupby('tin','name')['company'].agg('and'.join)))无效
df = df2.merge(df1, on=['tin','name'], how='left', indicator=True)
print (df)
  company   tin name     _merge
0     ABC  5555  AAA       both
1     ABC  1111  AAA  left_only
2     XYZ  5555  AAA       both
3     XYZ  2222  AAA  left_only
4     ABC  5555  BBB       both
5     ABC  1111  BBB  left_only
6     XYZ  5555  BBB       both
7     XYZ  2222  BBB  left_only
df = df[df['_merge'].eq('both')]
print (df)
  company   tin name _merge
0     ABC  5555  AAA   both
2     XYZ  5555  AAA   both
4     ABC  5555  BBB   both
6     XYZ  5555  BBB   both
s = df.groupby(['tin','name'])['company'].agg(' and '.join).rename('new')
df = df2.join(s, on=['tin','name'])
print (df)
  company   tin name          new
0     ABC  5555  AAA  ABC and XYZ
1     ABC  1111  AAA          NaN
2     XYZ  5555  AAA  ABC and XYZ
3     XYZ  2222  AAA          NaN
4     ABC  5555  BBB  ABC and XYZ
5     ABC  1111  BBB          NaN
6     XYZ  5555  BBB  ABC and XYZ
7     XYZ  2222  BBB          NaN