Python 如何对列表的值求和';s元素哪些元素';值来自pandas中的Antor数据帧?
我有两个数据帧,分别称为Python 如何对列表的值求和';s元素哪些元素';值来自pandas中的Antor数据帧?,python,pandas,Python,Pandas,我有两个数据帧,分别称为df1和df2。我想对df2中的列表值求和,列表值来自df1 例如: df1: 和df2: df2 = pd.DataFrame([['a',['b','c','d']],['b',['a','c']]],columns=['name2','data2']) df2 name2 data2 0 a [b, c, d] 1 b [a, c] 最后,我想说: name2 data2 0
df1
和df2
。我想对df2
中的列表值求和,列表值来自df1
例如:
df1:
和df2:
df2 = pd.DataFrame([['a',['b','c','d']],['b',['a','c']]],columns=['name2','data2'])
df2
name2 data2
0 a [b, c, d]
1 b [a, c]
最后,我想说:
name2 data2
0 a 146
1 b 56
怎么做?非常感谢。首先通过
df1
创建字典,然后使用get
列出对dict
映射值的理解,如果将不匹配的值添加到0
到sum
:
d = df1.set_index('name1')['data1'].to_dict()
df2['data2'] = [sum(d.get(y, 0) for y in x) for x in df2['data2']]
print (df2)
name2 data2
0 a 146
1 b 56
如果要删除NaN
s,可以使用filter
:
也可以
d = dict(df1.values)
df2['s'] = df2.data2.transform(lambda v: pd.Series(v).map(d)).sum(1)
0 146.0
1 56.0
dtype: float6
或
您可以使用
df1
上的pivot
将名称放入列中,然后索引到df2
:
pivoted = df1.pivot(columns="name1").data1.sum()
df2.data2 = df2.data2.apply(lambda x: pivoted[x].sum())
name2 data2
0 a 146.0
1 b 56.0
您可以将
collections.defaultdict
与dict一起使用
from collections import defaultdict
d = defaultdict(int, df1.set_index('name1')['data1'].to_dict())
df2['sum'] = [sum(map(d.__getitem__, x)) for x in df2['data2']]
print(df2)
name2 data2 sum
0 a [b, c, d] 146
1 b [a, c, e] 56
对于较大的数据帧,这将比生成器表达式更有效:
from collections import defaultdict
def jpp(df1, df2):
d = defaultdict(int, df1.set_index('name1')['data1'].to_dict())
return [sum(map(d.__getitem__, x)) for x in df2['data2']]
def jez(df1, df2):
d = df1.set_index('name1')['data1'].to_dict()
return [sum(d.get(y, 0) for y in x) for x in df2['data2']]
df2 = pd.concat([df2]*10000)
%timeit jpp(df1, df2) # 32.8 ms per loop
%timeit jez(df1, df2) # 49.1 ms per loop
很棒的方法。但是如果df2
值得到一个Nan怎么办?它是零。@runningman-我认为最好的方法是用d=df1.dropna(子集=['data1'])删除NaN
s行。将索引('name1')['data1'])设置为dict()
pivoted = df1.pivot(columns="name1").data1.sum()
df2.data2 = df2.data2.apply(lambda x: pivoted[x].sum())
name2 data2
0 a 146.0
1 b 56.0
from collections import defaultdict
d = defaultdict(int, df1.set_index('name1')['data1'].to_dict())
df2['sum'] = [sum(map(d.__getitem__, x)) for x in df2['data2']]
print(df2)
name2 data2 sum
0 a [b, c, d] 146
1 b [a, c, e] 56
from collections import defaultdict
def jpp(df1, df2):
d = defaultdict(int, df1.set_index('name1')['data1'].to_dict())
return [sum(map(d.__getitem__, x)) for x in df2['data2']]
def jez(df1, df2):
d = df1.set_index('name1')['data1'].to_dict()
return [sum(d.get(y, 0) for y in x) for x in df2['data2']]
df2 = pd.concat([df2]*10000)
%timeit jpp(df1, df2) # 32.8 ms per loop
%timeit jez(df1, df2) # 49.1 ms per loop