Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/323.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/variables/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何对列表的值求和';s元素哪些元素';值来自pandas中的Antor数据帧?_Python_Pandas - Fatal编程技术网

Python 如何对列表的值求和';s元素哪些元素';值来自pandas中的Antor数据帧?

Python 如何对列表的值求和';s元素哪些元素';值来自pandas中的Antor数据帧?,python,pandas,Python,Pandas,我有两个数据帧,分别称为df1和df2。我想对df2中的列表值求和,列表值来自df1 例如: df1: 和df2: df2 = pd.DataFrame([['a',['b','c','d']],['b',['a','c']]],columns=['name2','data2']) df2 name2 data2 0 a [b, c, d] 1 b [a, c] 最后,我想说: name2 data2 0

我有两个数据帧,分别称为
df1
df2
。我想对
df2
中的列表值求和,列表值来自
df1

例如:

df1:

和df2:

df2 = pd.DataFrame([['a',['b','c','d']],['b',['a','c']]],columns=['name2','data2'])
df2

    name2         data2
0      a      [b, c, d]
1      b         [a, c]
最后,我想说:

    name2   data2
0      a      146
1      b       56

怎么做?非常感谢。

首先通过
df1
创建字典,然后使用
get
列出对
dict
映射值的理解,如果将不匹配的值添加到
0
sum

d = df1.set_index('name1')['data1'].to_dict()
df2['data2'] = [sum(d.get(y, 0) for y in x) for x in df2['data2']]
print (df2)

  name2  data2
0     a    146
1     b     56
如果要删除
NaN
s,可以使用
filter

也可以

d = dict(df1.values)
df2['s'] = df2.data2.transform(lambda v: pd.Series(v).map(d)).sum(1) 

0    146.0
1     56.0
dtype: float6


您可以使用
df1
上的
pivot
将名称放入列中,然后索引到
df2

pivoted = df1.pivot(columns="name1").data1.sum()
df2.data2 = df2.data2.apply(lambda x: pivoted[x].sum())

  name2  data2
0     a  146.0
1     b   56.0

您可以将
collections.defaultdict
dict一起使用

from collections import defaultdict

d = defaultdict(int, df1.set_index('name1')['data1'].to_dict())

df2['sum'] = [sum(map(d.__getitem__, x)) for x in df2['data2']]

print(df2)

  name2      data2  sum
0     a  [b, c, d]  146
1     b  [a, c, e]   56
对于较大的数据帧,这将比生成器表达式更有效:

from collections import defaultdict

def jpp(df1, df2):
    d = defaultdict(int, df1.set_index('name1')['data1'].to_dict())
    return [sum(map(d.__getitem__, x)) for x in df2['data2']]

def jez(df1, df2):
    d = df1.set_index('name1')['data1'].to_dict()
    return [sum(d.get(y, 0) for y in x) for x in df2['data2']]

df2 = pd.concat([df2]*10000)

%timeit jpp(df1, df2)  # 32.8 ms per loop
%timeit jez(df1, df2)  # 49.1 ms per loop

很棒的方法。但是如果
df2
值得到一个Nan怎么办?它是零。@runningman-我认为最好的方法是用
d=df1.dropna(子集=['data1'])删除
NaN
s行。将索引('name1')['data1'])设置为dict()
pivoted = df1.pivot(columns="name1").data1.sum()
df2.data2 = df2.data2.apply(lambda x: pivoted[x].sum())

  name2  data2
0     a  146.0
1     b   56.0
from collections import defaultdict

d = defaultdict(int, df1.set_index('name1')['data1'].to_dict())

df2['sum'] = [sum(map(d.__getitem__, x)) for x in df2['data2']]

print(df2)

  name2      data2  sum
0     a  [b, c, d]  146
1     b  [a, c, e]   56
from collections import defaultdict

def jpp(df1, df2):
    d = defaultdict(int, df1.set_index('name1')['data1'].to_dict())
    return [sum(map(d.__getitem__, x)) for x in df2['data2']]

def jez(df1, df2):
    d = df1.set_index('name1')['data1'].to_dict()
    return [sum(d.get(y, 0) for y in x) for x in df2['data2']]

df2 = pd.concat([df2]*10000)

%timeit jpp(df1, df2)  # 32.8 ms per loop
%timeit jez(df1, df2)  # 49.1 ms per loop