Python 对大量数据帧求和
我有大量具有完全相同的键和列名的数据帧。他们的数据如下:Python 对大量数据帧求和,python,pandas,dataframe,Python,Pandas,Dataframe,我有大量具有完全相同的键和列名的数据帧。他们的数据如下: z1.ix[0] val1 [1, 5, 3, 4] val2 47 Name: 2017-01-01 01:00:00, dtype: object z2.ix[0] val1 [11, 5, 53, 5] val2 4 Name: 2017-01-01 01:00:00, dtype: object z3.ix[0] val1 [1, 25, 3, 4] val
z1.ix[0]
val1 [1, 5, 3, 4]
val2 47
Name: 2017-01-01 01:00:00, dtype: object
z2.ix[0]
val1 [11, 5, 53, 5]
val2 4
Name: 2017-01-01 01:00:00, dtype: object
z3.ix[0]
val1 [1, 25, 3, 4]
val2 7
Name: 2017-01-01 01:00:00, dtype: object
我尝试了以下方法:
summedDf = z1 + z2 + z3
summedDf.ix[0]
val1 [1, 5, 3, 4, 11, 5, 53, 5, 1, 25, 3, 4]
val2 58
Name: 2017-01-01 01:00:00, dtype: object
其中给出了以下内容:
summedDf = z1 + z2 + z3
summedDf.ix[0]
val1 [1, 5, 3, 4, 11, 5, 53, 5, 1, 25, 3, 4]
val2 58
Name: 2017-01-01 01:00:00, dtype: object
然而,我希望实现以下目标:
summedDf.ix[0]
val1 [13, 35, 59, 13]
val2 58
Name: 2017-01-01 01:00:00, dtype: object
另外,如何将上述添加扩展到大约500个数据帧
编辑:
val1
和val2
是不同的列名val1
存储列表,而val2
存储每个索引的值。可能不是最有效的,但会让您开始:
import pandas as pd
import numpy as np
# gen test data
df1 = pd.DataFrame({'val1':[[1,2,3],[4,5,6]], 'val2': [1,2]})
df1
给
val1 val2
0 [1, 2, 3] 1
1 [4, 5, 6] 2
val1 val2
0 [2, 4, 6] 2
1 [8, 10, 12] 4
另一个数据帧:
def check(x):
if isinstance(x, list):
output = [i * 2 for i in x]
else:
output = x*2
return output
df2 = df1.applymap(lambda x: check(x))
df2
给
val1 val2
0 [1, 2, 3] 1
1 [4, 5, 6] 2
val1 val2
0 [2, 4, 6] 2
1 [8, 10, 12] 4
添加数据帧:
def add_cols(df1, df2, col):
if isinstance(df1[col][0], list):
df1[col] = df1[col].apply(lambda x: np.array(x))
df2[col] = df2[col].apply(lambda x: np.array(x))
return df1[col].add(df2[col])
def add_dfs(df1, df2):
for c in df1.columns:
df1.loc[:,c] = add_cols(df1, df2, c)
return df1
# you can use a generator to read dataframes on the fly
# instead of loading all into a list
dfs = [df1, df2]
for e, df in enumerate(dfs):
if e == 0:
df_sum = df.copy()
else:
df_sum = add_dfs(df1, df2)
提供所需的输出:
val1 val2
0 [5, 10, 15] 5
1 [20, 25, 30] 10
可能不是最有效的,但会让您开始:
import pandas as pd
import numpy as np
# gen test data
df1 = pd.DataFrame({'val1':[[1,2,3],[4,5,6]], 'val2': [1,2]})
df1
给
val1 val2
0 [1, 2, 3] 1
1 [4, 5, 6] 2
val1 val2
0 [2, 4, 6] 2
1 [8, 10, 12] 4
另一个数据帧:
def check(x):
if isinstance(x, list):
output = [i * 2 for i in x]
else:
output = x*2
return output
df2 = df1.applymap(lambda x: check(x))
df2
给
val1 val2
0 [1, 2, 3] 1
1 [4, 5, 6] 2
val1 val2
0 [2, 4, 6] 2
1 [8, 10, 12] 4
添加数据帧:
def add_cols(df1, df2, col):
if isinstance(df1[col][0], list):
df1[col] = df1[col].apply(lambda x: np.array(x))
df2[col] = df2[col].apply(lambda x: np.array(x))
return df1[col].add(df2[col])
def add_dfs(df1, df2):
for c in df1.columns:
df1.loc[:,c] = add_cols(df1, df2, c)
return df1
# you can use a generator to read dataframes on the fly
# instead of loading all into a list
dfs = [df1, df2]
for e, df in enumerate(dfs):
if e == 0:
df_sum = df.copy()
else:
df_sum = add_dfs(df1, df2)
提供所需的输出:
val1 val2
0 [5, 10, 15] 5
1 [20, 25, 30] 10
我想你可以连接成一个
df
,然后沿一个轴使用df.sum。这些列表存储在一列中吗?或者对每个val1项目重复val2?请显示完整帧,而不是切片。我想您可以连接成一个df
,然后沿轴使用df.sum。这些列表是否存储在列中?或者对每个val1项目重复val2?请显示完整的框架,而不是切片。