Python 对数据帧行应用权重公式_Python_Pandas_Dataframe_Apply

Python 对数据帧行应用权重公式

python pandas dataframe

Python 对数据帧行应用权重公式,python,pandas,dataframe,apply,Python,Pandas,Dataframe,Apply,我在下面有一个df1。我将其复制到df2以保存df1；然后我使用df3计算df2 df2=df1.copy() 我想计算一个权重，比如weight（a）=Price（a）/Sum（row_Prices）并将其返回到df2价格下方，比如每行我得到3行数据，即Price、std和weight行。我还想计算行上的std，我想它的形式类似我试过这个 df3 = df2.iloc[1:,1:].div(df2.iloc[1:,1:].sum(axis=1), axis=0) 获取权重，然后打印df3

我在下面有一个

df1

。我将其复制到

df2

以保存

df1

；然后我使用

df3

计算

df2

df2=df1.copy()

我想计算一个权重，比如

weight（a）=Price（a）/Sum（row_Prices）

并将其返回到

df2

价格下方，比如每行我得到3行数据，即Price、std和weight行。我还想计算行上的std，我想它的形式类似

我试过这个

df3 = df2.iloc[1:,1:].div(df2.iloc[1:,1:].sum(axis=1), axis=0)

获取权重，然后打印

df3

，但它不起作用

对于每个日期获得2行，我尝试堆叠

.stack（）

，但我可能做错了。救命啊！多谢各位

abcde
2006-04-27 00:00:00                                    
2006-04-28 00:00:00  69.62  69.62  6.518   65.09  69.62
2006-05-01 00:00:00   71.5   71.5  6.522   65.16   71.5
2006-05-02 00:00:00  72.34  72.34  6.669   66.55  72.34
2006-05-03 00:00:00  70.22  70.22  6.662   66.46  70.22
2006-05-04 00:00:00  68.32  68.32  6.758   67.48  68.32
2006-05-05 00:00:00     68     68  6.805   67.99     68
2006-05-08 00:00:00  67.88  67.88  6.768   67.56  67.88

我希望它能很好地输出：

abcde
2006-04-27 00:00:00
2006-04-28 00:00:00                                    
价格69.62 69.62 6.518 65.09 69.62
重量
性病
2006-05-01 00:00:00  
价格71.5 71.5 6.522 65.16 71.5
重量
性病
2006-05-02 00:00:00   
价格72.3472.346.66966.5572.34
重量
性病

据我所知，要想实现你的目标，没有一种简单快捷的方法。您需要计算所有数据，然后将其合并到使用多级索引的

DataFrame

：

# Making weight/std DataFrames
cols = list('ABCDE')
weight = pd.DataFrame([df[col] / df.sum(axis=1) for col in df], index=cols).T
std = pd.DataFrame([df.std(axis=1) for col in df], index=cols).T

# Making MultiIndex DataFrame
mindex = pd.MultiIndex.from_product([['price', 'weight', 'std'], df.index])
new_df = pd.DataFrame(index=mindex, columns=cols)

# Inserting data
new_df.ix['price'] = df.values
new_df.ix['weight'] = weight.values
new_df.ix['std'] = std.values

# Swapping levels
new_df = new_df.swaplevel(0, 1).sort_index()

生成的

new_df

应该看起来有点像这样：

2006-04-28 price      69.62     69.62      6.518     65.09     69.62
           std      27.7829   27.7829    27.7829   27.7829   27.7829
           weight  0.248228  0.248228  0.0232397  0.232076  0.248228
2006-05-01 price       71.5      71.5      6.522     65.16      71.5
           std      28.4828   28.4828    28.4828   28.4828   28.4828
           weight  0.249841  0.249841  0.0227897  0.227687  0.249841
2006-05-02 price      72.34     72.34      6.669     66.55     72.34
           std      28.8308   28.8308    28.8308   28.8308   28.8308
           weight  0.249243  0.249243  0.0229776  0.229294  0.249243
2006-05-03 price      70.22     70.22      6.662     66.46     70.22
           std      28.0509   28.0509    28.0509   28.0509   28.0509
           weight  0.247443  0.247443  0.0234758  0.234194  0.247443
2006-05-04 price      68.32     68.32      6.758     67.48     68.32
           std      27.4399   27.4399    27.4399   27.4399   27.4399
           weight  0.244701  0.244701   0.024205  0.241692  0.244701
2006-05-05 price         68        68      6.805     67.99        68
           std      27.3661   27.3661    27.3661   27.3661   27.3661
           weight  0.243907  0.243907  0.0244086  0.243871  0.243907
2006-05-08 price      67.88     67.88      6.768     67.56     67.88
           std      27.2947   27.2947    27.2947   27.2947   27.2947
           weight  0.244201  0.244201  0.0243481   0.24305  0.244201

作为旁注，我不确定您要计算哪种std，所以我只是假设它是按行计算的价格std（每行将是一个单一/重复的值）。

是的，这只是一个输入错误。我确实得到了df1的副本。thnks thnks thnks thnks thnks thnks thnks thnks thnks thnks thnks thnks thnks thnks thnks thnks thnks thnks thnks thnks thnks thnks thnks THNK？你的索引名：

2006-04-27 23:55:00

看起来很奇怪。。。你能发布

print（df2）

和

print（df2.index）

的输出吗？是的，我只是简单地修改了索引，在顶部添加了一行和另一组值，但这与我认为的问题无关。不过，为了避免混淆，我把它改在了桌子上。感谢在发布与熊猫相关的问题时，请始终尝试使您的数据框易于重建。否则，愿意回答您的问题的人不得不浪费时间解析字符串，这不是很有趣。@GustavoBezerra谢谢！一个问题是，我在索引的第一行中有一个类型字符串作为值，所以weight和std函数不起作用，因为它们试图比较字符串和float。。您知道如何筛选只存在int或float的行吗？对于权重公式，我得到以下结果：TypeError:无法将类型“Timestamp”与类型“str”进行比较，请确保所有索引都由

Timestamp

类型组成。对索引应用

pd.to\u datetime

就足够了。