Python 如何在使用np.where进行计算后保留数据帧/列和索引名?
我有两个不同的pd.DataFrame: dailyRtn 平均 可使用以下代码复制:Python 如何在使用np.where进行计算后保留数据帧/列和索引名?,python,pandas,numpy,Python,Pandas,Numpy,我有两个不同的pd.DataFrame: dailyRtn 平均 可使用以下代码复制: dailyRtn = pd.DataFrame([["2017-12-25", 0.069392, 0.124916, 0.119108], ["2017-12-26", 0.020000, 0.100000, 0.080000], ["2017-12-27", 1.000000,
dailyRtn = pd.DataFrame([["2017-12-25", 0.069392, 0.124916, 0.119108],
["2017-12-26", 0.020000, 0.100000, 0.080000],
["2017-12-27", 1.000000, 1.200000, 1.500000]],
columns = ["date", "A", "B", "C"])
averageofP = pd.DataFrame([["2017-12-25", 0.059392, 0.894916, 0.419108],
["2017-12-26", 0.021000, 0.100000, 0.990000],
["2017-12-27", 1.500000, 1.100000, 1.800000]],
columns = ["date", "A", "B", "C"])
我尝试使用上面的2个df进行条件计算:
downsideDev = np.where(dailyRtn < averageofP, dailyRtn, "")
然而,当我看一看我以前的专栏:a、B、C等和日期:2017-12-27等都不见了,都是简单的整数+它不再是数据帧了。如何修复此问题?IIUC,您可以使用set\u索引,然后在数据帧构造函数中包含索引和列
dailyRtn = dailyRtn.set_index('date')
averageofP = averageofP.set_index('date')
downsideDev = np.where(dailyRtn < averageofP, dailyRtn, "")
downsideDev_df = (pd.DataFrame(downsideDev, index=dailyRtn.index, columns=dailyRtn.columns)
.reset_index())
print(downsideDev_df)
你也可以在没有np.where的情况下尝试这个方法,因为np.where遗漏了你需要的信息。这种方法是熊猫特有的:一种条件选择加上fillna
预期产量是多少?请注意,numpy.where输出是ndarray或tuples,因此,如果您希望输出是一个数据帧,那么您需要从结果数组创建一个数据帧。预期输出是如上所述的数据帧。我知道输出是随机的。当我这样做时:downlessdev=pd.dataframedownlessdev,它会删除我想要保留的索引和列名。请记住,我有超过20列和许多行。@AlexanderThomsen如果这个解决方案对您有帮助,您介意吗。对不起,我不得不做些什么,全部完成:!。无法查看,因为我没有足够的声誉点数。downlessdev=np.wheredailryrtn
downsideDev = np.where(dailyRtn < averageofP, dailyRtn, "")
downsideDev = pd.DataFrame(downsideDev)
dailyRtn = dailyRtn.set_index('date')
averageofP = averageofP.set_index('date')
downsideDev = np.where(dailyRtn < averageofP, dailyRtn, "")
downsideDev_df = (pd.DataFrame(downsideDev, index=dailyRtn.index, columns=dailyRtn.columns)
.reset_index())
print(downsideDev_df)
date A B C
0 2017-12-25 0.12491600000000001 0.11910799999999999
1 2017-12-26 0.02 0.08
2 2017-12-27 1.0 1.5
dailyRtn = pd.DataFrame([["2017-12-25", 0.069392, 0.124916, 0.119100],
["2017-12-26", 0.020000, 0.100000, 0.080000],
["2017-12-27", 1.000000, 1.200000, 1.500000]],
columns = ["date", "A", "B", "C"])
averageofP = pd.DataFrame([["2017-12-25", 0.059392, 0.894916, 0.419108],
["2017-12-26", 0.021000, 0.100000, 0.990000],
["2017-12-27", 1.500000, 1.100000, 1.800000]],
columns = ["date", "A", "B", "C"])
# select value in dailyRtn with a condition
downsideDev = dailyRtn[dailyRtn < averageofP]
downsideDev.fillna("", inplace=True) # fill out nan part with ""
downsideDev["date"] = dailyRtn["date"] # add back "date" that were replaced to ""