Python 熊猫-如果满足某些标准，则聚合分组数据_Python_Pandas

Python 熊猫-如果满足某些标准，则聚合分组数据

python pandas

Python 熊猫-如果满足某些标准，则聚合分组数据,python,pandas,Python,Pandas,我想创建一个股票投资组合估值的时间序列，通过汇总该投资组合中各个股票持有量的时间序列估值数据。我的问题是，在某些日期，可能不会对给定的股票进行估值，因此在该日期进行汇总将产生错误的结果我提出的解决方案是，排除给定持股不存在估值（实际价格）数据的日期，然后在我有完整数据的这些日期进行汇总。我使用的程序如下： # Get the individual holding valuation data valuation = get_valuation(portfolio = portfolio, df

我想创建一个股票投资组合估值的时间序列，通过汇总该投资组合中各个股票持有量的时间序列估值数据。我的问题是，在某些日期，可能不会对给定的股票进行估值，因此在该日期进行汇总将产生错误的结果

我提出的解决方案是，排除给定持股不存在估值（实际价格）数据的日期，然后在我有完整数据的这些日期进行汇总。我使用的程序如下：

# Get the individual holding valuation data
valuation = get_valuation(portfolio = portfolio, df = True)

# Then next few lines retrieve the dates for which I have complete price data for the
# assets that comprise this portflio
# First get a list of the assets that this portfolio contains (or has contained).
unique_assets = valuation['asset'].unique().tolist()

# Then I get the price data for these assets
ats = get_ats(assets = unique_assets, df = True )[['data_date','close_price']]

# I mark those dates for which I have a 'close_price' for each asset:
ats = ats.groupby('data_date')['close_price'].agg({'data_complete':lambda x: len(x) == len(unique_assets)} ).reset_index()

# And extract the corresponding valid dates.
valid_dates = ats['data_date'][ats['data_complete']]

# Filter the valuation data for those dates for which I have complete data:
valuation = valuation[valuation['data_date'].apply(lambda x: x in valid_dates.values)]

# Group by date, and sum the individual hodling valuations by date, to get the Portfolio valuation
portfolio_valuation = valuation[['data_date','valuation']].groupby('data_date').agg(lambda df: sum(df['valuation'])).reset_index()

我的问题有两个：

1）上述方法让人感到相当复杂，我相信熊猫有更好的方法来实现我的解决方案。有什么建议吗

2）我使用的方法并不理想。最好的方法是，对于我们没有估值数据的日期（对于给定的持有），我们应该使用该持有的最新估值。假设我正在计算2012年6月21日的投资组合估值，并在该日有GOOG的估值数据，但APPL仅在2012年6月20日有估值数据。那么，2012年6月21日的投资组合估值仍然应该是这两次估值的总和。在熊猫身上有没有一种有效的方法？我希望避免对数据进行迭代。

似乎重采样和/或fillna的某种组合将为您找到所需的数据（意识到这有点晚了！）

像你现在做的那样去抓取你的数据。你得到的东西有一些空隙。看看这个：

import pandas as pd
import numpy as np

dates = pd.DatetimeIndex(start='2012-01-01', periods=10, freq='2D')
df = pd.DataFrame(np.random.randn(20).reshape(10,2),index=dates)

所以现在你有了这些数据，其中有很多空白，但是你需要这些每日分辨率的数据

只要做：

df.resample('1D')

这将用一堆丢失数据的NAN填充数据帧。然后，当您对它们进行聚合时，只需使用忽略NAN的函数（例如，np.nansum，np.mean）

你所得到的数据的确切格式仍然有点不清楚。希望这能有所帮助。

似乎再采样和/或fillna的某种组合会让你得到你想要的东西（意识到这有点晚了！）

像你现在做的那样去抓取你的数据。你得到的东西有一些空隙。看看这个：

import pandas as pd
import numpy as np

dates = pd.DatetimeIndex(start='2012-01-01', periods=10, freq='2D')
df = pd.DataFrame(np.random.randn(20).reshape(10,2),index=dates)

所以现在你有了这些数据，其中有很多空白，但是你需要这些每日分辨率的数据

只要做：

df.resample('1D')

这将用一堆丢失数据的NAN填充数据帧。然后，当您对它们进行聚合时，只需使用忽略NAN的函数（例如，np.nansum，np.mean）

你所得到的数据的确切格式仍然有点不清楚。希望能有帮助