Pandas 在具有唯一名称的迭代中创建两个字典变量
我有两组不同的数据帧。 一个是面板,它的Pandas 在具有唯一名称的迭代中创建两个字典变量,pandas,dictionary,dataframe,iteration,naming-conventions,Pandas,Dictionary,Dataframe,Iteration,Naming Conventions,我有两组不同的数据帧。 一个是面板,它的项由股票表示 以下是获取面板的代码(用于再现性) 导致输出: Dimensions: 2 (items) x 1682 (major_axis) x 6 (minor_axis) Items axis: AAPL to OPK Major_axis axis: 2010-01-04 00:00:00 to 2016-09-07 00:00:00 Minor_axis axis: Open to Adj Close 面板的单个数据帧如下所示 stocks[
项由股票表示
以下是获取面板的代码(用于再现性)
导致输出:
Dimensions: 2 (items) x 1682 (major_axis) x 6 (minor_axis)
Items axis: AAPL to OPK
Major_axis axis: 2010-01-04 00:00:00 to 2016-09-07 00:00:00
Minor_axis axis: Open to Adj Close
面板的单个数据帧如下所示
stocks['OPK']
Open High Low Close Volume Adj Close log_return \
Date
2010-01-04 1.80 1.97 1.76 1.95 234500.0 1.95 NaN
2010-01-05 1.64 1.95 1.64 1.93 135800.0 1.93 -0.010309
2010-01-06 1.90 1.92 1.77 1.79 546600.0 1.79 -0.075304
2010-01-07 1.79 1.94 1.76 1.92 138700.0 1.92 0.070110
2010-01-08 1.92 1.94 1.86 1.89 62500.0 1.89 -0.015748
def slicing (stock, sliced_data, num_of_days):
# stocks = list of stock tickers I'm interesting in exploring
#sliced_data = the high_volume dict I created
#num_of_days = this represents the X days (the size of each mini-dataframe)
time_delta = dt.timedelta(days =num_of_days)
for i in stock: # stock name
vars()['mini_dfs' + i] ={} #dynamically creating a dictionary for that stock
print (vars()['mini_dfs' + i]) # to make sure dictionary was created
for date in sliced_data[i].index: #taking each date of High_volume df
start_date = date
end_date = date + time_delta
vars()['mini_dfs' + i][date] =stocks[i].loc[start_date:end_date] #
#filling the empty dictionary with dataframes (dates are keys, values are dataframes)
return vars()['mini_dfs' + i] #returning the dictionary before creating the new dictionary
然后,我通过以下代码添加了两个自定义列:
for i in stocks:
stocks[i]['log_return'] = np.log(stocks[i]['Close']/(stocks[i]['Close'].shift(1)))
stocks[i]['30_Avg_Vol'] = stocks[i] ['Volume'].rolling(min_periods =15, window=30).mean()
然后,为了只拼接出音量较大的行,我通过以下代码创建了一个数据帧字典(每个键都是股票,每个值都是拼接的数据帧)
High_volume ={}
for i in stocks.items: #stocks is a panel, the items are the stocks tickers
print (i)
High_volume[i] =stocks[i][stocks[i].Volume > 1.5* stocks[i]['30_Avg_Vol']]
所以我有一个拼接数据帧字典,我可以通过股票行情器访问每个数据帧
High_volume['OPK']
High_volume['AAPL']
现在,对于这些数据帧的每一行中的每一个日期(索引是datetime对象),我想创建一组迷你数据帧
因此,对于高容量['AAPL']
中的所有日期,我想为每个日期创建一个迷你数据框。对于高容量['OPK']中的所有日期,我想创建一组迷你数据帧。所以在本例中,我想创建两个包含迷你数据帧的字典
High_volume['OPK']看起来像这样,对于每个日期,我都要创建一个迷你数据框
Open High Low Close Volume Adj Close \
Date
2010-02-11 1.710000 2.200000 1.710000 1.940000 2212300.0 1.940000
2010-02-12 1.940000 2.100000 1.940000 2.030000 739500.0 2.030000
2010-03-19 2.030000 2.050000 1.950000 2.030000 611800.0 2.030000
2010-04-12 2.060000 2.210000 2.040000 2.160000 647100.0 2.160000
2010-04-13 2.210000 2.450000 2.160000 2.320000 823200.0 2.320000
每个迷你数据帧将有大约X
天的信息。开始日期为拼接的行,结束日期约为X
天后。为了获取其他日期的X
数据,我正在拼接原始面板(stocks
),其中包含所有股票数据
然而,由于我要处理许多股票,我必须在一次迭代中创建许多字典(在这种情况下是两个,OPK
和AAPL
),因此我需要动态命名字典
这样做的函数看起来像这样
stocks['OPK']
Open High Low Close Volume Adj Close log_return \
Date
2010-01-04 1.80 1.97 1.76 1.95 234500.0 1.95 NaN
2010-01-05 1.64 1.95 1.64 1.93 135800.0 1.93 -0.010309
2010-01-06 1.90 1.92 1.77 1.79 546600.0 1.79 -0.075304
2010-01-07 1.79 1.94 1.76 1.92 138700.0 1.92 0.070110
2010-01-08 1.92 1.94 1.86 1.89 62500.0 1.89 -0.015748
def slicing (stock, sliced_data, num_of_days):
# stocks = list of stock tickers I'm interesting in exploring
#sliced_data = the high_volume dict I created
#num_of_days = this represents the X days (the size of each mini-dataframe)
time_delta = dt.timedelta(days =num_of_days)
for i in stock: # stock name
vars()['mini_dfs' + i] ={} #dynamically creating a dictionary for that stock
print (vars()['mini_dfs' + i]) # to make sure dictionary was created
for date in sliced_data[i].index: #taking each date of High_volume df
start_date = date
end_date = date + time_delta
vars()['mini_dfs' + i][date] =stocks[i].loc[start_date:end_date] #
#filling the empty dictionary with dataframes (dates are keys, values are dataframes)
return vars()['mini_dfs' + i] #returning the dictionary before creating the new dictionary
当我得到两个股票的一组mini_数据帧的输出时,该函数似乎正在正确执行。然而,它并没有被保存到两个变量中。
所有这些都保存到一个变量中。
记住,在本例中,我处理的是两种股票,所以我希望创建两个字典
x=slicing(['AAPL','OPK'], High_volume , 1) # This works
但是,
x,y =slicing(['AAPL','OPK'], High_volume , 1)
ValueError: too many values to unpack (expected 2)
在这种情况下,如何让函数输出两个字典(或者每个股票一个字典,我希望它进行分析)
谢谢。问题是
return
只提供一个值——创建的最后一个字典。您可以使用yield
生成一系列字典,如下所示:
def slicing(stock, sliced_data, num_of_days):
time_delta = dt.timedelta(days =num_of_days)
for i in stock: # stock name
vars()['mini_dfs' + i] = {}
for date in sliced_data[i].index:
start_date = date
end_date = date + time_delta
vars()['mini_dfs' + i][date] = stocks[i].loc[start_date:end_date]
yield vars()['mini_dfs' + i]
my_list = [i for i in slicing(['AAPL','OPK'], High_volume, 1)]
然后你可以有一个这样的字典列表:
def slicing(stock, sliced_data, num_of_days):
time_delta = dt.timedelta(days =num_of_days)
for i in stock: # stock name
vars()['mini_dfs' + i] = {}
for date in sliced_data[i].index:
start_date = date
end_date = date + time_delta
vars()['mini_dfs' + i][date] = stocks[i].loc[start_date:end_date]
yield vars()['mini_dfs' + i]
my_list = [i for i in slicing(['AAPL','OPK'], High_volume, 1)]
啊!!关于
返回
,我不知道。我要玩弄收益率,看看我是否遇到任何问题。谢谢,干杯,伙计。很高兴为您编码。似乎很有效。非常感谢你。我现在就得读一读关于收益率的书。不客气。这绝对是值得学习的。