Python 3.x 使用PeriodIndex列中的数据创建新列
我有一个包含州和镇名称的多索引的数据框架。这些列是通过PeriodIndex创建的季度住房数据。我想在新列中创建一个数据比率:Python 3.x 使用PeriodIndex列中的数据创建新列,python-3.x,pandas,Python 3.x,Pandas,我有一个包含州和镇名称的多索引的数据框架。这些列是通过PeriodIndex创建的季度住房数据。我想在新列中创建一个数据比率: housing_data_compact_df['P Ratio'] = housing_data_compact_df[pd.Period(anal_start_col_name)].div(housing_data_compact_df[pd.Period(anal_end_col_name)]) 每当我尝试创建此新列时,都会出现一个错误: DateParseEr
housing_data_compact_df['P Ratio'] = housing_data_compact_df[pd.Period(anal_start_col_name)].div(housing_data_compact_df[pd.Period(anal_end_col_name)])
每当我尝试创建此新列时,都会出现一个错误:
DateParseError: Unknown datetime string format, unable to parse: P Ratio
完整代码:
# Create housing cost dataframe
zillow_file = 'City_Zhvi_AllHomes.csv' #from https://www.zillow.com/research/data/
zillow_df = pd.read_csv(zillow_file,header=0,usecols=1,2,*range(51,251)],index_col=[1,0]).dropna(how='all')
# rename state abbreviations in level 0 multiindex to full state name
zillow_df.reset_index(inplace=True)
zillow_df['State'] = zillow_df['State'].map(states)
zillow_df.set_index(['State','RegionName'], inplace=True)
housing_data_df = zillow_df.groupby(pd.PeriodIndex(zillow_df.columns, freq="Q"), axis=1).mean()
rec_start = '2000Q1'
rec_bottom = '2001Q1'
#Reduce Size to desired data
start_col = housing_data_df.columns.get_loc(pd.Period(rec_start))-1
end_col = housing_data_df.columns.get_loc(pd.Period(rec_bottom))
housing_data_compact_df = housing_data_df[[start_col,end_col]]
#This is where the issue occurs
housing_data_compact_df['P Ratio'] = housing_data_compact_df[pd.Period(anal_start_col_name)].div(housing_data_compact_df[pd.Period(anal_end_col_name)])
以下是一些可能/可能没有帮助的其他数据:
[In]: print(housing_data_compact_df.head())
2000Q1 2001Q1
State RegionName
New York New York 503933.333333 465833.333333
California Los Angeles 502000.000000 413633.333333
Illinois Chicago 237966.666667 219633.333333
Pennsylvania Philadelphia 118233.333333 116166.666667
Arizona Phoenix 205300.000000 168200.000000
[In]: print("Indices: " + str(housing_data_compact_df.index.names))
Indices: ['State', 'RegionName']
[In]: print(housing_data_compact_df.columns)
PeriodIndex(['2000Q1', '2001Q1'], dtype='period[Q-DEC]', freq='Q-DEC')
我所尝试的:
我的问题似乎与PeriodIndex列有关。我尝试过通过直接转换转换数据:
[In]: housing_data_compact_df['P Ratio'] = float(housing_data_compact_df[pd.Period(start_col_name)]).div(float(housing_data_compact_df[pd.Period(end_col_name)]))
TypeError: cannot convert the series to <class 'float'>
我还重置了这些键,试图破坏PeriodIndex,然后在操作完成后重新编制索引。然而,这似乎并不适用于我测试它的所有系统,而且似乎是一种迂回的方式来修复我认为应该是简单解决方案的问题
问题:
如何创建一个新列作为这些PeriodIndex列中数据的比率
提前感谢您的帮助。您需要将周期索引
转换为字符串
并添加:
所有代码:(仅为我工作的小更改,使用您的代码(很好;))
另一种可能的解决办法是:
housing_data_compact_df = housing_data_df[[start_col,end_col]].copy()
print (housing_data_compact_df.head())
anal_start_col_name = '2016Q3'
anal_end_col_name = '2001Q1'
housing_data_compact_df.columns = housing_data_compact_df.columns.strftime('%YQ%q')
housing_data_compact_df['P Ratio'] = housing_data_compact_df[anal_start_col_name]
.div(housing_data_compact_df[anal_end_col_name])
print (housing_data_compact_df.head())
2016Q3 2001Q1 P Ratio
State RegionName
NY New York 599850.0 NaN NaN
CA Los Angeles 588750.0 233000.000000 2.526824
IL Chicago 207600.0 156933.333333 1.322855
PA Philadelphia 129950.0 55333.333333 2.348494
AZ Phoenix 197800.0 119600.000000 1.653846
非常感谢,@jezrael,这非常有效。我花了好几天的时间浏览pandas文档和stackoverflow,试图找出如何解决这个问题,但都没有用。是的,我一直认为解决方案是
df.columns=df.columns.astype(str)
,我很惊讶它不起作用。但是strftime工作得很完美。顺便说一句,在官方文件中什么都不是,我发现只有感谢这个链接,它看起来非常有用。下一次当我在熊猫文档中大惊小怪的时候,我会去你推荐的网站看看。我只是用谷歌搜索一下,我认为熊猫文档显然更好,这只是个例外。
housing_data_compact_df.columns = housing_data_compact_df.columns.strftime('%YQ%q')
zillow_file = 'http://files.zillowstatic.com/research/public/City/City_Zhvi_AllHomes.csv'
zillow_df = pd.read_csv(zillow_file,header=0,
usecols=[1,2] + list(range(51,251)), #changed for python 3
index_col=[1,0]).dropna(how='all')
# rename state abbreviations in level 0 multiindex to full state name
zillow_df.reset_index(inplace=True)
#no states in question, so commented
#zillow_df['State'] = zillow_df['State'].map(states)
zillow_df.set_index(['State','RegionName'], inplace=True)
housing_data_df=zillow_df.groupby(pd.PeriodIndex(zillow_df.columns, freq="Q"), axis=1).mean()
rec_start = '2000Q1'
rec_bottom = '2001Q1'
#Reduce Size to desired data
start_col = housing_data_df.columns.get_loc(pd.Period(rec_start))-1
end_col = housing_data_df.columns.get_loc(pd.Period(rec_bottom))
#add copy
#http://stackoverflow.com/q/42438987/2901002
housing_data_compact_df = housing_data_df[[start_col,end_col]].copy()
print (housing_data_compact_df.head())
2016Q3 2001Q1
State RegionName
NY New York 599850.0 NaN
CA Los Angeles 588750.0 233000.000000
IL Chicago 207600.0 156933.333333
PA Philadelphia 129950.0 55333.333333
AZ Phoenix 197800.0 119600.000000
anal_start_col_name = '2016Q3'
anal_end_col_name = '2001Q1'
a = housing_data_compact_df[pd.Period(anal_start_col_name)]
.div(housing_data_compact_df[pd.Period(anal_end_col_name)])
housing_data_compact_df.columns = housing_data_compact_df.columns.strftime('%YQ%q')
housing_data_compact_df['P Ratio'] = a
print (housing_data_compact_df.head())
2016Q3 2001Q1 P Ratio
State RegionName
NY New York 599850.0 NaN NaN
CA Los Angeles 588750.0 233000.000000 2.526824
IL Chicago 207600.0 156933.333333 1.322855
PA Philadelphia 129950.0 55333.333333 2.348494
AZ Phoenix 197800.0 119600.000000 1.653846
housing_data_compact_df = housing_data_df[[start_col,end_col]].copy()
print (housing_data_compact_df.head())
anal_start_col_name = '2016Q3'
anal_end_col_name = '2001Q1'
housing_data_compact_df.columns = housing_data_compact_df.columns.strftime('%YQ%q')
housing_data_compact_df['P Ratio'] = housing_data_compact_df[anal_start_col_name]
.div(housing_data_compact_df[anal_end_col_name])
print (housing_data_compact_df.head())
2016Q3 2001Q1 P Ratio
State RegionName
NY New York 599850.0 NaN NaN
CA Los Angeles 588750.0 233000.000000 2.526824
IL Chicago 207600.0 156933.333333 1.322855
PA Philadelphia 129950.0 55333.333333 2.348494
AZ Phoenix 197800.0 119600.000000 1.653846