Python 3.x 使用PeriodIndex列中的数据创建新列_Python 3.x_Pandas

Python 3.x 使用PeriodIndex列中的数据创建新列

python-3.x pandas

Python 3.x 使用PeriodIndex列中的数据创建新列,python-3.x,pandas,Python 3.x,Pandas,我有一个包含州和镇名称的多索引的数据框架。这些列是通过PeriodIndex创建的季度住房数据。我想在新列中创建一个数据比率： housing_data_compact_df['P Ratio'] = housing_data_compact_df[pd.Period(anal_start_col_name)].div(housing_data_compact_df[pd.Period(anal_end_col_name)]) 每当我尝试创建此新列时，都会出现一个错误： DateParseEr

我有一个包含州和镇名称的多索引的数据框架。这些列是通过PeriodIndex创建的季度住房数据。我想在新列中创建一个数据比率：

housing_data_compact_df['P Ratio'] = housing_data_compact_df[pd.Period(anal_start_col_name)].div(housing_data_compact_df[pd.Period(anal_end_col_name)])

每当我尝试创建此新列时，都会出现一个错误：

DateParseError: Unknown datetime string format, unable to parse: P Ratio

完整代码：

# Create housing cost dataframe
zillow_file = 'City_Zhvi_AllHomes.csv'    #from https://www.zillow.com/research/data/
zillow_df = pd.read_csv(zillow_file,header=0,usecols=1,2,*range(51,251)],index_col=[1,0]).dropna(how='all')

# rename state abbreviations in level 0 multiindex to full state name
zillow_df.reset_index(inplace=True)
zillow_df['State'] = zillow_df['State'].map(states)
zillow_df.set_index(['State','RegionName'], inplace=True)

housing_data_df = zillow_df.groupby(pd.PeriodIndex(zillow_df.columns, freq="Q"), axis=1).mean()


rec_start = '2000Q1'
rec_bottom = '2001Q1'

#Reduce Size to desired data
start_col = housing_data_df.columns.get_loc(pd.Period(rec_start))-1
end_col = housing_data_df.columns.get_loc(pd.Period(rec_bottom))

housing_data_compact_df = housing_data_df[[start_col,end_col]]

#This is where the issue occurs
housing_data_compact_df['P Ratio'] = housing_data_compact_df[pd.Period(anal_start_col_name)].div(housing_data_compact_df[pd.Period(anal_end_col_name)])

以下是一些可能/可能没有帮助的其他数据：

[In]: print(housing_data_compact_df.head())

                                  2000Q1         2001Q1
State        RegionName                                
New York     New York      503933.333333  465833.333333
California   Los Angeles   502000.000000  413633.333333
Illinois     Chicago       237966.666667  219633.333333
Pennsylvania Philadelphia  118233.333333  116166.666667
Arizona      Phoenix       205300.000000  168200.000000



[In]: print("Indices: " + str(housing_data_compact_df.index.names))
Indices: ['State', 'RegionName']


[In]: print(housing_data_compact_df.columns)
PeriodIndex(['2000Q1', '2001Q1'], dtype='period[Q-DEC]', freq='Q-DEC')

我所尝试的：

我的问题似乎与PeriodIndex列有关。我尝试过通过直接转换转换数据：

[In]: housing_data_compact_df['P Ratio'] = float(housing_data_compact_df[pd.Period(start_col_name)]).div(float(housing_data_compact_df[pd.Period(end_col_name)]))

TypeError: cannot convert the series to <class 'float'>

我还重置了这些键，试图破坏PeriodIndex，然后在操作完成后重新编制索引。然而，这似乎并不适用于我测试它的所有系统，而且似乎是一种迂回的方式来修复我认为应该是简单解决方案的问题

问题:

如何创建一个新列作为这些PeriodIndex列中数据的比率

提前感谢您的帮助。

您需要将

周期索引

转换为

字符串

并添加：

所有代码：（仅为我工作的小更改，使用您的代码（很好；））

另一种可能的解决办法是：

housing_data_compact_df = housing_data_df[[start_col,end_col]].copy()
print (housing_data_compact_df.head())

anal_start_col_name = '2016Q3'
anal_end_col_name = '2001Q1'

housing_data_compact_df.columns = housing_data_compact_df.columns.strftime('%YQ%q')
housing_data_compact_df['P Ratio'] = housing_data_compact_df[anal_start_col_name]
                                       .div(housing_data_compact_df[anal_end_col_name])

print (housing_data_compact_df.head())
                      2016Q3         2001Q1   P Ratio
State RegionName                                     
NY    New York      599850.0            NaN       NaN
CA    Los Angeles   588750.0  233000.000000  2.526824
IL    Chicago       207600.0  156933.333333  1.322855
PA    Philadelphia  129950.0   55333.333333  2.348494
AZ    Phoenix       197800.0  119600.000000  1.653846

非常感谢，@jezrael，这非常有效。我花了好几天的时间浏览pandas文档和stackoverflow，试图找出如何解决这个问题，但都没有用。是的，我一直认为解决方案是

df.columns=df.columns.astype（str）

，我很惊讶它不起作用。但是strftime工作得很完美。顺便说一句，在官方文件中什么都不是，我发现只有感谢这个链接，它看起来非常有用。下一次当我在熊猫文档中大惊小怪的时候，我会去你推荐的网站看看。我只是用谷歌搜索一下，我认为熊猫文档显然更好，这只是个例外。

housing_data_compact_df.columns = housing_data_compact_df.columns.strftime('%YQ%q')

zillow_file = 'http://files.zillowstatic.com/research/public/City/City_Zhvi_AllHomes.csv'
zillow_df = pd.read_csv(zillow_file,header=0,
                        usecols=[1,2] + list(range(51,251)), #changed for python 3
                        index_col=[1,0]).dropna(how='all')

# rename state abbreviations in level 0 multiindex to full state name
zillow_df.reset_index(inplace=True)
#no states in question, so commented
#zillow_df['State'] = zillow_df['State'].map(states)
zillow_df.set_index(['State','RegionName'], inplace=True)

housing_data_df=zillow_df.groupby(pd.PeriodIndex(zillow_df.columns, freq="Q"), axis=1).mean()

rec_start = '2000Q1'
rec_bottom = '2001Q1'

#Reduce Size to desired data
start_col = housing_data_df.columns.get_loc(pd.Period(rec_start))-1
end_col = housing_data_df.columns.get_loc(pd.Period(rec_bottom))

#add copy
#http://stackoverflow.com/q/42438987/2901002
housing_data_compact_df = housing_data_df[[start_col,end_col]].copy()
print (housing_data_compact_df.head())
                      2016Q3         2001Q1
State RegionName                           
NY    New York      599850.0            NaN
CA    Los Angeles   588750.0  233000.000000
IL    Chicago       207600.0  156933.333333
PA    Philadelphia  129950.0   55333.333333
AZ    Phoenix       197800.0  119600.000000

anal_start_col_name = '2016Q3'
anal_end_col_name = '2001Q1'

a = housing_data_compact_df[pd.Period(anal_start_col_name)]
                              .div(housing_data_compact_df[pd.Period(anal_end_col_name)])
housing_data_compact_df.columns = housing_data_compact_df.columns.strftime('%YQ%q')
housing_data_compact_df['P Ratio'] = a
print (housing_data_compact_df.head())
                      2016Q3         2001Q1   P Ratio
State RegionName                                     
NY    New York      599850.0            NaN       NaN
CA    Los Angeles   588750.0  233000.000000  2.526824
IL    Chicago       207600.0  156933.333333  1.322855
PA    Philadelphia  129950.0   55333.333333  2.348494
AZ    Phoenix       197800.0  119600.000000  1.653846

housing_data_compact_df = housing_data_df[[start_col,end_col]].copy()
print (housing_data_compact_df.head())

anal_start_col_name = '2016Q3'
anal_end_col_name = '2001Q1'

housing_data_compact_df.columns = housing_data_compact_df.columns.strftime('%YQ%q')
housing_data_compact_df['P Ratio'] = housing_data_compact_df[anal_start_col_name]
                                       .div(housing_data_compact_df[anal_end_col_name])

print (housing_data_compact_df.head())
                      2016Q3         2001Q1   P Ratio
State RegionName                                     
NY    New York      599850.0            NaN       NaN
CA    Los Angeles   588750.0  233000.000000  2.526824
IL    Chicago       207600.0  156933.333333  1.322855
PA    Philadelphia  129950.0   55333.333333  2.348494
AZ    Phoenix       197800.0  119600.000000  1.653846