Python为OLS回归在DataFrame中排序列

Python为OLS回归在DataFrame中排序列,python,pandas,dataframe,linear-regression,Python,Pandas,Dataframe,Linear Regression,我有一个包含以下列的csv文件: 日期| Mkt RF | SMB | HML | RF | C | aig RF |福特RF | ibm RF | xom RF| 我试图在python中运行一个多元OLS回归,例如在“aig RF”上回归“Mkt RF”、“SMB”和“HML” 似乎我需要首先从阵列中整理数据帧,但我似乎无法理解如何: 回归 x = df[['Mkt-RF','SMB','HML']] y = df['aig-RF'] df = pd.DataFrame({'x':x, 'y'

我有一个包含以下列的csv文件:

日期| Mkt RF | SMB | HML | RF | C | aig RF |福特RF | ibm RF | xom RF|

我试图在python中运行一个多元OLS回归,例如在“aig RF”上回归“Mkt RF”、“SMB”和“HML”

似乎我需要首先从阵列中整理数据帧,但我似乎无法理解如何:

回归

x = df[['Mkt-RF','SMB','HML']]
y = df['aig-RF']
df = pd.DataFrame({'x':x, 'y':y})
df['constant'] = 1
df.head()
sm.OLS(y,df[['constant','x']]).fit().summary()
完整代码为:

将numpy作为np导入 作为pd进口熊猫 从导入数据帧 从sklearn导入线性_模型 将statsmodels.api作为sm导入

def ReadFFsIn: 目的: 读取FF数据

Inputs:
    sIn     string, name of input file

Return value:
    df      dataframe, data
"""
df= pd.read_csv(sIn, header=3, names=["Date","Mkt-RF","SMB","HML","RF"])
df= df.dropna(how='any')

# Reformat the dates, as date-time, and place them as index
vDate= pd.to_datetime(df["Date"].values,format='%Y%m%d')
df.index= vDate

# Add in a constant
iN= len(vDate)
df["C"]= np.ones(iN)

print(df)

return df
def接头库存DF、sStock、sPer: 目的: 将股票作为超额收益加入数据框

Inputs:
    df      dataframe, data including RF
    sStock  string, name of stock to read
    sPer    string, extension indicating period

Return value:
    df      dataframe, enlarged
"""
df1= pd.read_csv(sStock+"_"+sPer+".csv", index_col="Date", usecols=["Date", "Adj Close"])
df1.columns= [sStock]

# Add prices to original dataframe, to get correct dates
df= df.join(df1, how="left")

# Extract returns
vR= 100*np.diff(np.log(df[sStock].values))
# Add a missing, as one observation was lost differencing
vR= np.hstack([np.nan, vR])

# Add excess return to dataframe
df[sStock + "-RF"]= vR - df["RF"]
print(df)

return df
def SaveFFdf,助理仓库,南部: 目的: 保存FF回归的数据

Inputs:
    df      dataframe, all data
    asStock list of strings, stocks
    sOut    string, output file name

Output:
    file written to disk
"""
df= df.dropna(how='any')

asOut= ['Mkt-RF', 'SMB', 'HML', 'RF', 'C']
for sStock in asStock:
    asOut.append(sStock+"-RF")

print ("Writing columns ", asOut, "to file ", sOut)


df.to_csv(sOut, columns=asOut, index_label="Date", float_format="%.8g")

print(df)
return df
def主:

sPer= "0018"
sIn= "Research_Data_Factors_weekly.csv"
sOut= "ffstocks"
asStock= ["aig", "ford", "ibm", "xom"]

# Initialisation
df= ReadFF(sIn)
for sStock in asStock:
    df= JoinStock(df, sStock, sPer)

# Output
SaveFF(df, asStock, sOut+"_"+sPer+".csv")
print ("Done")

# Regression
x = df[['Mkt-RF','SMB','HML']]
y = df['aig-RF']
df = pd.DataFrame({'x':x, 'y':y})
df['constant'] = 1
df.head()
sm.OLS(y,df[['constant','x']]).fit().summary()

为了得到多元OLS回归表,我需要在pd.DataFrame中修改哪些内容?

我建议将您的第一段代码更改为以下内容,主要是交换行顺序:

# add constant column to the original dataframe
df['constant'] = 1

# define x as a subset of original dataframe
x = df[['Mkt-RF', 'SMB', 'HML', 'constant']]

# define y as a series
y = df['aig-RF']

# pass x as a dataframe, while pass y as a series
sm.OLS(y, x).fit().summary()

希望这有帮助。

现有代码的具体问题是什么?通读一遍,我注意到您试图将多个列['Mkt-RF'、'SMB'、'HML']分配给名称“x”,您应该能够将这些列直接传递到多元线性回归器中,而无需重命名它们。多元OLS回归有几种方法,但我在[link]中遵循此示例因此,我仍然对如何直接传递列感到困惑。似乎所有的事情都要在df=pd.DateFrame{,}中完成,但不知道如何完成。我已更改为df=DataFramey,x,但问题是在sm.OLSy,df[['constant','x']]].fit.summary中,我得到的KeyError:['x']不在索引中。我试图将1列附加到x DataFrame中我添加了修改后的最后一行到results=sm.OLSy、x.fit和printresults.summary中,但仍然得到一个包含nan@user304663好的,但至少你面临一个新问题,这是一个进步。我对你的数据一无所知,所以我想我无法有效地进一步帮助你。你确认过你所有的x变量都是数值的吗?谢谢你的代码,不管怎样,我的数据有问题!