Python 如何在for循环中填充数据帧列
我试图在for循环中填充Python 如何在for循环中填充数据帧列,python,pandas,loops,numpy,Python,Pandas,Loops,Numpy,我试图在for循环中填充pandasdataframe列。列名是参数化的,并由循环值指定。这是我的代码: for k in range (-1, -4, -1): df_orj = pd.read_csv('something.csv', sep= '\t') df_train = df_orj.head(11900) df_test = df_orj.tail(720) SHIFT = k df_train.trend = df_trai
pandas
dataframe列。列名是参数化的,并由循环值指定。这是我的代码:
for k in range (-1, -4, -1):
df_orj = pd.read_csv('something.csv', sep= '\t')
df_train = df_orj.head(11900)
df_test = df_orj.tail(720)
SHIFT = k
df_train.trend = df_train.trend.shift(SHIFT)
df_train = df_train.dropna()
df_test.trend = df_test.trend.shift(SHIFT)
df_test = df_test.dropna()
drop_list = some_list
df_out = df_test[['date','price']]
df_out.index = np.arange(0, len(df_out)) # start index from 0
df_out["pred-1"] = np.nan
df_out["pred-2"] = np.nan
df_out["pred-3"] = np.nan
df_train.drop(drop_list, 1, inplace = True )
df_test.drop(drop_list, 1, inplace = True )
# some processes here
rf = RandomForestClassifier(n_estimators = 10)
rf.fit(X_train,y_train)
y_pred = rf.predict(X_test)
print("accuracy score: " , rf.score(X_test, y_test))
X_test2 = sc.transform(df_test.drop('trend', axis=1))
y_test2 = df_test['trend'].values
y_pred2 = rf.predict(X_test2)
print("accuracy score: ",rf.score(X_test2, y_test2))
name = "pred{0}".format(k)
for i in range (0, y_test2.size):
df_out[name][i] = y_pred2[i]
df_out.head(20)
这是我的输出:
time_period_start price_open pred-1 pred-2 pred-3
697 2018-10-02T02:00:00.0000000Z 86.80 NaN NaN 1.0
698 2018-10-02T03:00:00.0000000Z 86.65 NaN NaN 1.0
699 2018-10-02T04:00:00.0000000Z 86.32 NaN NaN 1.0
如您所见,只填充了
pred-3
。如何填写所有3个预定义列?如果我理解正确,那么您的问题是您将获得pred-3
仅在与其他两个相同的位置填充。
这是因为你的df_out在循环中,你得到了最后的结果
循环的迭代。
您应该在循环外部定义它,这样您的信息不会丢失
另外两个。如果我理解正确,那么你的问题是你得到了pred-3 仅在与其他两个相同的位置填充。 这是因为你的df_out在循环中,你得到了最后的结果 循环的迭代。 您应该在循环之外定义它,这样您的信息就不会因为循环而丢失
另外两个。在每个循环中将这3列设置为null,因此在循环迭代时会丢失这些值。将这些初始化列移动到循环之前,或者可以使用以下变量初始化: 更换
df_out["pred-1"] = np.nan
df_out["pred-2"] = np.nan
df_out["pred-3"] = np.nan
在循环时仅初始化单个列
name = "pred{0}".format(k)
df_out[name] = np.nan
因此,完整代码:
for k in range (-1, -4, -1):
df_orj = pd.read_csv('something.csv', sep= '\t')
df_train = df_orj.head(11900)
df_test = df_orj.tail(720)
SHIFT = k
df_train.trend = df_train.trend.shift(SHIFT)
df_train = df_train.dropna()
df_test.trend = df_test.trend.shift(SHIFT)
df_test = df_test.dropna()
drop_list = some_list
df_out = df_test[['date','price']]
df_out.index = np.arange(0, len(df_out)) # start index from 0
name = "pred{0}".format(k)
df_out[name] = np.nan
df_train.drop(drop_list, 1, inplace = True )
df_test.drop(drop_list, 1, inplace = True )
# some processes here
rf = RandomForestClassifier(n_estimators = 10)
rf.fit(X_train,y_train)
y_pred = rf.predict(X_test)
print("accuracy score: " , rf.score(X_test, y_test))
X_test2 = sc.transform(df_test.drop('trend', axis=1))
y_test2 = df_test['trend'].values
y_pred2 = rf.predict(X_test2)
print("accuracy score: ",rf.score(X_test2, y_test2))
for i in range (0, y_test2.size):
df_out[name][i] = y_pred2[i]
df_out.head(20)
在每个循环中将这3列设置为null,因此在循环迭代时会丢失这些值。将这些初始化列移动到循环之前,或者可以使用以下变量初始化: 更换
df_out["pred-1"] = np.nan
df_out["pred-2"] = np.nan
df_out["pred-3"] = np.nan
在循环时仅初始化单个列
name = "pred{0}".format(k)
df_out[name] = np.nan
因此,完整代码:
for k in range (-1, -4, -1):
df_orj = pd.read_csv('something.csv', sep= '\t')
df_train = df_orj.head(11900)
df_test = df_orj.tail(720)
SHIFT = k
df_train.trend = df_train.trend.shift(SHIFT)
df_train = df_train.dropna()
df_test.trend = df_test.trend.shift(SHIFT)
df_test = df_test.dropna()
drop_list = some_list
df_out = df_test[['date','price']]
df_out.index = np.arange(0, len(df_out)) # start index from 0
name = "pred{0}".format(k)
df_out[name] = np.nan
df_train.drop(drop_list, 1, inplace = True )
df_test.drop(drop_list, 1, inplace = True )
# some processes here
rf = RandomForestClassifier(n_estimators = 10)
rf.fit(X_train,y_train)
y_pred = rf.predict(X_test)
print("accuracy score: " , rf.score(X_test, y_test))
X_test2 = sc.transform(df_test.drop('trend', axis=1))
y_test2 = df_test['trend'].values
y_pred2 = rf.predict(X_test2)
print("accuracy score: ",rf.score(X_test2, y_test2))
for i in range (0, y_test2.size):
df_out[name][i] = y_pred2[i]
df_out.head(20)
您正在初始化这些列,使其在for循环中变为null。将df_移出[“pred-1”]=np.nan到for之前loop@chitown88哦,我真傻。因为我重新初始化了列,所以我丢失了前两列信息。你能把正确的代码作为答案贴出来让我接受吗?是的。别担心。容易的大脑放屁…总是发生。我想我们都有这样或那样的经历。我可以保证我将来也会犯同样的错误。你正在初始化那些列,使它们在for循环中变为null。将df_移出[“pred-1”]=np.nan到for之前loop@chitown88哦,我真傻。因为我重新初始化了列,所以我丢失了前两列信息。你能把正确的代码作为答案贴出来让我接受吗?是的。别担心。容易的大脑放屁…总是发生。我想我们都有这样或那样的经历。我可以保证我将来也会犯同样的错误。我把列移出了循环。谢谢你,我把柱子从循环中移出了。谢谢。谢谢,你的答案和@chitown88的一样。谢谢,你的答案和@chitown88的一样。