Python Pandas：使用iterrows（）和pd.Series将值附加到序列中_Python_Pandas

Python Pandas：使用iterrows（）和pd.Series将值附加到序列中

python pandas

Python Pandas：使用iterrows（）和pd.Series将值附加到序列中,python,pandas,Python,Pandas,我的输入数据如下所示： cat start target 0 1 2016-09-01 00:00:00 4.370279 1 1 2016-09-01 00:00:00 1.367778 2 1 2016-09-01 00:00:00 0.385834 2016-09-01 00:00:00 4.370279 2016-09-01 01:00:00 1.367778 2016-09-01 02:00:00 0.38

我的输入数据如下所示：

   cat  start               target
0   1   2016-09-01 00:00:00 4.370279
1   1   2016-09-01 00:00:00 1.367778
2   1   2016-09-01 00:00:00 0.385834

2016-09-01 00:00:00    4.370279
2016-09-01 01:00:00    1.367778
2016-09-01 02:00:00    0.385834

2016-09-01 00:00:00    4.370279
2016-09-01 01:00:00    4.370279
2016-09-01 02:00:00    4.370279

time_series = (df.set_index(pd.date_range(pd.to_datetime(df.start).iloc[0],
                                        periods = len(df), freq='H')))['target']


>>> time_series
2016-09-01 00:00:00    4.370279
2016-09-01 01:00:00    1.367778
2016-09-01 02:00:00    0.385834
Freq: H, Name: target, dtype: float64
>>> type(time_series)
<class 'pandas.core.series.Series'>

我想构建一个系列，使用“开始”作为开始日期，使用“目标”作为系列值。iterrows（）正在为“imp”提取正确的值，但是当附加到时间序列时，只有第一个值会传递到所有序列点。“data=imp”每次拉第0行的原因是什么

t0 = model_input_test['start'][0] # t0 = 2016-09-01 00:00:00
num_ts = len(model_input_test.index) # num_ts = 1348
time_series = []
for i, row in model_input_test.iterrows():
    imp = row.loc['target']
    print(imp)
    index = pd.DatetimeIndex(start=t0, freq='H', periods=num_ts)
    time_series.append(pd.Series(data=imp, index=index))

系列“时间系列”应如下所示：

   cat  start               target
0   1   2016-09-01 00:00:00 4.370279
1   1   2016-09-01 00:00:00 1.367778
2   1   2016-09-01 00:00:00 0.385834

2016-09-01 00:00:00    4.370279
2016-09-01 01:00:00    1.367778
2016-09-01 02:00:00    0.385834

2016-09-01 00:00:00    4.370279
2016-09-01 01:00:00    4.370279
2016-09-01 02:00:00    4.370279

time_series = (df.set_index(pd.date_range(pd.to_datetime(df.start).iloc[0],
                                        periods = len(df), freq='H')))['target']


>>> time_series
2016-09-01 00:00:00    4.370279
2016-09-01 01:00:00    1.367778
2016-09-01 02:00:00    0.385834
Freq: H, Name: target, dtype: float64
>>> type(time_series)
<class 'pandas.core.series.Series'>

但最终看起来是这样的：

   cat  start               target
0   1   2016-09-01 00:00:00 4.370279
1   1   2016-09-01 00:00:00 1.367778
2   1   2016-09-01 00:00:00 0.385834

2016-09-01 00:00:00    4.370279
2016-09-01 01:00:00    1.367778
2016-09-01 02:00:00    0.385834

2016-09-01 00:00:00    4.370279
2016-09-01 01:00:00    4.370279
2016-09-01 02:00:00    4.370279

time_series = (df.set_index(pd.date_range(pd.to_datetime(df.start).iloc[0],
                                        periods = len(df), freq='H')))['target']


>>> time_series
2016-09-01 00:00:00    4.370279
2016-09-01 01:00:00    1.367778
2016-09-01 02:00:00    0.385834
Freq: H, Name: target, dtype: float64
>>> type(time_series)
<class 'pandas.core.series.Series'>

我正在Sagemaker上使用Jupyter conda_python3。

使用数据帧时，通常有更好的方法来执行任务，然后遍历数据帧。例如，在您的情况下，您可以创建如下系列：

   cat  start               target
0   1   2016-09-01 00:00:00 4.370279
1   1   2016-09-01 00:00:00 1.367778
2   1   2016-09-01 00:00:00 0.385834

2016-09-01 00:00:00    4.370279
2016-09-01 01:00:00    1.367778
2016-09-01 02:00:00    0.385834

2016-09-01 00:00:00    4.370279
2016-09-01 01:00:00    4.370279
2016-09-01 02:00:00    4.370279

time_series = (df.set_index(pd.date_range(pd.to_datetime(df.start).iloc[0],
                                        periods = len(df), freq='H')))['target']


>>> time_series
2016-09-01 00:00:00    4.370279
2016-09-01 01:00:00    1.367778
2016-09-01 02:00:00    0.385834
Freq: H, Name: target, dtype: float64
>>> type(time_series)
<class 'pandas.core.series.Series'>

time\u series=（df.set\u index（pd.date）范围（pd.to\u datetime（df.start）.iloc[0]，
句点=len（df，freq='H'））['target']
>>>时间序列
2016-09-01 00:00:00    4.370279
2016-09-01 01:00:00    1.367778
2016-09-01 02:00:00    0.385834
Freq:H，名称：target，数据类型：float64
>>>类型（时间序列）

本质上，这表示：“将索引设置为从第一个日期开始每小时递增一次的日期范围，然后取

target

列”

给定一个数据帧

df

和序列

start

和

target

，您只需使用

set\u index

：

time_series = df.set_index('start')['target']

您正在使用变量索引进行循环，然后创建datetimeindex，这似乎是一个问题。请注意：

time\u series

不是

pd.series

，而是

pd.series

实例的

列表。编辑：您需要在行上迭代吗？您是否考虑过类似于pd.Series（数据=模型输入测试['target]，索引=索引）

？谢谢Sacul-这是一个非常有效的解决方案！