Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/github/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 根据日期在数据框中排列数据_Python_Python 3.x_Pandas_Numpy_Dataframe - Fatal编程技术网

Python 根据日期在数据框中排列数据

Python 根据日期在数据框中排列数据,python,python-3.x,pandas,numpy,dataframe,Python,Python 3.x,Pandas,Numpy,Dataframe,鉴于表格中的数据: ID Date Highlight 1 201501 B 2 201506 C 1 201507 A 3 201508 D 2 201509 A 3 201510 B 3 201501 B 所需的输出(在数据帧中)--针对每个ID,我需要一个按发生时间顺序排列的序列: ID Highlight Sequence 1 B, A 2 C, A 3 D, B, B 本质上,我

鉴于表格中的数据:

ID  Date     Highlight
1   201501   B
2   201506   C
1   201507   A
3   201508   D
2   201509   A
3   201510   B
3   201501   B
所需的输出(在数据帧中)--针对每个ID,我需要一个按发生时间顺序排列的序列:

ID     Highlight Sequence
1      B, A
2      C, A
3      D, B, B
本质上,我打算训练一个可变长度输入-RNN,根据每个ID预测序列中的下一个字符

我想你首先需要:

然后使用参数
sort
,因为默认排序不需要:

<代码>列表用于列表列

df1 = df.groupby('ID', sort=False)['Highlight'] \
        .apply(list) \
        .reset_index(name='Highlight Sequence') \


print (df1)
   ID Highlight Sequence
0   1             [B, A]
2   2             [C, A]
1   3          [B, D, B]
<代码>连接对于
字符串
s列:

df2 =  df.groupby('ID', sort=False)['Highlight']
         .apply(','.join)
         .reset_index(name='Highlight Sequence')

print (df2)

   ID Highlight Sequence
0   1                B,A
1   2                C,A
2   3              B,D,B
但如果需要按行的位置排序(
date
列默认排序或不重要):

df2 =  df.groupby('ID', sort=False)['Highlight']
         .apply(','.join)
         .reset_index(name='Highlight Sequence')

print (df2)

   ID Highlight Sequence
0   1                B,A
1   2                C,A
2   3              B,D,B
df2 = df.groupby('ID', sort=False)['Highlight'] \
        .apply(list) \
        .reset_index(name='Highlight Sequence') 

print (df2)
   ID Highlight Sequence
0   1             [B, A]
1   2             [C, A]
2   3          [D, B, B]