Python 根据字符串最后一次出现的时间选择行_Python_Pandas

Python 根据字符串最后一次出现的时间选择行

python pandas

Python 根据字符串最后一次出现的时间选择行,python,pandas,Python,Pandas,我有一个熊猫数据框，看起来像这样 id desc 1 Description 1 02.09.2017 15:00 abcd 1 this is a sample description 1 which is continued here also 1 1 Description 1 01.09.2017 12:00 absd 1 this is another sample description 1 which might be c

我有一个熊猫数据框，看起来像这样

id   desc
1    Description
1    02.09.2017 15:00 abcd
1    this is a sample description
1    which is continued here also
1    
1    Description
1    01.09.2017 12:00 absd
1    this is another sample description
1    which might be continued here
1    or here
1
2    Description
2    09.03.2017 12:00 abcd
2    another sample again
2    and again
2
2    Description
2    08.03.2017 12:00 abcd
2    another sample again
2    and again times two

基本上，有一个id，行以非常非结构化的格式包含信息。我想提取最后一行“description”之后的描述，并将其存储在1行中。生成的数据帧如下所示：

id  desc
1   this is another sample description which might be continued here or here
2   another sample again and again times two

根据我的想法，我可能不得不使用groupby，但我不知道之后该怎么办。

提取最后一个

描述的位置

，并使用

str.cat

In [2840]: def lastjoin(x):
      ...:     pos = x.desc.eq('Description').cumsum().idxmax()
      ...:     return x.desc.loc[pos+2:].str.cat(sep=' ')
      ...:

In [2841]: df.groupby('id').apply(lastjoin)
Out[2841]:
id
1    this is another sample description which might...
2            another sample again and again times two
dtype: object

要使列具有索引，请使用

reset\u

In [3216]: df.groupby('id').apply(lastjoin).reset_index(name='desc')
Out[3216]:
   id                                               desc
0   1  this is another sample description which might...
1   2          another sample again and again times two

我们无法理解仅仅基于输出而连接文本的逻辑。指定连接的逻辑。@Zero修复了它。“我在输出行中出错。@Bharathshetty我想提取某个特定id的最后一个描述标记后面的所有文本。在我的情况下，文本的格式不正确，被分成了不同的行，因此我需要将其放在一行中。”。我希望这能回答你的问题谢谢！这是我一直在寻找的，但有一个小问题。我想保留这两个列标题，但这样做后它们就不在了