Python 根据字符串最后一次出现的时间选择行
我有一个熊猫数据框,看起来像这样Python 根据字符串最后一次出现的时间选择行,python,pandas,Python,Pandas,我有一个熊猫数据框,看起来像这样 id desc 1 Description 1 02.09.2017 15:00 abcd 1 this is a sample description 1 which is continued here also 1 1 Description 1 01.09.2017 12:00 absd 1 this is another sample description 1 which might be c
id desc
1 Description
1 02.09.2017 15:00 abcd
1 this is a sample description
1 which is continued here also
1
1 Description
1 01.09.2017 12:00 absd
1 this is another sample description
1 which might be continued here
1 or here
1
2 Description
2 09.03.2017 12:00 abcd
2 another sample again
2 and again
2
2 Description
2 08.03.2017 12:00 abcd
2 another sample again
2 and again times two
基本上,有一个id,行以非常非结构化的格式包含信息。我想提取最后一行“description”之后的描述,并将其存储在1行中。生成的数据帧如下所示:
id desc
1 this is another sample description which might be continued here or here
2 another sample again and again times two
根据我的想法,我可能不得不使用groupby,但我不知道之后该怎么办。提取最后一个
描述的位置
,并使用str.cat
In [2840]: def lastjoin(x):
...: pos = x.desc.eq('Description').cumsum().idxmax()
...: return x.desc.loc[pos+2:].str.cat(sep=' ')
...:
In [2841]: df.groupby('id').apply(lastjoin)
Out[2841]:
id
1 this is another sample description which might...
2 another sample again and again times two
dtype: object
要使列具有索引,请使用reset\u
In [3216]: df.groupby('id').apply(lastjoin).reset_index(name='desc')
Out[3216]:
id desc
0 1 this is another sample description which might...
1 2 another sample again and again times two
我们无法理解仅仅基于输出而连接文本的逻辑。指定连接的逻辑。@Zero修复了它。“我在输出行中出错。@Bharathshetty我想提取某个特定id的最后一个描述标记后面的所有文本。在我的情况下,文本的格式不正确,被分成了不同的行,因此我需要将其放在一行中。”。我希望这能回答你的问题谢谢!这是我一直在寻找的,但有一个小问题。我想保留这两个列标题,但这样做后它们就不在了