Python 熊猫把绳子和带子分开_Python_Regex_Pandas

Python 熊猫把绳子和带子分开

python regex pandas

Python 熊猫把绳子和带子分开,python,regex,pandas,Python,Regex,Pandas,我有一个Pandas列包含如下字符串： (15:38) Hello, how are you? (15:39) I am fine. (15:40) That's good. 我想按时间标记分隔字符串，因此我使用了正则表达式： r'\（\d{1,2}:\d{1,2}\）我只想保留从第三次标记开始到结束的所有内容。因此，所需的输出如下所示： (15:40) That's good. 如果时间标记少于三个，就让该行为空。您可以使用（？：（？：\（\d+：\d+\）[^\（]+{2，}（\（\

我有一个Pandas列包含如下字符串：

(15:38) Hello, how are you? (15:39) I am fine. (15:40) That's good.

我想按时间标记分隔字符串，因此我使用了正则表达式：

r'\（\d{1,2}:\d{1,2}\）

我只想保留从第三次标记开始到结束的所有内容。因此，所需的输出如下所示：

(15:40) That's good.

如果时间标记少于三个，就让该行为空。

您可以使用

（？：（？：\（\d+：\d+\）[^\（]+{2，}（\（\d+：\d+\）.$）

提取模式的最后一个匹配项，以及提取

                                                text
0  (15:38) Hello, how are you? (15:39) I am fine....

df.text.str.extract(r'(?:(?:\(\d+:\d+\))[^\(]+){2,}(\(\d+:\d+\).*$)')

0    (15:40) That's good.
Name: text, dtype: object

                                                text
0  (5:40) Hello there (3:20) Goodbye (3:30) This ...
1                     (3:30) Test 2 (5:45) Last text
2                              (4:30) Foo (5:18) Bar

df.text.str.extract(r'(?:(?:\(\d+:\d+\))[^\(]+){2,}(\(\d+:\d+\).*$)').fillna('')

0    (3:30) This has 3
1
2

如果对话框中有括号，则此操作无效

示例数据帧

提取

                                                text
0  (15:38) Hello, how are you? (15:39) I am fine....

df.text.str.extract(r'(?:(?:\(\d+:\d+\))[^\(]+){2,}(\(\d+:\d+\).*$)')

0    (15:40) That's good.
Name: text, dtype: object

                                                text
0  (5:40) Hello there (3:20) Goodbye (3:30) This ...
1                     (3:30) Test 2 (5:45) Last text
2                              (4:30) Foo (5:18) Bar

df.text.str.extract(r'(?:(?:\(\d+:\d+\))[^\(]+){2,}(\(\d+:\d+\).*$)').fillna('')

0    (3:30) This has 3
1
2

当前，如果对话框的独立部分少于三个，则它将填充

NaN

，但如果愿意，可以使用

fillna

替换为空字符串

带有
fillna的示例 text 0 (15:38) Hello, how are you? (15:39) I am fine.... df.text.str.extract(r'(?:(?:$\d+:\d+$)[^$]+){2,}(\(\d+:\d+$.*$)') 0 (15:40) That's good. Name: text, dtype: object text 0 (5:40) Hello there (3:20) Goodbye (3:30) This ... 1 (3:30) Test 2 (5:45) Last text 2 (4:30) Foo (5:18) Bar df.text.str.extract(r'(?:(?:$\d+:\d+$)[^$]+){2,}(\(\d+:\d+$.*$)').fillna('') 0 (3:30) This has 3 1 2 我将正则表达式更改为（？：（？：（\d+：\d+）[^（+]）{1,2}（（\d+：\d+）.*$），因为我想要的不是最后一个模式，而是从第三个模式开始的任何模式。但是如果对话框中有括号怎么办？如果对话框中有括号，它们将被错误地视为匹配的开始。