Python Pandas Regex：将名称与以单词或字符串开头、以特定单词结尾的字符串分开_Python_Regex_Pandas

Python Pandas Regex：将名称与以单词或字符串开头、以特定单词结尾的字符串分开

python regex pandas

Python Pandas Regex：将名称与以单词或字符串开头、以特定单词结尾的字符串分开,python,regex,pandas,Python,Regex,Pandas,我有一个pandas系列，其中包含多行共享名和其他详细信息： Netflix DIVIDEND Apple Inc (All Sessions) COMM Intel Corporation CONS Correction Netflix Section 31 Fee 我正在尝试使用正则表达式检索股票名称，这是我通过以下前瞻性操作完成的： transactions_df["Share Name"] = transactions_df["MarketName&quo

我有一个pandas系列，其中包含多行共享名和其他详细信息：

Netflix DIVIDEND
Apple Inc (All Sessions) COMM
Intel Corporation CONS
Correction Netflix Section 31 Fee

我正在尝试使用正则表达式检索股票名称，这是我通过以下前瞻性操作完成的：

transactions_df["Share Name"] = transactions_df["MarketName"].str.extract(r"(^.*?(?=DIVIDEND|\(All|CONS|COMM|Section))")

我唯一遇到麻烦的是行

Correction Netflix Section 31 Fee

，其中我的正则表达式的共享名是

Correction Netflix

。我不想要“更正”这个词

我需要正则表达式来检查字符串的开头或单词“Correction”

我尝试了一些方法，例如以字符串开头的OR

。我还试着查看后面是否有

或

更正

，但错误表明它们的长度必须是常量

r"((^|Correction ).*?(?=DIVIDEND|\(All|CONS|COMM|Section))"

给出一个错误<代码>值错误：传递的项目数错误2，放置意味着1。我不熟悉regex，所以我真的不知道这意味着什么。

您可以使用可选部分，在中，而不是使用具有匹配项的捕获组：

^(?:Correction\s*)?(\S.*?)\s*(?:\([^()]*\)|DIVIDEND|All|CONS|COMM|Section)

```
^
```
字符串的开头
```
（？：更正\s*）？
```
```
（\S.*）\S*
```
在组1中捕获，匹配非空白字符和尽可能少的字符，并匹配（不捕获）0+空白字符

（？：

备选方案的非捕获组

```
\（[^（）]*\）
```
匹配从
```
（
```
到
```
）
```
```
|
```
或
```
divident | All | CONS | COMM |部分
```
匹配任何单词

```
）
```
关闭组

输出

0                   Netflix DIVIDEND            Netflix
1      Apple Inc (All Sessions) COMM          Apple Inc
2             Intel Corporation CONS  Intel Corporation
3  Correction Netflix Section 31 Fee            Netflix

谢谢，这很有效！正则表达式令人困惑，我必须了解更多关于捕获组的信息。

0                   Netflix DIVIDEND            Netflix
1      Apple Inc (All Sessions) COMM          Apple Inc
2             Intel Corporation CONS  Intel Corporation
3  Correction Netflix Section 31 Fee            Netflix