Python 3.x 熊猫：在字符的开始和结束之间获取子字符串_Python 3.x_Regex_Pandas_Substring

Python 3.x 熊猫：在字符的开始和结束之间获取子字符串

python-3.x regex pandas

Python 3.x 熊猫：在字符的开始和结束之间获取子字符串,python-3.x,regex,pandas,substring,Python 3.x,Regex,Pandas,Substring,我试图在不同字符的开始和结束之间获得子字符串。我尝试了几种不同的正则表达式符号，我接近我需要的输出，但它不是完全正确的。我能做些什么来解决这个问题数据csv ID,TEST abc,1#London4#Harry Potter#5Rowling## cde,6#Harry Potter1#England#5Rowling efg,4#Harry Potter#5Rowling##1#USA ghi, jkm,4#Harry Potter5#Rowling xyz,4#Harry Potter1

我试图在不同字符的开始和结束之间获得子字符串。我尝试了几种不同的正则表达式符号，我接近我需要的输出，但它不是完全正确的。我能做些什么来解决这个问题

数据csv

ID,TEST
abc,1#London4#Harry Potter#5Rowling##
cde,6#Harry Potter1#England#5Rowling
efg,4#Harry Potter#5Rowling##1#USA
ghi,
jkm,4#Harry Potter5#Rowling
xyz,4#Harry Potter1#China#5Rowling

代码：

尝试：

从上述代码获得输出：它不拾取结束行“1#USA”

所需产出：

1#London
1#England
1#USA
NaN
NaN
1#China

您可以尝试：

# capture all characters that are neither `#` nor digits
# following 1#
df['TEST'].str.extract('(1#[^#\d]+)', expand=False)

输出：

0     1#London
1    1#England
2        1#USA
3          NaN
4          NaN
5      1#China
Name: TEST, dtype: object

您可以这样做：

>>> df.TEST.str.extract("(1#[a-zA-Z]*)")
           0
0   1#London
1  1#England
2      1#USA
3        NaN
4        NaN
5    1#China

如何

df['TEST'].astype（str）.str.extract（'（1#.*（？=#|$| d）））

# capture all characters that are neither `#` nor digits
# following 1#
df['TEST'].str.extract('(1#[^#\d]+)', expand=False)

0     1#London
1    1#England
2        1#USA
3          NaN
4          NaN
5      1#China
Name: TEST, dtype: object

>>> df.TEST.str.extract("(1#[a-zA-Z]*)")
           0
0   1#London
1  1#England
2      1#USA
3        NaN
4        NaN
5    1#China