Python 在Pandas中拆分忽略大小写的字符串_Python_Regex_Pandas_Split_Series

Python 在Pandas中拆分忽略大小写的字符串

python regex pandas

Python 在Pandas中拆分忽略大小写的字符串,python,regex,pandas,split,series,Python,Regex,Pandas,Split,Series,我需要做的是： df[col].str.split（my_regexp，re.IGNORECASE，expand=True）但是，pandasDataFrame.str.split方法不能添加regexp标志因为我需要扩展结果，所以我不能像这样做 df.apply（lambda x:re.split（my_regexp，x[col]，flags=re.IGNORECASE），axis=1，result='expand'）因为列表的长度不同我需要的是一种方法，使re.split返回相同

我需要做的是：

df[col].str.split（my_regexp，re.IGNORECASE，expand=True）

但是，pandas

DataFrame.str.split

方法不能添加regexp标志

因为我需要扩展结果，所以我不能像这样做

df.apply（lambda x:re.split（my_regexp，x[col]，flags=re.IGNORECASE），axis=1，result='expand'）

因为列表的长度不同

我需要的是一种方法，使

re.split

返回相同长度的所有列表，或者在

Series.str.split

方法中传递

re.IGNORECASE

。或者是更好的方法

谢谢大家

编辑：这里有一些数据可以更好地解释

系列=pd.系列([
“第一部分foo第二部分foo第三部分”，
“test1 FoO test2”，
‘hi1巴HI2’，
“这是一个测试”，
“第一小节第二小节第三小节”，
“决赛”
])

应该返回regexp

r'foo|bar'


0               1               2
0第一部分第二部分第三部分
1测试1测试2无
2 hi1 HI2无
3这是一个测试-无
4第一第二第三
5最终无

方法1：如果需要保留小写/大写：输出

                0              1            2
0     First paRt    second part    third part
1          test1           test2         None
2            hi1             HI2         None
3  This is a Test           None         None
4          first         second         third
5           final           None         None

                0              1            2
0     first part    second part    third part
1          test1           test2         None
2            hi1             hi2         None
3  this is a test           None         None
4          first         second         third
5           final           None         None

                0            1           2
0      first part  second part  third part
1           test1        test2        None
2             hi1          hi2        None
3  this is a test         None        None
4           first       second       third
5           final         None        None

方法2，如果小写/大写不存在问题如评论中所述，使用

str.lower（）

将您的系列广播成小写，然后使用

str.split

：

series.str.lower().str.split(r'foo|bar', expand=True)

输出

                0              1            2
0     First paRt    second part    third part
1          test1           test2         None
2            hi1             HI2         None
3  This is a Test           None         None
4          first         second         third
5           final           None         None

                0              1            2
0     first part    second part    third part
1          test1           test2         None
2            hi1             hi2         None
3  this is a test           None         None
4          first         second         third
5           final           None         None

                0            1           2
0      first part  second part  third part
1           test1        test2        None
2             hi1          hi2        None
3  this is a test         None        None
4           first       second       third
5           final         None        None

方法3删除不必要的空白：输出

                0              1            2
0     First paRt    second part    third part
1          test1           test2         None
2            hi1             HI2         None
3  This is a Test           None         None
4          first         second         third
5           final           None         None

                0              1            2
0     first part    second part    third part
1          test1           test2         None
2            hi1             hi2         None
3  this is a test           None         None
4          first         second         third
5           final           None         None

                0            1           2
0      first part  second part  third part
1           test1        test2        None
2             hi1          hi2        None
3  this is a test         None        None
4           first       second       third
5           final         None        None

你可以添加一些示例数据，这样我们就可以自己尝试并为你提供一个答案吗？是否可以选择用小写字母编写你的

my_regexp

，然后使用：

df[col].str.lower（）.str.split（my_regexp，expand=True）

我添加了一些示例。这将是一个好主意，但不幸的是，我真的需要字符串在我的Outputer中是相同的。上面的FRAN解决方案也降低了结果的情况。