Python 删除系列中以特定字符串开头的所有文本_Python_Regex_Pandas_String_Substring

Python 删除系列中以特定字符串开头的所有文本

python regex pandas string

Python 删除系列中以特定字符串开头的所有文本,python,regex,pandas,string,substring,Python,Regex,Pandas,String,Substring,我有下面的df称为“地点” 正如您所看到的，places[“place_name]的所有输入中都有一个类似的子字符串[modifier | modifier le code]，我想删除该子字符串我尝试了以下两种方法 places["place_name"] = places["place_name"].apply(lambda x: re.sub("\\[modifier \\| modifier le code\\]", "

我有下面的df称为“地点”

正如您所看到的，places[“place_name]的所有输入中都有一个类似的子字符串[modifier | modifier le code]，我想删除该子字符串

我尝试了以下两种方法

places["place_name"] = places["place_name"].apply(lambda x: re.sub("\\[modifier \\| modifier le code\\]", "", x))

places["places_name"] = places["place_name"].str.replace("[modifier | modifier le code]", "", regex=False)

所有这些都不起作用，因为我认为问题在于我试图删除的子字符串与另一个子字符串卡住了（请注意，开头没有空格），因此我认为代码本身无法将其识别为字符串。我一直在尝试使用split（）拆分此字符串方法，但我有相同的问题，因为我试图删除的字符串开头没有空格

最终输出应为

                   place_name
0                 "Palais et bâtiments officiels"
1                 "Lieux de culte renommés"
2                 "Vestiges gallo-romains"

我试图寻找其他解决方案，但找不到任何解决方案，我知道有很多带字符串的问题，但找不到具体的解决方案。

您应该使用：

基本上，在

'[modifier'

上拆分字符串，然后选择第一个值（

[0]

]

）

删除从0+空格开始的所有空格和

[修改器：


places[“place\u name”].str.replace（r'\s*\[modifier.*'，''）

在这里，\s*
匹配0+空格，\[
匹配[
和修饰符。*
匹配修饰符
然后匹配除换行符以外的任何0+字符，尽可能多
看
提取从字符串开头到第一个[
的所有文本：
places[“place\u name”]=places[“place\u name”].str.extract（r'^（[^][]+'），expand=False）

详情如下：

^
-字符串的开头
（[^][+）
-捕获组1（Seris.str.extract
需要捕获组返回任何值）：除]
和[
之外的一个或多个字符

熊猫测试：
>>将熊猫作为pd导入
>>>places=pd.DataFrame（{'place_name'：[“Palais et b–timents officiels[修饰符|修饰符le code]，“leeux de culte renommés[修饰符|修饰符le code]，“遗迹gallo romains[修饰符|修饰符le code]”]））
>>>places[“place\u name”]=places[“place\u name”].str.extract（r'^（[^][]+]），expand=False）
>>>地点
地名
0宫殿和酒店
1雷诺梅斯教堂
2个盖洛罗马遗迹
>>>places[“place\u name”].str.replace（r'\s*\[modifier.*'，''）
0宫殿和酒店
1雷诺梅斯教堂
2个盖洛罗马遗迹

如果您喜欢使用split
，可以使用使用文字字符串而不是正则表达式的Seris.str.rsplit
：
>>places[“place_name”].str.rsplit（'[modifier'）.str[0]
0宫殿和酒店
1雷诺梅斯教堂
2个盖洛罗马遗迹
@aramis您可以使用
解析“[modifier”
，因为它不使用正则表达式，而且您的字符串中只有一个[modifier，请参阅我的答案，并提供更多的解决方案。非常感谢您提供的扩展性答案，这对于这项任务非常有用，同时也扩展了我对正则表达式的了解
                   place_name
0                 "Palais et bâtiments officiels"
1                 "Lieux de culte renommés"
2                 "Vestiges gallo-romains"

places["place_name"] = places["place_name"].str.split('\\[modifier').str[0]