Python 在数据帧中的子字符串后提取字符串_Python_Regex_String_Pandas_Text

Python 在数据帧中的子字符串后提取字符串

python regex string pandas text

Python 在数据帧中的子字符串后提取字符串,python,regex,string,pandas,text,Python,Regex,String,Pandas,Text,我有一个pandas数据框，我想提取总是在某个关键字后面的名称\n名称=。因此，我希望获取'stet'和'bos'并将其放入数组。假设您提供的字符串是基于引号的字符串 '(ep1270399)\nname=stet, johannes cornelis p/a ballast nedam infra b.v., p.o. box 1526 , city=3430 bm nieuwegein , country=nl \n\nname=bos, wilhelmus johannes p/a ba

我有一个pandas数据框，我想提取总是在某个关键字后面的名称\n名称=。因此，我希望获取'stet'和'bos'并将其放入数组。

假设您提供的字符串是基于引号的字符串

'(ep1270399)\nname=stet, johannes cornelis p/a ballast nedam infra b.v., p.o. box 1526 , city=3430 bm  nieuwegein , country=nl \n\nname=bos, wilhelmus johannes p/a ballast nedam infra b.v., p.o. box 1526 , city=3430 bm  nieuwegein , country=nl \n'

这允许您提取名称=之后的所有值。但是，如果这些数据以不同的方式存储，您需要在问题中显示，以便我可以更好地为您定制答案

但是，您应该能够将正则表达式转换为任何格式。

您可以发布一个示例数据帧吗

import re

string = '(ep1270399)\nname=stet, johannes cornelis p/a ballast nedam infra b.v., p.o. box 1526 , city=3430 bm nieuwegein , country=nl \n\nname=bos, wilhelmus johannes p/a ballast nedam infra b.v., p.o. box 1526 , city=3430 bm nieuwegein , country=nl \n'

split = re.split(' |=|,|\n', string)
result = [split[idx + 1] for idx, value in enumerate(split) if value == 'name']

result

['stet', 'bos']