从文本文件中提取多个模式并将其保存到panda dataframe[python]
我的文本文件如下所示从文本文件中提取多个模式并将其保存到panda dataframe[python],python,regex,pandas,Python,Regex,Pandas,我的文本文件如下所示 Description: Text 1 follows <br/> blah blah blah Cause: Cause Text 1 follows here <br/>Description: Text 2 follows <br/> blah blah blah Cause: Cause Text 2 follows here<br/>Description: Text 3 follows <br/>
Description: Text 1 follows <br/> blah blah blah Cause: Cause Text 1
follows here <br/>Description: Text 2 follows <br/> blah blah
blah Cause: Cause Text 2 follows here<br/>Description: Text 3 follows <br/>
blah blah blah Description: Text 4 follows <br/> blah blah
blah Cause: Cause Text 4 follows<br/>
到目前为止我所做的:
re.findall(r'Description:(.*?)<br/>',textfile)
re.findall(r'Cause:(.*?)<br/>',textfile)
re.findall(r'Description:(.*)
,文本文件)
关于findall(r'原因:(.*)
,textfile)
但是当我尝试创建更大的数据帧时,这不允许我匹配描述和原因
感谢您的任何意见或指导。非常新的python 这是我想到的
r"Description:(.*?)<br/>(?:(?!Cause)(?!Description).)*(?:Cause:(.*?)<br/>)?"
尝试
r"Description:(.*?)<br/>(?:(?!Cause)(?!Description).)*(?:Cause:(.*?)<br/>)?"
data = re.findall(r"Description:(.*?)<br/>(?:(?!Cause)(?!Description).)*(?:Cause:(.*?)<br/>)?", textfile)
df = pandas.DataFrame(data, columns=("Description", "Cause"))