在python中检测和替换字符串中的xml

在python中检测和替换字符串中的xml,python,xml,re,Python,Xml,Re,我有一个包含文本和一些xml内容的文件。它看起来像这样: The authentication details : <id>70016683</id><password>password@123</password> The next step is to send the request. The request : <request><id>90016133</id><password>passw

我有一个包含文本和一些xml内容的文件。它看起来像这样:

The authentication details : <id>70016683</id><password>password@123</password>
The next step is to send the request.
The request : <request><id>90016133</id><password>password@3212</password></request>
Additional info includes <Address><line1>House no. 341</line1><line2>A B Street</line2><City>Sample city</City></Address>
The authentication details : xml_obj
The next step is to send the request.
The request : xml_obj
Additional info includes xml_obj
The authentication details : ID <id>70016683</id> Password <password>password@123</password> Authentication details complete
The next step is to send the request.
The request : <request><id>90016133</id><password>password@3212</password></request> Request successful
Additional info includes <Address><line1>House no. 341</line1><line2>A B Street</line2><City>Sample city</City></Address>
同时,我还想提取被替换的xml文本并将其存储在列表中。如果该行没有xml对象,则列表应不包含任何内容

  • 我已尝试使用regex来实现此目的:
xml\u tag=re.search(r“”,第行)
如果xml_标记:
start\u position=xml\u tag.start()
xml_word=xml_tag.group()[:1]+'/'+xml_tag.group()[1:]
xml\u pattern=r'{}.格式(xml\u-word)
stop\u position=re.search(xml\u模式,行).stop()
但是这段代码只检索一个xml标记的开始和停止位置,第一行的内容和最后一行的整个格式(在输入文件中)。我希望获得所有xml内容,而不考虑xml结构,并将其替换为“xml_obj”

任何建议都会有帮助。提前谢谢

编辑:

我还想对如下所示的文件应用相同的逻辑:

The authentication details : <id>70016683</id><password>password@123</password>
The next step is to send the request.
The request : <request><id>90016133</id><password>password@3212</password></request>
Additional info includes <Address><line1>House no. 341</line1><line2>A B Street</line2><City>Sample city</City></Address>
The authentication details : xml_obj
The next step is to send the request.
The request : xml_obj
Additional info includes xml_obj
The authentication details : ID <id>70016683</id> Password <password>password@123</password> Authentication details complete
The next step is to send the request.
The request : <request><id>90016133</id><password>password@3212</password></request> Request successful
Additional info includes <Address><line1>House no. 341</line1><line2>A B Street</line2><City>Sample city</City></Address>
身份验证详细信息:ID 70016683密码password@123身份验证详细信息已完成
下一步是发送请求。
请求:90016133password@3212请求成功
其他信息包括住宅号341A B StreetSample city
上述文件在一行中可能有多个xml对象


xml部分后面可能还有一些纯文本。

下面有点复杂,但假设您的问题中的示例正确表示了您的实际文本,请尝试以下操作:

txt = """[your sample text above]"""
lines = txt.splitlines()
entries = []
new_txt = ''

for line in lines:
    entry = (line.replace(' <',' xxx<',1).split('xxx'))
    if len(entry)==2:
        entries.append(entry[1])
        entry[1]="xml_obj"
        line=''.join(entry)
    else:
        entries.append('none')
    new_txt+=line+'\n'
for entry in entries:
    print(entry)
print('---')
print(new_txt)
txt=“”[上面的示例文本]“”“
lines=txt.splitlines()
条目=[]
新文本=“”
对于行中的行:

条目=(line.replace)('每行中的xml片段是否总是跟在
?@Jack-不总是。然后请编辑您的问题,并添加一些更具代表性的文本示例。@JackFleeting-当然@Jack-我在我拥有的其他类型的文件上尝试过它。我已经编辑了问题,将其包括在内。您能帮我解决这个问题吗太好了?提前谢谢。@sunehaks请将其作为一个单独的问题发布(根据SO政策),我很乐意看一看。@Jack Fleeting-当然。