在python中检测和替换字符串中的xml_Python_Xml_Re

在python中检测和替换字符串中的xml

python xml

在python中检测和替换字符串中的xml,python,xml,re,Python,Xml,Re,我有一个包含文本和一些xml内容的文件。它看起来像这样： The authentication details : <id>70016683</id><password>password@123</password> The next step is to send the request. The request : <request><id>90016133</id><password>passw

我有一个包含文本和一些xml内容的文件。它看起来像这样：

The authentication details : <id>70016683</id><password>password@123</password>
The next step is to send the request.
The request : <request><id>90016133</id><password>password@3212</password></request>
Additional info includes <Address><line1>House no. 341</line1><line2>A B Street</line2><City>Sample city</City></Address>

The authentication details : xml_obj
The next step is to send the request.
The request : xml_obj
Additional info includes xml_obj

The authentication details : ID <id>70016683</id> Password <password>password@123</password> Authentication details complete
The next step is to send the request.
The request : <request><id>90016133</id><password>password@3212</password></request> Request successful
Additional info includes <Address><line1>House no. 341</line1><line2>A B Street</line2><City>Sample city</City></Address>

同时，我还想提取被替换的xml文本并将其存储在列表中。如果该行没有xml对象，则列表应不包含任何内容

我已尝试使用regex来实现此目的：

xml\u tag=re.search（r“”，第行）
如果xml_标记：
start\u position=xml\u tag.start（）
xml_word=xml_tag.group（）[：1]+'/'+xml_tag.group（）[1:]
xml\u pattern=r'{}.格式（xml\u-word）
stop\u position=re.search（xml\u模式，行）.stop（）

但是这段代码只检索一个xml标记的开始和停止位置，第一行的内容和最后一行的整个格式（在输入文件中）。我希望获得所有xml内容，而不考虑xml结构，并将其替换为“xml_obj”

任何建议都会有帮助。提前谢谢

编辑：

我还想对如下所示的文件应用相同的逻辑：

The authentication details : <id>70016683</id><password>password@123</password>
The next step is to send the request.
The request : <request><id>90016133</id><password>password@3212</password></request>
Additional info includes <Address><line1>House no. 341</line1><line2>A B Street</line2><City>Sample city</City></Address>

The authentication details : xml_obj
The next step is to send the request.
The request : xml_obj
Additional info includes xml_obj

The authentication details : ID <id>70016683</id> Password <password>password@123</password> Authentication details complete
The next step is to send the request.
The request : <request><id>90016133</id><password>password@3212</password></request> Request successful
Additional info includes <Address><line1>House no. 341</line1><line2>A B Street</line2><City>Sample city</City></Address>

身份验证详细信息：ID 70016683密码password@123身份验证详细信息已完成
下一步是发送请求。
请求：90016133password@3212请求成功
其他信息包括住宅号341A B StreetSample city

上述文件在一行中可能有多个xml对象

xml部分后面可能还有一些纯文本。

下面有点复杂，但假设您的问题中的示例正确表示了您的实际文本，请尝试以下操作：

txt = """[your sample text above]"""
lines = txt.splitlines()
entries = []
new_txt = ''

for line in lines:
    entry = (line.replace(' <',' xxx<',1).split('xxx'))
    if len(entry)==2:
        entries.append(entry[1])
        entry[1]="xml_obj"
        line=''.join(entry)
    else:
        entries.append('none')
    new_txt+=line+'\n'
for entry in entries:
    print(entry)
print('---')
print(new_txt)

txt=“”[上面的示例文本]“”“
lines=txt.splitlines（）
条目=[]
新文本=“”
对于行中的行：
条目=（line.replace）（'每行中的xml片段是否总是跟在：
？@Jack-不总是。然后请编辑您的问题，并添加一些更具代表性的文本示例。@JackFleeting-当然@Jack-我在我拥有的其他类型的文件上尝试过它。我已经编辑了问题，将其包括在内。您能帮我解决这个问题吗太好了？提前谢谢。@sunehaks请将其作为一个单独的问题发布（根据SO政策），我很乐意看一看。@Jack Fleeting-当然。