在python中使用BeautifulSoup将xml标记中的所有连字符替换为下划线_Python_Regex_Python 2.7_Beautifulsoup

在python中使用BeautifulSoup将xml标记中的所有连字符替换为下划线

python regex python-2.7

在python中使用BeautifulSoup将xml标记中的所有连字符替换为下划线,python,regex,python-2.7,beautifulsoup,Python,Regex,Python 2.7,Beautifulsoup,假设我有一个包含XML输出的字符串，如下所示： <dept-details> <dept-domain-id>1</dept-domain-id> <dept-req-status>no-vacancies-present</dept-req-status> . . </dept-details> 1. 没有空缺 . . 我想将所有包含连字符（-）的标记替换为下划线（uz），

假设我有一个包含XML输出的字符串，如下所示：

<dept-details>
     <dept-domain-id>1</dept-domain-id>
     <dept-req-status>no-vacancies-present</dept-req-status>
      .
      .
</dept-details>


1.
没有空缺
.
.

我想将所有包含连字符（-）的标记替换为下划线（uz），因为我看到Beautiful Soup不允许您直接访问包含的标记，除非按照post所说的使用find（）进行访问

因此，我的目的是将包含-的标记转换为u，这样字符串看起来像：

<dept_details>
     <dept_domain_id>1</dept_domain_id>
     <dept_req_status>no-vacancies-present</dept_req_status>
      .
      .
</dept_details>


1.
没有空缺
.
.

我想知道如何使用python re方法来实现这一点，或者如果我可以直接使用BeautifulSoup来实现这一点，那将是非常棒的

提前感谢编辑：请看Burhan的答案，它好多了

string = '<dept-details><dept-domain-id>1</dept-domain-id><dept-req-status>no-vacancies-present</dept-req-status></dept-details>'

import re

tags = re.finditer('<.*?-.*?>', string)

for x in tags:
    string = string[:x.span()[0]] + x.group(0).replace('-','_') + string[x.span()[1]:]

print string

string='1不存在空缺'
进口稀土
tags=re.finditer（“”，字符串）
对于x in标记：
string=string[：x.span（）[0]]+x.group（0）。替换（'-'，''.''）+string[x.span（）[1]：]
打印字符串

其中string是实际的XML代码字符串。但肯定有更好的办法

如果此处需要正则表达式，请尝试以下解决方案：

>>> s
'<dept-details><dept-domain-id>1</dept-domain-id><dept-req-status>no-vacancies</dept-req-status></dept-details>'
>>> re.sub('<(.*?)>', lambda x: x.group(0).replace('-','_'), s)
'<dept_details><dept_domain_id>1</dept_domain_id><dept_req_status>no-vacancies</dept_req_status></dept_details>'

>>s
“1无空缺”
>>>re.sub（''，λx:x.group（0）.替换（'-'，''，s）
“1无空缺”

正则表达式存在一些问题，例如，它还将替换任何具有

的属性，但至少这将使您朝着正确的方向前进。

BeautifulSoup对于HTML很好，对于XML则不是很好。更适合于xml解析。