Python 如何使BeautifulSoup的“replace_with”属性与“unicode”对象一起工作？_Python_Beautifulsoup

Python 如何使BeautifulSoup的“replace_with”属性与“unicode”对象一起工作？

python

Python 如何使BeautifulSoup的“replace_with”属性与“unicode”对象一起工作？,python,beautifulsoup,Python,Beautifulsoup,这是我的html：在my I18N_index.html中，所有3个字符串都以“I18N_”前缀正确显示但是，my p标记包含子标记，对于这些子标记，返回类型为“None”。因此，连接不再有效： for p in soup.find_all('p'): i18n_string = "I18N_"+p.string p.string.replace_with(i18n_string) print(p.string) f.writ

这是我的html：

在my I18N_index.html中，所有3个字符串都以“I18N_”前缀正确显示

但是，my p标记包含子标记，对于这些子标记，返回类型为“None”。因此，连接不再有效：

    for p in soup.find_all('p'):
        i18n_string = "I18N_"+p.string
        p.string.replace_with(i18n_string)
        print(p.string)

    f.write(str(soup))

###Output:##################################################
# $ python ./test.py
# I18N_Pizza
# I18N_Eggplant Parmesan
# I18N_Italian Ice Cream
# I18N_This is some random paragraph without child tags.
# Traceback (most recent call last):
  # File "./test.py", line 15, in <module>
    # i18n_string = "I18N_"+p.string
# TypeError: cannot concatenate 'str' and 'NoneType' objects
############################################################

在该线程中，提到了另一个基于iInstance的解决方案，但我无法实现

如果我理解正确，join函数会连接字符串，但会返回一个“unicode”对象，而不是字符串对象，这就是为什么“replace_with”属性不起作用的原因。我怎样才能解决这个问题？非常感谢您的帮助。

使用简化版代码，即只需解决p标记问题，您似乎必须将p.string替换为p.text：

soup=BeautifulSoup[your html]，lxml

 for p in soup.find_all('p'):
   print('before: ',p.text)
   i18n_string = "I18N_"+p.text
   print('after ',i18n_string)

输出：

before:  This is some random paragraph without child tags.
after  I18N_This is some random paragraph without child tags.
before:  Delicious homebaked pizza.$8.99 pp
after  I18N_Delicious homebaked pizza.$8.99 pp
before:  Try the authentic Italian flavor of baked aubergine.$6.99 pp
after  I18N_Try the authentic Italian flavor of baked aubergine.$6.99 pp
before:  Our dessert specialty.$3.99 pp
after  I18N_Our dessert specialty.$3.99 pp

对于代码的简化版本，即只考虑p标记问题，您似乎必须将p.string替换为p.text：

soup=BeautifulSoup[your html]，lxml

 for p in soup.find_all('p'):
   print('before: ',p.text)
   i18n_string = "I18N_"+p.text
   print('after ',i18n_string)

输出：

before:  This is some random paragraph without child tags.
after  I18N_This is some random paragraph without child tags.
before:  Delicious homebaked pizza.$8.99 pp
after  I18N_Delicious homebaked pizza.$8.99 pp
before:  Try the authentic Italian flavor of baked aubergine.$6.99 pp
after  I18N_Try the authentic Italian flavor of baked aubergine.$6.99 pp
before:  Our dessert specialty.$3.99 pp
after  I18N_Our dessert specialty.$3.99 pp

用方法替换_不起作用，不是因为joined是unicode对象，而是因为它是bs4对象特有的方法。见此：

顺便说一下，join方法返回str，请参见：

现在给您一个解决方案，我只需删除p标记后的字符串：

输出：

I18N_比萨饼 I18N_茄子帕尔马干酪 I18N_意大利冰淇淋 I18N_这是一些没有子标记的随机段落。 I18N_美味的自制比萨饼，每页8.99美元 I18N_尝尝正宗的意大利风味的烤茄子。每页6.99美元 I18N_我们的甜点特色。3.99美元pp

用方法替换_不起作用，不是因为joined是unicode对象，而是因为它是bs4对象特有的方法。见此：

顺便说一下，join方法返回str，请参见：

现在给您一个解决方案，我只需删除p标记后的字符串：

输出：

I18N_比萨饼 I18N_茄子帕尔马干酪 I18N_意大利冰淇淋 I18N_这是一些没有子标记的随机段落。 I18N_美味的自制比萨饼，每页8.99美元 I18N_尝尝正宗的意大利风味的烤茄子。每页6.99美元

I18N_我们的甜点特色菜。3.99美元pp

此解决方案有效。非常感谢您提供更多信息。此解决方案有效。非常感谢您提供的更多信息。谢谢您的回复。我以前尝试过“text”，但它并不能解决我无法使用“replace_with”的问题。谢谢您的回复。我以前尝试过“text”，但它并没有解决我无法使用“replace_with”的问题。

before:  This is some random paragraph without child tags.
after  I18N_This is some random paragraph without child tags.
before:  Delicious homebaked pizza.$8.99 pp
after  I18N_Delicious homebaked pizza.$8.99 pp
before:  Try the authentic Italian flavor of baked aubergine.$6.99 pp
after  I18N_Try the authentic Italian flavor of baked aubergine.$6.99 pp
before:  Our dessert specialty.$3.99 pp
after  I18N_Our dessert specialty.$3.99 pp

from bs4 import BeautifulSoup

with open("index.html", "r") as f:
 soup = BeautifulSoup(f, "lxml")

f = open("I18N_index.html", "w+")

for h2 in soup.find_all('h2'):
    i18n_string = "I18N_"+h2.string
    h2.string.replace_with(i18n_string)
    print(h2.string)

for p in soup.find_all('p'):
    joined = ''.join(p.strings)
    i18n_string = "I18N_"+joined
    p.replace_with(i18n_string)
    print (i18n_string)


f.write(str(soup))