Javascript BeautifulSoup-移除子对象，但保留其内容_Javascript_Python_Html_Web Scraping_Beautifulsoup

Javascript BeautifulSoup-移除子对象，但保留其内容

javascript python html web-scraping

Javascript BeautifulSoup-移除子对象，但保留其内容,javascript,python,html,web-scraping,beautifulsoup,Javascript,Python,Html,Web Scraping,Beautifulsoup,我正在创建一个web刮板，我在获取最有可能生成的页面时遇到问题，如下所示： <html> <body> <div > <code> <my-component v-bind:prop1="parentV

我正在创建一个web刮板，我在获取最有可能生成的页面时遇到问题，如下所示：

<html> <body> <div > <code> <my-component v-bind:prop1="parentValue"></my-component>  <my-component :prop1="parentValue"></my-component> </code> </div> <div> <code> <my-component v-on:myEvent="parentHandler"></my-component>  <my-component @myEvent="parentHandler"></my-component> </code> </div> </body> </html>

<html> <body> <div > <code> text text and more text </code> </div> </html> </body>
我的努力如下

from bs4 import BeautifulSoup bs = BeautifulSoup(payload, 'lxml') with open('/tmp/out.html', 'w+') as f: for t in bs.find_all(): for q in t.find_all('code'): # print(t.text, t.next_sibling) f.write(q.text)
但这并不能带来好的结果。。据我所知，bs的主要目的是提取元素，所以这就是我尝试在另一个文件中重新创建dom的原因
谢谢
您可以尝试以下方法：

from bs4 import BeautifulSoup payload=''' <html> <body> <div > <code> <my-component v-bind:prop1="parentValue"></my-component>  <my-component :prop1="parentValue"></my-component> </code> </div> <div> <code> <my-component v-on:myEvent="parentHandler"></my-component>  <my-component @myEvent="parentHandler"></my-component> </code> </div> </body> </html> ''' soup = BeautifulSoup(payload, 'lxml') for match in soup.find_all('code'): new_t=soup.new_tag('code') new_t.string=match.text match.replace_with(new_t) with open(r'prove.html', "w") as file: file.write(str(soup))

但是您想要一个简单的
code
标记，其中包含
code
标记其余部分的所有内容，或者逐个提取所有内容？而且，所有的内容都是所有
p
的文本，是吗？@MrNobody33那么，所有不是
code
的标签都不应该被修改，而是按原样保存。应删除
code
标记中的子标记，但保留其文本。第二个问题，是的，根据我的经验，代码中的标签最有可能是或标签。
对于bs中的child.find_all（'code'）：print（child.text，child.next_sibling）
这似乎可行，但idk如何获得其对应项，好吧，我刚刚发布了一个答案@saventive！希望它对你有用！
<html> <body> <div> <code> <my-component v-bind:prop1="parentValue"></my-component>  <my-component :prop1="parentValue"></my-component> </code> </div> <div> <code> <my-component v-on:myEvent="parentHandler"></my-component>  <my-component @myEvent="parentHandler"></my-component> </code> </div> </body> </html>