Python 3.x 如何使用“for match in soup.find_all”组合几个类似的命令？_Python 3.x_Beautifulsoup

Python 3.x 如何使用“for match in soup.find_all”组合几个类似的命令？

python-3.x

Python 3.x 如何使用“for match in soup.find_all”组合几个类似的命令？,python-3.x,beautifulsoup,Python 3.x,Beautifulsoup,我有下面的代码，其中有类似的命令涉及到match in soup.find_all。我想问一下，是否有可能合并它们，从而得到更干净的代码 import requests from bs4 import BeautifulSoup url = 'https://www.collinsdictionary.com/dictionary/french-english/aimanter' headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux

我有下面的代码，其中有类似的命令涉及到match in soup.find_all。我想问一下，是否有可能合并它们，从而得到更干净的代码

import requests
from bs4 import BeautifulSoup

url = 'https://www.collinsdictionary.com/dictionary/french-english/aimanter'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
soup = BeautifulSoup(requests.get(url, headers = headers).content, 'html.parser')

entry_name = soup.h2.text

for script in soup.select('script, .hcdcrt, #ad_contentslot_1, #ad_contentslot_2'):
    script.extract()

for match in soup.find_all('div', {'class' : 'copyright'}):  
    match.extract()
    
for match in soup.find_all('div', {'class' : 'example-info'}):  
    match.extract()

for match in soup.find_all('div', {'class' : 'share-overlay'}):  
    match.extract()
    
for match in soup.find_all('div', {'class' : 'popup-overlay'}):  
    match.extract()    
    

content1 = ''.join(map(str, soup.select_one('.cB.cB-def.dictionary.biling').contents))
content2 = ''.join(map(str, soup.select_one('.cB.cB-e.dcCorpEx').contents))

format = open('aimer.html', 'w+', encoding = 'utf8')
format.write(entry_name + '\n' + str(content1) + str(content2) + '\n</>\n' )
format.close()

您可以将各种循环与.find_all组合到第一个循环中。选择

例如：

import requests
from bs4 import BeautifulSoup


url = 'https://www.collinsdictionary.com/dictionary/french-english/aimanter'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
soup = BeautifulSoup(requests.get(url, headers = headers).content, 'html.parser')

entry_name = soup.h2.text

for tag in soup.select('''
        script,
        .hcdcrt,
        #ad_contentslot_1,
        #ad_contentslot_2,
        div.copyright,
        div.example-info,
        div.share-overlay,
        div.popup-overlay'''):
    tag.extract()

content1 = ''.join(map(str, soup.select_one('.cB.cB-def.dictionary.biling').contents))
content2 = ''.join(map(str, soup.select_one('.cB.cB-e.dcCorpEx').contents))

format = open('aimer.html', 'w+', encoding = 'utf8')
format.write(entry_name + '\n' + str(content1) + str(content2) + '\n</>\n' )
format.close()

您可以将各种循环与.find_all组合到第一个循环中。选择

例如：

import requests
from bs4 import BeautifulSoup


url = 'https://www.collinsdictionary.com/dictionary/french-english/aimanter'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
soup = BeautifulSoup(requests.get(url, headers = headers).content, 'html.parser')

entry_name = soup.h2.text

for tag in soup.select('''
        script,
        .hcdcrt,
        #ad_contentslot_1,
        #ad_contentslot_2,
        div.copyright,
        div.example-info,
        div.share-overlay,
        div.popup-overlay'''):
    tag.extract()

content1 = ''.join(map(str, soup.select_one('.cB.cB-def.dictionary.biling').contents))
content2 = ''.join(map(str, soup.select_one('.cB.cB-e.dcCorpEx').contents))

format = open('aimer.html', 'w+', encoding = 'utf8')
format.write(entry_name + '\n' + str(content1) + str(content2) + '\n</>\n' )
format.close()

你的代码对我来说非常好用。你能为初学者推荐一些BeautifulSoup的教程吗？@LAD当然是第一个要读的东西。但抓取是一个巨大的主题-CSS选择器和/或XPath、正则表达式、json、HTTP。。。都是需要学习的东西，它们不是特定于Python的。@AndrejKesely比我快，但我有一个几乎相同的答案。要找到好的图坦卡蒙，你也可以在youtube上搜索，不要只听一个视频。这将是@AndrejKesely答案的复制品。所以我就这样离开了。非常感谢@UWTDTV！今天你帮我解决了几个问题。你的代码对我来说非常好用。你能为初学者推荐一些BeautifulSoup的教程吗？@LAD当然是第一个要读的东西。但抓取是一个巨大的主题-CSS选择器和/或XPath、正则表达式、json、HTTP。。。都是需要学习的东西，它们不是特定于Python的。@AndrejKesely比我快，但我有一个几乎相同的答案。要找到好的图坦卡蒙，你也可以在youtube上搜索，不要只听一个视频。这将是@AndrejKesely答案的复制品。所以我就这样离开了。非常感谢@UWTDTV！今天你帮我解决了几个问题。