如何使用Python在多个网页或URL中找到一个特定单词并对其进行计数
下面是我的代码。请检查并纠正我如何使用Python在多个网页或URL中找到一个特定单词并对其进行计数,python,url,web-scraping,beautifulsoup,Python,Url,Web Scraping,Beautifulsoup,下面是我的代码。请检查并纠正我 import requests from bs4 import BeautifulSoup url = ["https://www.tensorflow.org/","https://www.tomordonez.com/"] the_word = input() r = requests.get(url, allow_redirects=False) soup = BeautifulSoup(r.content, 'lxml') words = so
import requests
from bs4 import BeautifulSoup
url = ["https://www.tensorflow.org/","https://www.tomordonez.com/"]
the_word = input()
r = requests.get(url, allow_redirects=False)
soup = BeautifulSoup(r.content, 'lxml')
words = soup.find(text=lambda text: text and the_word in text)
print(words)
count = len(words)
print('\nUrl: {}\ncontains {} of word: {}'.format(url, count, the_word))
如何更改代码以解析多个URL并计算特定单词出现的次数
import requests
from bs4 import BeautifulSoup
url_list = ["https://www.tensorflow.org/","https://www.tomordonez.com/"]
#the_word = input()
the_word = 'Python'
total_words = []
for url in url_list:
r = requests.get(url, allow_redirects=False)
soup = BeautifulSoup(r.content.lower(), 'lxml')
words = soup.find_all(text=lambda text: text and the_word.lower() in text)
count = len(words)
words_list = [ ele.strip() for ele in words ]
for word in words:
total_words.append(word.strip())
print('\nUrl: {}\ncontains {} of word: {}'.format(url, count, the_word))
print(words_list)
#print(total_words)
total_count = len(total_words)
输出:
Url: https://www.tensorflow.org/
contains 0 of word: Python
[]
Url: https://www.tomordonez.com/
contains 8 of word: Python
['web scraping with python', 'this is a tutorial on web scraping with python. learn to scrape websites with python and beautifulsoup.', 'python unit testing tutorial', 'this is a tutorial about unit testing in python.', 'pip install ssl module in python is not available', 'troubleshooting ssl module in python is not available', 'python context manager', 'a short tutorial about python context manager: "with" statement.']
您可以使用
re
模块查找特定文本
import requests
import re
from bs4 import BeautifulSoup
urls = ["https://www.tensorflow.org/","https://www.tomordonez.com/"]
the_word ='Tableau'
for url in urls:
print(url)
r = requests.get(url, allow_redirects=False)
soup = BeautifulSoup(r.text, 'html.parser')
words = soup.find_all(text=re.compile(the_word))
print(len(words))
你有什么问题?为什么你的代码不起作用?预期产量是多少?请用详细信息填写您的问题。我想要传递多个URL的特定单词的计数???怎么做?