如何通过python中的Beauty soup在html页面中查找特定单词?

如何通过python中的Beauty soup在html页面中查找特定单词?,python,python-2.7,beautifulsoup,Python,Python 2.7,Beautifulsoup,我想通过html文本中的BeautifulSoup,找出一个特定单词在网页中出现了多少次? 我尝试了findAll函数,但只在特定标记(如soup.body)中查找单词。findAll将在body标记中查找特定单词,但我希望它在html文本中的所有标记中搜索该单词。 还有,一旦我找到那个单词,我需要创建一个单词前后的列表,有人能帮我怎么做吗?谢谢。根据,您可以使用recursive关键字在整个树中查找文本。您将拥有字符串,然后您可以对其进行运算符运算并分隔单词 下面是一个完整的示例: impor

我想通过html文本中的BeautifulSoup,找出一个特定单词在网页中出现了多少次? 我尝试了
findAll
函数,但只在特定标记(如
soup.body)中查找单词。findAll
将在body标记中查找特定单词,但我希望它在html文本中的所有标记中搜索该单词。 还有,一旦我找到那个单词,我需要创建一个单词前后的列表,有人能帮我怎么做吗?谢谢。

根据,您可以使用
recursive
关键字在整个树中查找文本。您将拥有字符串,然后您可以对其进行运算符运算并分隔单词

下面是一个完整的示例:

import bs4
import re

data = '''
<html>
<body>
<div>today is a sunny day</div>
<div>I love when it's sunny outside</div>
Call me sunny
<div>sunny is a cool word sunny</div>
</body>
</html>
'''

searched_word = 'sunny'

soup = bs4.BeautifulSoup(data, 'html.parser')
results = soup.body.find_all(string=re.compile('.*{0}.*'.format(searched_word)), recursive=True)

print 'Found the word "{0}" {1} times\n'.format(searched_word, len(results))

for content in results:
    words = content.split()
    for index, word in enumerate(words):
        # If the content contains the search word twice or more this will fire for each occurence
        if word == searched_word:
            print 'Whole content: "{0}"'.format(content)
            before = None
            after = None
            # Check if it's a first word
            if index != 0:
                before = words[index-1]
            # Check if it's a last word
            if index != len(words)-1:
                after = words[index+1]
            print '\tWord before: "{0}", word after: "{1}"'.format(before, after)

可能重复的否它不是重复的,我检查了Results=soup.body.find_all(string=searched_word,recursive=true)name错误:名称“true”未定义我已下载了4.3版/I用完整的工作示例更新了答案,请再次检查我得到的“find the word”sunny“0次”使用Python2.7.3的RU?我只是复制粘贴的示例代码似乎
string
关键字是在版本4.4中添加的,所以请使用该关键字或将
soup.body.find_all(string=…)
更改为
soup.body.find_all(text=…)
(4.3及之前版本的关键字不同)
Found the word "sunny" 4 times

Whole content: "today is a sunny day"
    Word before: "a", word after: "day"
Whole content: "I love when it's sunny outside"
    Word before: "it's", word after: "outside"
Whole content: "
Call me sunny
"
    Word before: "me", word after: "None"
Whole content: "sunny is a cool word sunny"
    Word before: "None", word after: "is"
Whole content: "sunny is a cool word sunny"
    Word before: "word", word after: "None"