Python 美丽的汤不改变对象降低_Python_Beautifulsoup

Python 美丽的汤不改变对象降低

python

Python 美丽的汤不改变对象降低,python,beautifulsoup,Python,Beautifulsoup,所以我得到的错误是： 'NoneType'对象没有属性'lower' 问题是，在我创建第二种方法之前，它是有效的，但现在它是喜怒无常的。我刚开始使用pycharm，所以我对这个场景非常陌生这是我的密码： import requests import sys from bs4 import BeautifulSoup import operator def start(url): word_list = [] source_code = requests.get(url).te

所以我得到的错误是：

'NoneType'对象没有属性'lower'

问题是，在我创建第二种方法之前，它是有效的，但现在它是喜怒无常的。我刚开始使用pycharm，所以我对这个场景非常陌生

这是我的密码：

import requests
import sys
from bs4 import BeautifulSoup
import operator

def start(url):
    word_list = []
    source_code = requests.get(url).text
    soup = BeautifulSoup(source_code, 'html.parser')
    for post_text in soup.find_all('p'):
        content = post_text.string
        words = content.lower().split()
        for word in words:
            word_list.append(word)
    clean_up_list(word_list)

def clean_up_list(word_list):
    clean_word_list = []
    for word in word_list:
        accepted = "abcdefghijklmnopqrstuvwxyz\'"
        for c in list(word):
            if c not in list(accepted):
                word = word.replace(c, "")
        if len(word) > 0:
            print(word)
            clean_up_list().append(word)


start('http://www.nameofwebsite.com/')

这是因为

post_text.string

没有文本属性

这是其中一个

标记中没有文本。因此它返回了

None

因此，当您执行

words=content.lower（）.split（）

时，实际上是在尝试对没有.lower属性的对象应用

.lower（）
您可以做的是添加一个if语句

修改：
for post_text in soup.find_all('p'):
    content = post_text.string
    if content is None: #  Checking if content is None
         continue
    words = content.lower().split()

下面是一个会导致错误的示例：
from bs4 import BeautifulSoup

soup = BeautifulSoup(
    '<p><div>hello</div><div>world</div></p>',
    'html.parser'
)

for p in soup.find_all('p'):
    print(repr(p.string))

--output:--
None

或.strings
：
from bs4 import BeautifulSoup

soup = BeautifulSoup(
    '<p><div>hello</div><div>world</div></p>',
    'html.parser'
)

for p in soup.find_all('p'):
    for string in p.strings:
        print(string)

--output:--
hello 
world

from bs4 import BeautifulSoup

soup = BeautifulSoup(
'''
<p>
  <div>hello</div>
  <div>world</div>
</p>
''',

    'html.parser'
)

for p in soup.find_all('p'):
    for string in p.stripped_strings:
        print(string)

--output:--
hello
world

要跳过空白，可以使用.stripped\u strings
：
from bs4 import BeautifulSoup

soup = BeautifulSoup(
    '<p><div>hello</div><div>world</div></p>',
    'html.parser'
)

for p in soup.find_all('p'):
    for string in p.strings:
        print(string)

--output:--
hello 
world

from bs4 import BeautifulSoup

soup = BeautifulSoup(
'''
<p>
  <div>hello</div>
  <div>world</div>
</p>
''',

    'html.parser'
)

for p in soup.find_all('p'):
    for string in p.stripped_strings:
        print(string)

--output:--
hello
world

从bs4导入美化组
汤=美汤(
'''

你好
世界

''',
“html.parser”
)
对于汤中的p。查找所有（'p'）：
对于p.U字符串中的字符串：
打印（字符串）
--输出：--
你好
世界
@user3667111 continue将跳过该特定的p标记
，以便更详细地指定该特定迭代