为什么可以'；我不是在用python抓取字符串时得到的吗？_Python_Web Scraping_Beautifulsoup

为什么可以'；我不是在用python抓取字符串时得到的吗？

python web-scraping

为什么可以'；我不是在用python抓取字符串时得到的吗？,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,这是我的代码我想从网站上抓取一个单词列表，但是当我调用 import requests from bs4 import BeautifulSoup url = "https://www.merriam-webster.com/browse/thesaurus/a" source_code = requests.get(url) plain_text = source_code.text soup = BeautifulSoup(plain_text, "html

这是我的代码我想从网站上抓取一个单词列表，但是当我调用

import requests
from bs4 import BeautifulSoup

url = "https://www.merriam-webster.com/browse/thesaurus/a"
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
entry_view = soup.find_all('div', {'class': 'entries'})
view = entry_view[0]
list = view.ul

for m in list:
    for x in m:
        title = x.string
        print(title)

我想要的是一个从网站打印文本的列表，但我得到的是一个错误

Traceback (most recent call last):
  File "/home/vidu/PycharmProjects/untitled/hello.py", line 14, in <module>
    title = x.string
AttributeError: 'str' object has no attribute 'string'
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
    from apport.fileutils import likely_packaged, get_recent_crashes
  File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
    from apport.report import Report
  File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
    import apport.fileutils
  File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
    from apport.packaging_impl import impl as packaging
  File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 23, in <module>
    import apt
  File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
    import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'

Original exception was:
Traceback (most recent call last):
  File "/home/vidu/PycharmProjects/untitled/hello.py", line 14, in <module>
    title = x.string
AttributeError: 'str' object has no attribute 'string'

回溯（最近一次呼叫最后一次）：
文件“/home/vidu/PycharmProjects/untitled/hello.py”，第14行，在
title=x.string
AttributeError:'str'对象没有属性'string'
sys.excepthook中出错：
回溯（最近一次呼叫最后一次）：
文件“/usr/lib/python3/dist packages/apport\u python\u hook.py”，第63行，apport\u excepthook
从apport.fileutils导入可能的\u打包，获取\u最近的\u崩溃
文件“/usr/lib/python3/dist-packages/apport/_-init___.py”，第5行，在
从apport.report导入报告
文件“/usr/lib/python3/dist-packages/apport/report.py”，第30行，在
导入apport.fileutils
文件“/usr/lib/python3/dist-packages/apport/fileutils.py”，第23行，在
从apport.packaging\u impl导入impl as packaging
文件“/usr/lib/python3/dist packages/apport/packaging_impl.py”，第23行，在
进口apt
文件“/usr/lib/python3/dist-packages/apt/_-init___.py”，第23行，在
进口apt_包装
ModuleNotFoundError:没有名为“apt_pkg”的模块
最初的例外是：
回溯（最近一次呼叫最后一次）：
文件“/home/vidu/PycharmProjects/untitled/hello.py”，第14行，在
title=x.string
AttributeError:'str'对象没有属性'string'

AttributeError:'str'对象没有属性'string'

这表示对象已经是字符串。试着移除它，它应该会工作

它还告诉您字符串数据类型的正确语法是

str

而不是

string

另一件需要注意的事情是，您可以使用

title=str（x）

进行转换，但由于在本例中它已经是一个字符串，因此它是多余的

引述：

Python有一个名为“str”的内置字符串类，具有许多方便的特性（有一个名为“string”的旧模块，您不应该使用它）

AttributeError:'str'对象没有属性'string'

这表示对象已经是字符串。试着移除它，它应该会工作

它还告诉您字符串数据类型的正确语法是

str

而不是

string

另一件需要注意的事情是，您可以使用

title=str（x）

进行转换，但由于在本例中它已经是一个字符串，因此它是多余的

引述：

Python有一个名为“str”的内置字符串类，具有许多方便的特性（有一个名为“string”的旧模块，您不应该使用它）

您可以使用下面的代码实现您想要的

代码：

import requests
from bs4 import BeautifulSoup

url = "https://www.merriam-webster.com/browse/thesaurus/a"
html_source = requests.get(url).text
soup = BeautifulSoup(html_source, "html.parser")

entry_view = soup.find_all('div', {'class': 'entries'})

entries = []
for elem in entry_view:
    for e in elem.find_all('a'):
        entries.append(e.text)

#show only 5 elements and whole list length
print(entries[:5])
print(entries[-5:])
print(len(entries))

['A1', 'aback', 'abaft', 'abandon', 'abandoned']
['absorbing', 'absorption', 'abstainer', 'abstain from', 'abstemious']
100

print(type(list))
<class 'bs4.element.Tag'>

print(type(m))
<class 'bs4.element.NavigableString'>

print(type(x))
<class 'str'>

输出：

import requests
from bs4 import BeautifulSoup

url = "https://www.merriam-webster.com/browse/thesaurus/a"
html_source = requests.get(url).text
soup = BeautifulSoup(html_source, "html.parser")

entry_view = soup.find_all('div', {'class': 'entries'})

entries = []
for elem in entry_view:
    for e in elem.find_all('a'):
        entries.append(e.text)

#show only 5 elements and whole list length
print(entries[:5])
print(entries[-5:])
print(len(entries))

['A1', 'aback', 'abaft', 'abandon', 'abandoned']
['absorbing', 'absorption', 'abstainer', 'abstain from', 'abstemious']
100

print(type(list))
<class 'bs4.element.Tag'>

print(type(m))
<class 'bs4.element.NavigableString'>

print(type(x))
<class 'str'>

在您的代码中：

import requests
from bs4 import BeautifulSoup

url = "https://www.merriam-webster.com/browse/thesaurus/a"
html_source = requests.get(url).text
soup = BeautifulSoup(html_source, "html.parser")

entry_view = soup.find_all('div', {'class': 'entries'})

entries = []
for elem in entry_view:
    for e in elem.find_all('a'):
        entries.append(e.text)

#show only 5 elements and whole list length
print(entries[:5])
print(entries[-5:])
print(len(entries))

['A1', 'aback', 'abaft', 'abandon', 'abandoned']
['absorbing', 'absorption', 'abstainer', 'abstain from', 'abstemious']
100

print(type(list))
<class 'bs4.element.Tag'>

print(type(m))
<class 'bs4.element.NavigableString'>

print(type(x))
<class 'str'>

打印（类型（列表））
打印（类型（m））
打印（类型（x））

如您所见，变量

已经是一个字符串，因此使用

p.s.：你不应该使用像

list

这样的变量名，它是一个保留关键字。

你可以通过使用下面的代码实现你想要的

代码：

import requests
from bs4 import BeautifulSoup

url = "https://www.merriam-webster.com/browse/thesaurus/a"
html_source = requests.get(url).text
soup = BeautifulSoup(html_source, "html.parser")

entry_view = soup.find_all('div', {'class': 'entries'})

entries = []
for elem in entry_view:
    for e in elem.find_all('a'):
        entries.append(e.text)

#show only 5 elements and whole list length
print(entries[:5])
print(entries[-5:])
print(len(entries))

['A1', 'aback', 'abaft', 'abandon', 'abandoned']
['absorbing', 'absorption', 'abstainer', 'abstain from', 'abstemious']
100

print(type(list))
<class 'bs4.element.Tag'>

print(type(m))
<class 'bs4.element.NavigableString'>

print(type(x))
<class 'str'>

输出：

import requests
from bs4 import BeautifulSoup

url = "https://www.merriam-webster.com/browse/thesaurus/a"
html_source = requests.get(url).text
soup = BeautifulSoup(html_source, "html.parser")

entry_view = soup.find_all('div', {'class': 'entries'})

entries = []
for elem in entry_view:
    for e in elem.find_all('a'):
        entries.append(e.text)

#show only 5 elements and whole list length
print(entries[:5])
print(entries[-5:])
print(len(entries))

['A1', 'aback', 'abaft', 'abandon', 'abandoned']
['absorbing', 'absorption', 'abstainer', 'abstain from', 'abstemious']
100

print(type(list))
<class 'bs4.element.Tag'>

print(type(m))
<class 'bs4.element.NavigableString'>

print(type(x))
<class 'str'>

在您的代码中：

import requests
from bs4 import BeautifulSoup

url = "https://www.merriam-webster.com/browse/thesaurus/a"
html_source = requests.get(url).text
soup = BeautifulSoup(html_source, "html.parser")

entry_view = soup.find_all('div', {'class': 'entries'})

entries = []
for elem in entry_view:
    for e in elem.find_all('a'):
        entries.append(e.text)

#show only 5 elements and whole list length
print(entries[:5])
print(entries[-5:])
print(len(entries))

['A1', 'aback', 'abaft', 'abandon', 'abandoned']
['absorbing', 'absorption', 'abstainer', 'abstain from', 'abstemious']
100

print(type(list))
<class 'bs4.element.Tag'>

print(type(m))
<class 'bs4.element.NavigableString'>

print(type(x))
<class 'str'>

打印（类型（列表））
打印（类型（m））
打印（类型（x））

如您所见，变量

已经是一个字符串，因此使用

p.s.：你不应该使用像

list

这样的变量名，这是一个保留的关键字。

您是否尝试过编写不带的

title=x

字符串是的，但它给出了完整的输出，但我想要的是弃权。您是否尝试编写不带的

title=x

字符串是的，但它给出了完整的输出，但我想要的是弃权，但它应该给我们一个链接而不是字符串在这种情况下，您应该编辑您的问题以正确识别您的请求：您希望从较大的字符串中提取一个单词。因此，我认为正则表达式是我能做的最好的工具，但我很好奇为什么当我尝试这个没有循环的方法时，它给出了正确的结果，但它只给出了第一个列表

字符串是一个不应该使用的旧模块，因为str
要好得多。解释为什么一个不同的问题是另一个问题，但它应该给我们一个链接，而不是字符串在这种情况下，你应该编辑你的问题，以正确识别你的请求：你想从一个更大的字符串中提取一个单词。因此，我认为正则表达式是我能做的最好的工具，但我很好奇为什么当我尝试这个没有循环的方法时，它给出了正确的结果，但它只给出了第一个列表字符串是一个不应该使用的旧模块，因为str
要好得多。解释为什么是一个完全不同的问题