Python 如何从带有或不带“URL”的URL链接收集文本数据;。html";在链接中?

Python 如何从带有或不带“URL”的URL链接收集文本数据;。html";在链接中?,python,html,url,beautifulsoup,Python,Html,Url,Beautifulsoup,我正在尝试从URL收集一些文本数据,如 我想从html中获取以下文本数据 1.1. Linear Models¶ The following are a set of methods intended for regression in which the target value is expected to be a linear combination of the features. In mathematical notation, if is the predicted

我正在尝试从URL收集一些文本数据,如

我想从html中获取以下文本数据

 1.1. Linear Models¶
 The following are a set of methods intended for regression in which the target value is 
 expected to be a linear combination of the features. In mathematical notation, if 
 is the predicted value.
我的代码:

import urllib
from bs4 import BeautifulSoup
link = "https://scikit-learn.org/stable/modules/linear_model.html"
f = urllib.request.urlopen(link)
html = f.read()
soup = BeautifulSoup(html)
print(soup.prettify()) 
如何导航到html的嵌入体以获取上述文本数据

此外,我需要做一些没有“.html”的链接类似的事情,我使用相同的代码,但没有从链接返回任何文本数据

当我用打印机打印出来时,我看不到任何文本数据

 print(soup.prettify())
返回状态为

  200
原因可能是什么


谢谢

在创建
美化组
对象时,必须指定要使用的解析器。除此之外,我还建议您使用
requests
而不是
urllib
,但这完全是您的愿望。以下是提取所需文本的方法:

div = soup.find('div', class_ = "section") #Finds the div with class section

print(div.h1.text) #Prints the text within the first h1 tag within the div

print(div.p.text) #Prints the text within the first p tag within the div
输出:

1.1. Linear Models¶
The following are a set of methods intended for regression in which
the target value is expected to be a linear combination of the features.
In mathematical notation, if \(\hat{y}\) is the predicted
value.
以下是完整的代码:

import urllib
from bs4 import BeautifulSoup
link = "https://scikit-learn.org/stable/modules/linear_model.html"
f = urllib.request.urlopen(link)
html = f.read()
soup = BeautifulSoup(html,'html5lib')

div = soup.find('div', class_ = "section")

print(div.h1.text)

print(div.p.text)

你能分享链接的URL吗?可能数据是通过JavaScript加载的,而beautifulsoup没有看到它