Python 为什么代码不能刮取HTML类中的任何内容？_Python_Html_Web Scraping_Beautifulsoup_Scrapy

Python 为什么代码不能刮取HTML类中的任何内容？

python html web-scraping scrapy

Python 为什么代码不能刮取HTML类中的任何内容？,python,html,web-scraping,beautifulsoup,scrapy,Python,Html,Web Scraping,Beautifulsoup,Scrapy,我可以在inspect中看到内容位于类文章包装中，如屏幕截图中突出显示的：但当我试图在其中刮取文本内容时，我什么也得不到：为什么呢？我是否错误地指定了类？如果是，我需要指定哪个类？知道应该指定哪个类（或标记、或div等）的最简单方法是什么代码如下： import requests links = open("article links.txt", "r") for a in links: page = requests.get(a) soup = Beautiful

我可以在

inspect

中看到内容位于类

文章包装中，如屏幕截图中突出显示的：

但当我试图在其中刮取文本内容时，我什么也得不到：

为什么呢？我是否错误地指定了类？如果是，我需要指定哪个类？知道应该指定哪个类（或标记、或div等）的最简单方法是什么
代码如下：
import requests

links = open("article links.txt", "r")

for a in links:
    page = requests.get(a)
    soup = BeautifulSoup(page.text, 'lxml')

    html = soup.find(class_="article-wrap")

    print(html)

使用BeautifulSoup的.select（）
方法要容易得多，类似于css选择器，如下所示：
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36'}

links = open("article links.txt", "r")

for a in links:
    r = requests.get(a, headers=headers, timeout=10)

    soup = BeautifulSoup(r.text, 'html.parser')

    results = soup.select(".article-wrap"):

    print(results)

我在电脑上做了测试。效果很好。
事实上，结果是两件事：
首先，在article links.txt
中，每个链接都在一个新行中（即，以\n
结尾）。因此，我必须在page=requests.get（a）
之前执行a=a.rstrip（）。如果没有条纹，如果我打印出汤，它看起来是这样的：
>>> page = requests.get('http://www3.asiainsurancereview.com//Mock-News-Article/id/42945/Type/eDaily/New-Zealand-Govt-starts-public-consultation-phase-of-review-of-insurance-law\n')
>>> page
<Response [400]>
>>> page.text
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd">
<HTML><HEAD><TITLE>Bad Request</TITLE>
<META HTTP-EQUIV="Content-Type" Content="text/html; charset=us-ascii"></HEAD>
<BODY><h2>Bad Request - Invalid URL</h2>
<hr><p>HTTP Error 400. The request URL is invalid.</p>
</BODY></HTML>

现在它工作正常。
我测试了你的代码，它工作正常。您是否可以打印出循环中的链接，以实际检查链接是否有效，并指向带有实际文章包装的页面div？我刚才使用了一个链接到一个文章。你能给出一个URL吗？例如，@ KristaDA63在你给我的这个例子是：“MAG文章包装”@ KistaDaAd63Super，考虑写自己的答案并接受它：）@ KistaDaa67，很高兴听到它为你工作：
html = soup.find(class_='article-wrap')
if html==None:
    html = soup.find(class_='mag-article-wrap')

text = get_text(html.text)