Python 为什么我没有得到输出，也没有抓取网页的错误？_Python_Web Scraping_Beautifulsoup_Python Requests_Google Colaboratory

Python 为什么我没有得到输出，也没有抓取网页的错误？

python web-scraping google-colaboratory

Python 为什么我没有得到输出，也没有抓取网页的错误？,python,web-scraping,beautifulsoup,python-requests,google-colaboratory,Python,Web Scraping,Beautifulsoup,Python Requests,Google Colaboratory,我正在谷歌colab上用beautifulsoup和请求做一个网页抓取作业。在这里，我只是略过谷歌新闻的标题。代码如下： import requests from bs4 import BeautifulSoup def beautiful_soup(url): '''DEFINING THE FUNCTION HERE THAT SENDS A REQUEST AND PRETTIFIES THE TEXT INTO SOMETHING THAT IS EASY TO READ''' r

我正在谷歌colab上用beautifulsoup和请求做一个网页抓取作业。在这里，我只是略过谷歌新闻的标题。代码如下：

import requests
from bs4 import BeautifulSoup

def beautiful_soup(url):
'''DEFINING THE FUNCTION HERE THAT SENDS A REQUEST AND PRETTIFIES THE TEXT 
INTO SOMETHING THAT IS EASY TO READ'''

request = requests.get(url)
soup = BeautifulSoup(request.text, "lxml")
print(soup.prettify())

beautiful_soup('https://news.google.com/?hl=en-IN&gl=IN&ceid=IN:en')

for headlines in soup.find_all('a', {'class': 'VDXfz'}):
   print(headlines.text)

问题是，当我运行单元格时，它既不显示输出（标题列表），也不显示错误。请帮忙，它困扰了我两天。

您可能需要显示下一个

span

元素的文本。这可以通过以下方式实现：

import requests
from bs4 import BeautifulSoup

def beautiful_soup(url):
    '''DEFINING THE FUNCTION HERE THAT SENDS A REQUEST AND PRETTIFIES THE TEXT 
       INTO SOMETHING THAT IS EASY TO READ'''

    request = requests.get(url)
    soup = BeautifulSoup(request.text, "lxml")
    #print(soup.prettify())
    return soup

soup = beautiful_soup('https://news.google.com/?hl=en-IN&gl=IN&ceid=IN:en')

for headlines in soup.find_all('a', {'class': 'VDXfz'}):
    print(headlines.find_next('span').text)

这将为您提供如下输出：

“我收回我的评论，”拉姆·马达夫在奥马尔·阿卜杜拉敢于证明巴基斯坦指控后说拉姆·马德哈夫在奥马尔·阿卜杜拉·达雷之后“从巴基斯坦发出指令”时后退奥马尔·阿卜杜拉：全国会议支持PDP将J&K从不确定性中解救出来在拉姆·马德哈夫的巴布指示下，奥马尔·阿卜杜拉尖刻的回答公开报道J-K政府机构中的马匹交易：奥马尔·阿卜杜拉致Guv

您可以使用以下方法将标题写入CSV格式的文件：

import requests
from bs4 import BeautifulSoup
import csv

def beautiful_soup(url):
    '''DEFINING THE FUNCTION HERE THAT SENDS A REQUEST AND PRETTIFIES THE TEXT 
       INTO SOMETHING THAT IS EASY TO READ'''

    request = requests.get(url)
    soup = BeautifulSoup(request.text, "lxml")
    return soup

soup = beautiful_soup('https://news.google.com/?hl=en-IN&gl=IN&ceid=IN:en')

with open('output.csv', 'w', newline='', encoding='utf-8') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['Headline'])

    for headlines in soup.find_all('a', {'class': 'VDXfz'}):
        headline = headlines.find_next('span').text
        print(headline)
        csv_output.writerow([headline])

目前，这只生成一个名为

Headline

的列。执行以下脚本，您应该会得到所需的结果。如果使用选择器，脚本将更干净

但是，使用

.find_all（）

：

要使用

.select（）

执行相同操作，请在脚本中显示此更改：

headlines = [item.text for item in soup.select("h3 > a > span")]
return headlines

如何将此列表转换为csv？您有哪些列？目前这只是一列。这是在我的本地PC上测试的，所以它保存在当前文件夹中。我不能说谷歌colab会把它保存在哪里。我想你需要看看

文件。download（）

我写了文件。download（'output.csv'）它下载了输出csv的次数是标题的次数，并且它没有任何数据。0kbI的每个Excel文件都在我的本地PC上进行了测试，得到了输出！

headlines = [item.text for item in soup.select("h3 > a > span")]
return headlines