Python 使用urllib在reddit中读取信息_Python_Urllib

Python 使用urllib在reddit中读取信息

python

Python 使用urllib在reddit中读取信息,python,urllib,Python,Urllib,我得到了以下代码： import urllib import re def worldnews(): count = 0 html = urllib.urlopen("https://www.reddit.com/r/worldnews/").readlines() lines = html for line in lines: if "Paris" or "Putin" in line: count = count +

我得到了以下代码：

import urllib
import re

def worldnews():
    count = 0
    html = urllib.urlopen("https://www.reddit.com/r/worldnews/").readlines()

    lines = html
    for line in lines:
        if "Paris" or "Putin" in line:
            count = count + 1
            print line       

    print "Totaal gevonden: ", count
    print "----------------------"

worldnews()

我怎样才能在那个页面上找到所有的reddit帖子，标题中有Paris或Puttin。有没有办法把这篇文章的标题打印到控制台上？当我现在运行它时，我得到了很多html代码。

在Python中使用html的最佳方法是。因此，您需要下载该文件并查看文档，以了解如何准确地执行您的要求。然而，我让你有了一个开始：

import urllib
from bs4 import BeautifulSoup

def worldnews():
    count = 0
    html = urllib.urlopen("https://www.reddit.com/r/worldnews/")
    soup = BeautifulSoup(html,"lxml")
    titles = soup.find_all('p',{'class':'title'})
    for i in titles:
        print(i.text)

worldnews()

运行此命令时，它会给出如下输出：

Paris attacks ringleader dead - French officials (bbc.com)
Company which raised price of AIDS drug by 5500% reports $14m quarterly losses. (pinknews.co.uk)
Syria/IraqSyrian man kills judge at ISIS Sharia Court for beheading his brother (en.abna24.com)
Putin Puts $50 Million Bounty on Heads of Metrojet Bombers (fortune.com)

依此类推页面上的所有标题。从这里，您应该能够轻松地了解如何编写其余的代码。：-）

请注意，如果第行中的“Paris”或“Putin”，则第行

将始终返回True，这就是为什么会得到大量HTML代码的原因。如上所述，请使用BeautifulSoup或其他HTML解析库，非常感谢！这将有助于解决meNo问题。如果你需要帮助，请告诉我！在我从您的脚本返回的结果中搜索的最佳方式是什么？因为当我搜索标题时，我什么也找不到。我想我必须搜索“href”，但如何搜索：（对不起，我是python新手）您可以尝试将它们全部添加到列表中，然后在列表中搜索“Paris”或“Putin”