Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/288.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/spring-boot/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用feedparser检测重复标题_Python_Feedparser - Fatal编程技术网

Python 使用feedparser检测重复标题

Python 使用feedparser检测重复标题,python,feedparser,Python,Feedparser,我使用相同的rss源进行测试。每次添加新标题时,我都会检查标题是否已经出现在标题中。但是,它似乎没有检测到重复的标题。未打印找到的重复标题。我做错了什么 试试这个 def parseRSS(rss_url): parsed_feed = feedparser.parse(rss_url) return parsed_feed def getHeadlines(rss_url,key): headlines = [] feed = parseRSS(rss_url

我使用相同的rss源进行测试。每次添加新标题时,我都会检查标题是否已经出现在标题中。但是,它似乎没有检测到重复的标题。未打印找到的重复标题。我做错了什么

试试这个

def parseRSS(rss_url):
    parsed_feed = feedparser.parse(rss_url)
    return parsed_feed

def getHeadlines(rss_url,key):
    headlines = []
    feed = parseRSS(rss_url)
    for newsitem in feed['items']:
        if newsitem['title'] not in headlines:
            headlines.append([newsitem,key])
        else:
            print("-----------------------Duplicate title found----------------------")
    return headlines

def get_rss():
    allheadlines = []
    newsurls = {
    ('key1','source1'): 'https://news.google.com/news/rss/?hl=en&ned=us&gl=US',
    ('key2','source2'): 'https://news.google.com/news/rss/?hl=en&ned=us&gl=US',
    }
    for key,url in newsurls.items():
        allheadlines.extend(getHeadlines(url,key))

    return allheadlines

allheadlines = get_rss()

for hl in allheadlines:
    source = hl[1][0]
    key = hl[1][1]
    title = hl[0]['title']
    link = hl[0]['link']

您可以先尝试打印
feed
,一开始可能没有任何重复的标题。在本例中,我使用相同的rss提要两次,以确保有重复的标题。好的,但是您永远不会将
allheadlines
传入,因此
headlines=[]
每次运行循环时,都会在
getHeadlines
函数中传递
allheadlines
,每次创建
null列表时,都会使比较变得毫无意义,如果它不起作用,那么come backStill似乎不会检测到重复的标题,它改变了我原来的代码的工作方式。我需要将所有内容扩展到所有标题,而不仅仅是标题。我需要访问标题、链接、已解析的更新内容、密钥和源代码。在这个新函数中,所有标题仅附加标题。
def parseRSS(rss_url):
    parsed_feed = feedparser.parse(rss_url)
    return parsed_feed

def getHeadlines(rss_url,key,allheadlines,allitems):
    feed = parseRSS(rss_url)
    for newsitem in feed['items']:
        if newsitem['title'] not in allheadlines:
            allheadlines.append(newsitem['title'])
            allitems.append([newsitem,key])
        else:
            print("-----------------------Duplicate title found----------------------")
    return allheadlines,allitems

def get_rss():
    allheadlines = []
    allitems = []
    newsurls = {
    ('key1','source1'): 'https://news.google.com/news/rss/?hl=en&ned=us&gl=US',
    ('key2','source2'): 'https://news.google.com/news/rss/?hl=en&ned=us&gl=US',
    }
    for key,url in newsurls.items():
        allheadlines,allitems=(getHeadlines(url,key,allheadlines,allitems))

    return allitems

allheadlines = get_rss()