Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/302.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python语法分析器和多线程_Python_Multithreading_Feedparser - Fatal编程技术网

Python语法分析器和多线程

Python语法分析器和多线程,python,multithreading,feedparser,Python,Multithreading,Feedparser,我有一个RSS/ATOM提要URL列表(将近500个)来解析和获取链接 我正在使用python feedparser库解析url。为了并行解析URL列表,我考虑在python中使用线程库 我的代码看起来像这样 import threading import feedparser class PullFeeds: def _init__(self): self.data = open('urls.txt', 'r') def pullfeed(self):

我有一个RSS/ATOM提要URL列表(将近500个)来解析和获取链接

我正在使用python feedparser库解析url。为了并行解析URL列表,我考虑在python中使用线程库

我的代码看起来像这样

import threading
import feedparser

class PullFeeds:
    def _init__(self):
        self.data = open('urls.txt', 'r')

    def pullfeed(self):
        threads = []
        for url in self.data:
             t = RssParser(url)
             threads.append(t)
        for thread in threads:
             thread.start()
        for thread in threads:
             thread.join()

class RssParser(threading.Thread):
     def __init__(self, url):
         threading.Thread.__init__(self)
         self.url = url

     def run(self):
         print "Starting: ", self.name
         rss_data = feedparser.parse(self.url)
         for entry in rss_data.get('entries'):
             print entry.get('link')
         print "Exiting: ", self.name


pf = PullFeeds()
pf.pullfeed()
问题是,当我运行这个脚本时,Feedparser返回了一个空列表。但是如果没有线程,feedparser会打印从提供的URL解析的链接列表


如何解决此问题?

要查看问题是否与多线程有关,您可以尝试使用多个进程:

#!/usr/bin/env python
####from multiprocessing.dummy import Pool # use threads
from multiprocessing import Pool # use processes
from multiprocessing import freeze_support
import feedparser

def fetch_rss(url):
    try:
        data = feedparser.parse(url)
    except Exception as e:
        return url, None, str(e)
    else:
        e = data.get('bozo_exception')
        return url, data['entries'], str(e) if e else None

if __name__=="__main__":
    freeze_support()
    with open('urls.txt') as file:
        urls = (line.strip() for line in file if line.strip())
        pool = Pool(20) # no more than 20 concurrent downloads
        for url, items, error in pool.imap_unordered(fetch_rss, urls):
            if error is None:
                print(url, len(items))
            else:
                print(url, error)

问题出在流浪汉身上。我在我的一台流浪机器里运行脚本。同样的脚本在“流浪者”框中运行良好


这需要报告。我还不确定该向何处报告此错误,无论是Vagrant、Python线程还是Feedparser库的问题。

我会尽快与您联系。谢谢你的帮助。