Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/296.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 为什么feedparser返回的结果比实际结果少?_Python_Django_Web Scraping_Rss_Feedparser - Fatal编程技术网

Python 为什么feedparser返回的结果比实际结果少?

Python 为什么feedparser返回的结果比实际结果少?,python,django,web-scraping,rss,feedparser,Python,Django,Web Scraping,Rss,Feedparser,我正在使用feed解析器和beautifulsoup。youtube嵌入代码没有明确的密钥。相反,它通过如下“content”键位于html内部 'content': [{'value': '<h4><strong>Video:</strong> cangel Ft De La – ose (Official Video)<span id="more-125869"></span></h4>\n<p>&

我正在使用feed解析器和beautifulsoup。youtube嵌入代码没有明确的密钥。相反,它通过如下“content”键位于html内部

'content': [{'value': '<h4><strong>Video:</strong> cangel Ft De La  – ose (Official Video)<span id="more-125869"></span></h4>\n<p>&nbsp;</p>\n<div class="lyte-wrapper"></div>\n<p><span id="more-2331"></span><iframe src="https://www.youtube.com/embed/HFiDh_TcvNE" width="560" height="315" frameborder="0" allowfullscreen="allowfullscreen"></iframe></p>',}],
def pan_task():
    url = 'http://example.net/feed/'
    name = 'elrealsonidodelakalle'
    live_leaks = [i for i in feedparser.parse(url).entries][:3]
    the_count = len(live_leaks)
    ky = feedparser.parse(url).keys()
    oky = [i.keys() for i in feedparser.parse(url).entries][1] # shows what I can pull

    def embed_image(html_doc):
        soup = BeautifulSoup(html_doc, "html5lib")
        embed = soup.iframe.get('src')
        remove = 'https://www.youtube.com/embed/'
        remaining_pic_code = embed.replace(remove, '')
        the_img = 'http://i1.ytimg.com/vi/' + remaining_pic_code + '/hqdefault.jpg'
        results = {'src': the_img, 'embed': embed}
        return results

    results = [{
                'name': name,
                'text': i.title,
                'url': i.id,
                'comments': i.title,
                'src': embed_image(i.content[0]['value'])['src'],
                'embed': embed_image(i.content[0]['value'])['embed'],
                'author': None,
                'video': True,
                'status': 'published'
               } for i in live_leaks]

    for entry in results:
        post = Post()  #
        post.title = entry['text'] #
        title = post.title  #
        if not Post.objects.filter(title=title):
            post.title = entry['text']
            post.name = entry['name']
            post.url = entry['url']
            post.body = entry['comments']
            post.image_url = entry['src']
            post.video_path = entry['embed']
            post.author = entry['author']
            post.video = entry['video']
            post.status = entry['status']
            post.save()
            post.tags.add("video")

    return print(results)
但只有当我这么做的时候它才会起作用

live_leaks = [i for i in feedparser.parse(url).entries][:3]
如果我删除三个“[:3]”,我会得到这个错误

Task blog.tasks.pan_task[79707dd3-70ae-40e9-a97a-e5c36dee4004] raised unexpected: AttributeError("'NoneType' object has no attribute 'get'",)
Traceback (most recent call last):
  File "/Users/ray/Desktop/myheroku/practice/lib/python3.5/site-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/Users/ray/Desktop/myheroku/practice/lib/python3.5/site-packages/celery/app/trace.py", line 438, in __protected_call__
    return self.run(*args, **kwargs)
  File "/Users/ray/Desktop/myheroku/practice/src/blog/tasks.py", line 126, in pan_task
    } for i in live_leaks]
  File "/Users/ray/Desktop/myheroku/practice/src/blog/tasks.py", line 126, in <listcomp>
    } for i in live_leaks]
  File "/Users/ray/Desktop/myheroku/practice/src/blog/tasks.py", line 109, in embed_image
    embed = soup.iframe.get('src')
AttributeError: 'NoneType' object has no attribute 'get'
[2016-10-15 17:24:43,560: ERROR/MainProcess] Task blog.tasks.pan_task[339ccc72-c87a-4323-948b-4db7afb4f619] raised unexpected: AttributeError("'NoneType' object has no attribute 'get'",)
Traceback (most recent call last):
  File "/Users/ray/Desktop/myheroku/practice/lib/python3.5/site-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/Users/ray/Desktop/myheroku/practice/lib/python3.5/site-packages/celery/app/trace.py", line 438, in __protected_call__
    return self.run(*args, **kwargs)
  File "/Users/ray/Desktop/myheroku/practice/src/blog/tasks.py", line 126, in pan_task
    } for i in live_leaks]
  File "/Users/ray/Desktop/myheroku/practice/src/blog/tasks.py", line 126, in <listcomp>
    } for i in live_leaks]
  File "/Users/ray/Desktop/myheroku/practice/src/blog/tasks.py", line 109, in embed_image
    embed = soup.iframe.get('src')
AttributeError: 'NoneType' object has no attribute 'get'. If I go to the feed page I can count eight items. any help with this would be great. I am new to programming all code is my own so if it seems sloppy or unprofessional, that's why
Task blog.tasks.pan_任务[79707dd3-70ae-40e9-a97a-e5c36dee4004]引发意外:AttributeError(“'NoneType'对象没有属性'get'”)
回溯(最近一次呼叫最后一次):
文件“/Users/ray/Desktop/myheroku/practice/lib/python3.5/site packages/芹菜/app/trace.py”,第240行,在trace_任务中
R=retval=fun(*args,**kwargs)
文件“/Users/ray/Desktop/myheroku/practice/lib/python3.5/site packages/芹菜/app/trace.py”,第438行,在受保护的调用中__
返回self.run(*args,**kwargs)
文件“/Users/ray/Desktop/myheroku/practice/src/blog/tasks.py”,第126行,在pan_任务中
}因为我在现场
文件“/Users/ray/Desktop/myheroku/practice/src/blog/tasks.py”,第126行,在
}因为我在现场
文件“/Users/ray/Desktop/myheroku/practice/src/blog/tasks.py”,第109行,嵌入图片
embed=soup.iframe.get('src')
AttributeError:“非类型”对象没有属性“get”
[2016-10-15 17:24:43560:ERROR/MainProcess]任务blog.tasks.pan_任务[339ccc72-c87a-4323-948b-4db7afb4f619]引发意外:AttributeError(“非类型”对象没有属性“get”)
回溯(最近一次呼叫最后一次):
文件“/Users/ray/Desktop/myheroku/practice/lib/python3.5/site packages/芹菜/app/trace.py”,第240行,在trace_任务中
R=retval=fun(*args,**kwargs)
文件“/Users/ray/Desktop/myheroku/practice/lib/python3.5/site packages/芹菜/app/trace.py”,第438行,在受保护的调用中__
返回self.run(*args,**kwargs)
文件“/Users/ray/Desktop/myheroku/practice/src/blog/tasks.py”,第126行,在pan_任务中
}因为我在现场
文件“/Users/ray/Desktop/myheroku/practice/src/blog/tasks.py”,第126行,在
}因为我在现场
文件“/Users/ray/Desktop/myheroku/practice/src/blog/tasks.py”,第109行,嵌入图片
embed=soup.iframe.get('src')
AttributeError:“非类型”对象没有属性“get”。如果我转到提要页面,我可以数到八项。这方面的任何帮助都会很好。我刚开始编程,所有的代码都是我自己的,所以如果它看起来很草率或不专业,这就是为什么