Python 使用feedparser分别识别itunes:关键字和itunes:类别?
我用它来解析rss提要,比如,但无法明确识别Python 使用feedparser分别识别itunes:关键字和itunes:类别?,python,rss,feedparser,Python,Rss,Feedparser,我用它来解析rss提要,比如,但无法明确识别itunes:category值 查看列表,似乎itunes:keywords和itunes:category值都被放入feed['tags']字典中 从类别的测试中: <!-- Description: iTunes channel category Expect: not bozo and feed['tags'][0]['term'] == 'Technology' --> <rss xmlns:itunes="htt
itunes:category
值
查看列表,似乎itunes:keywords
和itunes:category
值都被放入feed['tags']
字典中
从类别的测试中
:
<!--
Description: iTunes channel category
Expect: not bozo and feed['tags'][0]['term'] == 'Technology'
-->
<rss xmlns:itunes="http://www.itunes.com/DTDs/Podcast-1.0.dtd">
<channel>
<itunes:category text="Technology"></itunes:category>
</channel>
</rss>
对于上面的示例提要,条目为:
<itunes:keywords>Hurley, Liss, feelings</itunes:keywords>
是否有任何方法可以唯一地识别来自itunes:category标签的值?我找不到一种方法来使用just,所以我也利用了: 实现特定的
itunes:x
属性
在feedparser中作为itunes:category
提供category
确实被重命名为标签,并填充到术语itunes:feedparser中的关键字
scheme
作为筛选器
import feedparser
feedp = feedparser.parse(url)
#get all the keywords both item and channel
keywords = [k["term"] for k in feedp["feed"]["tags"]]
# get the keywords from all the items
keyword = [t["term"] for t in feedp["feed"]["tags"] if t["scheme"] == 'http://www.itunes.com/']
这可能会删除其他标签(如果可用),但如果itunes:关键字和标签共存,则它们是重复的
可作为itunes:duration
itunes\u duration
beautifulSoup4
再次解析
<itunes:category text="Society & Culture"/>
<itunes:category text="Technology"/>
[{'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'Hurley'},
{'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'Liss'},
{'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'feelings'},
{'label': None,'scheme': 'http://www.itunes.com/','term': 'Society & Culture'},
{'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'Technology'}]
import bs4
soup = bs4.BeautifulSoup(raw_data, "lxml")
def is_itunes_category(tag):
return tag.name == 'itunes:category'
categories = [tag.attrs['text'] for tag in soup.find_all(is_itunes_category)]
import feedparser
feedp = feedparser.parse(url)
category = feedp.feed.category
import feedparser
feedp = feedparser.parse(url)
#get all the keywords both item and channel
keywords = [k["term"] for k in feedp["feed"]["tags"]]
# get the keywords from all the items
keyword = [t["term"] for t in feedp["feed"]["tags"] if t["scheme"] == 'http://www.itunes.com/']
import feedparser
feedp = feedparser.parse(url)
duration = feedp["itunes_duration"]
>>>import feedparser
>>>feedp = feedparser.parse(url)
>>>categories = feedp.feed.categories
>>>print(categories)
>>>[(u'Syndic8', u'1024'),
(u'dmoz', 'Top/Society/People/Personal_Homepages/P/')]