Python 使用BeautifulSoup抓取Rss提要_Python_Web Scraping_Beautifulsoup

Python 使用BeautifulSoup抓取Rss提要

python web-scraping

Python 使用BeautifulSoup抓取Rss提要,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我的剧本有问题。我可以得到标题和链接，但我似乎无法打开文章和刮文章。有人能帮忙吗 from urllib import urlopen from BeautifulSoup import BeautifulSoup import re source = urlopen('http://www.marketingmag.com.au/feed/').read() title = re.compile('<title>(.*)</title>') link = re.c

我的剧本有问题。我可以得到标题和链接，但我似乎无法打开文章和刮文章。有人能帮忙吗

from urllib import urlopen
from BeautifulSoup import BeautifulSoup
import re

source  = urlopen('http://www.marketingmag.com.au/feed/').read()

title = re.compile('<title>(.*)</title>')
link = re.compile('<a href="(.*)">')

find_title = re.findall(title, source)
find_link = re.findall(link, source)



literate = []
literate[:] = range(1, 10)

for i in literate:
    print find_title[i]
    print find_link[i]

articlePage = urlopen(find_link[i]).read()

divBegin = articlePage.find('<div class="entry-content">')

article = articlePage[divBegin:(divBegin+1000)]

soup = BeautifulSoup(article)

paragList = soup.findAll('p')

for i in paragList:
        print i
        print ("\n")

从urllib导入urlopen
从BeautifulSoup导入BeautifulSoup
进口稀土
source=urlopen（'http://www.marketingmag.com.au/feed/）。读（）
title=re.compile（“（.*”）
link=re.compile（“”）
find_title=re.findall（标题，来源）
find_link=re.findall（链接，源）
识字率=[]
识字[：]=范围（1,10）
对于有文化的人：
打印查找标题[i]
打印查找链接[i]
articlePage=urlopen（查找链接[i]）。read（）
divBegin=articlePage.find（“”）
article=articlePage[divBegin:[divBegin+1000]
汤=美汤（文章）
paragList=soup.findAll（'p'）
对于清单中的i：
打印i
打印（“\n”）

不要使用正则表达式解析HTML。只需使用Beautiful Soup，它的功能是获取链接，然后您可以使用urllib2.urlopen打开url，然后阅读内容。

您的代码强烈提醒我：

为什么实际使用BeautifulSoup进行XML解析？它是为HTML站点和python本身构建的，具有非常好的XML解析器。示例：

尝试在每行代码前面放置4个空格，或者选择所有代码，然后单击“代码示例”按钮（带有类似{}的大括号的按钮），以使代码更具可读性。另外，如果您能给我们展示一些电流输出的样品线，最好是理想的输出：）