Python BeautifulSoup:在href和类之间提取?
我想存储以下文本块中的日期:Python BeautifulSoup:在href和类之间提取?,python,python-2.7,web-scraping,beautifulsoup,Python,Python 2.7,Web Scraping,Beautifulsoup,我想存储以下文本块中的日期: newsoup = '''<html><body><a href="/president/washington/speeches/speech-3460">Proclamation of Pardons in Western Pennsylvania (July 10, 1795)</a>, <a class="transcript" href="/president/washington/speeches/s
newsoup = '''<html><body><a href="/president/washington/speeches/speech-3460">Proclamation
of Pardons in Western Pennsylvania (July 10, 1795)</a>, <a class="transcript" href="/president/washington/speeches/speech-3460">Transcript</a>,
<a href="/president/washington/speeches/speech-3939">Seventh Annual Message to Congress (December 8, 1795)</a></body></html>'''
我应该注意到,
newsoup
已经是一个soup对象。假设newsoup是一个soup对象,我认为这样应该可以:
(如果不是,则可以运行newsoup=BeautifulSoup(newsoup)
)
这将对您有用:
a = newsoup.findAll('a')[0].contents[0]
其中,newsoup
是一个BeautifulSoup对象
否则首先要做:
newsoup = BeautifulSoup(newsoup)
您可以将其放入一个循环中:
a = soup.findAll('a')
for x in a:
print x.contents[0]
嗯,我找到的文档称为get_text()方法,但我得出了相同的结论。“href”关键字参数也是不需要的,因为它应该按特定的href筛选链接(甚至没有提供)
newsoup = BeautifulSoup(newsoup)
a = soup.findAll('a')
for x in a:
print x.contents[0]