Python BeautifulSoup:在href和类之间提取？_Python_Python 2.7_Web Scraping_Beautifulsoup

Python BeautifulSoup:在href和类之间提取？

python python-2.7 web-scraping

Python BeautifulSoup:在href和类之间提取？,python,python-2.7,web-scraping,beautifulsoup,Python,Python 2.7,Web Scraping,Beautifulsoup,我想存储以下文本块中的日期： newsoup = '''<html><body><a href="/president/washington/speeches/speech-3460">Proclamation of Pardons in Western Pennsylvania (July 10, 1795)</a>, <a class="transcript" href="/president/washington/speeches/s

我想存储以下文本块中的日期：

newsoup = '''<html><body><a href="/president/washington/speeches/speech-3460">Proclamation 
of Pardons in Western Pennsylvania (July 10, 1795)</a>, <a class="transcript" href="/president/washington/speeches/speech-3460">Transcript</a>, 
<a href="/president/washington/speeches/speech-3939">Seventh Annual Message to Congress (December 8, 1795)</a></body></html>'''

我应该注意到，

newsoup

已经是一个soup对象。

假设newsoup是一个soup对象，我认为这样应该可以：

（如果不是，则可以运行

newsoup=BeautifulSoup（newsoup）

）

这将对您有用：

a = newsoup.findAll('a')[0].contents[0]

其中，

newsoup

是一个BeautifulSoup对象

否则首先要做：

newsoup = BeautifulSoup(newsoup)

您可以将其放入一个循环中：

a = soup.findAll('a')
for x in a:
    print x.contents[0]

嗯，我找到的文档称为get_text（）方法，但我得出了相同的结论。“href”关键字参数也是不需要的，因为它应该按特定的href筛选链接（甚至没有提供）

newsoup = BeautifulSoup(newsoup)

a = soup.findAll('a')
for x in a:
    print x.contents[0]