Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/macos/9.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从段落中获取内容(全文)_Python_Beautifulsoup - Fatal编程技术网

Python 从段落中获取内容(全文)

Python 从段落中获取内容(全文),python,beautifulsoup,Python,Beautifulsoup,我想从新闻网页中提取段落的内容(全文),我有一组url,它应该只从中提取段落的内容。当我使用下面的代码时,它会为我提供整个html页面。 这是我的密码 import urllib2 import urllib from cookielib import CookieJar from bs4 import BeautifulSoup cj = CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) p =

我想从新闻网页中提取段落的内容(全文),我有一组url,它应该只从中提取段落的内容。当我使用下面的代码时,它会为我提供整个html页面。
这是我的密码

import urllib2
import urllib
from cookielib import CookieJar
from bs4 import BeautifulSoup
cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
p = opener.open("http://www.nytimes.com/2014/09/09/world/europe/turkey-is-courted-by-us-to-help-         fight-isis.html?module=Search&mabReward=relbias%3Aw%2C%7B%222%22%3A%22RI%3A18%22%7D&_r=0")
print p.read()
soup = BeautifulSoup(p)
content = soup.find('p', attrs= {'class' : 'story-body-text story-content'})
print content

这是因为您有打印整个HTML页面的
print p.read()

要获取文章文本,请按
id
查找,然后按文章内的所有段落查找

示例使用:

印刷品:

ANKARA, Turkey —  The Obama administration on Monday began the work of trying to determine
...
仅供参考,
article#story p.story-content
将匹配所有
p
标记,这些标记在
article
中包含
story-content
类,并且带有
story
id.

p.text.encode('utf-8'),如果在IDE中打印不好的话。
ANKARA, Turkey —  The Obama administration on Monday began the work of trying to determine
...