Python 关于web抓取-使用urllib（也可能是beautifulsoup）_Python_Python 2.7_Web Scraping

Python 关于web抓取-使用urllib（也可能是beautifulsoup）

python python-2.7 web-scraping

Python 关于web抓取-使用urllib（也可能是beautifulsoup）,python,python-2.7,web-scraping,Python,Python 2.7,Web Scraping,我正在从以下网站抓取信息：我要分析的标记有：开始-，结束- 我的代码： from urllib import urlopen from bs4 import BeautifulSoup import re html = urlopen('http://mansci.journal.informs.org/gca?gca=mansci%3B6%2F2%2F141&gca=mansci%3B6%2F2%2F149&gca=mansci%3B6%2F2%2F165&gca=

我正在从以下网站抓取信息：

我要分析的标记有：开始-

，结束-

我的代码：

from urllib import urlopen
from bs4 import BeautifulSoup
import re

html = urlopen('http://mansci.journal.informs.org/gca?gca=mansci%3B6%2F2%2F141&gca=mansci%3B6%2F2%2F149&gca=mansci%3B6%2F2%2F165&gca=mansci%3B6%2F2%2F172&gca=mansci%3B6%2F2%2F187&gca=mansci%3B6%2F2%2F191&gca=mansci%3B6%2F2%2F197&gca=mansci%3B6%2F2%2F205&gca=mansci%3B6%2F2%2F215&submit=Get+All+Checked+Abstracts').read()

a = re.compile('<p id="p-1">(.*)</p>')
b = re.findall(a,html)

从urllib导入urlopen
从bs4导入BeautifulSoup
进口稀土
html=urlopen（'http://mansci.journal.informs.org/gca?gca=mansci%3B6%2F2%2F141&gca=mansci%3B6%2F2%2F149&gca=mansci%3B6%2F2%2F165&gca=mansci%3B6%2F2%2F172&gca=mansci%3B6%2F2%2F187&gca=mansci%3B6%2F2%2F191&gca=mansci%3B6%2F2%2F197&gca=mansci%3B6%2F2%2F205&gca=mansci%3B6%2F2%2F215&submit=Get+全部+选中+摘要“）.read（）
a=重新编译（'（.*））
b=re.findall（a，html）

我遇到的问题是，我的代码看起来是逐行的，我不知道如何解析整个段落。

使用beautifulsoup，然后执行以下操作：

from urllib2 import urlopen
from bs4 import BeautifulSoup

soup = BeautifulSoup(urlopen(your_url).read())
print soup.find('p', {'id': 'p-1'}).text

给

测量的可能性不一定会导致提供相关信息以供决策生意。这可以通过参考会计方法，特别是利润计算来证明。会计程序已经正式化，以至于他们歪曲了财务结果和财务状况；可能性资源将被有效利用，利益相关方之间的权益将因缺乏照顾在定义重要概念和同时接受直接具有相反的理由以及后果。随着信息处理速度的提高和计算的精细化开发出相应的有必要努力用与操作相关的术语重新定义，或强化这类概念的定义关键概念如利润，资金、成本。会计和辅助计算的发展历史说明了许可的后果一个将制度化的测量和沟通系统。提高关联性的几点建议会计学并提供了类似的信息

我已经修改了格式，并删除了有关间距的注释。谢谢！第一次在这个网站上。只是学习如何编码！非常有用。我看看我能从这里做些什么