Python BeautifulsoupHTML刮_Python_Beautifulsoup

Python BeautifulsoupHTML刮

python

Python BeautifulsoupHTML刮,python,beautifulsoup,Python,Beautifulsoup,我正在尝试从网站上抓取文本，目前为止我编写了以下代码： import urllib, urllib2, cookielib, re, io, sys from bs4 import BeautifulSoup cj = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) resp = opener.open('http://www.bancuri.net/formular_m

我正在尝试从网站上抓取文本，目前为止我编写了以下代码：

import urllib, urllib2, cookielib, re, io, sys
from bs4 import BeautifulSoup

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

resp = opener.open('http://www.bancuri.net/formular_mail.aspx?ID=3181').read()
soup = BeautifulSoup(resp)
for tr in soup.find_all('p'):
    tds = tr.find_all('justify')
    for x in tds:
        print x

我需要搜集的数据是：

Categoria: Bărbaţi şi femei
Bancul: O femeie către un bărbat la o petrecere:
- Dumneata tare semeni cu al treilea soţ al meu.
- Dar de cîte ori aţi fost căsătorită?
- De două ori pînă acum.

但它不能正常工作，我得到以下结果：

C:\Users\admin\Desktop>bancuri.py
C:\Users\admin\Desktop>

关于可能出现的问题有什么想法吗？

我不太了解

cookielib

以及它的用途，但我仅使用模块

urllib2

检索页面，该模块也导入到您的代码中

首先，这是：

resp = urllib2.urlopen('http://www.bancuri.net/formular_mail.aspx?ID=3181').read()

并检索您需要的内容：

>>> soup = BeautifulSoup(resp)
>>> text = soup.find('p').get_text()
>>> print text

Categoria: Bărbaţi şi femei
Bancul:

O femeie către un bărbat la o petrecere:

- Dumneata tare semeni cu al treilea soţ al meu.

- Dar de cîte ori aţi fost căsătorită?

- De două ori pînă acum.

我看了那个网址。您需要用户名和密码才能访问它。不，访问它不需要用户名或密码！编辑你的帖子，把你得到的作为输出，这样别人就更容易帮助你了。你是怎么得到输出的？请提供您的更多代码。@aIKid是我编辑了这篇文章，并为OP留下了一个地方来添加正在作为输出检索的内容，以便于帮助。。。是的，但现在我不知道OP是如何将其作为输出^^回溯的（上次调用）：文件“”，第1行，text=soup.find（'p'）。get_text（）AttributeError:'NoneType'对象没有属性'get_text'，您确定要刮取相同的页面吗？

打印汤的产量是多少？