Python __父类中的getattr\uuuuuu导致子类\uuuuu初始化\uuuuuu递归错误_Python_Beautifulsoup

Python __父类中的getattr\uuuuuu导致子类\uuuuu初始化\uuuuuu递归错误

python

Python __父类中的getattr\uuuuuu导致子类\uuuuu初始化\uuuuuu递归错误,python,beautifulsoup,Python,Beautifulsoup,根据答案中的建议：，我尝试使用类组合而不是子类化beautifulsou 基本的Scraper类本身运行良好（至少在我有限的测试中是这样）刮刀等级： from BeautifulSoup import BeautifulSoup import urllib2 class Scrape(): """base class to be subclassed basically a wrapper that providers basic url fetching with url

根据答案中的建议：，我尝试使用类组合而不是子类化

beautifulsou

基本的Scraper类本身运行良好（至少在我有限的测试中是这样）

刮刀等级：

from BeautifulSoup import BeautifulSoup
import urllib2

class Scrape():
    """base class to be subclassed
    basically a  wrapper that providers basic url fetching with urllib2
    and the basic html parsing with beautifulsoupץ
    some useful methods are provided with class composition with BeautifulSoup.
    for direct access to the soup class you can use the _soup property."""

    def __init__(self,file):
        self._file = file
        #very basic input validation
        #import re

        #import urllib2
        #from BeautifulSoup import BeautifulSoup
        try:
            self._page = urllib2.urlopen(self._file) #fetching the page
        except (urllib2.URLError):
            print ('please enter a valid url starting with http/https/ftp/file')

        self._soup = BeautifulSoup(self._page) #calling the html parser

        #BeautifulSoup.__init__(self,self._page)
        # the next part is the class compostion part - we transform attribute and method calls to the BeautifulSoup class
        #search functions:
        self.find = self._soup.find
        self.findAll = self._soup.findAll

        self.__iter__ = self._soup.__iter__ #enables iterating,looping in the object

        self.__len__ = self._soup.__len__
        self.__contains__ = self._soup.__contains__
        #attribute fetching and setting - __getattr__ implented by the scraper class
        self.__setattr__ = self._soup.__setattr__
        self.__getattribute__ = self._soup.__getattribute__

        #Called to implement evaluation of self[key]
        self.__getitem__ = self._soup.__getitem__
        self.__setitem__ = self._soup.__setitem__
        self.__delitem__ = self._soup.__delitem__

        self.__call__ = self._soup.__call__#Called when the instance is “called” as a function

        self._getAttrMap = self._soup._getAttrMap
        self.has_key = self._soup.has_key

        #walking the html document methods
        self.contents = self._soup.contents
        self.text = self._soup.text
        self.extract = self._soup.extract
        self.next = self._soup.next
        self.parent = self._soup.parent
        self.fetch = self._soup.fetch
        self.fetchText = self._soup.fetchText
        self.findAllNext = self._soup.findAllNext
        self.findChild = self._soup.findChild
        self.findChildren = self._soup.findChildren
        self.findNext = self._soup.findNext
        self.findNextSibling = self._soup.findNextSibling
        self.first = self._soup.first
        self.name = self._soup.name
        self.get = self._soup.get
        self.getString = self._soup.getString


        # comparison operators or similiar boolean checks
        self.__eq__ = self._soup.__eq__
        self.__ne__ = self._soup.__ne__
        self.__hash__ = self._soup.__hash__
        self.__nonezero__ = self._soup.__nonzero__ #not sure



        # the class represntation magic methods:
        self.__str__ = self._soup.__str__
        self.__repr__ =self._soup.__repr__
        #self.__dict__ = self._soup.__dict__


    def __getattr__(self,method):
        """basically this 'magic' method transforms calls for unknown attributes to
        and enables to traverse the html document with the .notation.
        for example - using instancename.div will return the first div.
        explantion: python calls __getattr__ if It didn't find any method or attribute correspanding to the call.
        I'm not sure this is a good or the right use for the method """

        return self._soup.find(method)

    def clean(self,work=False,element=False):
        """clean method that provides:basic cleaning of head,scripts etc
        input 'work' soup object to clean from unneccesary parts:scripts,head,style
        has optional variable:'element' that can get a tuple of element
        that enables to override what element to clean"""
        self._work = work or self._soup
        self._cleanelements=element or ("head","style","script")

        #for elem in self._work.findAll(self._cleanelements):
        for elem in self.findAll(self._cleanelements):
            elem.extract()

但当我将其子类化时，我会得到某种递归循环，我只是能够理解

以下是子类（相关部分）：

下面是错误消息：

文件“C:\Python27\learn\traffic.py”，第117行，在\uuu init中__
Scrape.\uuuu初始化（self，self.\u文件）
文件“C:\Python27\learn\traffic.py”，第26行，在\uuu init中__
self.\u soup=BeautifulSoup（self.\u page）#调用html解析器
文件“C:\Python27\learn\traffic.py”，第92行，在__
返回self.\u soup.find（方法）
文件“C:\Python27\learn\traffic.py”，第92行，在__
返回self.\u soup.find（方法）
文件“C:\Python27\learn\traffic.py”，第92行，在__
返回self.\u soup.find（方法）
运行时错误：超过最大递归深度

我怀疑问题在于我误用了

\uuu getattr\uuuu

，但我不知道应该更改什么。

第1部分您的代码无法工作，因为

\uuu getattr\uuu（）

在初始化之前访问

self.\u soup

。这是由于四条看起来无害的线条造成的：

try:
  self._page = urllib2.urlopen(self._file)
except (urllib2.URLError):
  print ('please enter a valid url starting with http/https/ftp/file')

为什么捕获异常而不实际处理它

下一行访问self.\u页面，如果urlopen（）引发异常，该页面尚未设置：

self._soup = BeautifulSoup(self._page)

由于尚未设置，访问它将调用

\uuuu getattr\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
最简单的“修复”方法是使用特例来防止无限递归。此外，\uuu getattr\uuu
只需在soup上执行常规属性查找似乎更有意义：
def __getattr__(self,attr):
  if attr == "_soup":
    raise AttributeError()
  return getattr(self._soup,attr)

第二部分
复制所有的方法不太可能很好地工作，而且似乎完全没有抓住课堂写作的重点。谢谢tc-有几个问题：1。汤之前在哪里叫getattr？如果有执行选项，这是正确的，但没有例外-所以我真的不明白我应该做什么来避免这2。我没有复制所有的方法——只有那些我使用的方法，我应该做什么不同的？这是我第一次尝试这个。将靓汤再分类到底是不是一个更好的主意？3.你为什么建议检查attr=“\u汤”？我想我理解了返回行，但不是if，在它之前我想我理解了新getattr的含义-允许我从beautiful soup中删除大部分方法复制，因为我将未知方法调用转移到beautiful soup。不幸的是，这并不能解决递归问题！好的-问题是url异常，你是对的。谢谢你的帮助
def __getattr__(self,attr):
  if attr == "_soup":
    raise AttributeError()
  return getattr(self._soup,attr)