Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/331.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在python中使用BeautifulSoup时出错:ValueError:int()的文本无效,以10为底:';xBB';_Python_Beautifulsoup - Fatal编程技术网

在python中使用BeautifulSoup时出错:ValueError:int()的文本无效,以10为底:';xBB';

在python中使用BeautifulSoup时出错:ValueError:int()的文本无效,以10为底:';xBB';,python,beautifulsoup,Python,Beautifulsoup,下面的代码在我的机器上运行良好,但它在代码行中抛出了一个错误 soup = BeautifulSoup(html) 当它在另一台机器上运行时。它正在解析雅虎体育的活跃NBA球员名单,并将他们的姓名和位置存储到文本文件中 from bs4 import BeautifulSoup import urllib2 ''' scraping the labeled data from yahoo sports ''' def scrape(filename): base_url = "htt

下面的代码在我的机器上运行良好,但它在代码行中抛出了一个错误

soup = BeautifulSoup(html)
当它在另一台机器上运行时。它正在解析雅虎体育的活跃NBA球员名单,并将他们的姓名和位置存储到文本文件中

from bs4 import BeautifulSoup
import urllib2

'''
scraping the labeled data from yahoo sports
'''
def scrape(filename):
    base_url = "http://sports.yahoo.com/nba/players?type=position&c=NBA&pos="
    positions = ['G', 'F', 'C']
    players = 0

    with open(filename, 'w') as names:
        for p in positions:
            html = urllib2.urlopen(base_url + p).read()
            soup = BeautifulSoup(html) #throws the error!
            table = soup.find_all('table')[9]
            cells = table.find_all('td')

            for i in xrange(4, len(cells) - 1, 3):
                names.write(cells[i].find('a').string + '\t' + p + '\n')
                players += 1

    print "...success! %r players downloaded." % players
它抛出的错误是:

Traceback (most recent call last):
  File "run_me.py", line 9, in <module>
    scrapenames.scrape('namelist.txt')
  File "/Users/brapse/Downloads/bball/scrapenames.py", line 15, in scrape
    soup = BeautifulSoup(html)
  File "/usr/local/Cellar/python/2.6.5/lib/python2.6/site-packages/bs4/__init__.py", line 100, in __init__
    self._feed()
  File "/usr/local/Cellar/python/2.6.5/lib/python2.6/site-packages/bs4/__init__.py", line 113, in _feed
    self.builder.feed(self.markup)
  File "/usr/local/Cellar/python/2.6.5/lib/python2.6/site-packages/bs4/builder/_htmlparser.py", line 46, in feed
    super(HTMLParserTreeBuilder, self).feed(markup)
  File "/usr/local/Cellar/python/2.6.5/lib/python2.6/HTMLParser.py", line 108, in feed
    self.goahead(0)
  File "/usr/local/Cellar/python/2.6.5/lib/python2.6/HTMLParser.py", line 171, in goahead
    self.handle_charref(name)
  File "/usr/local/Cellar/python/2.6.5/lib/python2.6/site-packages/bs4/builder/_htmlparser.py", line 58, in handle_charref
    self.handle_data(unichr(int(name)))
ValueError: invalid literal for int() with base 10: 'xBB'
回溯(最近一次呼叫最后一次):
文件“run_me.py”,第9行,在
scrapNames.scrape('namelist.txt')
文件“/Users/brapse/Downloads/bball/scrapenames.py”,第15行,在scrape中
soup=BeautifulSoup(html)
文件“/usr/local/ceral/python/2.6.5/lib/python2.6/site packages/bs4/_init_____;.py”,第100行,in__;init__
self._feed()
文件“/usr/local/cillar/python/2.6.5/lib/python2.6/site packages/bs4/_init___;.py”,第113行,在feed中
self.builder.feed(self.markup)
提要中的文件“/usr/local/ceral/python/2.6.5/lib/python2.6/site packages/bs4/builder/_htmlparser.py”,第46行
super(HTMLParserTreeBuilder,self).feed(标记)
feed中的文件“/usr/local/ceral/python/2.6.5/lib/python2.6/HTMLParser.py”,第108行
自我激励(0)
goahead中的文件“/usr/local/ceral/python/2.6.5/lib/python2.6/HTMLParser.py”,第171行
self.handle_charref(名称)
文件“/usr/local/ceral/python/2.6.5/lib/python2.6/site packages/bs4/builder/_htmlparser.py”,第58行,在handle\u charref中
self.handle_数据(unichr(int(name)))
ValueError:基数为10的int()的文本无效:“xBB”

我相信这是BS4 HTMLPasser代码中的一个bug,它会在
和#xBB上崩溃实体(代表
»
),认为它应该是十进制的。我建议你在那台机器上更新BeautifulSoup。

我相信这是BS4 HTMLPasser代码中的一个错误,它会在
&xBB实体(代表
»
),认为它应该是十进制的。我建议你在那台机器上更新BeautifulSoup