使用beautifulsoup python分析表数据时出错_Python_Parsing_Beautifulsoup

使用beautifulsoup python分析表数据时出错

python parsing

使用beautifulsoup python分析表数据时出错,python,parsing,beautifulsoup,Python,Parsing,Beautifulsoup,我需要在一个表中填充条目，该表可以在HTML页面中的soup.findAll（'table'，{'id'：'taxHistoryTable'）中找到。现在，我需要在这道汤中创建一个类似于此的指针，以便在中获取值 <table id="taxHistoryTable" class="view-history responsive-table yui3-toggle-content-minimized ceilingless"><thead> <tr><th

我需要在一个表中填充条目，该表可以在HTML页面中的soup.findAll（'table'，{'id'：'taxHistoryTable'）中找到。现在，我需要在这道汤中创建一个类似于此的指针，以便在中获取值

<table id="taxHistoryTable" class="view-history responsive-table yui3-toggle-content-minimized ceilingless"><thead>
<tr><th class="year">Year</th>
<th class="numeric property-taxes">Property taxes</th>
<th class="numeric">Change</th><th class="numeric tax-assessment">Tax assessment</th>
<th class="numeric">Change</th></tr></thead><tfoot>
<tr><td colspan="5"><span class="yui3-toggle-content-link-block"><a href="#" class="yui3-toggle-content-link">
<span class="maximize">More</span><span class="minimize">Fewer</span></a></span></td>             </tr></tfoot><tbody>
<tr class="alt"><td>2011</td><td class="numeric">$489</td><td class="numeric"><span class="delta-value"><span class="inc">-81.8%</span></span></td>
<td class="numeric">$34,730</td>
<td class="numeric"><span class="delta-value"><span class="inc">-6.9%</span></span>   </td></tr><tr>
<td>2010</td><td class="numeric">$2,683</td><td class="numeric"><span class="delta-value"><span class="dec">177%</span></span></td><td class="numeric">$37,300</td><td class="numeric"><span class="delta-value"><span class="dec">98.7%</span></span></td></tr><tr class="alt"><td>2009</td><td class="numeric">$969</td><td class="numeric"><span class="delta-value">--</span></td><td class="numeric">$18,770</td><td class="numeric"><span class="delta-value">--</span></td></tr><tr class="minimize"><td>2008</td><td class="numeric">$0</td><td class="numeric"><span class="delta-value">--</span></td><td class="numeric">$18,770</td><td class="numeric"><span class="delta-value">--</span></td></tr></tbody></table>

我正在处理的实际代码非常长，因此我只包含了特定于我收到的错误“UnboundLocalError:local variable'tax1'referenced before assignment”的片段

有人能帮助我理解如何分配这些变量，以便在循环完成后这些变量中的值可用

您试图在

zestimate

之后定位的元素，如

tax

等，都是

urllib2

响应中的空标记。简而言之，

loop1=soup.findAll（'table'，{'id'：'taxHistoryTable'}）

将找不到任何内容，因为如果您使用urllib2或mechanize发出请求，它的父项是空的

div

标记。因此，之后的代码肯定不会工作

要收集您在浏览器中看到的完整HTML源代码，您需要一个能够处理javascript等的工具，并像真正的浏览器一样运行，然后您可以使用ghost.py phantomjs…等

顺便说一句，因为你想刮齐洛。你最好在启动你的机器人之前检查一下他们的API。祝你好运

“tax1”是否出现在代码的前面？如果是的话，你能把那些台词贴出来吗？另外，当您收到此错误消息时，您确定您的代码到达了定义“tax1”的行吗？

        page = urllib2.urlopen(houselink).read() #opening link
        soup = BeautifulSoup(page) #parsing link
        address = soup.find('h1',{'class':'prop-addr'}) #finding html address of house address
        price = soup.find('h2',{'class':'prop-value-price'}) #finding html address of price info, find used to find only instance of price
        price1 = price.find('span',{'class':'value'}) #Had to do this as price address was not unique at granular level, used upper level to identify it
        #Price address was not unique becuase of presence of Zestimate price also on page
        bedroom = soup.findAll('span',{'class':'prop-facts-value'})[0]
        bathroom = soup.findAll('span',{'class':'prop-facts-value'})[1]
        #zestimate
        zestimate = soup.findAll('td',{'class':'zestimate'})[1]
        #tax
        loop1 = soup.findAll('table',{'id':'taxHistoryTable'})
        for form1 in loop1:
            loop2=form1.findAll('tr',{'class':'alt'})
            for form2 in loop2:
                #year1=form2.find('td')[0]
                tax1=form2.find('td',{'class':'numeric'})[0]
                percent1=form2.find('span',{'class':'inc'})[0]
                asses1=form2.find('td',{'class':'numeric'})[1]
                precent2=form2.find('span',{'class':'inc'})[1]
 try:
            q_cleaned = unicode(u' '.join(zestimate.stripped_strings)).encode('utf8').strip()
        except AttributeError:
            q_cleaned = ""
        try:
            r_cleaned = unicode(u' '.join(tax1.stripped_strings)).encode('utf8').strip()
        except AttributeError:
            r_cleaned = ""
        try:
            s_cleaned = unicode(u' '.join(percent1.stripped_strings)).encode('utf8').strip()
        except AttributeError:
            s_cleaned = ""
        try:
            t_cleaned = unicode(u' '.join(asses1.stripped_strings)).encode('utf8').strip()
        except AttributeError:
            t_cleaned = ""
        try:
            u_cleaned = unicode(u' '.join(percent2.stripped_strings)).encode('utf8').strip()
        except AttributeError:
            u_cleaned = ""

        spamwriter.writerow([a_cleaned,b_cleaned,d_cleaned,e_cleaned,f_cleaned,g_cleaned,h_cleaned,i_cleaned,j_cleaned,k_cleaned,l_cleaned,m_cleaned,n_cleaned,o_cleaned,p_cleaned,coordinates,q_cleaned,r_cleaned,s_cleaned,t_cleaned,u_cleaned]) #writing row for that address price combination