Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/326.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 蟒蛇、汤和抓取表_Python_Beautifulsoup - Fatal编程技术网

Python 蟒蛇、汤和抓取表

Python 蟒蛇、汤和抓取表,python,beautifulsoup,Python,Beautifulsoup,我有一个包含多个表的页面。我试图得到一个名为“TabBox”的表,但它似乎抓住了一个名为“TabBox2”的表。有什么想法吗 有一个“TabBox2”包装两个表。似乎它正在搜索“TabBox”的第一个实例,而不管名称是“TabBox2”还是“TabBox” 行2应=table2.find 谢谢你,布莱尼亚克 <br /> <table cellspacing="0" cellpadding="4" border="1" class="GroupBox

我有一个包含多个表的页面。我试图得到一个名为“TabBox”的表,但它似乎抓住了一个名为“TabBox2”的表。有什么想法吗

有一个“TabBox2”包装两个表。似乎它正在搜索“TabBox”的第一个实例,而不管名称是“TabBox2”还是“TabBox”

行2应=table2.find

谢谢你,布莱尼亚克

       <br />
       <table cellspacing="0" cellpadding="4" border="1" class="GroupBox1">
          <tbody><tr>
            <th><h3>Completion Information</h3></th>
          </tr>
          <tr>
            <td><table width="578" cellspacing="0" cellpadding="4" border="1" class="GroupBox3">
              <tbody><tr>
                <th width="31%">Well Status Code</th>
                <th width="17%" nowrap="nowrap"><div align="center"><strong>Spud Date</strong></div></th>
                <th width="28%" nowrap="nowrap"><div align="center">Drilling Completed</div></th>
                <th width="24%" nowrap="nowrap"><div align="center">Surface Casing Date</div></th>
              </tr>
              <tr>
                <td nowrap="nowrap">W - Final Completion</td>
                <td><div align="center">12/08/2011</div></td>
                <td><div align="center">02/14/2012</div></td>
                <td><div align="center">12/09/2011</div></td>
              </tr>
            </tbody></table></td>
          </tr>

          <tr>
            <td><table cellspacing="0" cellpadding="4" border="1" class="TabBox">
              <tbody><tr>
                <th width="155" nowrap="nowrap">Field Name</th>
                <th width="142" nowrap="nowrap">Completed Well Type</th>
                <th width="108" nowrap="nowrap"><div align="center">Completed Date</div></th>
                <th width="133" nowrap="nowrap"><div align="center">Validated Date</div></th>
              </tr>

               <tr>
                <td nowrap="nowrap">
                   WOLFBONE (TREND AREA)
                </td>
                <td nowrap="nowrap"><div align="center">Oil</div>
                </td>
                <td nowrap="nowrap"><div align="center">02/14/2012</div>
                </td>
                <td nowrap="nowrap"><div align="center">06/04/2013</div>
                </td>
               </tr>

            </tbody></table>
           </td>
          </tr>

        </tbody></table>
       <br />

完成信息 油井状态代码 开钻日期 钻探完成 地表套管日期 W-最终完工 12/08/2011 02/14/2012 12/09/2011 字段名 完井类型 完成日期 生效日期 WOLFBONE(趋势区) 油 02/14/2012 06/04/2013
尝试以下操作:

from bs4 import BeautifulSoup
import re

html = r"""
      <br />
       <table cellspacing="0" cellpadding="4" border="1" class="GroupBox1">
          <tbody><tr>
            <th><h3>Completion Information</h3></th>
          </tr>
          <tr>
            <td><table width="578" cellspacing="0" cellpadding="4" border="1" class="GroupBox3">
              <tbody><tr>
                <th width="31%">Well Status Code</th>
                <th width="17%" nowrap="nowrap"><div align="center"><strong>Spud Date</strong></div></th>
                <th width="28%" nowrap="nowrap"><div align="center">Drilling Completed</div></th>
                <th width="24%" nowrap="nowrap"><div align="center">Surface Casing Date</div></th>
              </tr>
              <tr>
                <td nowrap="nowrap">W - Final Completion</td>
                <td><div align="center">12/08/2011</div></td>
                <td><div align="center">02/14/2012</div></td>
                <td><div align="center">12/09/2011</div></td>
              </tr>
            </tbody></table></td>
          </tr>

          <tr>
            <td><table cellspacing="0" cellpadding="4" border="1" class="TabBox">
              <tbody><tr>
                <th width="155" nowrap="nowrap">Field Name</th>
                <th width="142" nowrap="nowrap">Completed Well Type</th>
                <th width="108" nowrap="nowrap"><div align="center">Completed Date</div></th>
                <th width="133" nowrap="nowrap"><div align="center">Validated Date</div></th>
              </tr>

               <tr>
                <td nowrap="nowrap">
                   WOLFBONE (TREND AREA)
                </td>
                <td nowrap="nowrap"><div align="center">Oil</div>
                </td>
                <td nowrap="nowrap"><div align="center">02/14/2012</div>
                </td>
                <td nowrap="nowrap"><div align="center">06/04/2013</div>
                </td>
               </tr>

            </tbody></table>
           </td>
          </tr>

        </tbody></table>
       <br />
"""
soup = BeautifulSoup(html)
tab_box = soup.findAll('table', {'class': 'TabBox'})

for var in tab_box:
    print var
从bs4导入美化组
进口稀土
html=r“”

完成信息 油井状态代码 开钻日期 钻探完成 地表套管日期 W-最终完工 12/08/2011 02/14/2012 12/09/2011 字段名 完井类型 完成日期 生效日期 WOLFBONE(趋势区) 油 02/14/2012 06/04/2013
""" soup=BeautifulSoup(html) tab_box=soup.findAll('table',{'class':'TabBox'}) 对于选项卡框中的var: 打印变量
是表的
id
选项卡框
还是
名称
?还有,你的示例HTML和代码在哪里?@Games Brainiac:这就是我的。同样,它被“包装”在一个“TabBox2”标题中。table2=soup.find(“table”,{“class”:“TabBox”})rows2=table.find_all(“tr”)我有点迷路了,但我发布了一些东西,看看它是否对您有效。谢谢。我是根据原始表而不是第二个表查找行的。有时没有帮助我看不到最明显的东西。你就是那个人!@再次感谢你。我用table=soup.find(“table”,“class”:“GroupBox3”)rows=table.find_all(“tr”)table2=soup.find(“table”,“class”:“TabBox”)rows2=table.find_all(“tr”)代替table=soup.find(“table”,“class”:“GroupBox3”)rows=table.find_all(“tr”)table2=soup.find(“table”,“table”,“class”:“TabBox”)rows2=table2.find_all(“tr”)不知道如何在注释中添加“\n”行。
from bs4 import BeautifulSoup
import re

html = r"""
      <br />
       <table cellspacing="0" cellpadding="4" border="1" class="GroupBox1">
          <tbody><tr>
            <th><h3>Completion Information</h3></th>
          </tr>
          <tr>
            <td><table width="578" cellspacing="0" cellpadding="4" border="1" class="GroupBox3">
              <tbody><tr>
                <th width="31%">Well Status Code</th>
                <th width="17%" nowrap="nowrap"><div align="center"><strong>Spud Date</strong></div></th>
                <th width="28%" nowrap="nowrap"><div align="center">Drilling Completed</div></th>
                <th width="24%" nowrap="nowrap"><div align="center">Surface Casing Date</div></th>
              </tr>
              <tr>
                <td nowrap="nowrap">W - Final Completion</td>
                <td><div align="center">12/08/2011</div></td>
                <td><div align="center">02/14/2012</div></td>
                <td><div align="center">12/09/2011</div></td>
              </tr>
            </tbody></table></td>
          </tr>

          <tr>
            <td><table cellspacing="0" cellpadding="4" border="1" class="TabBox">
              <tbody><tr>
                <th width="155" nowrap="nowrap">Field Name</th>
                <th width="142" nowrap="nowrap">Completed Well Type</th>
                <th width="108" nowrap="nowrap"><div align="center">Completed Date</div></th>
                <th width="133" nowrap="nowrap"><div align="center">Validated Date</div></th>
              </tr>

               <tr>
                <td nowrap="nowrap">
                   WOLFBONE (TREND AREA)
                </td>
                <td nowrap="nowrap"><div align="center">Oil</div>
                </td>
                <td nowrap="nowrap"><div align="center">02/14/2012</div>
                </td>
                <td nowrap="nowrap"><div align="center">06/04/2013</div>
                </td>
               </tr>

            </tbody></table>
           </td>
          </tr>

        </tbody></table>
       <br />
"""
soup = BeautifulSoup(html)
tab_box = soup.findAll('table', {'class': 'TabBox'})

for var in tab_box:
    print var