刮表困难（Python、BeautifulSoup）_Python_Python 3.x_Beautifulsoup

刮表困难（Python、BeautifulSoup）

python python-3.x

刮表困难（Python、BeautifulSoup）,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,我正在努力从这个网站上删除表格：具体地说，我正试图为表中列出的每一个游戏的“Westgate”行刮去“RunLine”列我不确定我做错了什么，因为我只是试图深入到表格中的文本，从我对网络垃圾的有限理解来看，这将是我选择的“oddrow”表格中的第二个表格我曾尝试搜索我的问题，但在将任何建议的解决方案应用于我的特定场景时遇到了困难谢谢你事先的帮助这是到目前为止我的代码 url='http://www.espn.com/mlb/lines' driver = webdriver.Chro

我正在努力从这个网站上删除表格：

具体地说，我正试图为表中列出的每一个游戏的“Westgate”行刮去“RunLine”列

我不确定我做错了什么，因为我只是试图深入到表格中的文本，从我对网络垃圾的有限理解来看，这将是我选择的“oddrow”表格中的第二个表格

我曾尝试搜索我的问题，但在将任何建议的解决方案应用于我的特定场景时遇到了困难

谢谢你事先的帮助

这是到目前为止我的代码

url='http://www.espn.com/mlb/lines'
driver = webdriver.Chrome() 
driver.get(url)
time.sleep(5)
content=driver.page_source

soup=BeautifulSoup(content,'lxml')

driver.quit()

table=soup.find('table',{'class':'tablehead'})
table_row=table.find_all('tr',{'class':'oddrow'})
table_data=table_row.find_all('table',{'class':'tablehead'})[1] #trying to 
#just scrape the second table only within this row, ie the Westgate and Runline table

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-397-fea09cb40cb2> in <module>()
----> 1 table_data=table_row.find_all('table',{'class':'tablehead'})

~\Anaconda3\lib\site-packages\bs4\element.py in __getattr__(self, key)
   1805     def __getattr__(self, key):
   1806         raise AttributeError(
-> 1807             "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key
   1808         )

AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

url='1〕http://www.espn.com/mlb/lines'
driver=webdriver.Chrome（）
获取驱动程序（url）
时间。睡眠（5）
content=driver.page\u来源
汤=BeautifulSoup（含量，'lxml'）
driver.quit（）
table=soup.find（'table'，{'class'：'tablehead'}）
table_row=table.find_all（'tr'，{'class'：'oddrow'））
table_data=table_row.find_all（'table'，{'class'：'tablehead'}）[1]#正在尝试
#只需刮掉这一排内的第二张桌子（西门和跑道桌子）
---------------------------------------------------------------------------
AttributeError回溯（最近一次呼叫上次）
在（）
---->1 table_data=table_行。find_all（'table'，{'class'：'tablehead'}）
~\Anaconda3\lib\site packages\bs4\element.py在\uuuuu getattr\uuuuuuu中（self，key）
1805 def_uugetattr_uu（自身，键）：
1806提高属性错误(
->1807“ResultSet对象没有属性“%s”。您可能将项目列表视为单个项目。当您打算调用find（）时是否调用find_all（）？%key？”
1808         )
AttributeError:ResultSet对象没有“全部查找”属性。您可能将项目列表视为单个项目。当您打算调用find（）时，是否调用了find_all（）？

我相信下面给出了您想要的输出，可能有更好的方法，但是我使用了一个嵌套循环来递增I，直到它是3，因为您每次都想在soup中使用第三个表，然后我递增oddrowindex，这将在循环中从westgate行返回运行行列：

from bs4 import BeautifulSoup
from selenium import webdriver

url='http://www.espn.com/mlb/lines'
driver = webdriver.webdriver.Chrome() 
driver.get(url)
content=driver.page_source

oddrowindex = 0
soup=BeautifulSoup(content,'lxml')

while oddrowindex < 70:
        i = 0
        table_row=soup.find_all('tr',{'class':'oddrow'})[oddrowindex]
        for td in table_row:
                if (i == 3):
                        print(td.text)
                i = i + 1
                oddrowindex = oddrowindex + 1

driver.quit()

从bs4导入美化组
从selenium导入webdriver
url='1〕http://www.espn.com/mlb/lines'
driver=webdriver.webdriver.Chrome（）
获取驱动程序（url）
content=driver.page\u来源
oddrowindex=0
汤=BeautifulSoup（含量，'lxml'）
当oddrowindex<70时：
i=0
table_row=soup.find_all（'tr'，{'class'：'oddrow'}）[oddrowindex]
对于表_行中的td：
如果（i==3）：
打印（td.text）
i=i+1
oddrowindex=oddrowindex+1
driver.quit（）

样本输出：

您好，我正在深入研究您的解决方案，但理解起来有困难。给你两个问题。。。1). 你怎么能跳过我们不想要的怪胎，比如威廉希尔和CG技术？2). 我在每个oddrow下只看到两个HTML表格类，在本例中的表格是“text align:center”吗？此外，i变量如何知道您正在准确地引用这些表？对于tr类oddrow中的每个td，i变量都会递增，它读取的每个td都会递增，然后在第三行（运行行）它会打印这些内容