用python进行网页抓取。can';t访问td元素

用python进行网页抓取。can';t访问td元素,python,html,web-scraping,Python,Html,Web Scraping,我正在尝试从以下地址进行web抓取: 这是美式足球比赛成绩的一页。我想知道每场比赛的日期、赢家和输家。我在访问日期方面没有问题,但是我不知道如何隔离和获取赢家和输家的球队名称。 到目前为止我所拥有的 from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup #assigning url my_url = 'https://www.pro-football-reference.com/box

我正在尝试从以下地址进行web抓取:

这是美式足球比赛成绩的一页。我想知道每场比赛的日期、赢家和输家。我在访问日期方面没有问题,但是我不知道如何隔离和获取赢家和输家的球队名称。 到目前为止我所拥有的

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup


#assigning url
my_url = 'https://www.pro-football-reference.com/boxscores/'

# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

# html parsing
page_soup = soup(page_html,"html.parser")

games = page_soup.findAll("div",{"class":"game_summary expanded nohover"})


for game in games:
    date_block = game.findAll("tr",{"class":"date"})
    date_val = date_block[0].text
    winner_block = game.findAll("tr",{"class":"winner"})
    #here I need a line that returns the game winner, e.g. "Philadelphia Eagles"
    loser = game.findAll("tr",{"class":"loser"})
这里是相关的html

<div class="game_summary expanded nohover">
<table class="teams">
    <tbody>
        <tr class="date">
            <td colspan="3">Sep 6, 2018</td>
        </tr>
        <tr class="loser">
            <td><a href="/teams/atl/2018.htm">Atlanta Falcons</a></td>
            <td class="right">12</td>
            <td class="right gamelink">
                <a href="/boxscores/201809060phi.htm">Final</a>
            </td>
        </tr>
        <tr class="winner">
            <td><a href="/teams/phi/2018.htm">Philadelphia Eagles</a></td>
            <td class="right">18</td>
            <td class="right">
            </td>
        </tr>
    </tbody>
</table>
<table class="stats">
    <tbody>
        <tr>
            <td><strong>PassYds</strong></td>
            <td><a href="/players/R/RyanMa00.htm" title="Matt Ryan">Ryan</a>-ATL</td>
            <td class="right">251</td>
        </tr>
        <tr>
            <td><strong>RushYds</strong></td>
            <td><a href="/players/A/AjayJa00.htm" title="Jay Ajayi">Ajayi</a>-PHI</td>
            <td class="right">62</td>
        </tr>
        <tr>
            <td><strong>RecYds</strong></td>
            <td><a href="/players/J/JoneJu02.htm" title="Julio Jones">Jones</a>-ATL</td>
            <td class="right">169</td>
        </tr>
    </tbody>
</table>

2018年9月6日
12
18
PassYds
-ATL
251
RushYds
-PHI
62
RecYds
-ATL
169


我得到一个错误,说ResultSet对象没有属性“td”。任何帮助都将不胜感激

您的代码看起来非常正确

html = ''' ... '''
soup = bs4.BeautifulSoup(html, 'lxml')  # or 'html.parser' either way
print([elem.text for elem in soup.find_all('tr', {'class': 'loser'})])
['\nAtlanta Falcons\n12\n\nFinal\n\n']`

到底出了什么问题?

您可以从
“游戏摘要”
div
锚定您的搜索:

import requests, json
from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://www.pro-football-reference.com/boxscores/').text, 'html.parser')
def get_data(_soup_obj, _headers):
  _d = [(lambda x:[c.text for c in x.find_all('td')] if x is not None else [])(_soup_obj.find(a, {'class':b})) for a, b in _headers]
  if all(_d):
    [date], [t1, val, _], [t2, val2, _] = _d
    return {'date':date, 'winner':{'team':t1, 'score':int(val)}, 'loser':{'team':t2, 'score':int(val2)}}
  return {}

headers = [['tr', 'date'], ['tr', 'winner'], ['tr', 'loser']]
games = [get_data(i, headers) for i in d.find('div', {'class':'game_summaries'}).find_all('div', {'class':'game_summary'})]
print(json.dumps(games, indent=4))
输出:

[
  {
    "date": "Sep 6, 2018",
    "winner": {
        "team": "Philadelphia Eagles",
        "score": 18
    },
    "loser": {
        "team": "Atlanta Falcons",
        "score": 12
    }
 },
  {
    "date": "Sep 9, 2018",
    "winner": {
        "team": "New England Patriots",
        "score": 27
    },
    "loser": {
        "team": "Houston Texans",
        "score": 20
    }
 },
 {
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Tampa Bay Buccaneers",
        "score": 48
    },
    "loser": {
        "team": "New Orleans Saints",
        "score": 40
    }
 },
 {
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Minnesota Vikings",
        "score": 24
    },
    "loser": {
        "team": "San Francisco 49ers",
        "score": 16
    }
 },
 {
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Miami Dolphins",
        "score": 27
    },
    "loser": {
        "team": "Tennessee Titans",
        "score": 20
    }
},
{
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Cincinnati Bengals",
        "score": 34
    },
    "loser": {
        "team": "Indianapolis Colts",
        "score": 23
    }
},
{},
{
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Baltimore Ravens",
        "score": 47
    },
    "loser": {
        "team": "Buffalo Bills",
        "score": 3
    }
},
{
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Jacksonville Jaguars",
        "score": 20
    },
    "loser": {
        "team": "New York Giants",
        "score": 15
    }
},
{
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Kansas City Chiefs",
        "score": 38
    },
    "loser": {
        "team": "Los Angeles Chargers",
        "score": 28
    }
},
{
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Denver Broncos",
        "score": 27
    },
    "loser": {
        "team": "Seattle Seahawks",
        "score": 24
    }
},
{
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Washington Redskins",
        "score": 24
    },
    "loser": {
        "team": "Arizona Cardinals",
        "score": 6
    }
},
{
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Carolina Panthers",
        "score": 16
    },
    "loser": {
        "team": "Dallas Cowboys",
        "score": 8
    }
},
{
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Green Bay Packers",
        "score": 24
    },
    "loser": {
        "team": "Chicago Bears",
        "score": 23
    }
},
{
    "date": "Sep 10, 2018",
    "winner": {
        "team": "New York Jets",
        "score": 48
    },
    "loser": {
        "team": "Detroit Lions",
        "score": 17
    }
},
{
    "date": "Sep 10, 2018",
    "winner": {
        "team": "Los Angeles Rams",
        "score": 33
    },
    "loser": {
        "team": "Oakland Raiders",
        "score": 13
     }
  }
]
Sep 6, 2018
Philadelphia Eagles
Sep 9, 2018
New England Patriots
Sep 9, 2018
Tampa Bay Buccaneers
Sep 9, 2018
Minnesota Vikings
Sep 9, 2018
Miami Dolphins
Sep 9, 2018
Cincinnati Bengals
Sep 9, 2018
Baltimore Ravens
Sep 9, 2018
Jacksonville Jaguars
Sep 9, 2018
Kansas City Chiefs
Sep 9, 2018
Denver Broncos
Sep 9, 2018
Washington Redskins
Sep 9, 2018
Carolina Panthers
Sep 9, 2018
Green Bay Packers
Sep 10, 2018
New York Jets
Sep 10, 2018
Los Angeles Rams

小心平局游戏,我认为这是导致你的错误的原因,因为在这种情况下没有赢家,因此你不会找到与赢家阶级的争吵。下面的代码输出日期和获胜者

for game in games:
    date_block = game.find('tr',{'class':'date'})
    date_val = date_block.text
    winner_block = game.find('tr',{'class':'winner'})
    if winner_block:
        winner = winner_block.find('a').text
        print(date_val)
        print(winner)
    loser = game.findAll('tr',{'class':'loser'})
输出:

[
  {
    "date": "Sep 6, 2018",
    "winner": {
        "team": "Philadelphia Eagles",
        "score": 18
    },
    "loser": {
        "team": "Atlanta Falcons",
        "score": 12
    }
 },
  {
    "date": "Sep 9, 2018",
    "winner": {
        "team": "New England Patriots",
        "score": 27
    },
    "loser": {
        "team": "Houston Texans",
        "score": 20
    }
 },
 {
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Tampa Bay Buccaneers",
        "score": 48
    },
    "loser": {
        "team": "New Orleans Saints",
        "score": 40
    }
 },
 {
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Minnesota Vikings",
        "score": 24
    },
    "loser": {
        "team": "San Francisco 49ers",
        "score": 16
    }
 },
 {
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Miami Dolphins",
        "score": 27
    },
    "loser": {
        "team": "Tennessee Titans",
        "score": 20
    }
},
{
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Cincinnati Bengals",
        "score": 34
    },
    "loser": {
        "team": "Indianapolis Colts",
        "score": 23
    }
},
{},
{
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Baltimore Ravens",
        "score": 47
    },
    "loser": {
        "team": "Buffalo Bills",
        "score": 3
    }
},
{
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Jacksonville Jaguars",
        "score": 20
    },
    "loser": {
        "team": "New York Giants",
        "score": 15
    }
},
{
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Kansas City Chiefs",
        "score": 38
    },
    "loser": {
        "team": "Los Angeles Chargers",
        "score": 28
    }
},
{
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Denver Broncos",
        "score": 27
    },
    "loser": {
        "team": "Seattle Seahawks",
        "score": 24
    }
},
{
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Washington Redskins",
        "score": 24
    },
    "loser": {
        "team": "Arizona Cardinals",
        "score": 6
    }
},
{
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Carolina Panthers",
        "score": 16
    },
    "loser": {
        "team": "Dallas Cowboys",
        "score": 8
    }
},
{
    "date": "Sep 9, 2018",
    "winner": {
        "team": "Green Bay Packers",
        "score": 24
    },
    "loser": {
        "team": "Chicago Bears",
        "score": 23
    }
},
{
    "date": "Sep 10, 2018",
    "winner": {
        "team": "New York Jets",
        "score": 48
    },
    "loser": {
        "team": "Detroit Lions",
        "score": 17
    }
},
{
    "date": "Sep 10, 2018",
    "winner": {
        "team": "Los Angeles Rams",
        "score": 33
    },
    "loser": {
        "team": "Oakland Raiders",
        "score": 13
     }
  }
]
Sep 6, 2018
Philadelphia Eagles
Sep 9, 2018
New England Patriots
Sep 9, 2018
Tampa Bay Buccaneers
Sep 9, 2018
Minnesota Vikings
Sep 9, 2018
Miami Dolphins
Sep 9, 2018
Cincinnati Bengals
Sep 9, 2018
Baltimore Ravens
Sep 9, 2018
Jacksonville Jaguars
Sep 9, 2018
Kansas City Chiefs
Sep 9, 2018
Denver Broncos
Sep 9, 2018
Washington Redskins
Sep 9, 2018
Carolina Panthers
Sep 9, 2018
Green Bay Packers
Sep 10, 2018
New York Jets
Sep 10, 2018
Los Angeles Rams

你可能会遇到本周平局的问题。匹兹堡和克利夫兰的比赛没有赢家。运行此命令应输出所有游戏,包括平局游戏:

for game in games:
    date_block = game.findAll("tr",{"class":"date"})
    date_val = date_block[0].text
    print "Game Date: %s" % (date_val)
    #Test if a winner is defined
    if game.find("tr",{"class":"winner"}) is not None:        


        winner_block = game.findAll("tr",{"class":"winner"})
        #Get the winner from the first TD and print text only
        winner = winner_block[0].findAll("td")
        print "Winner: %s" % (winner[0].get_text())

        loser_block = game.findAll("tr",{"class":"loser"})
        #Get the loser from the first TD and print text only
        loser = loser_block[0].findAll("td")
        print "Loser: %s" % (loser[0].get_text())
    else:
        #If no winner is listed, it must be a tie. Get both teams and print them.
        print "Its a tie!"
        draw_block  = game.findAll("tr",{"class":"draw"})
        for team in draw_block:
            print "Draw : %s"   % (team.findAll("td")[0].get_text())

原来是一场平局把事情搞砸了。当然没有“赢家”和“输家”两个词。非常感谢。这确实是个问题。哈哈,很高兴我能帮上忙。