Python Webscraping:如何处理返回的数据帧中的NA值?
我正在浏览网页,我得到的Python Webscraping:如何处理返回的数据帧中的NA值?,python,pandas,selenium,selenium-webdriver,web-scraping,Python,Pandas,Selenium,Selenium Webdriver,Web Scraping,我正在浏览网页,我得到的对象没有属性错误 一些比赛没有分数,因此没有返回值。因此,我知道这是错误的原因 我的代码一直工作到没有可用的分数并返回错误 import pandas as pd from selenium import webdriver from bs4 import BeautifulSoup as bs browser = webdriver.Chrome() class GameData: def __init__(self): self.date
对象没有属性
错误
一些比赛没有分数,因此没有返回值。因此,我知道这是错误的原因
我的代码一直工作到没有可用的分数并返回错误
import pandas as pd
from selenium import webdriver
from bs4 import BeautifulSoup as bs
browser = webdriver.Chrome()
class GameData:
def __init__(self):
self.date = []
self.time = []
self.country = []
self.league = []
self.game = []
self.home_odds = []
self.draw_odds = []
self.away_odds = []
def append(self, score):
pass
def get_urls(browser, landing_page):
browser.get(landing_page)
urls = [i.get_attribute('href') for i in
browser.find_elements_by_css_selector(
'.next-games-date > a:nth-child(1), .next-games-date > a:nth-child(n+3)')]
return urls
def parse_data(html):
df = pd.read_html(html, header=0)[0]
html = browser.page_source
soup = bs(html, "lxml")
cont = soup.find('div', {'id': 'wrap'})
content = cont.find('div', {'id': 'col-content'})
content = content.find('table', {'class': 'table-main'}, {'id': 'table-matches'})
main = content.find('th', {'class': 'first2 tl'})
if main is None:
return None
count = main.findAll('a')
country = count[0].text
game_data = GameData()
for row in df.itertuples():
if not isinstance(row[1], str):
continue
elif ':' not in row[1]:
country = row[1].split('»')[0]
continue
game_time = row[1]
game_date = row[1].split('-')[0]
score = row[3] #The error happens here. How do I construct 'if NA then NaN?'
game_data.date.append(game_date)
game_data.time.append(game_time)
game_data.country.append(country)
game_data.league.append(count[1].text)
game_data.game.append(row[2])
game_data.score.append(score) #This should be score if available else NaN
game_data.home_odds.append(row[4])
game_data.draw_odds.append(row[5])
game_data.away_odds.append(row[6])
return game_data
if __name__ == '__main__':
start_url = "https://www.oddsportal.com/matches/soccer/"
urls = []
browser = webdriver.Chrome()
results = None
urls = get_urls(browser, start_url)
urls.insert(0, start_url)
for number, url in enumerate(urls):
if number > 0:
browser.get(url)
html = browser.page_source
game_data = parse_data(html)
if game_data is None:
continue
result = pd.DataFrame(game_data.__dict__)
if results is None:
results = result
else:
results = results.append(result, ignore_index=True)
错误:
Traceback (most recent call last):
File "C:/Users/harsh/AppData/Roaming/JetBrains/PyCharmCE2021.1/scratches/scratch_16.py", line 98, in <module>
game_data = parse_data(html)
File "C:/Users/harsh/AppData/Roaming/JetBrains/PyCharmCE2021.1/scratches/scratch_16.py", line 75, in parse_data
game_data.score.append(score)
AttributeError: 'GameData' object has no attribute 'score'
回溯(最近一次呼叫最后一次):
文件“C:/Users/harsh/AppData/Roaming/JetBrains/PyCharmCE2021.1/scratches/scratch_16.py”,第98行,in
游戏数据=解析数据(html)
文件“C:/Users/harsh/AppData/Roaming/JetBrains/PyCharmCE2021.1/scratch/scratch_16.py”,第75行,在parse_数据中
游戏数据。分数。附加(分数)
AttributeError:“GameData”对象没有属性“score”
如何在此处包含“If NA then NaN else get score”参数?您可以使用
hasattr
函数检查对象是否具有属性。它有两个参数,第一个是对象
本身,第二个是要查找的属性
。在你的情况下是这样的:
if hasattr(gamedata, 'score'):
gamedata.score.append(score)
问题
你的代码有很多问题。导致此错误的原因是您未在GameData
类中的初始值设定方法中定义score
属性