Python 使用Beauty soup进行网页抓取（体育数据）_Python_Web Scraping_Beautifulsoup

Python 使用Beauty soup进行网页抓取（体育数据）

python web-scraping

Python 使用Beauty soup进行网页抓取（体育数据）,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,当我尝试加载这段代码时，我得到两个错误。 1：第一个问题是，我不能正确地为name_文本刮取数据 2：我得到了team=name\u text.div.text的缩进错误。我知道这可能很容易解决，但我尝试了不同的缩进，似乎没有任何效果在这个网站上，我想知道球队的名字和胜算 <div class="size14_f7opyze Endeavour_fhudrb0 medium_f1wf24vo participantText_fivg86r" data-automation-id="par

当我尝试加载这段代码时，我得到两个错误。 1：第一个问题是，我不能正确地为name_文本刮取数据

2：我得到了team=name\u text.div.text的缩进错误。我知道这可能很容易解决，但我尝试了不同的缩进，似乎没有任何效果

在这个网站上，我想知道球队的名字和胜算

<div class="size14_f7opyze Endeavour_fhudrb0 medium_f1wf24vo participantText_fivg86r" data-automation-id="participant-one">Orlando Magic</div>
<div class="priceText_f71sibe"><span class="size14_f7opyze medium_f1wf24vo priceTextSize_frw9zm9" data-automation-id="price-text">5.85</span></div>

任何帮助都会很好。干杯。

您的

for loop

缩进不正确。正确的压痕应为：

对于价格文本中的价格文本：
团队=名称\u text.div.text
赔率=价格_text.span.text
团队=名称\u text.div.text
赔率=价格_text.span.text
打印（赔率）
打印（团队+赔率）
f、 写（团队+“，“+赔率+”\n”）
f、 关闭（）

在队伍和赔率之前有4个空格。请仔细阅读

此外，没有

price\u文本

变量。您需要在查找时分配它。如果您忘记了“S”：

price_text=soup.findAll（“div”，“class”：“priceText_f71sibe”}）

最后一件事，考虑使用而不是<代码>（）/<代码>和<>代码>关闭（）/>代码>写入文件。

我想你可以做的只是迭代并将它们存储到列表中，然后写入文件。不幸的是，我无法在工作中访问该站点，因此我无法测试代码，但我相信这应该会提供您想要的输出：

from bs4 import BeautifulSoup
from urllib.request import urlopen as uReq
import csv
from itertools import zip_longest

my_url = 'https://www.sportsbet.com.au/betting/basketball-us'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

soup = BeautifulSoup(page_html, "html.parser")

price_text = soup.findAll("span",{"data-automation-id":"price-text"})
name_text = soup.findAll("div",{"data-automation-id":"participant-one"})

team_list = [ name.text.strip() for name in name_text ]
odds_list = [ price.text.strip() for price in price_text ]

d = [team_list, odds_list]
export_data = zip_longest(*d, fillvalue = '')
with open('odds.csv', 'w', encoding="ISO-8859-1", newline='') as myfile:
      wr = csv.writer(myfile)
      wr.writerow(("Team", "odds_team"))
      wr.writerows(export_data)
myfile.close()

你能试试这个吗

from bs4 import BeautifulSoup
from urllib.request import urlopen as uReq
my_url = 'https://www.sportsbet.com.au/betting/basketball-us'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

soup = BeautifulSoup(page_html, "html.parser")

price_texts = soup.findAll("div",{"class":"priceText_f71sibe"})
name_texts = soup.findAll("div",{"class":"size14_f7opyze Endeavour_fhudrb0 medium_f1wf24voparticipantText_fivg86r"})
filename = "odds.csv"
f = open(filename,"w")
headers = "Team, odds_team\n"
print(name_text)
f.write(headers)

odds =''
team=''
for price_text in price_texts:
    odds = price_text.text
for name_text in name_texts:
    team = name_text.text
print(odds)
print(team + odds)
f.write(team + "," + odds + "\n")
f.close()

这是远远超出我的能力，但工作没有错误。但是，它不会生成团队名称列表。标题1下面没有名字，但标题2下面有赔率。好的。当我开始工作时，我会调整它。就像我说的，你可以访问这个网站。确保团队的findAll正确地抓取了团队名称。另一种可能是有空白。现在我将编辑它，但正如我所说的，我将在稍后进行更深入的研究。看看OP，我可能会选择使用数据自动化id，而不是类。就像我说的，我会稍微修改一下代码，然后解释每一点，这样你就能理解它在做什么。@mclo如果你愿意，给我发封电子邮件给jason。schvach@gmail.com. 从声音上看，你似乎是在为了某种任务而学习这个。我可以帮助你更深入一点，逐行解释如何实现你的目标，而不仅仅是发布解决方案（这超出了statckoverflow的范围。这段代码可能看起来远远超出了基本能力，但经过一点理解/练习，这就变得非常基本了。@mclo我对代码进行了编辑。尝试一下，看看你得到了什么注释。循环现在更有意义了。你重复team=name\u text.di有什么原因吗v、 text和赔率=price\u text.span.text？当我到达line team=name\u text.div.text时，我得到一个错误。错误表明名称“name\u text”没有定义，但我不明白为什么，因为我对赔率做了类似的处理，对赔率也很好。这几乎可以工作。它得到一个错误名称错误：名称“team”没有定义。这是一样的错误如上面的回答。我被难住了。这个项目对于学习基本python的人来说太高级了。我的错。在for循环之前，你必须定义团队和赔率。我已经编辑了它

from bs4 import BeautifulSoup
from urllib.request import urlopen as uReq
my_url = 'https://www.sportsbet.com.au/betting/basketball-us'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

soup = BeautifulSoup(page_html, "html.parser")

price_texts = soup.findAll("div",{"class":"priceText_f71sibe"})
name_texts = soup.findAll("div",{"class":"size14_f7opyze Endeavour_fhudrb0 medium_f1wf24voparticipantText_fivg86r"})
filename = "odds.csv"
f = open(filename,"w")
headers = "Team, odds_team\n"
print(name_text)
f.write(headers)

odds =''
team=''
for price_text in price_texts:
    odds = price_text.text
for name_text in name_texts:
    team = name_text.text
print(odds)
print(team + odds)
f.write(team + "," + odds + "\n")
f.close()