Python 有没有办法用selenium或beautifulsoup4替代提取元素?
有趣的问题。我正在用selenium抓取一个投注点,然后用bs4处理。问题是,站点加载赔率信息的方式与加载团队名称的方式不同。例如:Python 有没有办法用selenium或beautifulsoup4替代提取元素?,python,beautifulsoup,selenium-chromedriver,Python,Beautifulsoup,Selenium Chromedriver,有趣的问题。我正在用selenium抓取一个投注点,然后用bs4处理。问题是,站点加载赔率信息的方式与加载团队名称的方式不同。例如: London v Tokyo 2/1 4/1 Amsterdam v Helsinki 5/1 3/1 New York v California 7/1 10/1 当我把它拉出来,迭代它时,结果是这样的: Names = [London, Tokyo, Amsterdam, Helsinki] Odds =
London v Tokyo 2/1 4/1
Amsterdam v Helsinki 5/1 3/1
New York v California 7/1 10/1
当我把它拉出来,迭代它时,结果是这样的:
Names = [London, Tokyo, Amsterdam, Helsinki]
Odds = [2/1, 5/1, 4/1, 3/1, 7/1, 10/1]
可能性是从上到下,从左到右,以不同长度的块加载。这意味着,当我试图将这些名字和赔率拼接在一起时,它们不会匹配
我的问题是,我怎样才能避开这个问题?我希望最终将信息公布出来,以便团队名称后面紧跟着它的赔率:
Games = [London, 2/1, Tokyo, 4/1, Amsterdam, 5/1, Helsinki, 3/1, New York, 7/1, California, 10/1]
**更新**
该网站是:
如果你有一个登录页,那么只需点击浏览即可。然后在左侧面板上显示“电子竞技”,然后在中间页显示“所有比赛”
代码:
球队像街区一样进入,例如:“伦敦对东京”。
因此,为了分离团队名称,我迭代并拆分它们
for name in teams_text:
first, second = name.split(" v ")
new_teams.append(first)
new_teams.append(second)
然后我将收到的赔率换算成小数:
for odd in odds_raw:
odds.append(odd.text)
for odd in odds:
first, second = odd.split("/")
new_odd = (int(first) / int(second)) + 1
new_odds.append(round(new_odd, 2))
现在我有一个所有团队名称的列表,还有一个十进制奇数值的列表。这就是我的问题所在。bet365在比赛中产生赔率的方式是在每个游戏分区的不同长度的垂直块中
因此,如果可能性如下所示:
Division 1
London v Tokyo 1 2
Amsterdam v Helsinki 3 4
Division 2
New York v California 5 6
Division 3
Sydney v Brisbane 7 8
Bali v Singapore 9 10
Berlin v Paris 11 12
Names = ["London", "Tokyo", "Amsterdam", "Helsinki","New York","California"]
Odds = [2/1, 5/1, 4/1, 3/1, 7/1, 10/1]
start_nmb = 1
for nmb, odd in enumerate(Odds):
Names.insert(start_nmb, odd)
start_nmb += 2
然后当我把它们拉出来的时候,几率会是这样的:
[1, 3, 2, 4, 5, 6, 7, 9, 11, 8, 10, 12]
当分割长度不同时,我很难弄清楚如何处理它。您可以使用正则表达式捕获元素
import re
s = '''London v Tokyo 2/1 4/1 Amsterdam v Helsinki 5/1 3/1 New York v California 7/1 10/1'''
re.findall(r'(\w+)\s+v\s+(\w+)\s+(\d+/\d+)\s+(\d+/\d+)', s)
[('London', 'Tokyo', '2/1', '4/1'),
('Amsterdam', 'Helsinki', '5/1', '3/1'),
('York', 'California', '7/1', '10/1')]
您可以使用正则表达式来捕获元素
import re
s = '''London v Tokyo 2/1 4/1 Amsterdam v Helsinki 5/1 3/1 New York v California 7/1 10/1'''
re.findall(r'(\w+)\s+v\s+(\w+)\s+(\d+/\d+)\s+(\d+/\d+)', s)
[('London', 'Tokyo', '2/1', '4/1'),
('Amsterdam', 'Helsinki', '5/1', '3/1'),
('York', 'California', '7/1', '10/1')]
您可以使用
for
循环实现所需的输出,如下所示:
Division 1
London v Tokyo 1 2
Amsterdam v Helsinki 3 4
Division 2
New York v California 5 6
Division 3
Sydney v Brisbane 7 8
Bali v Singapore 9 10
Berlin v Paris 11 12
Names = ["London", "Tokyo", "Amsterdam", "Helsinki","New York","California"]
Odds = [2/1, 5/1, 4/1, 3/1, 7/1, 10/1]
start_nmb = 1
for nmb, odd in enumerate(Odds):
Names.insert(start_nmb, odd)
start_nmb += 2
输出:
['London', 2.0, 'Tokyo', 5.0, 'Amsterdam', 4.0, 'Helsinki', 3.0, 'New York', 7.0, 'California', 10.0]
希望这有帮助 您可以使用
for
循环实现所需的输出,如下所示:
Division 1
London v Tokyo 1 2
Amsterdam v Helsinki 3 4
Division 2
New York v California 5 6
Division 3
Sydney v Brisbane 7 8
Bali v Singapore 9 10
Berlin v Paris 11 12
Names = ["London", "Tokyo", "Amsterdam", "Helsinki","New York","California"]
Odds = [2/1, 5/1, 4/1, 3/1, 7/1, 10/1]
start_nmb = 1
for nmb, odd in enumerate(Odds):
Names.insert(start_nmb, odd)
start_nmb += 2
输出:
['London', 2.0, 'Tokyo', 5.0, 'Amsterdam', 4.0, 'Helsinki', 3.0, 'New York', 7.0, 'California', 10.0]
希望这有帮助 这里有一个冗长的尝试方法。赔率的奇数行(由循环确定)进入团队1(团队1和团队2的左手边)。偶数行进入团队2。列表列表被展平。然后,如@user942640的回答中所示,列表被合并为候补成员 注意:这依赖于等长列表来返回准确的结果
import itertools
from bs4 import BeautifulSoup as bs
#your existing code to get to page and wait for presence of all elements
soup = bs(driver.page_source, 'lxml')
teams = [item.text.split(' v ') for item in soup.select('.sl-CouponParticipantWithBookCloses_NameContainer')]
i = 0
team1 = []
team2 = []
for item in soup.select('.sl-MarketCouponValuesExplicit2'):
if i % 2 == 0:
team1.append([i.text for i in item.select('div:not(.gl-MarketColumnHeader )')])
else:
team2.append([i.text for i in item.select('div:not(.gl-MarketColumnHeader )')])
i+=1
team1 = [item for sublist in team1 for item in sublist]
team2 = [item for sublist in team2 for item in sublist]
teams = [item for sublist in teams for item in sublist]
team_odds = [x for x in itertools.chain.from_iterable(itertools.zip_longest(team1,team2)) if x]
final = [x for x in itertools.chain.from_iterable(itertools.zip_longest(teams, team_odds)) if x]
print(final)
比如(注意概率不断更新):
这里有一个冗长的尝试方法。奇数行(由循环确定)的赔率进入团队1(团队1和团队2的左手边)。偶数行进入团队2。列表列表被展平。然后,列表被@user942640按回答中所示合并为候补成员 注意:这依赖于等长列表来返回准确的结果
import itertools
from bs4 import BeautifulSoup as bs
#your existing code to get to page and wait for presence of all elements
soup = bs(driver.page_source, 'lxml')
teams = [item.text.split(' v ') for item in soup.select('.sl-CouponParticipantWithBookCloses_NameContainer')]
i = 0
team1 = []
team2 = []
for item in soup.select('.sl-MarketCouponValuesExplicit2'):
if i % 2 == 0:
team1.append([i.text for i in item.select('div:not(.gl-MarketColumnHeader )')])
else:
team2.append([i.text for i in item.select('div:not(.gl-MarketColumnHeader )')])
i+=1
team1 = [item for sublist in team1 for item in sublist]
team2 = [item for sublist in team2 for item in sublist]
teams = [item for sublist in teams for item in sublist]
team_odds = [x for x in itertools.chain.from_iterable(itertools.zip_longest(team1,team2)) if x]
final = [x for x in itertools.chain.from_iterable(itertools.zip_longest(teams, team_odds)) if x]
print(final)
比如(注意概率不断更新):
更新了链接和代码Yep,在我将其更改为十进制后,我希望它存储为:[Gen.G,1.44,Team Envy,2.62],但不会以这种方式更新链接和代码Yep,在我将其更改为十进制后,我希望它存储为:[Gen.G,1.44,Team Envy,2.62]但是它不是以那种方式出现的尝试了这个,不幸的是它似乎不起作用!我用更多的信息更新了这个问题,如果你愿意阅读的话。谢谢你的建议。尝试了这个,不幸的是它似乎不起作用!我用更多的信息更新了这个问题,如果你愿意阅读的话。谢谢你的建议好哇。这真是太棒了。这里也有一些很好的优雅的解决方案来缩短我的代码,我会注意的。太棒了!从中学到了很多,非常感谢:)哇。这真是太棒了。这里也有一些很好的优雅的解决方案来缩短我的代码,我会注意的。太棒了!从中学到了很多,非常感谢:)