Python 使用select获取td文本
我试图获得链接的几率,但我得到了一个错误。你知道我做错了什么吗 多谢各位Python 使用select获取td文本,python,select,html-table,beautifulsoup,Python,Select,Html Table,Beautifulsoup,我试图获得链接的几率,但我得到了一个错误。你知道我做错了什么吗 多谢各位 import requests from bs4 import BeautifulSoup as bs url = 'https://www.oddsportal.com/soccer/spain/laliga' r = requests.get(url, headers = {'User-Agent' : 'Mozilla/5.0'}) soup = bs(r.content, 'lxml') ##print([a.
import requests
from bs4 import BeautifulSoup as bs
url = 'https://www.oddsportal.com/soccer/spain/laliga'
r = requests.get(url, headers = {'User-Agent' : 'Mozilla/5.0'})
soup = bs(r.content, 'lxml')
##print([a.text for a in soup.select('#tournamentTable tr[xeid] [href*=soccer]')])
print([b.text for b in soup.select('#tournamentTable td[xodd]')])
我希望得到10行3列,每奇数一行。
但是,我有以下错误
Traceback (most recent call last):
File "/Users/.py", line 14, in <module>
print([b.text for b in soup.select('#tournamentTable td[xodd]')])
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/bs4/element.py", line 1376, in select
return soupsieve.select(selector, self, namespaces, limit, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/soupsieve/__init__.py", line 114, in select
return compile(select, namespaces, flags, **kwargs).select(tag, limit)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/soupsieve/__init__.py", line 63, in compile
return cp._cached_css_compile(pattern, namespaces, custom, flags)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/soupsieve/css_parser.py", line 214, in _cached_css_compile
CSSParser(pattern, custom=custom_selectors, flags=flags).process_selectors(),
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/soupsieve/css_parser.py", line 1113, in process_selectors
return self.parse_selectors(self.selector_iter(self.pattern), index, flags)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/soupsieve/css_parser.py", line 946, in parse_selectors
key, m = next(iselector)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/soupsieve/css_parser.py", line 1100, in selector_iter
raise SelectorSyntaxError(msg, self.pattern, index)
File "<string>", line None
soupsieve.util.SelectorSyntaxError: Invalid character '\x1b' position 17
line 1:
#tournamentTable td[xodd]
^
...
看起来您在tournamentTable和td[xodd]之间有错误的字符。它可能看起来像空格,但有\x1b代码。您可以尝试删除此字符并再次放入空格 我可以运行您的代码而不会出现此错误。但是这个页面使用JavaScript来获取数据,而BS不能运行JavaScript。您可能需要Selenium来控制web浏览器,该浏览器可以运行JavaScript,并且可以获取包含数据的HTML 或者,您可以使用Chrome/Firefox中的DevTool检查JavaScript是否从某个url读取数据,以及是否从同一url读取数据 我找到了url
https://fb.oddsportal.com/ajax-sport-country-tournament/1/YLO7JZEA/X0/1/?_=1558215347943
最后一部分是时间戳为*1000的当前日期
import datetime
print(datetime.datetime.fromtimestamp(1558215347943/1000))
# 2019-05-18 23:35:47.943000
dt = datetime.datetime.now()
print(int(dt.timestamp()*1000))
# 1558216525573
使用requests.Session和更好的标题,我可以从这个url读取。它以JavaScript代码的形式提供数据,但在剪切了某些部分后,我得到了JSON格式的数据,可以将其转换为Python字典
import requests
from bs4 import BeautifulSoup as bs
import json
s = requests.Session()
headers = {
'User-Agent' : 'Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0'
}
url = 'https://www.oddsportal.com/soccer/spain/laliga'
r = s.get(url, headers=headers)
soup = bs(r.content, 'lxml')
print(r.text.find('xodd'))
print([b.text for b in soup.select('#tournamentTable td[xodd]')])
headers = {
'User-Agent' : 'Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0',
'Referer': 'https://www.oddsportal.com/soccer/spain/laliga/',
}
r = s.get('https://fb.oddsportal.com/ajax-sport-country-tournament/1/YLO7JZEA/X0/1/?_=1558215347943', headers=headers)
text = r.text[len("globals.jsonpCallback('/ajax-sport-country-tournament/1/YLO7JZEA/X0/1/', "):-2]
data = json.loads(text)
for key, val in data['d']['oddsData'].items():
print('xeid:', key)
print('xoid:', val['odds'][0]['oid'], 'avg:', val['odds'][0]['avg'])
print('xoid:', val['odds'][0]['oid'], 'avg:', val['odds'][1]['avg'])
print('xoid:', val['odds'][0]['oid'], 'avg:', val['odds'][2]['avg'])
print('---')
结果:
xeid: ltB92yKu
xoid: 35vjqxv464x0x7qrck avg: 2.16
xoid: 35vjqxv464x0x7qrck avg: 3.5
xoid: 35vjqxv464x0x7qrck avg: 3.44
---
xeid: SW9D1eZo
xoid: 35vjrxv464x0x7qrcm avg: 1.33
xoid: 35vjrxv464x0x7qrcm avg: 5.71
xoid: 35vjrxv464x0x7qrcm avg: 8.83
---
xeid: Mg9H0Flh
xoid: 35vjsxv464x0x7qrco avg: 1.99
xoid: 35vjsxv464x0x7qrco avg: 3.79
xoid: 35vjsxv464x0x7qrco avg: 3.68
---
xeid: zcDLaZ3b
xoid: 35vjtxv464x0x7qrcq avg: 1.57
xoid: 35vjtxv464x0x7qrcq avg: 4.38
xoid: 35vjtxv464x0x7qrcq avg: 5.95
---
['4.26', '4.07', '1.80', '1.99', '3.79', '3.68', '1.57', '4.38', '5.95', '2.13', '3.19', '3.94', '7.82', '5.00', '1.41', '2.16', '3.50', '3.44', '1.33', '5.71', '8.83', '2.58', '3.52', '2.73', '1.49', '5.31', '5.66', '4.03', '4.21', '1.82']
编辑:使用Selenium
import selenium.webdriver
url = 'https://www.oddsportal.com/soccer/spain/laliga'
driver = selenium.webdriver.Firefox()
driver.get(url)
items = driver.find_elements_by_css_selector("#tournamentTable td[xodd]")
print([x.text for x in items])
结果:
xeid: ltB92yKu
xoid: 35vjqxv464x0x7qrck avg: 2.16
xoid: 35vjqxv464x0x7qrck avg: 3.5
xoid: 35vjqxv464x0x7qrck avg: 3.44
---
xeid: SW9D1eZo
xoid: 35vjrxv464x0x7qrcm avg: 1.33
xoid: 35vjrxv464x0x7qrcm avg: 5.71
xoid: 35vjrxv464x0x7qrcm avg: 8.83
---
xeid: Mg9H0Flh
xoid: 35vjsxv464x0x7qrco avg: 1.99
xoid: 35vjsxv464x0x7qrco avg: 3.79
xoid: 35vjsxv464x0x7qrco avg: 3.68
---
xeid: zcDLaZ3b
xoid: 35vjtxv464x0x7qrcq avg: 1.57
xoid: 35vjtxv464x0x7qrcq avg: 4.38
xoid: 35vjtxv464x0x7qrcq avg: 5.95
---
['4.26', '4.07', '1.80', '1.99', '3.79', '3.68', '1.57', '4.38', '5.95', '2.13', '3.19', '3.94', '7.82', '5.00', '1.41', '2.16', '3.50', '3.44', '1.33', '5.71', '8.83', '2.58', '3.52', '2.73', '1.49', '5.31', '5.66', '4.03', '4.21', '1.82']
完成…………看起来您在tournamentTable和td[xodd]之间有错误的字符。它可能看起来像空格,但有\x1b代码。您可以尝试删除此字符并再次放入空格。我可以运行代码而不会出错-但页面使用JavaScript来放入数据td[xodd],而BS无法运行JavaScript,因此代码无法获取此数据。您将需要Selenium来控制可以运行JavaScript的web浏览器。我如何使用Selenium实现这一点?有关如何在开发工具中找到url的文档??谢谢,我在Firefox中打开了DevTool,进入tab网络,加载了页面,看到了从浏览器到服务器的所有请求。只有选择XHR AJAX和/或JS的选项。我选择了两者,并开始检查所有请求的响应——这个url似乎更有趣,所以我检查了是否有表中的值——即4.26