Python 如何从现场得分中获取足球比赛结果？_Python_Web Scraping_Beautifulsoup_Python 3.4_Urllib

Python 如何从现场得分中获取足球比赛结果？

python web-scraping

Python 如何从现场得分中获取足球比赛结果？,python,web-scraping,beautifulsoup,python-3.4,urllib,Python,Web Scraping,Beautifulsoup,Python 3.4,Urllib,我有这个项目，我正在使用python 3.4。我想从livescore.com上获取足球比赛的分数（结果），例如获得当天的所有分数（英格兰2-2挪威，法国2-1意大利等）。我正在用python 3.4、windows 10 64位操作系统构建它我尝试了两种方法，这是代码： import bs4 as bs import urllib.request sauce = urllib.request.urlopen('http://www.livescore.com/').read() soup

我有这个项目，我正在使用python 3.4。我想从livescore.com上获取足球比赛的分数（结果），例如获得当天的所有分数（英格兰2-2挪威，法国2-1意大利等）。我正在用python 3.4、windows 10 64位操作系统构建它

我尝试了两种方法，这是代码：

import bs4 as bs
import urllib.request

sauce = urllib.request.urlopen('http://www.livescore.com/').read()
soup = bs.BeautifulSoup(sauce,'lxml')

for div in soup.find_all('div', class_='container'):
    print(div.text)

当我运行这段代码时，一只盒子小狗说：

IDLE的子进程未建立连接。空闲无法启动子进程，或者防火墙软件正在阻止连接

我决定写另一个这是代码：

# Import Modules
import urllib.request
import re

# Downloading Live Score XML Code From Website and reading also
xml_data = urllib.request.urlopen('http://static.cricinfo.com/rss/livescores.xml').read()

# Pattern For Searching Score and link
pattern = "<item>(.*?)</item>"

# Finding Matches
for i in re.findall(pattern, xml_data, re.DOTALL):
    result = re.split('<.+?>',i)
    print (result[1], result[3]) # Print Score

#导入模块
导入urllib.request
进口稀土
#从网站下载Live Score XML代码并阅读
xml\u data=urllib.request.urlopen（'http://static.cricinfo.com/rss/livescores.xml）。读（）
#搜索分数和链接的模式
pattern=“（.*）”
#查找匹配项
对于re.findall（模式、xml_数据、re.DOTALL）中的i：
结果=重新拆分（“”，i）
打印（结果[1]，结果[3]）#打印分数

我得到了这个错误：

Traceback (most recent call last):
  File "C:\Users\Bright\Desktop\live_score.py", line 12, in <module>
   for i in re.findall(pattern, xml_data, re.DOTALL):
  File "C:\Python34\lib\re.py", line 206, in findall
    return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object

回溯（最近一次呼叫最后一次）：
文件“C:\Users\Bright\Desktop\live\u score.py”，第12行，在
对于re.findall（模式、xml_数据、re.DOTALL）中的i：
文件“C:\Python34\lib\re.py”，第206行，在findall中
返回编译（模式、标志）.findall（字符串）
TypeError:无法在类似字节的对象上使用字符串模式

在您的第一个示例中，该站点正在通过大量javascript加载其内容，因此我建议在获取源代码时使用selenium

您的代码应该如下所示：

import bs4 as bs
from selenium import webdriver

url = 'http://www.livescore.com/'
browser = webdriver.Chrome()
browser.get(url)
sauce = browser.page_source
browser.quit()
soup = bs.BeautifulSoup(sauce,'lxml')

for div in soup.find('div', attrs={'data-type': 'container'}).find_all('div'):
    print(div.text)

对于第二个示例，它的正则表达式引擎返回一个错误，因为请求中的

read（）

函数提供字节数据类型，“re”只接受字符串或unicode。因此，您不必将xml_数据键入str

这是修改后的代码：

for i in re.findall(pattern, str(xml_data), re.DOTALL):
    result = re.split('<.+?>',i)
    print (result[1], result[3]) # Print Score

对于re.findall（模式、str（xml\u数据）、re.DOTALL中的i：
结果=重新拆分（“”，i）
打印（结果[1]，结果[3]）#打印分数

Hi@Chad当我使用chrome检查元素时，我上传了一个我想要刮取的页面截图，我根据Div的目标位置用颜色标记了Div标签和网站布局。请帮助我查看代码（第一个代码）谢谢！我编辑了我的答案，请看一看。如果你需要澄清，请告诉我。如果这解决了您的问题，请将其标记为答案。谢谢您好@Chad我上传了一张运行您添加的代码时出错的图片，第一个代码，请给我您的facebook信息，以便我可以添加，这是我的brightgodwin47@yahoo.com谢谢。pip安装chromedriver 2。更新代码：

browser=webdriver.Chrome（可执行文件\u path=r'C:\Python34\chromedriver Windows'）

请使用selenium，我需要先打开页面，然后它才能工作