Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/277.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从文件夹中读取HTML文件时出现的问题_Python_Html - Fatal编程技术网

Python 从文件夹中读取HTML文件时出现的问题

Python 从文件夹中读取HTML文件时出现的问题,python,html,Python,Html,我有两个HTML文件,我想从中读取,就好像它们是网站一样,但我在开始的date\u部分行中遇到错误,这使我认为我没有正确读取文件。我用于保存到文件的代码: game_links = [ 'https://rugby.statbunker.com/competitions/MatchDetails/Gallagher-Premiership-19/20/Harlequins-VS-Bristol-Bears?comp_id=609&match_id=39862&date=2

我有两个HTML文件,我想从中读取,就好像它们是网站一样,但我在开始的
date\u部分
行中遇到错误,这使我认为我没有正确读取文件。我用于保存到文件的代码:

game_links = [
    'https://rugby.statbunker.com/competitions/MatchDetails/Gallagher-Premiership-19/20/Harlequins-VS-Bristol-Bears?comp_id=609&match_id=39862&date=26-Oct-2019',
    'https://rugby.statbunker.com/competitions/MatchDetails/World-Cup-2007/France-VS-Argentina?comp_id=239&match_id=15479&date=07-Sep-2007'
]
for link in game_links:
    response = requests.get(link)
    html_loop = response.content
    soup_loop = BeautifulSoup(html_loop, 'html.parser')
    print(soup_loop)
每个输出都保存为自己的html文件。我运行的代码用于从中提取数据:

import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime
import time
import uuid

game_links = [open('test1.html', 'r', encoding='utf-8'), open('test2.html', 'r', encoding='utf-8')]

game = {}

for link in game_links:
    soup_loop = link.read()

    game['uuid'] = uuid.uuid1()
    date_part = soup_loop.find('img', {'src': '/images/date.png'}).text
    time_part = soup_loop.find('img', {'src': '/images/kickoff.png'}).text
    if time_part == '':
        game['datetime'] = datetime.strptime(date_part, '%d %b %Y')
    else:
        game['datetime'] = datetime.combine(datetime.strptime(date_part, '%d %b %Y'), datetime.strptime(time_part, '%H:%M').time())
    print(game)

读取文件后,应使用BeautifulSoup再次解析该文件:

for link in game_links:
    text = link.read()
    soup_loop = BeautifulSoup(text, 'html.parser')
    game['uuid'] = uuid.uuid1()
    date_part = soup_loop.find('img', {'src': '/images/date.png'}).text
    time_part = soup_loop.find('img', {'src': '/images/kickoff.png'}).text
    if time_part == '':
        game['datetime'] = datetime.strptime(date_part, '%d %b %Y')
    else:
        game['datetime'] = datetime.combine(datetime.strptime(date_part, '%d %b %Y'), datetime.strptime(time_part, '%H:%M').time())
    print(game)

读取文件后,应使用BeautifulSoup再次解析该文件:

for link in game_links:
    text = link.read()
    soup_loop = BeautifulSoup(text, 'html.parser')
    game['uuid'] = uuid.uuid1()
    date_part = soup_loop.find('img', {'src': '/images/date.png'}).text
    time_part = soup_loop.find('img', {'src': '/images/kickoff.png'}).text
    if time_part == '':
        game['datetime'] = datetime.strptime(date_part, '%d %b %Y')
    else:
        game['datetime'] = datetime.combine(datetime.strptime(date_part, '%d %b %Y'), datetime.strptime(time_part, '%H:%M').time())
    print(game)

首先需要创建soup对象。大多数情况下,您可以这样做:
soup=BeautifulSoup(soup\u loop)
然后是代码的其余部分。首先,在使用打开的文件之前,将其挂起是非常糟糕的做法。除非有特殊原因,否则您应该让游戏链接包含文件名,并在循环、进程和关闭中逐个打开它们。关于问题本身,请添加您正在获取的错误消息。您需要首先创建soup对象。大多数情况下,您可以这样做:
soup=BeautifulSoup(soup\u loop)
然后是代码的其余部分。首先,在使用打开的文件之前,将其挂起是非常糟糕的做法。除非有特殊原因,否则您应该让游戏链接包含文件名,并在循环、进程和关闭中逐个打开它们。关于问题本身,请添加您收到的错误消息