Python ';非类型';由于HTML中的更改而导致错误。问:您如何帮助更改数据格式?

Python ';非类型';由于HTML中的更改而导致错误。问:您如何帮助更改数据格式?,python,html,python-3.x,beautifulsoup,Python,Html,Python 3.x,Beautifulsoup,我在下面的HTML中遇到了一个错误,给了我一个错误(两个HTML之间缺少数据)。 我只想刮取强标记后的数据,很好,1:56:5和1:56.5 <td><strong>Track Rating:</strong> GOOD</td> <td></td> <td><strong>Gross Time:</strong> 1:56:5</td> <td><stron

我在下面的HTML中遇到了一个错误,给了我一个错误(两个HTML之间缺少数据)。 我只想刮取强标记后的数据,很好,1:56:5和1:56.5

<td><strong>Track Rating:</strong> GOOD</td>
<td></td>
<td><strong>Gross Time:</strong> 1:56:5</td>
<td><strong>Mile Rate:</strong> 1:56:5</td>
哪张照片 ['Track Rating:GOOD'、'Gross Time:2:05:1'、'Mile Rate:1:56:4'、'Lead Time:8.1'] [‘第一季度:29.4’、‘第二季度:32’、‘第三季度:28.4’、‘第四季度:27.2’] ['保证金:HFHD x HFNK']


我想要斜体的数据,但前提是它符合标题。我尝试使用的大多数if语句都会给出错误-“list”对象没有属性“string”或类似于我尝试访问嵌套列表中的文本的内容。这里有什么想法吗?

您可以通过几个嵌套的if添加一些不安全的内容,但是如果您必须为每个可能不返回任何内容的查找添加if,则会变得非常混乱。尝试以下方法:

另一个解决方案

from simplified_scrapy import SimplifiedDoc,req,utils
html = '''
<table class="meetingListFull">
<td><strong>Track Rating:</strong> GOOD</td>
<td><strong>Gross Time:</strong> 2:29:6</td>
<td><strong>Mile Rate:</strong> 1:58:6</td>
<td><strong>Lead Time:</strong> 30.3</td>
</table>
'''
doc = SimplifiedDoc(html)
table1 = doc.select('table.meetingListFull')
strongs = table1.selects('strong')
print([(s.text,s.nextText()) for s in strongs])

这里有更多的例子

是的,所以即使我做了嵌套的ifs,你发送的链接假定标题与我想要的html看起来不一致的数据对齐……你能分享你正在使用的html文件的更大上下文吗?我可以复制吗?我可以把它寄给你吗?对不起,与stackoverflow不同,它比字符余量大得多?我认为使用
find_-all(“tr”)
find_-all(“td”)
的方法可以很好地处理空的
td
标记。这对丢失的数据没有帮助-s.nexttext不会返回此实例中需要的内容?@BradLangtry我使用的是strong,而不是td。您应该运行代码以查看结果。当我运行此命令时,我得到类型错误:不可损坏的类型:“slice'``doc=SimplifiedDoc(tableoftimes)table123=doc.select('table.meetingListFull')strongs=table123.selects('strong')print([(s.text,s.nextText())表示strongs中的s]) ``` @dabingsou@BradLangtry它应该是由参数tablofTimes引起的。SimplifiedDoc方法接收一个字符串。
from datetime import datetime, date, timedelta
import requests
import re
import csv
import os
import numpy
import pandas as pd
from bs4 import BeautifulSoup as bs

base_url = "http://www.harness.org.au/racing/results/?firstDate="
base1_url = "http://www.harness.org.au"

webpage_response = requests.get('http://www.harness.org.au/racing/results/?firstDate=')

soup = bs(webpage_response.content, "html.parser")

format = "%d-%m-%y"
delta = timedelta(days=1)
yesterday = datetime.today() - timedelta(days=1)


enddate = datetime(2019, 1, 1)



while enddate <= yesterday:
    enddate += timedelta(days=1)
    enddate1 = enddate.strftime("%d-%m-%y") 
    new_url = base_url + str(enddate1)
    soup12 = requests.get(new_url)
    soup1 = bs(soup12.content, "html.parser") 
    table1 = soup1.find('table', class_='meetingListFull')
    
    tr = table1.find_all('tr', {'class':['odd', 'even']})
    
    for tr1 in tr or trr:
        tr2 = tr1.find('a').get_text()
        tr3 = tr1.find('a')['href']
        newurl = base1_url + tr3
        with requests.Session() as s:
            webpage_response = s.get(newurl)
            soup = bs(webpage_response.content, "html.parser")
            #soup1 = soup.select('.content')
            results = soup.find_all('div', {'class':'forPrint'})
....
for race in results:
tableoftimes = race.find('table', class_='raceTimes')
trackrating = tableoftimes.find(text="Track Rating:").findPrevious('td').contents[1]
grosstime = tableoftimes.find(text="Track Rating:").find_next('td').contents[1]
milerate = tableoftimes.find(text="Gross Time:").findNext('td').contents[1]
leadtime = tableoftimes.find(text="Mile Rate:").findNext('td').contents[1]
firstquarter = tableoftimes.find(text="Lead Time:").findNext('td').contents[1]
....
tableoftimes = race.find('table', class_='raceTimes')
                for row in tableoftimes.find_all('tr'):
                    string23 = [td.get_text() for td in row.find_all('td')]
for row in table.find_all("tr")[1:]:
    datarow = [td.get_text() for td in row.find_all("td")]
from simplified_scrapy import SimplifiedDoc,req,utils
html = '''
<table class="meetingListFull">
<td><strong>Track Rating:</strong> GOOD</td>
<td><strong>Gross Time:</strong> 2:29:6</td>
<td><strong>Mile Rate:</strong> 1:58:6</td>
<td><strong>Lead Time:</strong> 30.3</td>
</table>
'''
doc = SimplifiedDoc(html)
table1 = doc.select('table.meetingListFull')
strongs = table1.selects('strong')
print([(s.text,s.nextText()) for s in strongs])
[('Track Rating:', 'GOOD'), ('Gross Time:', '2:29:6'), ('Mile Rate:', '1:58:6'), ('Lead Time:', '30.3')]