Python 刮纸错误_Python - Fatal编程技术网

Python 刮纸错误

python

Python 刮纸错误,python,Python,我正在用下面的代码从网站中提取一些数据，但是我在这一行的持续时间方面遇到了一些问题duration=tr.select（'td.duration'）[0]。contents[0]。strip（），这会引发下面的异常。请告诉我如何修复这一行，谢谢，以便提取持续时间数据。我在网上搜索过类似的问题，但它们并没有完全回答我的问题 # import needed libraries from mechanize import Browser from bs4 import BeautifulSoup im

我正在用下面的代码从网站中提取一些数据，但是我在这一行的持续时间方面遇到了一些问题

duration=tr.select（'td.duration'）[0]。contents[0]。strip（）

，这会引发下面的异常。请告诉我如何修复这一行，谢谢，以便提取持续时间数据。我在网上搜索过类似的问题，但它们并没有完全回答我的问题

# import needed libraries
from mechanize import Browser
from bs4 import BeautifulSoup
import csv

br = Browser()

# Ignore robots.txt
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Chrome')]

# Retrieve the home page
br.open('http://fahrplan.sbb.ch/bin/query.exe/en')
br.select_form(nr=6)

br.form["REQ0JourneyStopsS0G"] = 'Eisenstadt'  # Origin train station (From)
br.form["REQ0JourneyStopsZ0G"] = 'sarajevo'  # Destination train station (To)
br.form["REQ0JourneyTime"] = '5:30'  # Search Time
br.form["date"] = '18.01.17'  # Search Date

# Get the search results
br.submit()

# get the response from mechanize Browser
soup = BeautifulSoup(br.response().read(), 'lxml', from_encoding="utf-8")
trs = soup.select('table.hfs_overview tr')

# scrape the contents of the table to csv (This is not complete as I cannot write the duration column to the csv)
with open('out.csv', 'w') as f:
    for tr in trs:
        locations = tr.select('td.location')
        if len(locations) > 0:
            location = locations[0].contents[0].strip()
            prefix = tr.select('td.prefix')[0].contents[0].strip()
            time = tr.select('td.time')[0].contents[0].strip()
            duration = tr.select('td.duration')[0].contents[0].strip()
            f.write("{},{},{},{}\n".format(location.encode('utf-8'), prefix, time, duration))

回溯（最近一次呼叫最后一次）：
文件“C:/…/tester.py”，第204行，在
duration=tr.select（'td.duration'）[0]。内容[0]。条带（）
索引器：列表索引超出范围
进程已完成，退出代码为1

或者

tr.select（'td.duration'）

是长度为零的列表，或者

tr.select（'td.duration'）[0]。内容

是长度为零的列表。你需要以某种方式防范这些可能性。一种方法是使用条件句

durations = tr.select('td.duration')
if len(durations) == 0:
    print("oops! There aren't any durations.")
else:
    contents = durations[0].contents
    if len(contents) == 0:
        print("oops! There aren't any contents.")
    else:
        duration = contents[0].strip()
        #rest of code goes here

或者，您可能想简单地忽略不符合预期模型的tr，在这种情况下，try-catch可能就足够了

try:
    duration = tr.select('td.duration')[0].contents[0].strip()
except IndexError:
    print("Oops! tr didn't have expected tds and/or contents.")
    continue
#rest of code goes here

您是否理解

索引器：列表索引超出范围

的含义？这些错误不言自明。从外观上看，该站点要么没有任何

td

元素，要么第一个

td

元素不包含任何内容。调试以找出它是哪一个。

try:
    duration = tr.select('td.duration')[0].contents[0].strip()
except IndexError:
    print("Oops! tr didn't have expected tds and/or contents.")
    continue
#rest of code goes here