Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/297.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 列表索引超出范围错误:使用Beauthoul汤拉屎_Python_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 列表索引超出范围错误:使用Beauthoul汤拉屎

Python 列表索引超出范围错误:使用Beauthoul汤拉屎,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我正试着用漂亮的汤把现场的登机台擦掉 我尝试了以下方法: caremar_live_departures_table = list(soup.select('.table-booking-history tr')) caremar_live_departures_data = [] for tr in caremar_live_departures_table: td = tr.select('td') caremar_live_departures_data.append({

我正试着用漂亮的汤把现场的登机台擦掉

我尝试了以下方法:

caremar_live_departures_table = list(soup.select('.table-booking-history tr'))
caremar_live_departures_data = []
for tr in caremar_live_departures_table:
    td = tr.select('td')
    caremar_live_departures_data.append({
    'DEPARTURE PORT': td[1].select('span span').text,
    'ARRIVAL PORT': td[2].select('span span').text, 
    'DEPARTURE TIME': td[4].select('span').text, 
    'ARRIVAL TIME': td[6].select('span').text,     
    'FEERY TYPE':  td[3].select('span span').text,   
    'STATUS': td[3].select('span span').text   
    })
我得到了以下错误:

 'DEPARTURE PORT': td[1].select('span span').text,
IndexError: list index out of range

td应该是一个数组,为什么不是这样呢?

我查看了源代码,表中并非每个tr都有您要查找的数据。如果您仅观察r1类、r2类,则已获得您需要的数据。有些只有一个td。因此,只有td[0]可用。这就是为什么您会得到
索引器

而且我认为你可能把列表索引搞错了。我已经尽可能地把它修好了

import requests
from bs4 import BeautifulSoup
r=requests.get('https://shop.caremar.it/it/prossime-partenze/')
soup=BeautifulSoup(r.text,'html.parser')
caremar_live_departures_table = list(soup.select('.table-booking-history tr[class*="r"]'))
caremar_live_departures_data = []
for tr in caremar_live_departures_table:
    td = tr.select('td')
    caremar_live_departures_data.append({
    'DEPARTURE PORT': td[0].text.strip(),
    'ARRIVAL PORT': td[1].text.strip(),
    'DEPARTURE TIME': td[3].text.strip(),
    'ARRIVAL TIME': td[5].text.strip(),
    'FEERY TYPE':  td[2].text.strip(),
    'STATUS': td[6].text.strip()
    })
print(caremar_live_departures_data)
输出

[{'DEPARTURE PORT': 'Procida', 'ARRIVAL PORT': 'Ischia', 'DEPARTURE TIME': '23:00', 'ARRIVAL TIME': '23:30', 'FEERY TYPE': 'Traghetto', 'STATUS': 'Chiuso'}, {'DEPARTURE PORT': 'Ischia', 'ARRIVAL PORT': 'Procida', 'DEPARTURE TIME': '02:30', 'ARRIVAL TIME': '02:45', 'FEERY TYPE': 'Traghetto', 'STATUS': ''}, {'DEPARTURE PORT': 'Ischia', 'ARRIVAL PORT': 'Pozzuoli', 'DEPARTURE TIME': '02:30', 'ARRIVAL TIME': '03:30', 'FEERY TYPE': 'Traghetto', 'STATUS': ''}, {'DEPARTURE PORT': 'Procida', 'ARRIVAL PORT': 'Pozzuoli', 'DEPARTURE TIME': '03:10', 'ARRIVAL TIME': '03:30', 'FEERY TYPE': 'Traghetto', 'STATUS': ''}, {'DEPARTURE PORT': 'Pozzuoli', 'ARRIVAL PORT': 'Procida', 'DEPARTURE TIME': '04:10', 'ARRIVAL TIME': '05:10', 'FEERY TYPE': 'Traghetto', 'STATUS': ''}, {'DEPARTURE PORT': 'Pozzuoli', 'ARRIVAL PORT': 'Ischia', 'DEPARTURE TIME': '04:10', 'ARRIVAL TIME': '05:40', 'FEERY TYPE': 'Traghetto', 'STATUS': ''}, {'DEPARTURE PORT': 'Procida', 'ARRIVAL PORT': 'Ischia', 'DEPARTURE TIME': '04:40', 'ARRIVAL TIME': '05:40', 'FEERY TYPE': 'Traghetto', 'STATUS': ''}, {'DEPARTURE PORT': 'Napoli (Porta di Massa)', 'ARRIVAL PORT': 'Capri', 'DEPARTURE TIME': '05:35', 'ARRIVAL TIME': '06:25', 'FEERY TYPE': 'TMV', 'STATUS': ''}, {'DEPARTURE PORT': 'Napoli (Porta di Massa)', 'ARRIVAL PORT': 'Procida', 'DEPARTURE TIME': '06:15', 'ARRIVAL TIME': '07:15', 'FEERY TYPE': 'Traghetto', 'STATUS': ''}, {'DEPARTURE PORT': 'Napoli (Porta di Massa)', 'ARRIVAL PORT': 'Ischia', 'DEPARTURE TIME': '06:15', 'ARRIVAL TIME': '07:55', 'FEERY TYPE': 'Traghetto', 'STATUS': ''}, {'DEPARTURE PORT': 'Procida', 'ARRIVAL PORT': 'Napoli (Molo Beverello)', 'DEPARTURE TIME': '06:35', 'ARRIVAL TIME': '07:05', 'FEERY TYPE': 'Aliscafo', 'STATUS': ''}, {'DEPARTURE PORT': 'Capri', 'ARRIVAL PORT': 'Napoli (Porta di Massa)', 'DEPARTURE TIME': '06:40', 'ARRIVAL TIME': '08:00', 'FEERY TYPE': 'Traghetto', 'STATUS': ''}, {'DEPARTURE PORT': 'Ischia', 'ARRIVAL PORT': 'Procida', 'DEPARTURE TIME': '06:45', 'ARRIVAL TIME': '07:00', 'FEERY TYPE': 'Aliscafo', 'STATUS': ''}, {'DEPARTURE PORT': 'Ischia', 'ARRIVAL PORT': 'Napoli (Molo Beverello)', 'DEPARTURE TIME': '06:45', 'ARRIVAL TIME': '07:50', 'FEERY TYPE': 'Aliscafo', 'STATUS': ''}, {'DEPARTURE PORT': 'Capri', 'ARRIVAL PORT': 'Sorrento', 'DEPARTURE TIME': '07:00', 'ARRIVAL TIME': '07:25', 'FEERY TYPE': 'TMV', 'STATUS': ''}, {'DEPARTURE PORT': 'Procida', 'ARRIVAL PORT': 'Napoli (Molo Beverello)', 'DEPARTURE TIME': '07:10', 'ARRIVAL TIME': '07:50', 'FEERY TYPE': 'Aliscafo', 'STATUS': ''}, {'DEPARTURE PORT': 'Ischia', 'ARRIVAL PORT': 'Procida', 'DEPARTURE TIME': '07:20', 'ARRIVAL TIME': '07:50', 'FEERY TYPE': 'Traghetto', 'STATUS': ''}, {'DEPARTURE PORT': 'Ischia', 'ARRIVAL PORT': 'Pozzuoli', 'DEPARTURE TIME': '07:20', 'ARRIVAL TIME': '08:30', 'FEERY TYPE': 'Traghetto', 'STATUS': ''}, {'DEPARTURE PORT': 'Procida', 'ARRIVAL PORT': 'Ischia', 'DEPARTURE TIME': '07:25', 'ARRIVAL TIME': '07:55', 'FEERY TYPE': 'Traghetto', 'STATUS': ''}, {'DEPARTURE PORT': 'Napoli (Molo Beverello)', 'ARRIVAL PORT': 'Procida', 'DEPARTURE TIME': '07:30', 'ARRIVAL TIME': '08:05', 'FEERY TYPE': 'Aliscafo', 'STATUS': ''}]

在html中指定感兴趣的列和顺序不是更容易阅读吗

import pandas as pd

results = pd.read_html('https://shop.caremar.it/it/prossime-partenze/')
df = results[0].dropna(how='all').fillna('')[['Porto di Partenza','Porto di Arrivo','Orario', 'Arrivo', 'Mezzo', 'Stato']]
print(df)
您可以通过包括列标题的更改使其更加明确:

import pandas as pd

results = pd.read_html('https://shop.caremar.it/it/prossime-partenze/')
columnOrder = ['Porto di Partenza','Porto di Arrivo','Orario', 'Arrivo', 'Mezzo', 'Stato']
headers = ['DEPARTURE PORT','ARRIVAL PORT', 'DEPARTURE TIME', 'ARRIVAL TIME', 'FERRY TYPE', 'STATUS']
df = results[0].dropna(how='all').fillna('')[columnOrder]
df.columns = headers
print(df)

您知道python从0开始索引吗?如果不是,那么您似乎正在使用索引n访问
td
中第n个位置的元素,而您应该使用索引(n-1)访问第n个元素