查看基于csv数据python中给定链接的提取?

查看基于csv数据python中给定链接的提取?,python,web-scraping,extract,review,Python,Web Scraping,Extract,Review,我是python的新手。我想根据csv文件中的给定链接提取每个酒店的所有评论详细信息,该文件名为hotel_FortWorth.csv,共有3列:订单、名称、链接。 hotel_FortWorth.csv示例: name link 1 Crockett Hotel https://www.tripadvisor.com.au/Hotel_Review-g60956-d553469-Reviews-Crockett_Hotel-San_A

我是python的新手。我想根据csv文件中的给定链接提取每个酒店的所有评论详细信息,该文件名为
hotel_FortWorth.csv
,共有3列:订单、名称、链接。 hotel_FortWorth.csv示例:

     name            link

1   Crockett Hotel            https://www.tripadvisor.com.au/Hotel_Review-g60956-d553469-Reviews-Crockett_Hotel-San_Antonio_Texas.html
2   La Cantera Resort & Spa   https://www.tripadvisor.com.au/Hotel_Review-g60956-d108571-Reviews-La_Cantera_Resort_Spa-San_Antonio_Texas.html
3   .....
4....
我在
thepage=urllib.request.urlopen(url)
处出错。有人请帮我解决这个问题。我对此表示高度赞赏

data = pd.read_csv('hotel_FortWorth.csv', header = None)
df = data[2]

for url in df:
  print(url)
  thepage = urllib.request.urlopen(url)
  soup = BeautifulSoup(thepage, "html.parser")
  while True:
    a = b = 0
    overallRatingarray = seeAllReviewsarray =  rankarray = hotelarray = ""

    for profile in soup.findAll(attrs={"class": "overview_card"}):
        image = profile.text.replace("\n", "|||||").strip()
        if image.find("rating") > 0:
            counter = image.split("rating", 1)[0].split("|", 1)[1][-4].replace("|", "").strip()
            if len(overallRatingarray) == 0:
                overallRatingarray = [counter]
            else:
                overallRatingarray.append(counter)
错误是:

 Traceback (most recent call last):
 File "E:/LA TROBE SUBJECTS/Python/testing.py", line 33, in <module>
counter = image.split("rating", 1)[0].split("|", 1)[1][-4].replace("|", "").strip()
IndexError: list index out of range

Process finished with exit code 1
回溯(最近一次呼叫最后一次):
文件“E:/LA TROBE SUBJECTS/Python/testing.py”,第33行,在
计数器=image.split(“额定值”,1)[0]。split(“|”,1)[1][4]。替换(“|”),”).strip()
索引器:列表索引超出范围
进程已完成,退出代码为1

有关
请求的示例,请参阅以了解更多信息

import requests
import pandas as pd
from bs4 import BeautifulSoup

def main():
    data = pd.read_csv("hotel_FortWorth.csv", header=None)
    df = data[2]

    for url in df:
        print(url)
        thepage = requests.get(url).text
        soup = BeautifulSoup(thepage, "html.parser")
        print(soup)
        ...

if __name__ == '__main__':
    main()

评论不用于扩展讨论;对话已经结束。顺便说一句,你能帮我提取评论内容吗?因为我这样做了,它总是出错,谢谢