Python 3.x 如何从具有相同class属性的表中刮取文本?

Python 3.x 如何从具有相同class属性的表中刮取文本?,python-3.x,web-scraping,beautifulsoup,Python 3.x,Web Scraping,Beautifulsoup,我试图从airline quality.com上获取航班上的客户评论,但我遇到了一个问题,因为在航班信息中,所有线路都具有相同的“评论值”类别。我试着用下面的方法来解决这个问题,通常所有的注释都有相同的4航班描述,除了飞机,它只出现在一些地方。所以,我做了一个if来确定如果没有飞机,那么只需附加下一个“回顾值”,然而,它不起作用。你们能帮帮我吗?代码如下所示: from bs4 import BeautifulSoup import requests import pandas as pd fr

我试图从airline quality.com上获取航班上的客户评论,但我遇到了一个问题,因为在航班信息中,所有线路都具有相同的“评论值”类别。我试着用下面的方法来解决这个问题,通常所有的注释都有相同的4航班描述,除了飞机,它只出现在一些地方。所以,我做了一个if来确定如果没有飞机,那么只需附加下一个“回顾值”,然而,它不起作用。你们能帮帮我吗?代码如下所示:

from bs4 import BeautifulSoup
import requests
import pandas as pd
from datetime import datetime

myProxy = {
            "http"  : "http://10.120.118.49:8080",
            "https"  : "https://10.120.118.49:8080"
            }

headers={'User-agent' : 'Mozilla/5.0'}

title_of_review=[]
details_about=[]
review_text=[]
type_of_traveller=[]
seat_type=[]
route_flown=[]
date_flown=[]
aircraft_flown=[]


url='https://www.airlinequality.com/airline-reviews/air-france/page/4/?sortby=post_date%3ADesc&pagesize=100'

page1 = requests.get(url, proxies=myProxy, headers=headers)
soup1 = BeautifulSoup(page1.text, 'lxml')
page1.close()


for review in soup1.findAll('div', attrs={'class': "body"}):
    title=review.find('h2', attrs={'class': "text_header"})
    if title is not None:
        title_of_review.append(title.text)
    else:
        title_of_review.append('')

    details=review.find('h3', attrs={'class': "text_sub_header userStatusWrapper"})
    if details is not None:
        details_about.append(details.text)
    else:
        details_about.append('')

    texto=review.find('div', attrs={'class': "text_content"})
    if texto is not None:
        review_text.append(texto.text.strip('✅ Verified Review |').strip('\r\n\r\n'))
    else:
        review_text.append('')

    aircrafts=review.findAll('td', attrs={'class': "review-rating-header aircraft"})
    #print(aircrafts)
    all_reviews=review.findAll('td', attrs={'class': "review-value"})
    aircraft=all_reviews[0]
    if aircrafts is None:
        aircraft_flown.append('')        
    else:    
        aircraft_flown.append(aircraft.text)

您可以使用pandas解析该表。棘手的部分是带有多个标记和属性的评级元素。但不难去掉类“fill”并计算出给定的评级。下面是一个输出表:

from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
from datetime import datetime

myProxy = {
            "http"  : "http://10.120.118.49:8080",
            "https"  : "https://10.120.118.49:8080"
            }

headers={'User-agent' : 'Mozilla/5.0'}


results_df = pd.DataFrame()
colsList = ['Aircraft', 'Type Of Traveller', 'Seat Type', 'Route', 'Date Flown', 
            'Seat Comfort', 'Cabin Staff Service', 'Food & Beverages', 
            'Inflight Entertainment', 'Ground Service', 'Value For Money,', 'Recommended']    
url='https://www.airlinequality.com/airline-reviews/air-france/page/4/?sortby=post_date%3ADesc&pagesize=100'

page1 = requests.get(url, headers=headers)
soup1 = BeautifulSoup(page1.text, 'html.parser')
page1.close()

reviews = soup1.findAll('div', attrs={'class': "body"})
for review in reviews:
    for span in review.find_all("span", {'class':'star fill'}): 
        span.decompose()

    title=review.find('h2', attrs={'class': "text_header"})
    if title is not None:
        title_of_review = title.text.strip()
    else:
        title_of_review = ''

    details=review.find('h3', attrs={'class': "text_sub_header userStatusWrapper"})
    if details is not None:
        details_about = details.text.strip()
    else:
        details_about = ''

    texto=review.find('div', attrs={'class': "text_content"})
    if texto is not None:
        review_text = texto.text.strip('✅ Verified Review |').strip('\r\n\r\n')
    else:
        review_text = ''

    temp_df_alpha = pd.DataFrame([[title_of_review, details_about, review_text]], columns=['Title','Details','Text'])


    temp_df_beta = pd.read_html(str(review))[0].T
    new_header = list(temp_df_beta.iloc[0])
    temp_df_beta = temp_df_beta[1:]
    temp_df_beta.columns = new_header
    cols = [ col for col in new_header if col in colsList]
    temp_df_beta = temp_df_beta[cols]




    temp_df = temp_df_alpha.merge(temp_df_beta.reset_index(drop=True), how='left', left_index=True, right_index=True)
    rateCols = ['Seat Comfort', 'Cabin Staff Service', 'Food & Beverages', 
            'Inflight Entertainment', 'Ground Service', 'Value For Money']

    for col in rateCols:
        if col in temp_df.columns:
            try:    
                temp_df[col] = int(temp_df.iloc[0][col][0]) - 1
            except:
                temp_df[col] = 5


    results_df = results_df.append(temp_df, sort=False).reset_index(drop=True)
输出:

print (results_df.to_string())
                                              Title                                            Details                                               Text Type Of Traveller        Seat Type                                     Route      Date Flown  Seat Comfort  Cabin Staff Service  Food & Beverages  Inflight Entertainment  Ground Service Recommended                 Aircraft
0                          "I would fly them again"  1 reviews\n\n\n\nPeter-John de Kock (South Afr...  Trip Verified |  Cape Town to Los Angeles via ...          Business    Economy Class    Cape Town to Los Angeles via Paris CDG   December 2017           2.0                  3.0               3.0                     2.0             4.0         yes                      NaN
1               "service on board was just average"             Wenyu Zhao (France) 14th December 2017  Trip Verified | Flew Air France from Sydney to...      Solo Leisure    Economy Class          Sydney to Paris CDG via Shanghai     August 2017           3.0                  2.0               4.0                     3.0             2.0          no                      NaN
2                  "Enjoyed flying with Air France"      G Lamille (United Kingdom) 14th December 2017  Trip Verified |  Enjoyed flying with Air Franc...      Solo Leisure    Economy Class                        Paris to Vancouver    October 2017           5.0                  5.0               5.0                     5.0             4.0         yes         Boeing 777-200ER
3             "AF spreading this busline of theirs"       M Stanton (United States) 14th December 2017  Trip Verified |  Barcelona to Paris CDG. Ticke...          Business    Economy Class                    Barcelona to Paris CDG   December 2017           3.0                  3.0               1.0                     1.0             4.0          no                     A320
4                        "a good flying experience"  8 reviews\n\n\n\nS Lacey (United Kingdom) 12th...  Trip Verified | I had a pleasant flight with A...      Solo Leisure    Economy Class                      Hamburg to Paris CDG      March 2017           3.0                  4.0               4.0                     NaN             5.0         yes                A319/A320
5                   "horrible start to the holiday"    Luke Johnson (United Kingdom) 9th December 2017  Trip Verified | Flew Air France from London He...    Family Leisure    Economy Class    London Heathrow to Mauritius via Paris   December 2017           3.0                  3.0               2.0                     3.0             2.0          no                      NaN
6                      "it was Air France at fault"      Philip Jelovsek (Australia) 3rd December 2017  Trip Verified |  Paris to Abu Dhabi. My flight...      Solo Leisure    Economy Class                        Paris to Abu Dhabi   November 2017           NaN                  NaN               NaN                     NaN             1.0          no                      NaN
7         "not sure I will travel Air France again"  Luciana Queiroz (United States) 3rd December 2017  Trip Verified |  Nice to Paris. One person in ...    Family Leisure    Economy Class                             Nice to Paris   December 2017           1.0                  5.0               NaN                     NaN             1.0          no                      NaN
8                          "such a poor experience"       Francesco Lucchese (Italy) 2nd December 2017  Trip Verified | Bogota to Rome via Paris. It's...      Solo Leisure    Economy Class                  Bogota to Rome via Paris   November 2017           3.0                  1.0               3.0                     1.0             1.0          no                      NaN
9                        "Ridiculous and insulting"  25 reviews\n\n\n\nS Robinson (Austria) 1st Dec...  Trip Verified | Vienna - Paris CDG - London He...    Couple Leisure    Economy Class   Vienna to London Heathrow via Paris CDG   December 2016           4.0                  3.0               3.0                     NaN             1.0          no                      NaN
10             "experience very poor and stressful"          T Cody (United States) 25th November 2017  Trip Verified |  Air France accidentally cance...    Family Leisure    Economy Class            Barcelona to Atlanta via Paris    October 2017           1.0                  1.0               NaN                     NaN             1.0          no                      NaN
11                   "cabin staff were much better"  2 reviews\n\n\n\nSalal Bajwa (United Arab Emir...  Trip Verified |  London to Paris. Check in was...    Family Leisure    Economy Class                           London to Paris       June 2017           4.0                  5.0               4.0                     5.0             1.0         yes             Boeing 787-9

我有一个解决方案给你,但是你想要什么作为输出?您的目标是将其输入csv/表格吗?