Python 使用beautiful soup刮表数据时遇到问题_Python_Python 3.x_Web Scraping_Beautifulsoup

Python 使用beautiful soup刮表数据时遇到问题

python python-3.x web-scraping

Python 使用beautiful soup刮表数据时遇到问题,python,python-3.x,web-scraping,beautifulsoup,Python,Python 3.x,Web Scraping,Beautifulsoup,我想从中提取表格数据。我尝试了下面的代码，但出于任何原因，BS4似乎无法获取表数据： import bs4 as bs import urllib.request sauce = urllib.request.urlopen('https://drafty.cs.brown.edu/csprofessors').read() soup = bs.BeautifulSoup(sauce, 'lxml') table = soup.find('table', attrs={"id&qu

我想从中提取表格数据。我尝试了下面的代码，但出于任何原因，BS4似乎无法获取表数据：

import bs4 as bs
import urllib.request

sauce = urllib.request.urlopen('https://drafty.cs.brown.edu/csprofessors').read()
soup = bs.BeautifulSoup(sauce, 'lxml')
table = soup.find('table',  attrs={"id": "table"})
table_rows = table.find_all('tr')

for tr in table_rows:
    td = tr.find_all('td')
    row = [i.text for i in td]
    print(row)

非常感谢您的帮助：）

您使用了错误的标记和id名称来查找正确的表。以下方面应起作用：

import bs4 as bs
import urllib.request

sauce = urllib.request.urlopen('https://drafty.cs.brown.edu/csprofessors').read()
soup = bs.BeautifulSoup(sauce, 'lxml')
table = soup.find('template', attrs={"id":"table-data"})
for tr in table.find_all('tr'):
    td = tr.find_all('td')
    row = [i.text for i in td]
    print(row)

您只需将

selenium

与

pandas

结合使用，即可刮表。以下是您的操作方法：

import pandas as pd
from selenium import webdriver
import time

url = 'https://drafty.cs.brown.edu/csprofessors'

driver = webdriver.Chrome()

driver.get(url)

time.sleep(2)

driver.find_element_by_xpath('//*[@id="welcome-screen"]/div/div/div[1]/button').click()

time.sleep(1)

page = driver.page_source

df = pd.read_html(page)[0]

print(df)

谢谢你，苏西，但这并不是我的全部理由。

import requests
from bs4 import BeautifulSoup as bs4


url = ('https://drafty.cs.brown.edu/csprofessors')
response = requests.get(url)
if response.ok:
    data = list()
    soup = bs4(response.text, 'html.parser')
    fullnames = soup.select('td:nth-child(1)')
    university = soup.select('td:nth-child(2)')
    join_year = soup.select('td:nth-child(3)')
    sub_field = soup.select('td:nth-child(4)')
    bachelors = soup.select('td:nth-child(5)')
    doctorate = soup.select('td:nth-child(6)')
    for item in range(1, len(fullnames) + 1):
        data.append(
            [
                {
                'fullnames': fullnames,
                'university': university,
                'join_year': join_year,
                'sub_field': sub_field,
                'bachelors': bachelors,
                'doctorate': doctorate
                }
            ]
        )