Python 无法使用beautifulsoup在网站上刮表_Python_Beautifulsoup

Python 无法使用beautifulsoup在网站上刮表

python

Python 无法使用beautifulsoup在网站上刮表,python,beautifulsoup,Python,Beautifulsoup,我正试图刮这张桌子：这是我的密码： import requests from bs4 import BeautifulSoup import csv root_url = "https://www.coingecko.com/en/coins/recently_added" html = requests.get(root_url) soup = BeautifulSoup(html.text, 'html.parser') paging = soup.find(&qu

我正试图刮这张桌子：

这是我的密码：

import requests
from bs4 import BeautifulSoup
import csv

root_url = "https://www.coingecko.com/en/coins/recently_added"
html = requests.get(root_url)
soup = BeautifulSoup(html.text, 'html.parser')

paging = soup.find("div",{"class":"row no-gutters tw-flex flex-column flex-lg-row tw-justify-end mt-2"}).find("ul",{"class":"pagination"}).find_all("a")
start_page = paging[1].text
last_page = paging[len(paging)-2].text

#
# outfile = open('gymlookup.csv','w', newline='')
# writer = csv.writer(outfile)
# writer.writerow(["Name", "Address", "Phone"])


pages = list(range(1,int(last_page)+1))
for page in pages:
    url = 'https://www.coingecko.com/en/coins/recently_added?page=%s' %(page)
    html = requests.get(url)
    soup = BeautifulSoup(html.text, 'html.parser')

    #print(soup.prettify())
    print ('Processing page: %s' %(page))

    coins = soup.findAll("div",{"class":"coingecko-table"})
    for element in coins:
        coin = element.find(class_='coin-name text-left tablesorter-header tablesorter-headerUnSorted')
        price = element.find(class_='price text-right sorter-numeric tablesorter-header tablesorter-headerUnSorted')
        print(coin,price)
        # hr = element.find('change1h').text
        # last_added = element.find('last_added').text

#         writer.writerow([coin, price, hr,last_added])
#
# outfile.close()
print('Done')

印刷品（硬币、价格）无法印刷任何东西。不知道为什么，欢迎任何帮助：）

您可以使用

Selenium

：

from selenium import webdriver
import time
from bs4 import BeautifulSoup

options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')

url='https://www.coingecko.com/en/coins/recently_added?page=1'
wd = webdriver.Chrome('chromedriver',options=options)
wd.get(url)
time.sleep(2) # sleep for a few seconds to allow loading the data

coin = BeautifulSoup(wd.page_source)
for element in coin:
    coin = element.find(class_='coin-name text-left tablesorter-header tablesorter-headerUnSorted')
    price = element.find(class_='price text-right sorter-numeric tablesorter-header tablesorter-headerUnSorted')
    print(coin,price)

这将输出：

<th aria-disabled="false" aria-label="Coin: No sort applied, activate to apply a descending sort" aria-sort="none" class="coin-name text-left tablesorter-header tablesorter-headerUnSorted" data-column="2" role="columnheader" scope="col" style="user-select: none;" tabindex="0" unselectable="on">
Coin
</th> <th aria-disabled="false" aria-label="Price: No sort applied, activate to apply a descending sort" aria-sort="none" class="price text-right sorter-numeric tablesorter-header tablesorter-headerUnSorted" data-column="3" role="columnheader" scope="col" style="user-select: none;" tabindex="0" unselectable="on">
Price
</th>


硬币
价格

只需使用

pandas

即可获取表格数据

以下是方法：

import pandas as pd
import requests

url = "https://www.coingecko.com/en/coins/recently_added?page=1"
df = pd.read_html(requests.get(url).text, flavor="bs4")
df = pd.concat(df).drop(["Unnamed: 0", "Unnamed: 1"], axis=1)
df.to_csv("your_table.csv", index=False)

输出：

Coin_list
['Revolt',
 'StarShip',
 'Panda Finance',....]

Price_list
['$0.00003005',
 '$0.188834',
 '$0.00000071',...]

以下是从一个页面获取数据进行分页的代码，您已经完成了分页，因此只需要另一个循环

import requests
from bs4 import BeautifulSoup
import csv

root_url = "https://www.coingecko.com/en/coins/recently_added?page=1"
html = requests.get(root_url)
soup = BeautifulSoup(html.text, 'html.parser')

coin_lst=[]
price_lst=[]
coins = soup.findAll("div",{"class":"coingecko-table"})
for element in coins:
    coin=element.find_all("td",attrs={"class":"py-0 coin-name"})
    price=element.find_all("td",attrs={"class":"td-price price text-right"})
    for c in range(len(coin)):
        coin_lst.append(coin[c]["data-text"])
        price_lst.append(price[c].text.strip("\n"))

输出：

Coin_list
['Revolt',
 'StarShip',
 'Panda Finance',....]

Price_list
['$0.00003005',
 '$0.188834',
 '$0.00000071',...]

为什么不使用API（）？他们没有用于检索最近添加的硬币的端点。我猜数据是添加的，并且是动态的，因此

beautifulsoup

无法确定输出数据似乎不是动态的。我只想拍一张表的快照这太棒了，我该如何解决分页问题呢？只需在页面中循环，不断向列表中添加数据帧，然后将所有数据帧连接到一个列表中。