Python 删除Coinmarketcap数据只返回前10个结果,为什么其余90个不返回';你不回来了吗?

Python 删除Coinmarketcap数据只返回前10个结果,为什么其余90个不返回';你不回来了吗?,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我没有抓取问题,甚至抓取我定义的任意数量的页面,但它只显示每页的前10个结果 def scrape_pages(page_num): for page in range(1, page_num+1): headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'} u

我没有抓取问题,甚至抓取我定义的任意数量的页面,但它只显示每页的前10个结果

def scrape_pages(page_num):
for page in range(1, page_num+1):
    headers = {'User-Agent': 
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}

    url = "https://coinmarketcap.com/?page={}".format(page)
    page_tree = requests.get(url, headers=headers)
    pageSoup = BeautifulSoup(page_tree.content, 'html.parser')

    print("Page {} Parsed successfully!".format(url))

这是因为前十个结果都在您返回的
HTML
中。但是,其余部分是通过
JavaScript
动态添加的,因此
BeautifulSoup
不会看到这一点,因为它根本不存在

但是,有一个API可以用来获取表数据(如果您愿意,也可以用于所有页面)

以下是方法:

from urllib.parse import urlencode

import requests
from tabulate import tabulate

query_string = [
    ('start', '1'),
    ('limit', '100'),
    ('sortBy', 'market_cap'),
    ('sortType', 'desc'),
    ('convert', 'USD'),
    ('cryptoType', 'all'),
    ('tagType', 'all'),
]

base = "https://api.coinmarketcap.com/data-api/v3/cryptocurrency/listing?"
response = requests.get(f"{base}{urlencode(query_string)}").json()

results = [
    [
        currency["name"],
        round(currency["quotes"][0]["price"], 4),
    ]
    for currency in response["data"]["cryptoCurrencyList"]
]

print(tabulate(results, headers=["Currency", "Price"], tablefmt="pretty"))
输出:

+-----------------------+------------+
|       Currency        |   Price    |
+-----------------------+------------+
|        Bitcoin        | 46204.9211 |
|       Ethereum        | 1488.0481  |
|        Tether         |   0.9995   |
|     Binance Coin      |  212.8729  |
|        Cardano        |    0.93    |
|       Polkadot        |  31.1603   |
|          XRP          |   0.4464   |
|       Litecoin        |  167.2676  |
|       Chainlink       |  25.1752   |
|     Bitcoin Cash      |  488.9875  |
|        Stellar        |   0.3724   |
|       USD Coin        |   0.9998   |
|                       |            |
|     and many more     |   values   |
+-----------------------+------------+
编辑:要在页面上循环,请尝试以下操作:

from urllib.parse import urlencode

import requests

query_string = [
    ('start', '1'),
    ('limit', '100'),
    ('sortBy', 'market_cap'),
    ('sortType', 'desc'),
    ('convert', 'USD'),
    ('cryptoType', 'all'),
    ('tagType', 'all'),
]

base = "https://api.coinmarketcap.com/data-api/v3/cryptocurrency/listing?"
response = requests.get(f"{base}{urlencode(query_string)}").json()
last_page = (int(response["data"]["totalCount"]) // 100) + 1
all_pages = [1 if i == 1 else (i * 100) + 1 for i in range(1, last_page)]

for page in all_pages[:2]:
    query_string = [
        ('start', str(page)),
        ('limit', '100'),
        ('sortBy', 'market_cap'),
        ('sortType', 'desc'),
        ('convert', 'USD'),
        ('cryptoType', 'all'),
        ('tagType', 'all'),
    ]
    response = requests.get(f"{base}{urlencode(query_string)}").json()
    results = [
        [
            currency["name"],
            round(currency["quotes"][0]["price"], 4),
        ]
        for currency in response["data"]["cryptoCurrencyList"]
    ]
    print(results)
注意:我通过将
[:2]
添加到
for循环
来限制此示例,但是如果您想查看所有页面,只需删除此
[:2]
,循环如下所示:

for page in all_pages:
    #  the rest of the body ...

这是一个很好的解决方案,但它不会显示第2页的结果,而是显示第3页。另外,这个API有限制吗?我认为这行需要从:all_pages=[1 if I==1 else(I*100)+1 for I in range(1,last_pages)]更改为:all_pages=[1 if I==0 else(I*100)+1 for I in range(0,last_pages)]