Python 删除Coinmarketcap数据只返回前10个结果,为什么其余90个不返回';你不回来了吗?
我没有抓取问题,甚至抓取我定义的任意数量的页面,但它只显示每页的前10个结果Python 删除Coinmarketcap数据只返回前10个结果,为什么其余90个不返回';你不回来了吗?,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我没有抓取问题,甚至抓取我定义的任意数量的页面,但它只显示每页的前10个结果 def scrape_pages(page_num): for page in range(1, page_num+1): headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'} u
def scrape_pages(page_num):
for page in range(1, page_num+1):
headers = {'User-Agent':
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}
url = "https://coinmarketcap.com/?page={}".format(page)
page_tree = requests.get(url, headers=headers)
pageSoup = BeautifulSoup(page_tree.content, 'html.parser')
print("Page {} Parsed successfully!".format(url))
这是因为前十个结果都在您返回的
HTML
中。但是,其余部分是通过JavaScript
动态添加的,因此BeautifulSoup
不会看到这一点,因为它根本不存在
但是,有一个API可以用来获取表数据(如果您愿意,也可以用于所有页面)
以下是方法:
from urllib.parse import urlencode
import requests
from tabulate import tabulate
query_string = [
('start', '1'),
('limit', '100'),
('sortBy', 'market_cap'),
('sortType', 'desc'),
('convert', 'USD'),
('cryptoType', 'all'),
('tagType', 'all'),
]
base = "https://api.coinmarketcap.com/data-api/v3/cryptocurrency/listing?"
response = requests.get(f"{base}{urlencode(query_string)}").json()
results = [
[
currency["name"],
round(currency["quotes"][0]["price"], 4),
]
for currency in response["data"]["cryptoCurrencyList"]
]
print(tabulate(results, headers=["Currency", "Price"], tablefmt="pretty"))
输出:
+-----------------------+------------+
| Currency | Price |
+-----------------------+------------+
| Bitcoin | 46204.9211 |
| Ethereum | 1488.0481 |
| Tether | 0.9995 |
| Binance Coin | 212.8729 |
| Cardano | 0.93 |
| Polkadot | 31.1603 |
| XRP | 0.4464 |
| Litecoin | 167.2676 |
| Chainlink | 25.1752 |
| Bitcoin Cash | 488.9875 |
| Stellar | 0.3724 |
| USD Coin | 0.9998 |
| | |
| and many more | values |
+-----------------------+------------+
编辑:要在页面上循环,请尝试以下操作:
from urllib.parse import urlencode
import requests
query_string = [
('start', '1'),
('limit', '100'),
('sortBy', 'market_cap'),
('sortType', 'desc'),
('convert', 'USD'),
('cryptoType', 'all'),
('tagType', 'all'),
]
base = "https://api.coinmarketcap.com/data-api/v3/cryptocurrency/listing?"
response = requests.get(f"{base}{urlencode(query_string)}").json()
last_page = (int(response["data"]["totalCount"]) // 100) + 1
all_pages = [1 if i == 1 else (i * 100) + 1 for i in range(1, last_page)]
for page in all_pages[:2]:
query_string = [
('start', str(page)),
('limit', '100'),
('sortBy', 'market_cap'),
('sortType', 'desc'),
('convert', 'USD'),
('cryptoType', 'all'),
('tagType', 'all'),
]
response = requests.get(f"{base}{urlencode(query_string)}").json()
results = [
[
currency["name"],
round(currency["quotes"][0]["price"], 4),
]
for currency in response["data"]["cryptoCurrencyList"]
]
print(results)
注意:我通过将[:2]
添加到for循环
来限制此示例,但是如果您想查看所有页面,只需删除此[:2]
,循环如下所示:
for page in all_pages:
# the rest of the body ...
这是一个很好的解决方案,但它不会显示第2页的结果,而是显示第3页。另外,这个API有限制吗?我认为这行需要从:all_pages=[1 if I==1 else(I*100)+1 for I in range(1,last_pages)]更改为:all_pages=[1 if I==0 else(I*100)+1 for I in range(0,last_pages)]