PythonWeb将多页表抓取到csv和DF进行分析_Python_Pandas_Web Scraping_Beautifulsoup

PythonWeb将多页表抓取到csv和DF进行分析

python pandas web-scraping

PythonWeb将多页表抓取到csv和DF进行分析,python,pandas,web-scraping,beautifulsoup,Python,Pandas,Web Scraping,Beautifulsoup,当我试图翻开网页时，它只从第10页开始出现在表格中页面到csv文件，我想在这里将每个页面的结果发送到该文件。我知道我可能犯了一个很容易的错误在这里有人能告诉我正确的方法吗谢谢，我很感谢你的意见 import pandas as pd import requests from bs4 import BeautifulSoup from tabulate import tabulate #transactions over the last 17hrs #Looping through pa

当我试图翻开网页时，它只从第10页开始出现在表格中页面到csv文件，我想在这里将每个页面的结果发送到该文件。我知道我可能犯了一个很容易的错误在这里有人能告诉我正确的方法吗谢谢，我很感谢你的意见

import pandas as pd
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate

#transactions over the last 17hrs 
#Looping through page nimbers using url manipulation
#for i in range(1,100,1):

dfs = []

url = "https://etherscan.io/txs?p="
for index in range(1, 10, 1):
    res = requests.get(url+str(index))
    soup = BeautifulSoup(res.content,'lxml')
    table = soup.find_all('table')[0] 
    df = pd.read_html(str(table))

    dfs.append(df)
    #df[0].to_csv('Desktop/scrape.csv')

final_df[0] = pd.concat(dfs)
final_df[0].to_csv('Desktop/scrape.csv')
print( tabulate(df[0], headers='keys', tablefmt='psql'))

我得到以下类型错误

---------------------------------------------------------------------------
TypeError回溯（最近一次调用上次）
在（）
20#df[0]。到_csv（'Desktop/scrape.csv'））
21
--->22最终_df[0]=局部放电浓度（dfs）
23最终文档[0]。至文档集（“桌面/scrape.csv”）
24打印（制表（df[0]，标题='key'，tablefmt='psql'））
concat中的~/anaconda3/lib/python3.6/site-packages/pandas/core/restrape/concat.py（对象、轴、连接、连接轴、忽略索引、键、级别、名称、验证完整性、复制）
204键=键，级别=级别，名称=名称，
205验证完整性=验证完整性，
-->206拷贝=拷贝）
207返回操作获取结果（）
208
~/anaconda3/lib/python3.6/site-packages/pandas/core/reforme/concat.py in_uuu_uinit_u（self，objs，axis，join，join_axes，key，levels，name，ignore_index，verify_integrity，copy）
261对于objs中的obj：
262如果不存在（obj，NDFrame）：
-->263 raise TypeError（“无法连接非NDFrame对象”）
264
265#合并
TypeError:无法连接非NDFrame对象

您的代码中只缺少一行

pd.read\u html

将返回数据帧列表。因此，只需在添加到

dfs

之前进行concat即可

dfs = []

url = "https://etherscan.io/txs?p="
for index in range(1, 10):
    res = requests.get(url+str(index), proxies=proxyDict)
    soup = BeautifulSoup(res.content, 'lxml')
    table = soup.find_all('table')[0]
    df_list = pd.read_html(str(table))
    df = pd.concat(df_list)  # this line is what you're missing
    dfs.append(df)

final_df = pd.concat(dfs)
final_df.to_csv('Desktop/scrape.csv')

您的代码中只缺少一行

pd.read\u html

将返回数据帧列表。因此，只需在添加到

dfs

之前进行concat即可

dfs = []

url = "https://etherscan.io/txs?p="
for index in range(1, 10):
    res = requests.get(url+str(index), proxies=proxyDict)
    soup = BeautifulSoup(res.content, 'lxml')
    table = soup.find_all('table')[0]
    df_list = pd.read_html(str(table))
    df = pd.concat(df_list)  # this line is what you're missing
    dfs.append(df)

final_df = pd.concat(dfs)
final_df.to_csv('Desktop/scrape.csv')

不是您问题的答案，但使用

范围（1,10,1）

是多余的，因为步骤的默认值为+1。另外，如果有10页，您需要使用

range（1,11）

。这不是对您的问题的回答，但是使用

range（1,10,1）

是多余的，因为步骤的默认值是+1。另外，如果有10个页面，则需要使用

范围（1,11）

。