Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/apache-kafka/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
带Python的网页抓取:让我的网页抓取代码更快?_Python_Pandas_Web Scraping - Fatal编程技术网

带Python的网页抓取:让我的网页抓取代码更快?

带Python的网页抓取:让我的网页抓取代码更快?,python,pandas,web-scraping,Python,Pandas,Web Scraping,我想刮两个表从2个链接。我的代码是: import pandas as pd import xlwings as xw from datetime import datetime def last_row(symbol, name): # Function that outputs if the last row of the df should be deleted or not, # based on the 2 requirements below. req

我想刮两个表从2个链接。我的代码是:

import pandas as pd
import xlwings as xw
from datetime import datetime

def last_row(symbol, name):

    # Function that outputs if the last row of the df should be deleted or not, 
    # based on the 2 requirements below.

    requirements = [symbol.lower()=="total", name.isdigit()]
    return all(requirements)
    
    # return True, if the last row should be deleted.
    # The deletion will be performed in the next function.

def get_foreigncompanies_info():
    df_list = []
    links = ["https://stockmarketmba.com/nonuscompaniesonusexchanges.php",
              "https://stockmarketmba.com/listofadrs.php"]
    for i in links:

        #Reads table with pandas read_html and only save the necessary columns.

        df = pd.read_html(i)[0][['Symbol', 'Name', 'GICS Sector']] 
        if last_row(df.iloc[-1]['Symbol'], df.iloc[-1]['Name']):

            # Delete the last row

            df_list.append(df.iloc[:-1])
        else:

            # Keep last row

            df_list.append(df)
    return pd.concat(df_list).reset_index(drop=True).rename(columns={'Name': 'Security'})

def open_in_excel(dataframe):  # Code to view my df in excel.
    xw.view(dataframe)
    
if __name__ == "__main__":
    start = datetime.now()
    df = get_foreigncompanies_info()
    print(datetime.now() - start)
    open_in_excel(get_foreigncompanies_info())
花了 秒来执行代码

我想让代码运行得更快(在某种程度上,这不会产生太多不必要的请求)。 我的想法是将表格作为csv下载,因为在网站上有一个“下载csv”按钮

如何使用python下载csv

我检查了按钮,但找不到它的url。 (如果你能找到它,请描述一下你是如何找到它的,也许可以用一个“检查”屏幕截图。)

或者有没有其他更快的方式下载表格


感谢您提供的任何指针:-)

您可以使用它自动单击按钮。这并不难,但要为这么琐碎的事情付出很多努力。我不喜欢抓取,但有时我们只有抓取,对吗?

你可以使用线程。如果你只想下载csv,这个答案可能会有帮助。这并不是那么简单,因为我找不到下载csv的链接。我已经检查了网站,但没有成功。谢谢你的回复。我尝试使用selenium,但在“检查”它之后,我找不到csv的url。我希望,你能给我更多的提示。。。