
如何使用python根据给定的ip地址将大型excel文件拆分为多个工作表,python,excel,pandas,Python,Excel,Pandas,我刚接触熊猫和蟒蛇,所以遇到了一些麻烦。我有一个很大的excel文件,我需要使用python脚本将其分成多个工作表。我必须根据数据中给出的ip地址进行划分。我不知道该怎么做,希望能得到一些帮助和指导。 我以前不知道如何使用python或任何库。这就是我所做的,但为每一行创建了工作簿 import pandas as pd df = pd.read_excel("D:/Users/Zakir/Desktop/MyNotebooks/Legacy.xls", sheet_name="Total",

我刚接触熊猫和蟒蛇,所以遇到了一些麻烦。我有一个很大的excel文件,我需要使用python脚本将其分成多个工作表。我必须根据数据中给出的ip地址进行划分。我不知道该怎么做,希望能得到一些帮助和指导。 我以前不知道如何使用python或任何库。这就是我所做的,但为每一行创建了工作簿

import pandas as pd
df = pd.read_excel("D:/Users/Zakir/Desktop/MyNotebooks/Legacy.xls", sheet_name="Total", header=0, names=None, index_col=None, parse_cols=None, usecols=None, squeeze=False, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skiprows=None, nrows=None, na_values=None, keep_default_na=True, verbose=False, parse_dates=False, date_parser=None, thousands=None, comment=None, skipfooter=0, convert_float=True, mangle_dupe_cols=True)

writer = pd.ExcelWriter('D:/Users/Zakir/Desktop/MyNotebooks/pandas_simple.xlsx', engine='xlsxwriter')
for index, row in df.iterrows():
    df1 = df.iloc[[index]]
    df1.to_excel(writer,  sheet_name=row['IPAddress'])


from pandas import ExcelWriter
df = pd.read_excel('file',sheet_name="Total", header=0, #other settings.....#)
writer = ExcelWriter('E:/output.xlsx',engine='xlsxwriter')
def writesheet(g):
    a = g['IPAddress'].tolist()[0]
    g.to_excel(writer, sheet_name=str(a), index=False)# index = True if you want to keep index



import sys
import os, shutil
from os import listdir
from os.path import isfile, join
import pandas as pd
import urllib as ul
import datetime
import xlrd

#this method retrieves all the xlsx filenames from a folder
def find_excel_filenames( path_to_dir, suffix=".xlsx" ):
    filenames = listdir(path_to_dir)
    return [ filename for filename in filenames if filename.endswith( suffix ) ]

#this folder contains .xlsx files
filePath = "D:\files\sample\"

#there is a subfolder in my solution to move the processed files to
#and another subfolder to move the splitted output files
archivePath = os.path.join(filePath, "archive")
outPath = os.path.join(filePath, "output")

#get a list of filenames
fnames = find_excel_filenames(filePath)

#loop through each file
for fl in fnames:
    vFile = os.path.join(filePath, fl)
    #load the content of the file to a data frame, 
    #I open the file twice, first to get the number of columns and
    #create the converter, then to open the file with string converter
    #it helps with trimming of leading zeros

    df = pd.read_excel(vFile, header=None)

    column_list = []
    for i in df:

    converter = {col: str for col in column_list} 

    df1 = pd.read_excel(vFile, converters=converter)

    for v in colValues:
        filteredDF = df1.loc[df1[vColName]==v]
        vOutFile = os.path.join(outPath, fl+''_''+v.replace("/"," ")+''.xlsx'')
        writer = pd.ExcelWriter(vOutFile, engine=''xlsxwriter'')
        # Convert the dataframe to an XlsxWriter Excel object.
        filteredDF.to_excel(writer, sheet_name=''Sheet1'')
        # Close the Pandas Excel writer and output the Excel file.

    #move the processed file to an archive folder
    dst_file = os.path.join(archivePath, fl)
    if os.path.exists(dst_file):
    shutil.move(vFile, archivePath)

使用openpyxl库更容易做到这一点。另外,您应该提供您所做的工作以及Excel文件的结构,这样我们就不会在这里使用air。否则,我可以简单地说df['ip_address']中的for I:wb.create_sheeti@ycx谢谢你的答复。是的,对不起,我编辑了这个问题,我不确定我是否做得更好。我第一次使用stackoverflow表示歉意,所以要习惯它。非常感谢你的帮助。我仍然不确定我是否已经很好地解释了这种情况,excel中的手动方法是先排序,然后复制粘贴。。。或者复制文件并排序,然后删除不需要的文件,然后重复…@SolarMike谢谢您的输入!是的,我已经用KutoOS手工完成了这个任务,但是需要编写一个脚本来执行它:所以请考虑VBA…你知道一个方法让excel vba重复它。。。