Python 如果数据帧行超过64KB,则拆分该行

Python 如果数据帧行超过64KB,则拆分该行,python,pandas,split,pyspark-dataframes,Python,Pandas,Split,Pyspark Dataframes,我有一个有两列的数据框。身份证,姓名。Id是整数,name是列表 我想检查行的utf-8长度是否超过64KB。如果大于64KB,那么我想将该行拆分为N行,使每一新行的大小小于64KB限制 这就是我到目前为止所做的 import pandas as pd def split_data_frame_list(df, target_column): """ Splits a column with lists into rows Key

我有一个有两列的数据框。身份证,姓名。Id是整数,name是列表

我想检查行的utf-8长度是否超过64KB。如果大于64KB,那么我想将该行拆分为N行,使每一新行的大小小于64KB限制

这就是我到目前为止所做的

import pandas as pd

def split_data_frame_list(df, target_column):
    """
    Splits a column with lists into rows
    
    Keyword arguments:
        df -- dataframe
        target_column -- name of column that contains lists        
    """
    # create a new dataframe with each item in a seperate column, dropping rows with missing values
    col_df = pd.DataFrame(df[target_column].dropna().tolist(),index=df[target_column].dropna().index)

    # create a series with columns stacked as rows         
    stacked = col_df.stack()

    # rename last column to 'idx'
    index = stacked.index.rename(names="idx", level=-1)
    new_df = pd.DataFrame(stacked, index=index, columns=[target_column])
    return new_df


df = pd.read_csv(csv_file_name)

df_new=df.groupby(['id']).agg(lambda x: tuple(x)).applymap(list).reset_index()

lStr = int(df['name'].str.encode(encoding='utf-8').str.len().max())
maxlen=64000
mStr = json.dumps(df_new['name'].T.to_dict(), ensure_ascii=False, sort_keys=True).encode('utf-8')
if lStr > maxlen:
    n = int(math.ceil(float(lStr)/maxlen))
    eId=df_new['id'].to_string(index=False)
    print("Splitting row with id=%s of len=%d into %d pieces of upto %d" % (eId, lStr, n, maxlen))
    split_df=split_data_frame_list(df_new, 'name')


The split_data_frame_list create 1 row for each element in my *Name* column. 
Im stuck at how to change the function to make sure it only split in a way that each new/split row do not exceed the 64KB limit. 

Any inputs will be of great help. 

Thank you