Python中更快的嵌套列表生成_Python_Performance_Vectorization

Python中更快的嵌套列表生成

python performance

Python中更快的嵌套列表生成,python,performance,vectorization,Python,Performance,Vectorization,我有一个长熊猫数据框，有两列。第一列是处方号（请记住，这些不是唯一的，因为多行可以具有相同的处方号）。第二列是该交易编号中的1项。我想为每个事务编号创建一个项目列表（删除重复项），并将每个列表放入一个更大的嵌套列表中，其长度等于唯一事务编号的数量我已经成功地完成了这项壮举，但是，它需要一段时间才能运行，我想知道一种更好（即更快）的方法。我的代码如下： # get the unique values for prescription list_prescription = list(pd.val

我有一个长熊猫数据框，有两列。第一列是处方号（请记住，这些不是唯一的，因为多行可以具有相同的处方号）。第二列是该交易编号中的1项。我想为每个事务编号创建一个项目列表（删除重复项），并将每个列表放入一个更大的嵌套列表中，其长度等于唯一事务编号的数量

我已经成功地完成了这项壮举，但是，它需要一段时间才能运行，我想知道一种更好（即更快）的方法。我的代码如下：

# get the unique values for prescription
list_prescription = list(pd.value_counts(df['prescription']).index)

# make a list of product_name for each tx_plan_id_date (this will be time consuming)
time_start = datetime.datetime.now()
counter = 1
list_list_product_name = []
for prescription in list_prescription:
    # subset to just that tx_plan_id_date
    df_subset = df[df['prescription'] == prescription]
    # put product_name into a list
    list_product_name = list(df_subset['product_name'])
    # remove any duplicates
    list_product_name = list(dict.fromkeys(list_product_name))
    # append list_product_name to list_list_product_name
    list_list_product_name.append(list_product_name)
    # get current time
    time_current = datetime.datetime.now()
    # get minutes elapsed from time_start
    time_elapsed = (time_current - time_start).seconds/60
    # print a message to the console for status
    stdout.write('\r{0}/{1}; {2:0.4f}% complete; elapsed time: {3:0.2} min.'.format(counter, len(list_prescription), (counter/len(list_prescription))*100, time_elapsed))
    stdout.flush()
    # increase counter by 1
    counter += 1

你可以更换这个零件

# put product_name into a list
list_product_name = list(df_subset['product_name'])    
# remove any duplicates
list_product_name = list(dict.fromkeys(list_product_name))
# append list_product_name to list_list_product_name
list_list_product_name.append(list_product_name)

与

另外，您可能希望检查

我现在无法运行代码，但我认为，

new_df=df.groupby（'transaction'）.agg（lambda x:list（x））.reset_index（）

将为每个

事务提供一行新的数据框，并在第二行提供一个处方列表column@Aryerez非常感谢。最后，我用@francesols的建议对代码进行了一些小的修改，使用了new_df=df.groupby（'prescription'）.agg（lambda x:x.unique（）.tolist（））.reset_index（）
，效果非常好！非常感谢。我使用了部分代码和建议的groupby函数new_df=df.groupby（'prescription'）.agg（lambda x:x.unique（）.tolist（））.reset_index（）
list_list_product_name.append(df_subset['product_name'].unique().tolist())