Python 使用另一列中的唯一值在dataframe中创建列表的新列_Python_Pandas_Dataframe

Python 使用另一列中的唯一值在dataframe中创建列表的新列

python pandas dataframe

Python 使用另一列中的唯一值在dataframe中创建列表的新列,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个带有一列列表的数据框： full_list_to_check 0 NaN 1 NaN 2 [1, 2, 3, 4, 5] 3 [6, 6] 4 [11, 11] 我需要创建一个新的列，如果列表中存在重复项，它将为每一行显示一个不同的列表，否则就是相同的列表 full_list_to_check new_col 0 NaN

我有一个带有一列列表的数据框：

    full_list_to_check
 0          NaN 
 1          NaN 
 2    [1, 2, 3, 4, 5] 
 3        [6, 6] 
 4        [11, 11]

我需要创建一个新的列，如果列表中存在重复项，它将为每一行显示一个不同的列表，否则就是相同的列表

  full_list_to_check            new_col
 0          NaN                   NaN
 1          NaN                   NaN
 2    [1, 2, 3, 4, 5]           [1, 2, 3, 4, 5]
 3        [6, 6]                  [6]
 4        [11, 11]                [11]

我试过这个：

df['new_col'] = df['full_list_to_check'].apply(lambda x: list(set(x)))

但我得到了这个错误：

TypeError: 'float' object is not iterable

您必须选中

Nan

：

df['full_list_to_check'].apply(lambda x: list(set(x)) if not np.any(pd.isna(x)) else np.nan)

更新：

df['full_list_to_check'].apply(lambda x: list(set(x)) if x is not np.nan else np.nan)

您必须选中

Nan

：

df['full_list_to_check'].apply(lambda x: list(set(x)) if not np.any(pd.isna(x)) else np.nan)

更新：

df['full_list_to_check'].apply(lambda x: list(set(x)) if x is not np.nan else np.nan)

您可以尝试以下方法：

df['new_col'] = df.loc[~df['full_list_to_check'].isna(), 'full_list_to_check'].apply(lambda x: list(set(x)))

您可以尝试以下方法：

df['new_col'] = df.loc[~df['full_list_to_check'].isna(), 'full_list_to_check'].apply(lambda x: list(set(x)))

您可以使用：

df['new_col'] = df['full_list_to_check'].apply(lambda x: list(set(x)) if isinstance(x,list) else x)

只有当数据中没有其他值时，其他答案才起作用。

您可以使用：

df['new_col'] = df['full_list_to_check'].apply(lambda x: list(set(x)) if isinstance(x,list) else x)

其他答案仅在数据中没有其他值或NaN时有效。

将NaN值替换为空字符串dataframe.fillna（“”，inplace=True）。将nan值替换为空字符串dataframe.fillna（“”，inplace=True）。如果不是np.any（pd.isna（x））而不是

如果x不是np.nan

？@deadvoid

pd.isna（x）

返回list@Mikhail真的很好奇，如果目的是过滤掉

NaN

tho，为什么需要

[True，True，False，…]

？我的意思是，在这种情况下，如果目标是使它更通用，那么它可以适用于NaN或list。。。我想我只是想掩饰一下it@DimaFirst啊，好的。。我想有一些原因我不知道，比如性能或其他方面：）但这都很好，只是诚实地好奇为什么

如果不是np.any（pd.isna（x））

而不是

如果x不是np.nan

？@deadvoid

pd.isna（x）

返回list@Mikhail只是出于好奇，如果目的是过滤掉

NaN

tho，为什么需要

[True，True，False，…]

？我的意思是，在这种情况下，如果目标是使它更通用，那么它可以适用于NaN或list。。。我想我只是想掩饰一下it@DimaFirst啊，好的。。我想有一些原因我不知道，比如表演或其他：）但一切都很好，只是好奇而已