从列表中获取唯一值作为python中的值_Python_Pandas_Dataframe

从列表中获取唯一值作为python中的值

python pandas dataframe

从列表中获取唯一值作为python中的值,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个包含更多行和列的数据框架，但其中一个示例如下： id values 1 [v1, v2, v1] 如何从列中的列表中获取唯一值？第二列v1、v2中的所需输出我试过使用df['values'].unique（），但显然它不起作用试试看 df['values'] = df['values'].apply(lambda x: list(set(x))) id values 0 1 [v2, v1] 注意：值是pandas属性，因此最好避免将其用作列名

我有一个包含更多行和列的数据框架，但其中一个示例如下：

id    values 
1   [v1, v2, v1]

如何从列中的列表中获取唯一值？第二列v1、v2中的所需输出我试过使用df['values'].unique（），但显然它不起作用

试试看

df['values'] = df['values'].apply(lambda x: list(set(x)))


    id  values
0   1   [v2, v1]

注意：值是pandas属性，因此最好避免将其用作列名

时间比较：

df= pd.DataFrame({'id':[1]*1000,    'values' :[['v1', 'v2', 'v1']]*1000})
%timeit df['values'].agg(np.unique)

34.7 ms ± 2.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


%timeit df['values'].apply(lambda x: list(set(x)))

1.98 ms ± 259 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

试一试

注意：值是pandas属性，因此最好避免将其用作列名

时间比较：

df= pd.DataFrame({'id':[1]*1000,    'values' :[['v1', 'v2', 'v1']]*1000})
%timeit df['values'].agg(np.unique)

34.7 ms ± 2.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


%timeit df['values'].apply(lambda x: list(set(x)))

1.98 ms ± 259 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

一个简单的解决方案是agg pd.unique，即

df = pd.DataFrame({'x' : [['v','w','x','v','x']]})

df['x'].agg(pd.unique) # Also np.unique

0    [v, w, x]
Name: x, dtype: object

或

一个简单的解决方案是agg pd.unique，即

df = pd.DataFrame({'x' : [['v','w','x','v','x']]})

df['x'].agg(pd.unique) # Also np.unique

0    [v, w, x]
Name: x, dtype: object

或

再次

时机

%timeit df['values'].agg(np.unique)
The slowest run took 6.78 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 6.99 ms per loop
%timeit list(map(set,df['values'].values))
The slowest run took 55.36 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 228 µs per loop
%timeit df['values'].apply(lambda x: list(set(x)))
1000 loops, best of 3: 743 µs per loop

再次

时机

%timeit df['values'].agg(np.unique)
The slowest run took 6.78 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 6.99 ms per loop
%timeit list(map(set,df['values'].values))
The slowest run took 55.36 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 228 µs per loop
%timeit df['values'].apply(lambda x: list(set(x)))
1000 loops, best of 3: 743 µs per loop

试用时间测试：）

Apply

比

agg

快，这里没有参数。试用时间测试：）

Apply

比

agg

快，这里没有参数，

pd.unique

比那张地图慢多了——很像Python，我总是忘了在数据帧的上下文中使用它。可能是下一次：）map很好-非常pythonic，我总是忘记在dataframe的上下文中使用它。可能是下一次：）