Python 熊猫过滤大于1的唯一值并连接唯一值_Python_Pandas - Fatal编程技术网

Python 熊猫过滤大于1的唯一值并连接唯一值

python pandas

Python 熊猫过滤大于1的唯一值并连接唯一值,python,pandas,Python,Pandas,我有一个数据框： df2 = pd.DataFrame({'c':[1,1,1,2,2,2,2,3], 'type':['m','n','o','m','m','n','n', 'p']}) 我想找出c的哪些值具有多个唯一类型，对于这些值，返回c值、唯一类型的数量以及连接在一个字符串中的所有唯一类型到目前为止，我已经用了这两个问题：这是可行的，但我无法获得唯一值（例如，在第二行中，我希望只有一个m和一个n）。我的问题如下：我可以跳过创建“唯一

我有一个数据框：

df2 = pd.DataFrame({'c':[1,1,1,2,2,2,2,3],
                    'type':['m','n','o','m','m','n','n', 'p']})

我想找出

的哪些值具有多个唯一类型，对于这些值，返回

值、唯一类型的数量以及连接在一个字符串中的所有唯一类型

到目前为止，我已经用了这两个问题：

这是可行的，但我无法获得唯一值（例如，在第二行中，我希望只有一个

和一个

）。我的问题如下：

我可以跳过创建“唯一计数”的中间步骤吗创造一些临时的东西

如何仅筛选唯一值第二步呢

先删除唯一行，然后计算值的解决方案-创建帮助程序系列

，并使用唯一字符串

set

s:

s= df2.groupby('c')['type'].transform('nunique').rename('Unique counts')
a = df2[s > 1].groupby(['c', s]).agg(lambda x: '-'.join(set(x)))
print (a)

                  type
c Unique counts       
1 3              o-m-n
2 2                m-n

另一个想法是首先通过以下方式删除重复项：

然后使用join聚合计数：

a = df3.groupby('c')['type'].agg([('Unique Counts', 'size'), ('Type', '-'.join)])
print (a)
   Unique Counts   Type
c                      
1              3  m-n-o
2              2    m-n

或者，如果需要首先聚合所有值：

df4 = df2.groupby('c')['type'].agg([('Unique Counts', 'nunique'), 
                                  ('Type', lambda x: '-'.join(set(x)))])
print (df4)
   Unique Counts   Type
c                      
1              3  o-m-n
2              2    m-n
3              1      p

最后通过以下方式删除唯一行：

先删除唯一行，然后计算值的解决方案-创建帮助程序系列

，并使用唯一字符串

set

s:

s= df2.groupby('c')['type'].transform('nunique').rename('Unique counts')
a = df2[s > 1].groupby(['c', s]).agg(lambda x: '-'.join(set(x)))
print (a)

                  type
c Unique counts       
1 3              o-m-n
2 2                m-n

另一个想法是首先通过以下方式删除重复项：

然后使用join聚合计数：

a = df3.groupby('c')['type'].agg([('Unique Counts', 'size'), ('Type', '-'.join)])
print (a)
   Unique Counts   Type
c                      
1              3  m-n-o
2              2    m-n

或者，如果需要首先聚合所有值：

df4 = df2.groupby('c')['type'].agg([('Unique Counts', 'nunique'), 
                                  ('Type', lambda x: '-'.join(set(x)))])
print (df4)
   Unique Counts   Type
c                      
1              3  o-m-n
2              2    m-n
3              1      p

最后通过以下方式删除唯一行：

使用并传递

（列名、函数）

的

元组：
[外]
使用并传递（列名、函数）
的元组：
[外]
根据需要在唯一计数
列上使用和筛选：
df2 = (df2.groupby('c', as_index=False)
          .agg({'type': ['nunique', lambda x: '-'.join(np.unique(x))]}))
df2.columns = ['c','Unique counts','type']

print(df2)
   c  Unique counts   type
0  1              3  m-n-o
1  2              2    m-n
2  3              1      p

对唯一计数进行过滤
：
df2 = df2.loc[df2['Unique counts']>1,:]

print(df2)
   c  Unique counts   type
0  1              3  m-n-o
1  2              2    m-n

根据需要在唯一计数
列上使用和筛选：
df2 = (df2.groupby('c', as_index=False)
          .agg({'type': ['nunique', lambda x: '-'.join(np.unique(x))]}))
df2.columns = ['c','Unique counts','type']

print(df2)
   c  Unique counts   type
0  1              3  m-n-o
1  2              2    m-n
2  3              1      p

对唯一计数进行过滤

：

df2 = df2.loc[df2['Unique counts']>1,:]

print(df2)
   c  Unique counts   type
0  1              3  m-n-o
1  2              2    m-n

顺序在输出中很重要？不是真的。我只是想要每个组的唯一顺序在输出中很重要？不是真的。我只是想要每个组的唯一顺序非常感谢，这很有效！我是Python新手，来自R，所以我有以下问题：是否可以跳过中间分配到s，以及集合如何工作？@User2321-如果更改另一个解决方案，我想是。@User2321-set是一个无序的项集合。有关更多信息，请检查，非常感谢，这是有效的！我是Python新手，来自R，因此我有以下问题：是否可以跳过s的中间赋值？set如何工作？@User2321-如果再次更改，我想是er解决方案。@User2321-set是无序的项集合。有关详细信息，请检查

[pandas]相关文章推荐

随机文章推荐