Python 3.x 在数据帧单元格中查找唯一值_Python 3.x_Pandas_Unique

Python 3.x 在数据帧单元格中查找唯一值

python-3.x pandas

Python 3.x 在数据帧单元格中查找唯一值,python-3.x,pandas,unique,Python 3.x,Pandas,Unique,样本DF data = {'name': ['Jason , Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 'year': ['2012 , 2012 , 2016 , 2016', 2012, 2013, 2014, 2014], 'reports': ['4 , 4 , 5 , 6 , 6 , 7', 24, 31, 2, 3]} df1 = pd.DataFrame(data, index = ['Cochice',

样本DF

data = {'name': ['Jason , Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
        'year': ['2012 , 2012 , 2016 , 2016', 2012, 2013, 2014, 2014], 
        'reports': ['4 , 4 , 5 , 6 , 6 , 7', 24, 31, 2, 3]}
df1 = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'])

看起来像：

                     name            ...                                   year
Cochice     Jason , Jason            ...              2012 , 2012 , 2016 , 2016
Pima                Molly            ...                                   2012
Santa Cruz           Tina            ...                                   2013
Maricopa             Jake            ...                                   2014
Yuma                  Amy            ...                                   2014

我希望

Cochice

索引的每个单元格上都有唯一的值。我尝试了

删除重复的

和

nunique

，但它们都不起作用

在我的原始df中，列数可以超过3

输出Df

             name  reports       year
Cochice     Jason  4,5,6,7  2012,2016
Pima        Molly       24       2012
Santa Cruz   Tina       31       2013
Maricopa     Jake        2       2014
Yuma          Amy        3       2014

我不知道有任何内置的Pandas函数可以做到这一点，因此我提出了一个使用

applymap

的解决方案，以及一个自定义函数，它可以在逗号上拆分，去除空白，并将唯一的元素重新连接到一个字符串中。它并不漂亮，效率也不太高，但它应该可以工作：

In [15]: df1.applymap(lambda x: x if ',' not in str(x) else ','.join(sorted(set(y.strip() for y in(x.split(','))))))
Out[15]: 
             name  reports       year
Cochice     Jason  4,5,6,7  2012,2016
Pima        Molly       24       2012
Santa Cruz   Tina       31       2013
Maricopa     Jake        2       2014
Yuma          Amy        3       2014

编辑以显示仅应用于某个索引而不是所有行：

df1.loc[['Cochice']].applymap(lambda x: x if ',' not in str(x) else ','.join(sorted(set(y.strip() for y in(x.split(','))))))
Out[24]: 
          name  reports       year
Cochice  Jason  4,5,6,7  2012,2016

我不知道有任何内置的Pandas函数可以做到这一点，因此我提出了一个使用

applymap

In [15]: df1.applymap(lambda x: x if ',' not in str(x) else ','.join(sorted(set(y.strip() for y in(x.split(','))))))
Out[15]: 
             name  reports       year
Cochice     Jason  4,5,6,7  2012,2016
Pima        Molly       24       2012
Santa Cruz   Tina       31       2013
Maricopa     Jake        2       2014
Yuma          Amy        3       2014

编辑以显示仅应用于某个索引而不是所有行：

df1.loc[['Cochice']].applymap(lambda x: x if ',' not in str(x) else ','.join(sorted(set(y.strip() for y in(x.split(','))))))
Out[24]: 
          name  reports       year
Cochice  Jason  4,5,6,7  2012,2016

你的真实数据在逗号前有空格吗（比如在“Jason，Jason”条目中），或者这只是一个输入错误？实际上，我在所有值中都有空格。让我更新问题你的真实数据在逗号前有空格吗（比如在“Jason，Jason”条目中）或者这只是一个输入错误？事实上，我在所有的值中都有空格。让我更新这个问题。这个问题有效。只需再查询一次。这可以专门应用于一个特定的“索引”值而不是整个df吗。在上述情况下，

Cochice

@好奇的\u谢谢！这很有效..只需再进行一次查询..这可以专门应用于特定的“索引”值而不是整个df。在上述情况下，

Cochice

@好奇的\u谢谢！