Pandas 列的前五个非数字、非空、不同的值_Pandas

Pandas 列的前五个非数字、非空、不同的值

pandas

Pandas 列的前五个非数字、非空、不同的值,pandas,Pandas,如何从列中获取前五个非数字、非空、不同的值例如，给出如下表 col1 ===== n1 1 2 n2 n3 n3 n4 n5 n5 n6 None 我想去 col1 ===== n1 n2 n3 n4 n5 您可以使用pd.to_numeric强制非NaN为NaN，然后反转遮罩并选择前5个唯一值： In [9]: df.loc[df.index.difference(pd.to_numeric(df['col1'], errors='c

如何从列中获取前五个非数字、非空、不同的值

例如，给出如下表

col1 
=====
n1 
1        
2        
n2
n3
n3
n4
n5
n5
n6
None

我想去

 col1 
=====
n1       
n2
n3
n4
n5

您可以使用

pd.to_numeric

强制非NaN为

NaN

，然后反转遮罩并选择前5个唯一值：

In [9]:
df.loc[df.index.difference(pd.to_numeric(df['col1'], errors='coerce').dropna().index),'col1'].unique()[:5]

Out[9]:
array(['n1', 'n2', 'n3', 'n4', 'n5'], dtype=object)

您可以使用

pd.to_numeric

强制非NaN为

NaN

，然后反转遮罩并选择前5个唯一值：

In [9]:
df.loc[df.index.difference(pd.to_numeric(df['col1'], errors='coerce').dropna().index),'col1'].unique()[:5]

Out[9]:
array(['n1', 'n2', 'n3', 'n4', 'n5'], dtype=object)

您可以使用：

df = pd.DataFrame({'col1':['n1', '1', '2', 'n2', 'n3', 'n3', 'n4', 'n5', 'n5', 'n6','None']})

通过
```
replace
```
删除字符串
```
NaN
```
和
```
None
```
通过和删除数字
通过以下方式删除重复项：
通过以下方式获取前5个值：
如有必要，为单调递增索引重置索引

另一种可能的解决办法：

df = pd.Series(df.loc[pd.to_numeric(df.col1
                        .replace({'None':1, 'NaN':1}), errors='coerce').isnull(), 'col1']
      .unique()[:5])
print (df)
0    n1
1    n2
2    n3
3    n4
4    n5
dtype: object

但如果混合值-带

字符串的数字

：

df = pd.DataFrame({'col1':['n1', 1, 1, 'n2', 'n3', 'n3', 'n4', 'n5', 'n5', 'n6', None]})

df = pd.Series(df.loc[df.col1.apply(lambda x: isinstance(x, str)), 'col1']
       .unique()[:5])

print (df)
0    n1
1    n2
2    n3
3    n4
4    n5
dtype: object

您可以使用：

df = pd.DataFrame({'col1':['n1', '1', '2', 'n2', 'n3', 'n3', 'n4', 'n5', 'n5', 'n6','None']})

通过
```
replace
```
删除字符串
```
NaN
```
和
```
None
```
通过和删除数字
通过以下方式删除重复项：
通过以下方式获取前5个值：
如有必要，为单调递增索引重置索引

另一种可能的解决办法：

df = pd.Series(df.loc[pd.to_numeric(df.col1
                        .replace({'None':1, 'NaN':1}), errors='coerce').isnull(), 'col1']
      .unique()[:5])
print (df)
0    n1
1    n2
2    n3
3    n4
4    n5
dtype: object

但如果混合值-带

字符串的数字

：

df = pd.DataFrame({'col1':['n1', 1, 1, 'n2', 'n3', 'n3', 'n4', 'n5', 'n5', 'n6', None]})

df = pd.Series(df.loc[df.col1.apply(lambda x: isinstance(x, str)), 'col1']
       .unique()[:5])

print (df)
0    n1
1    n2
2    n3
3    n4
4    n5
dtype: object

循环并使用正则表达式？循环并使用正则表达式？这两个答案都是正确的和好的，但我发现在我的头脑中这要容易得多。这两个答案都是正确的和好的，但我发现在我的头脑中这要容易得多。