Python 查找数据框中字符串元素的位置_Python_Pandas

Python 查找数据框中字符串元素的位置

python pandas

Python 查找数据框中字符串元素的位置,python,pandas,Python,Pandas,我有一个熊猫数据帧，我怀疑它包含一些字符串 >>> d2 1 2 3 4 5 6 7 8 9 10 ... 1771 \ 0 0 0 0 0 0 0 0 0 0 0 ... 0 1 0 0 0 0 0 0 0 0 0

我有一个熊猫数据帧，我怀疑它包含一些字符串

>>> d2
   1     2     3     4     5     6     7     8     9     10    ...   1771  \
0     0     0     0     0     0     0     0     0     0     0  ...      0   
1     0     0     0     0     0     0     0     0     0     0  ...      0   
2     0     0     0     0     0     0     0     0     0     0  ...      0   
3     0     0     0     0     0     0     0     0     0     0  ...      0   
4     0     0     0     0     0     0     0     0     0     0  ...      0   
5     0     0     0     0     0     0     0     0     0     0  ...      0   
6     0     0     0     0     0     0     0     0     0     0  ...      0   
7     0     0     0     0     0     0     0     0     0     0  ...      0   
8     0     0     0     0     0     0     0     0     0     0  ...      0   
9     0     0     0     0     0     0     0     0     0     0  ...      0   

   1772  1773  1774  1775  1776  1777  1778  1779  1780  
0     0     0     0     0     0     0     1   398     2  
1     0     0     0     0     0     0     1   398     2  
2     0     0     0     0     0     0     1   398     2  
3     0     0     0     0     0     0     1   398     2  
4     0     0     0     0     0     0     1   398     2  
5     0     0     0     0     0     0     1   398     2  
6     0     0     0     0     0     0     1   398     2  
7     0     0     0     0     0     0     1   398     2  
8     0     0     0     0     0     0     1   398     2  
9     0     0     0     0     0     0     1   398     2  

[10 rows x 1780 columns]
>>> any(d2.applymap(lambda x: type(x) == str))
True
>>>

我想找出哪些元素是字符串，以防删除包含这些元素的列

我该怎么做

我得到一个奇怪的结果。似乎所有列都有dtype int或float，但同时似乎有些元素是string。这怎么可能

>>> d2.dtypes.drop_duplicates()
1         int64
1755    float64
dtype: object
>>> any(d2.applymap(lambda x: type(x) == str))
True

使用列表理解检查每列的类型，然后排除对象：

df[[col for col in df if df[col].dtype != 'O']]  # 'O' is letter O (not zero)

我不确定我是否理解您下面的评论，因此我将用一个简单的例子进一步解释：

d2 = pd.DataFrame({'a': [1, 2], 'b': ['a', 1], 'c': [2, 3]})

>>> d2
   a  b  c
0  1  a  2
1  2  1  3

>>> d2.applymap(lambda x: type(x))
                      a             b                     c
0  <type 'numpy.int64'>  <type 'str'>  <type 'numpy.int64'>
1  <type 'numpy.int64'>  <type 'int'>  <type 'numpy.int64'>

>>> d2.applymap(lambda x: type(x) == str)
       a      b      c
0  False   True  False
1  False  False  False

测试每列的类型：

>>> [d2[col].dtype for col in d2]
[dtype('int64'), dtype('O'), dtype('int64')]

解决方案显然是有效的：

>>> d2[[col for col in d2 if d2[col].dtype != 'O']]
   a  c
0  1  2
1  2  3

列出“Object”类型的所有列：

>>> [col for col in d2 if d2[col].dtype == 'O']
['b']

我想说的是，由于你使用的方法，你得到了假阳性

下面是我要做的：

要选择可能包含文本的所有列，可以使用以下命令：

df.select_dtypes(include=['object']).columns

df.applymap(lambda x: isinstance(x, str)).any().any()

或者：

df.select_dtypes(exclude=['number']).columns

要检查数据框中的任何单元格是否为文本，请使用以下命令：

df.select_dtypes(include=['object']).columns

df.applymap(lambda x: isinstance(x, str)).any().any()

或者删除last

.any（）

，查看所有包含文本和不包含文本的列：

df.applymap(lambda x: isinstance(x, str)).any()

调用

any（您的数据帧）

（使用数据帧作为参数）会给您带来误报。

>[col for col in d2 if d2[col].dtype=='O'][]

这样我就得到了所有列。而我认为数据中有一些字符串。（我更新了问题）很明显，没有一个单元格是字符串。顺便说一下，奇怪的是，我使用的方法导致了不同的结果