如何在python中检查列的数据类型_Python_Pandas

如何在python中检查列的数据类型

python pandas

如何在python中检查列的数据类型,python,pandas,Python,Pandas,我需要使用不同的函数来处理数字列和字符串列。我现在所做的真是愚蠢： allc = list((agg.loc[:, (agg.dtypes==np.float64)|(agg.dtypes==np.int)]).columns) for y in allc: treat_numeric(agg[y]) allc = list((agg.loc[:, (agg.dtypes!=np.float64)&(agg.dtypes!=np.int)]).columns) for

我需要使用不同的函数来处理数字列和字符串列。我现在所做的真是愚蠢：

allc = list((agg.loc[:, (agg.dtypes==np.float64)|(agg.dtypes==np.int)]).columns)
for y in allc:
    treat_numeric(agg[y])    

allc = list((agg.loc[:, (agg.dtypes!=np.float64)&(agg.dtypes!=np.int)]).columns)
for y in allc:
    treat_str(agg[y])

有没有更优雅的方法？例如

for y in agg.columns:
    if(dtype(agg[y]) == 'string'):
          treat_str(agg[y])
    elif(dtype(agg[y]) != 'string'):
          treat_numeric(agg[y])

您可以通过以下方式访问列的数据类型：

我知道这是一个有点旧的线程，但使用熊猫19.02，您可以：

df.select_dtypes(include=['float64']).apply(your_function)
df.select_dtypes(exclude=['string','object']).apply(your_other_function)

from pandas.api.types import is_string_dtype
from pandas.api.types import is_numeric_dtype

is_string_dtype(df['A'])
>>>> True

is_numeric_dtype(df['B'])
>>>> True

在

熊猫0.20.2

中，您可以执行以下操作：

df.select_dtypes(include=['float64']).apply(your_function)
df.select_dtypes(exclude=['string','object']).apply(your_other_function)

from pandas.api.types import is_string_dtype
from pandas.api.types import is_numeric_dtype

is_string_dtype(df['A'])
>>>> True

is_numeric_dtype(df['B'])
>>>> True

因此，您的代码变成：

for y in agg.columns:
    if (is_string_dtype(agg[y])):
        treat_str(agg[y])
    elif (is_numeric_dtype(agg[y])):
        treat_numeric(agg[y])

如果要将dataframe列的类型标记为字符串，可以执行以下操作：

df.select_dtypes(include=['float64']).apply(your_function)
df.select_dtypes(exclude=['string','object']).apply(your_other_function)

from pandas.api.types import is_string_dtype
from pandas.api.types import is_numeric_dtype

is_string_dtype(df['A'])
>>>> True

is_numeric_dtype(df['B'])
>>>> True

df['A'].dtype.kind

例如：

[8]中的

：df=pd.DataFrame（[1，'a'，1.2]，[2，'b'，2.3]）
[9]中：df[0].dtype.kind，df[1].dtype.kind，df[2].dtype.kind
Out[9]：（'i'，'O'，'f'）

您的代码的答案是：

聚合列中y的

：
如果（agg[y].dtype.kind='f'或agg[y].dtype.kind='i'）：
处理数字（聚合[y]）
其他：
treat_str（agg[y]）

注:

```
uint
```
和
```
uint
```
属于
```
u
```
类，而不是
```
i
```
类
考虑效用函数，例如

问题标题是一般性的，但问题主体中陈述的作者用例是具体的。因此，可以使用任何其他答案

但是，为了全面回答标题问题，应该澄清的是，似乎所有的方法在某些情况下都可能失败，需要一些返工。我在降低可靠性顺序（我认为）时回顾了所有这些（以及一些额外的）：

1.通过

==

直接比较类型（接受答案）。尽管这是一个被接受的答案，并且拥有最多的选票，但我认为根本不应该使用这种方法。因为事实上，正如前面多次提到的那样，python中不鼓励使用这种方法但是，如果仍要使用它，则应注意一些特定的数据类型，如

pd.CategoricalDType

、

pd.perioddype

或

pd.IntervalDtype

。为了正确识别数据类型，必须使用额外的

type（）

：

s=pd.系列（[pd.期间（'2002-03'，'D'），pd.期间（'2012-02-01'，'D'））
s
s、 dtype==pd.PeriodDtype#不工作
类型（s.dtype）=pd.PeriodDtype#工作
>>> 0    2002-03-01
>>> 1    2012-02-01
>>>数据类型：周期[D]
>>>假的
>>>真的

这里的另一个警告是，应准确指出该类型：

s=pd.系列（[1,2]）
s
s、 dtype==np.int64#正在工作
s、 dtype==np.int32#不工作
>>> 0    1
>>> 1    2
>>>数据类型：int64
>>>真的
>>>假的

2. <代码>isinstance（）进近。到目前为止，答案中还没有提到这种方法

因此，如果直接比较类型不是一个好主意，那么让我们试试内置的python函数，即-

isinstance（）

它在一开始就失败了，因为假设我们有一些对象，但是

pd.Series

或

pd.DataFrame

可以用作带有预定义

dtype

但其中没有对象的空容器：

s=pd.Series（[]，dtype=bool）
s
>>>系列（[]，数据类型：bool）

但是，如果一个人以某种方式克服了这个问题，并希望访问每个对象，例如，在第一行中，并检查其数据类型，如下所示：

df=pd.DataFrame（{'int'：[12,2]，'dt'：[pd.Timestamp（'2013-01-02'），pd.Timestamp（'2016-10-20'）]），
索引=['A'，'B']）
对于df.列中的列：
df[col].dtype'is_int64=%s'%i实例（df.loc['A'，col]，np.int64）
>>>（数据类型（'int64'），'is_int64=True'）
>>>（dtype（'以精确打印列数据类型
例如，从文件导入后检查数据类型的步骤
def打印列信息（df）：
template=“%-8s%-30s%s”
打印（模板%（“类型”、“列名”、“示例值”））
打印（“-”*53）
对于df.columns中的c：
打印（模板%（df[c].dtype，c，df[c].iloc[1]））

说明性输出：
键入列名示例值
-----------------------------------------------------
int64 49岁
物体磨损编号
经常出差
float64 DailyRate 279.0
string
不是数据类型Hi David，你能解释一下为什么包含==np.float64吗？我们不是在尝试转换为floats吗？谢谢。@Ryanche因为这个问题中的OP从未说过他要转换为floats，他只需要知道是否使用（未指定）treat_numeric
函数。由于他将agg.dtypes==np.float64
作为一个选项，我也这样做了。numpy中的数字类型比这两个多，在number
下的所有内容都在这里：一般的解决方案是是_numeric\u dtype（agg[y]）
很好的答案，尽管我可能会这样做include[np number]
（也包括整数和32位浮点）用于第一行，第二行exclude[object]
用于第二行。就数据类型而言，字符串是对象。事实上，在对象中包含“string”会给我一个错误。似乎不再支持“string”，必须改用“object”。但肯定正确的答案是：）还应注意，'period'
dtype目前正在引发NotImplementedError
（熊猫0.24.2）。因此，您可能需要一些手工制作的后期处理。对于较旧的pandas版本，是否有其他选择？我得到的错误是：没有名为api.types的模块。pandas.core.common.Is\u numeric\u dtype
自pandas 0.13以来就已经存在，并且它做了相同的事情，但在0.19中，它被弃用为pandas.api.types.Is\u numeric\u dtype
thinkIt是最自然的答案。但是我们应该注意到一些问题。…dtype.kind
的问题是它为句点和字符串/对象提供了'O'
。最好使用pd.api.types.is.
变体。