Python 删除违反列数据类型的行_Python_Pandas

Python 删除违反列数据类型的行

python pandas

Python 删除违反列数据类型的行,python,pandas,Python,Pandas,我有以下数据帧，必须从中删除违反列数据类型的行： ts a b c d 0 1555338562 9.01 True 1648.37 1.01 1 1555338563 9.01 1.022 1648.37 1.01 2 1555338564 9.01 1.022 AVC 1.01 3 1555338565 9.01

我有以下数据帧，必须从中删除违反列数据类型的行：

            ts       a         b         c      d
0   1555338562    9.01      True   1648.37   1.01
1   1555338563    9.01     1.022   1648.37   1.01
2   1555338564    9.01     1.022   AVC       1.01
3   1555338565    9.01     1.022   1648.37   1.01
4   1555338566    9.01     1.022   1648.33   1.01
5   1555338567    test     1.022   1648.33   1.01

列的数据类型：

data_types = { "ts": "int64", 
               "a": "float64", 
               "b": "float64", 
               "c": "float64", 
               "d": "float64"
             }

在上面的示例中，行0、2和5将被删除，因为它们分别违反了列b、c和a的数据类型。预期产出为：

            ts       a         b         c      d
0   1555338563    9.01     1.022   1648.37   1.01
1   1555338565    9.01     1.022   1648.37   1.01
2   1555338566    9.01     1.022   1648.33   1.01

有没有建议我如何使用熊猫来实现这一点

编辑：将来我们也可以使用字符串或布尔列。比如说,

            ts       a         b         c      d    e       f
0   1555338562    9.01      True   1648.37   1.01 True  Test_1
1   1555338563    9.01     1.022   1648.37   1.01 True  Test_2
2   1555338564    9.01     1.022   AVC       1.01 True  Test_2
3   1555338565    9.01     1.022   1648.37   1.01 True  Test_2
4   1555338566    9.01     1.022   1648.33   1.01 True  Test_2
5   1555338567    test     1.022   1648.33   1.01 False Test_2

如果您使用的是Pandas>1.0，请尝试使用，然后使用。

因为您希望将所有列转换为

int

或

float

，类似的操作可能会奏效：

In [1118]: for i in df.columns:
      ...:     df[i] = pd.to_numeric(df[i], errors='coerce')
      ...:

In [1122]: df = df.dropna()

In [1123]: df
Out[1123]: 
           ts     a      b        c     d
1  1555338563  9.01  1.022  1648.37  1.01
3  1555338565  9.01  1.022  1648.37  1.01
4  1555338566  9.01  1.022  1648.33  1.01

您可以为此构建一个自定义函数，将

True

放在有正确类型else

False

的地方。现在您可以将其用作布尔掩码，然后使用。关于

@Ch3steR的答案对于int、float和string数据类型非常有效，但对于boolean则失败了，因为Python在内部将boolean转换为整数。以下方法适用于所有数据类型：

data_types = { "ts": "int64", 
           "a": "float64", 
           "b": "float64", 
           "c": "float64", 
           "d": "float64"
         }
cast_mapping = { 'int64': int, 
             'float64': float, 
             'bool': bool, 
             'str': str
           }

def change_type(s):
    cast_to = s.name
    def cast_type(val):
        return isinstance(val, cast_mapping[cast_to])
return s.map(cast_type)

df[df.apply(change_type)].dropna().astype(data_types)

           ts     a      b        c     d
1  1555338563  9.01  1.022  1648.37  1.01
3  1555338565  9.01  1.022  1648.37  1.01
4  1555338566  9.01  1.022  1648.33  1.01

如果我错了，请纠正我，但Pandas已经将这些列分配给更高级别的数据类型（在本例中为text），因此所有a和c列值都被视为text，对吗？@Ch3steR-本例是，但OP的示例将a和c数据类型作为“object”，因此，他将无法知道是哪个价值观导致的it@Ch3steR列可以具有任何数据类型。例如，将来可以使用str或bool数据类型添加一个新的列“e”。@Ch3steR它最多大约20000行。

data_types = { "ts": "int64", 
           "a": "float64", 
           "b": "float64", 
           "c": "float64", 
           "d": "float64"
         }
cast_mapping = { 'int64': int, 
             'float64': float, 
             'bool': bool, 
             'str': str
           }

def change_type(s):
    cast_to = s.name
    def cast_type(val):
        return isinstance(val, cast_mapping[cast_to])
return s.map(cast_type)

df[df.apply(change_type)].dropna().astype(data_types)

           ts     a      b        c     d
1  1555338563  9.01  1.022  1648.37  1.01
3  1555338565  9.01  1.022  1648.37  1.01
4  1555338566  9.01  1.022  1648.33  1.01