Python 更改DataFrame列类型但使用默认错误值的简单方法？_Python_Pandas_Type Conversion

Python 更改DataFrame列类型但使用默认错误值的简单方法？

python pandas

Python 更改DataFrame列类型但使用默认错误值的简单方法？,python,pandas,type-conversion,Python,Pandas,Type Conversion,假设我有以下专栏 >>> import pandas >>> a = pandas.Series(['0', '1', '5', '1', None, '3', 'Cat', '2']) 我希望能够将列中的所有数据转换为类型int，任何无法转换的元素都应替换为0 我目前的解决方案是使用来使用“强制”选项来_numeric，用0填充任何NaN，然后转换为int（因为NaN的存在使得列浮动，而不是int）有没有一种方法可以让我一步完成这项工作，而不必经历两种中

假设我有以下专栏

>>> import pandas
>>> a = pandas.Series(['0', '1', '5', '1', None, '3', 'Cat', '2'])

我希望能够将列中的所有数据转换为类型

int

，任何无法转换的元素都应替换为

我目前的解决方案是使用

来使用“强制”
选项来_numeric

，用

填充任何

NaN

，然后转换为

int

（因为

NaN

的存在使得列

浮动，而不是int
）
有没有一种方法可以让我一步完成这项工作，而不必经历两种中间状态？我正在寻找一种类似于astype
的想象选项的方法：
>>> a.astype(int, value_on_error=0)

选项1
pd.to_numeric(a, 'coerce').fillna(0).astype(int)

def try_int(x):
    try:
        return int(x)
    except:
        return 0

a.apply(try_int)

b = np.empty(a.shape, dtype=int)

i = np.core.defchararray.isdigit(a.values.astype(str))

b[i] = a[i].astype(int)
b[~i] = 0

pd.Series(b, a.index)

0    0
1    1
2    5
3    1
4    0
5    3
6    0
7    2
dtype: int64


选项2
b = pd.to_numeric(a, 'coerce')
b.mask(b.isnull(), 0).astype(int)


选项3
pd.to_numeric(a, 'coerce').fillna(0).astype(int)

def try_int(x):
    try:
        return int(x)
    except:
        return 0

a.apply(try_int)

b = np.empty(a.shape, dtype=int)

i = np.core.defchararray.isdigit(a.values.astype(str))

b[i] = a[i].astype(int)
b[~i] = 0

pd.Series(b, a.index)

0    0
1    1
2    5
3    1
4    0
5    3
6    0
7    2
dtype: int64


选项4
pd.to_numeric(a, 'coerce').fillna(0).astype(int)

def try_int(x):
    try:
        return int(x)
    except:
        return 0

a.apply(try_int)

b = np.empty(a.shape, dtype=int)

i = np.core.defchararray.isdigit(a.values.astype(str))

b[i] = a[i].astype(int)
b[~i] = 0

pd.Series(b, a.index)

0    0
1    1
2    5
3    1
4    0
5    3
6    0
7    2
dtype: int64


所有产品
pd.to_numeric(a, 'coerce').fillna(0).astype(int)

def try_int(x):
    try:
        return int(x)
    except:
        return 0

a.apply(try_int)

b = np.empty(a.shape, dtype=int)

i = np.core.defchararray.isdigit(a.values.astype(str))

b[i] = a[i].astype(int)
b[~i] = 0

pd.Series(b, a.index)

0    0
1    1
2    5
3    1
4    0
5    3
6    0
7    2
dtype: int64


定时

代码如下

输出：
0    0
1    1
2    5
3    1
4    0
5    3
6    0
7    2
dtype: int64

@PirSquare谢谢。pd.to_numeric（a.where（a.str.isnumeric（），0））
同样有效。虽然这不是一个步骤，但对于我正在寻找的内容，它可能会尽可能好。它至少只有一个类型转换而不是两个。我是否应该解释选项1与我的当前解决方案相同这一事实，即我的原始解决方案可能还可以？是的！当然可以。但你的目标是什么。更简单的代码还是更快的执行？这是一个很好的问题。我的主要目标是简化代码。对我来说，我提出的解决方案需要我思考片刻，因为虽然行为是自我记录的，但意图并非如此。我觉得在转换时用显式替换值的方法更明确。我也从来没有喜欢过这样一个事实，to_numeric
不是一个系列的方法，而是一个模块级函数，所以我倾向于尝试并找到不使用它的解决方案。它只是展示了。。。不要在没有分析的情况下对性能做出任何假设！回答得好！