Python 应用某些操作后自动更改数据帧的数据类型_Python_Pandas

Python 应用某些操作后自动更改数据帧的数据类型

python pandas

Python 应用某些操作后自动更改数据帧的数据类型,python,pandas,Python,Pandas,这是一个数据帧： txt = '''A B C 1Â 2Â abcÂ 2Â 5Â defÂ''' df = pd.read_table(StringIO(txt), sep = '\s{1,}') 现在df.dtypes给出： A B C 0 1Â 2Â abcÂ 1 2Â 5Â defÂ 删除特殊字符后 A object B object C object dtype: object 我想适当地更改每列的d类型我已经使用了df.in

这是一个数据帧：

txt = '''A B C 
1Â 2Â abcÂ
2Â 5Â defÂ'''

df = pd.read_table(StringIO(txt), sep = '\s{1,}')

现在df.dtypes给出：

    A   B   C
0   1Â  2Â  abcÂ
1   2Â  5Â  defÂ

删除特殊字符

后

A    object
B    object
C    object
dtype: object

我想适当地更改每列的

d类型
我已经使用了df.infere\dtype（）.dtypes
，但它仍然提供了对象dtype。
我还使用了pd.api.types.expert\u dtype（df）
，但它给出了字符串
，但我希望每个列都有dtype
。
这种情况发生是因为1
和2
实际上是“1”
和“2”
。熊猫只是推断你有字符串列，因为它们是字符串。它们是字符串，因为1
是一个字符串，当你去掉时，你只剩下str“1”

相反，您可以做的是事先解析数据，然后用清理过的数据创建数据帧
比如说
df = df.applymap(lambda x: x.strip('Â'))

现在
屈服
pd.read_table(StringIO(clean(txt)), delim_whitespace=True).dtypes

您可以使用dataframe.astype（）
方法来更改dtypes
。使用dict
以特定列及其预期类型为目标
A     int64
B     int64
C    object
dtype: object

编辑：如果我理解正确，请在清理数据帧后再查看数据帧的数据类型。在这种情况下，您可以执行以下操作：
import numpy as np

# Create data frame and clean data...

types = {'A': np.int64, 'B': np.int64, 'C': np.str}
df = df.astype(types)

df.dtypes

A     int64
B     int64
C    object
dtype: object

或者，对于口述：
# Implicitly convert numeric types; see the 'convert_objects'
# documentation for other supported types
df = df.convert_objects(convert_numeric=True)

df.apply(pd.api.types.infer_dtype)

A    integer
B    integer
C     string
dtype: object

注意：我使用的是pandas 0.23.3
编辑2:根据您的请求，这里是我正在使用的完整代码。我也简化了它，所以不需要推理
dict(df.apply(pd.api.types.infer_dtype))

{'A': 'integer', 'B': 'integer', 'C': 'string'}

即使这样做有效，我相信OP也不想手动设置类型。我只想在没有硬编码的情况下获得类型
字典，因为有100个列@RafaelC有什么帮助吗？@krishna请查看修改后的答案。@T.Ray在我的例子中，它为所有列提供了字符串。我用的是熊猫0.22。0@krishna看起来您需要首先转换一些字符串类型。请参阅convert_objectsdocs以了解其他受支持的类型（）。它在本例中有效。但在一般情况下，我想在读取数据帧并清理之后更改类型。有没有办法做到这一点。
dict(df.apply(pd.api.types.infer_dtype))

{'A': 'integer', 'B': 'integer', 'C': 'string'}

from io import StringIO
txt = '''A B C 
1Â 2Â abcÂ
2Â 5Â defÂ'''

df = pd.read_table(StringIO(txt), sep = '\s{1,}', engine='python')
df = df.applymap(lambda x: x.strip('Â'))

df = df.convert_objects(convert_numeric=True)

df.dtypes

A     int64
B     int64
C    object
dtype: object