Python 如何将函数应用于具有混合数和整数的序列_Python_Pandas_Types_Series

Python 如何将函数应用于具有混合数和整数的序列

python pandas types

Python 如何将函数应用于具有混合数和整数的序列,python,pandas,types,series,Python,Pandas,Types,Series,我有ICD9代码，它是整数和字符串的混合体。我需要对代码进行分类。作为字符串的代码归为一组，称为“abc”，然后根据代码所属的值范围对纯数字代码进行分组。我已经尝试了很多方法，但没有任何运气，下面是一些方法 a=pd.Series(['v2',2,7,22,'v4']) print (a.dtype) a=a.apply(lambda x: 'abc' if x[0]=='v' else x) a=a.apply(lambda x: 'def' if x>=1 and x<10 el

我有ICD9代码，它是整数和字符串的混合体。我需要对代码进行分类。作为字符串的代码归为一组，称为“abc”，然后根据代码所属的值范围对纯数字代码进行分组。我已经尝试了很多方法，但没有任何运气，下面是一些方法

a=pd.Series(['v2',2,7,22,'v4'])
print (a.dtype)
a=a.apply(lambda x: 'abc' if x[0]=='v' else x)
a=a.apply(lambda x: 'def' if x>=1 and x<10 else x)
a=a.apply(lambda x: 'ghi' if x>=10 and x<30 else x)

我也尝试过：

a=pd.Series(['v2',2,7,22,'v4'])
print (a.dtype)
a=a.apply(lambda x: 'abc' if x.astype(str).str[0]=='v' else x)
a=a.apply(lambda x: 'def' if x.astype(int)>=1 and x.astype(int)<10 else x)
a=a.apply(lambda x: 'ghi' if x.astype(int)>=10 and x.astype(int)<30 else x)

谢谢你的帮助。我需要使用熊猫，因为这是更大数据帧的一部分。更复杂的是，我有一些以“e”开头的代码，还有一些以“v”开头的代码，它们需要归入不同的类别。尽管如此，当我使用数据框来表示数值时，它并没有将列中的数值元素转换为数值数据类型。（下面的代码指的是我的实际数据，其中diag_1 etc指的是列名，diag_data指的是数据框

    list_diag=['diag_1','diag_2','diag_3']
    for i in list_diag:
    pd.to_numeric(diabetic_data[i],errors='coerce').fillna(-1)
    print(diabetic_data[i].dtype)

知道为什么数据类型没有转换吗？目前它将列中的所有元素视为字符串，因为当我尝试“is instance（x，str）”时，所有列都有效地转换为“abc”。

我将使用以下方法：

现在我们可以使用

pd.cut（）

对垃圾箱进行分类：

[18]中的

pd.cut（pd.to_numeric（a，errors='concurve'）.fillna（-1），
…：bin=[-np.inf，-1,9，np.inf]，
…：标签=['abc'、'def'、'ghi']
...: )
...:
出[18]：
0 abc
1 def
2 def
3 ghi
4 abc
数据类型：类别
类别（3，对象）：[abc


更新：这里有一个更通用的解决方案（感谢您的提示！），它也适用于负数
资料来源：
In [34]: x
Out[34]:
   val
0   v2
1  -10
2   -1
3    0
4   v5
5    9
6   10
7   13
8   22
9   v4

In [35]: x.assign(
    ...:    cat=pd.cut(pd.to_numeric(x['val'], errors='coerce').fillna(-np.inf),
    ...:        bins=[-np.inf, np.iinfo(np.int64).min, -1, np.inf],
    ...:        labels=['NaN','<0','>=0'],
    ...:        include_lowest=True))
    ...:
Out[35]:
   val  cat
0   v2  NaN
1  -10   <0
2   -1   <0
3    0  >=0
4   v5  NaN
5    9  >=0
6   10  >=0
7   13  >=0
8   22  >=0
9   v4  NaN

[34]：x中的
出[34]：
瓦尔
0 v2
1  -10
2   -1
3    0
4 v5
5    9
6   10
7   13
8   22
9 v4
在[35]中：x.assign(
…：cat=pd.cut（pd.to_numeric（x['val']，errors='concurve'）.fillna（-np.inf），
…：bin=[-np.inf，np.iinfo（np.int64）.min，-1，np.inf]，
…：labels=['NaN'，'=0']，
…：包括（最低=真）
...:
出[35]：
瓦尔猫
0 v2 NaN
1  -10   =0
6   10  >=0
7   13  >=0
8   22  >=0
9 v4南
不使用熊猫
使用isinstance（）列出按类型选择元素的理解
groupby（）需要一个键函数，我刚刚编了一个
a = ['v2',2,7,22,'v4', 77, 'fred']

a_strs = [e for e in a if isinstance(e, str)]

print('strings: ', a_strs)

a_ints = [e for e in a if isinstance(e, int)]

print('ints: ', a_ints)


from itertools import groupby

groups = [list(g) for k,g in groupby(a_ints, key=lambda x: x//10)]

print('group by decade ', groups)

strings:  ['v2', 'v4', 'fred']
ints:  [2, 7, 22, 77]
group by decade  [[2, 7], [22], [77]]

您正在尝试测试类型，但您使用的是错误的（不存在）函数。以下是在遵循您的算法风格的同时实现该测试的方法：
a.apply(lambda x: 'abc' if isinstance(x, str) else
                  'def' if x>=1 and x<10 else
                  'ghi' if x>=10 and x<30 else x)
Out[31]: 
0    abc
1    def
2    def
3    ghi
4    abc
dtype: object

a.apply（lambda x:'abc'如果是instance（x，str）else
“def”如果x>=1，x=10和x，如果你处理未知的正值和初始值，你可以使用-np.inf
强制+fillna，并将切割的左边界保留为一个桶。@Boud，说得好，谢谢！我相应地更新了答案谢谢-这真的很有帮助。我在我的问题中添加了一些细节，因为它是仍然不适用于我。@ags，您能提供一个可复制的（类似于您拥有的）样本数据集和所需的数据集吗？pd.to_numeric（糖尿病数据[i]，errors='concurve'）。fillna（-1）
-不更改数据-它返回转换后的序列，所以您应该这样做：diabetic_data[i]=pd.to_numeric（糖尿病数据[i]，errors='concurve'）。fillna（-1）
In [17]: pd.to_numeric(a, errors='coerce').fillna(-1)
Out[17]:
0    -1.0
1     2.0
2     7.0
3    22.0
4    -1.0
dtype: float64

In [18]: pd.cut(pd.to_numeric(a, errors='coerce').fillna(-1),
    ...:        bins=[-np.inf, -1, 9, np.inf],
    ...:        labels=['abc','def','ghi']
    ...: )
    ...:
Out[18]:
0    abc
1    def
2    def
3    ghi
4    abc
dtype: category
Categories (3, object): [abc < def < ghi]

In [34]: x
Out[34]:
   val
0   v2
1  -10
2   -1
3    0
4   v5
5    9
6   10
7   13
8   22
9   v4

In [35]: x.assign(
    ...:    cat=pd.cut(pd.to_numeric(x['val'], errors='coerce').fillna(-np.inf),
    ...:        bins=[-np.inf, np.iinfo(np.int64).min, -1, np.inf],
    ...:        labels=['NaN','<0','>=0'],
    ...:        include_lowest=True))
    ...:
Out[35]:
   val  cat
0   v2  NaN
1  -10   <0
2   -1   <0
3    0  >=0
4   v5  NaN
5    9  >=0
6   10  >=0
7   13  >=0
8   22  >=0
9   v4  NaN

a = ['v2',2,7,22,'v4', 77, 'fred']

a_strs = [e for e in a if isinstance(e, str)]

print('strings: ', a_strs)

a_ints = [e for e in a if isinstance(e, int)]

print('ints: ', a_ints)


from itertools import groupby

groups = [list(g) for k,g in groupby(a_ints, key=lambda x: x//10)]

print('group by decade ', groups)

strings:  ['v2', 'v4', 'fred']
ints:  [2, 7, 22, 77]
group by decade  [[2, 7], [22], [77]]

a.apply(lambda x: 'abc' if isinstance(x, str) else
                  'def' if x>=1 and x<10 else
                  'ghi' if x>=10 and x<30 else x)
Out[31]: 
0    abc
1    def
2    def
3    ghi
4    abc
dtype: object