Python 如何从字符串中删除所有字符并仅在数据帧中保留数字？_Python_Dataframe

Python 如何从字符串中删除所有字符并仅在数据帧中保留数字？

python dataframe

Python 如何从字符串中删除所有字符并仅在数据帧中保留数字？,python,dataframe,Python,Dataframe,我在数据框中有两列，其中包含数值和字符串我想删除所有字符，只留下数字 Admit_DX_Description Primary_DX_Description 510.9 - EMPYEMA W/O FISTULA 510.9 - EMPYEMA W/O FISTULA 681.10 - CELLULITIS, TOE NOS 681.10 - CELLULITIS, TOE NOS 780.2 - SYNCOPE AND COLLAPSE 427.89

我在数据框中有两列，其中包含数值和字符串
我想删除所有字符，只留下数字

Admit_DX_Description            Primary_DX_Description
510.9 - EMPYEMA W/O FISTULA     510.9 - EMPYEMA W/O FISTULA
681.10 - CELLULITIS, TOE NOS    681.10 - CELLULITIS, TOE NOS
780.2 - SYNCOPE AND COLLAPSE    427.89 - CARDIAC DYSRHYTHMIAS NEC
729.5 - PAIN IN LIMB            998.30 - DISRUPTION OF WOUND, UNSPEC

到

代码：

  for col in strip_col:
       # # Encoding only categorical variables
       if df[col].dtypes =='object':
           df[col] = df[col].map(lambda x: x.rstrip(r'[a-zA-Z]'))

print df.head()

错误：
回溯（最近一次呼叫最后一次）：

文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site packages/pandas/core/series.py”，第2175行，在地图中新值=映射值（值，arg） pandas.lib.map_infere（pandas/lib.c:63307）中的文件“pandas/src/inference.pyx”，第1217行

AttributeError:“int”对象没有属性“rstrip”

您应该查看并将其应用于要从中删除文本的列。 [已编辑] 或者：

import pandas as pd 
test = [{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}] 
fun = lambda x: x+10 
df = pd.DataFrame(test) 
df['c1'] = df['c1'].apply(fun) 
print df

您可以使用以下示例：

我选择了

re

module只提取浮点数

import re
import pandas

df = pandas.DataFrame({'A': ['Hello 199.9', '19.99 Hello'], 'B': ['700.52 Test', 'Test 7.7']})

df
             A            B
0  Hello 199.9  700.52 Test
1  19.99 Hello     Test 7.7

for col in df:
    df[col] = [''.join(re.findall("\d+\.\d+", item)) for item in df[col]]

       A       B
0  199.9  700.52
1  19.99     7.7

如果您也有整数，请将

re pattern

更改为：

\d*\？\d+

已编辑

对于

TypeError

我建议使用

试试。在本例中，我创建了一个列表errs
。此列表将用于除TypeError之外的中。您可以打印（错误）
以查看这些值
同时检查dfdf

...
...
errs = []
for col in df:
    try:
        df[col] = [''.join(re.findall("\d+\.\d+", item)) for item in df[col]]
    except TypeError:
        errs.extend([item for item in df[col]])

我试过了，但是我得到了这个错误AttributeError:'Series'对象没有属性'applymap'嘿，这是一个很好的答案，但是我得到了这个错误TypeError：应该是字符串或缓冲区，但我发现有些字符串的值类似于“250.82-II型”我运行这个新的数据框：df=pandas.dataframe（{'A'：['250.82-DIABETES，.typeii'，'19.99 Hello']，'B'：['700.52 Test'，'Test 7.7']}）
我没有得到任何类型错误。可能是另一个字符串不同于250.82-糖尿病，II型。我不知道，但它可能是像V22.0-监管正常的第一怀孕
import pandas as pd 
test = [{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}] 
fun = lambda x: x+10 
df = pd.DataFrame(test) 
df['c1'] = df['c1'].apply(fun) 
print df

import re
import pandas

df = pandas.DataFrame({'A': ['Hello 199.9', '19.99 Hello'], 'B': ['700.52 Test', 'Test 7.7']})

df
             A            B
0  Hello 199.9  700.52 Test
1  19.99 Hello     Test 7.7

for col in df:
    df[col] = [''.join(re.findall("\d+\.\d+", item)) for item in df[col]]

       A       B
0  199.9  700.52
1  19.99     7.7

...
...
errs = []
for col in df:
    try:
        df[col] = [''.join(re.findall("\d+\.\d+", item)) for item in df[col]]
    except TypeError:
        errs.extend([item for item in df[col]])