Python 执行简单数据规范化时出错，类型错误：不支持的操作数类型为-：'；str'；和'；str'；_Python_Typeerror_Normalization

Python 执行简单数据规范化时出错，类型错误：不支持的操作数类型为-：'；str'；和'；str'；

python

Python 执行简单数据规范化时出错，类型错误：不支持的操作数类型为-：'；str'；和'；str'；,python,typeerror,normalization,Python,Typeerror,Normalization,我正在尝试使用以下函数规范化熊猫数据帧： def normalize(df): result = df.copy() for feature_name in df.columns: max_value = df[feature_name].max() min_value = df[feature_name].min() result[feature_name] = (df[feature_name] - min_value) / (max_value - min_valu

我正在尝试使用以下函数规范化熊猫数据帧：

def normalize(df):
result = df.copy()
for feature_name in df.columns:
    max_value = df[feature_name].max()
    min_value = df[feature_name].min()
    result[feature_name] = (df[feature_name] - min_value) / (max_value - min_value)
return result


df_normalized = normalize(df)

其中：

filename = 'data.csv'

data=pd.read\u csv（文件名）

df=pd.DataFrame（数据）

但我一直遇到这个困扰了我好几个小时的错误：

TypeError: unsupported operand type(s) for -: 'str' and 'str'

有人知道为什么吗

这是我的数据：

错误是说您正在尝试减去字符串，这一操作毫无意义

实际上，您正在尝试执行类似于

“foo”-“bar”

的操作

尝试在所有减法操作数上使用

float（）
对于您的代码：
def normalize(df):
    result = df.copy()
    for feature_name in df.columns:
        max_value = float(df[feature_name].max())
        min_value = float(df[feature_name].min())
        result[feature_name] = (float(df[feature_name]) - min_value) / (max_value - min_value)
    return result

这个错误是说您正在尝试减去字符串，这是一个毫无意义的操作
实际上，您正在尝试执行类似于“foo”-“bar”
的操作
尝试在所有减法操作数上使用float（）
对于您的代码：
def normalize(df):
    result = df.copy()
    for feature_name in df.columns:
        max_value = float(df[feature_name].max())
        min_value = float(df[feature_name].min())
        result[feature_name] = (float(df[feature_name]) - min_value) / (max_value - min_value)
    return result

从文件中读取并不总是保证pandas会猜到对象的类型，您必须像
def normalize(df):
result = df.copy()
for feature_name in df.columns:
    df[feature_name]=df[feature_name].apply(pd.to_numeric,errors='ignore')
    max_value = df[feature_name].max()
    min_value = df[feature_name].min()
    result[feature_name] = (df[feature_name] - min_value) / (max_value - min_value)
return result


df_normalized = normalize(df)

 df.apply(pd.to_numeric)

从文件中读取并不总是保证pandas会猜到对象的类型，您必须像
def normalize(df):
result = df.copy()
for feature_name in df.columns:
    df[feature_name]=df[feature_name].apply(pd.to_numeric,errors='ignore')
    max_value = df[feature_name].max()
    min_value = df[feature_name].min()
    result[feature_name] = (df[feature_name] - min_value) / (max_value - min_value)
return result


df_normalized = normalize(df)

 df.apply(pd.to_numeric)

您可以首先检查输出的df
的d类型：
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data'

所有列都是数字，只有第二列是对象
——显然字符串
，因此一种可能的解决方案是将所有字符串列转换为索引：
df = df.set_index(1)
print (df.head())
         0      2      3       4       5        6        7       8        9   \
1                                                                              
M    842302  17.99  10.38  122.80  1001.0  0.11840  0.27760  0.3001  0.14710   
M    842517  20.57  17.77  132.90  1326.0  0.08474  0.07864  0.0869  0.07017   
M  84300903  19.69  21.25  130.00  1203.0  0.10960  0.15990  0.1974  0.12790   
M  84348301  11.42  20.38   77.58   386.1  0.14250  0.28390  0.2414  0.10520   
M  84358402  20.29  14.34  135.10  1297.0  0.10030  0.13280  0.1980  0.10430   

       10   ...        22     23      24      25      26      27      28  \
1           ...                                                            
M  0.2419   ...     25.38  17.33  184.60  2019.0  0.1622  0.6656  0.7119   
M  0.1812   ...     24.99  23.41  158.80  1956.0  0.1238  0.1866  0.2416   
M  0.2069   ...     23.57  25.53  152.50  1709.0  0.1444  0.4245  0.4504   
M  0.2597   ...     14.91  26.50   98.87   567.7  0.2098  0.8663  0.6869   
M  0.1809   ...     22.54  16.67  152.20  1575.0  0.1374  0.2050  0.4000   

       29      30       31  
1                           
M  0.2654  0.4601  0.11890  
M  0.1860  0.2750  0.08902  
M  0.2430  0.3613  0.08758  
M  0.2575  0.6638  0.17300  
M  0.1625  0.2364  0.07678  

[5 rows x 31 columns]

然后一切都很好，最后添加：
您可以首先检查输出的df
的d类型：
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data'

所有列都是数字，只有第二列是对象
——显然字符串
，因此一种可能的解决方案是将所有字符串列转换为索引：
df = df.set_index(1)
print (df.head())
         0      2      3       4       5        6        7       8        9   \
1                                                                              
M    842302  17.99  10.38  122.80  1001.0  0.11840  0.27760  0.3001  0.14710   
M    842517  20.57  17.77  132.90  1326.0  0.08474  0.07864  0.0869  0.07017   
M  84300903  19.69  21.25  130.00  1203.0  0.10960  0.15990  0.1974  0.12790   
M  84348301  11.42  20.38   77.58   386.1  0.14250  0.28390  0.2414  0.10520   
M  84358402  20.29  14.34  135.10  1297.0  0.10030  0.13280  0.1980  0.10430   

       10   ...        22     23      24      25      26      27      28  \
1           ...                                                            
M  0.2419   ...     25.38  17.33  184.60  2019.0  0.1622  0.6656  0.7119   
M  0.1812   ...     24.99  23.41  158.80  1956.0  0.1238  0.1866  0.2416   
M  0.2069   ...     23.57  25.53  152.50  1709.0  0.1444  0.4245  0.4504   
M  0.2597   ...     14.91  26.50   98.87   567.7  0.2098  0.8663  0.6869   
M  0.1809   ...     22.54  16.67  152.20  1575.0  0.1374  0.2050  0.4000   

       29      30       31  
1                           
M  0.2654  0.4601  0.11890  
M  0.1860  0.2750  0.08902  
M  0.2430  0.3613  0.08758  
M  0.2575  0.6638  0.17300  
M  0.1625  0.2364  0.07678  

[5 rows x 31 columns]

然后一切都很好，最后添加：