Pandas 从三列中删除字符串/字符并转换为浮点的有效方法_Pandas_For Loop

Pandas 从三列中删除字符串/字符并转换为浮点的有效方法

pandas for-loop

Pandas 从三列中删除字符串/字符并转换为浮点的有效方法,pandas,for-loop,Pandas,For Loop,我有一个熊猫数据框，有三列，其中混合了字母数字值。我想：有效地删除列Price、Miles和Weight中字母数字值中的字符/字符串将结果值转换为浮点值请参见下面的示例 import pandas as pd cars_info = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'], 'Price': ['22000$','25000$','27000$','35000$'],

我有一个熊猫数据框，有三列，其中混合了字母数字值。我想：

有效地删除列

Price

、

Miles

和

Weight

中字母数字值中的字符/字符串

将结果值转换为浮点值

请参见下面的示例

import pandas as pd

cars_info = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
            'Price': ['22000$','25000$','27000$','35000$'],
            'Miles': ['1200 miles', '10045 miles', '22103 miles', '1110 miles'],
            'Weight': ['2500 lbs','2335 lbs','2110 lbs','2655 lbs']}

df = pd.DataFrame(cars_info, columns = ['Brand', 'Price','Miles','Weight'])

df.dtypes # returns `object` data type for columns Price, Miles and Weight

期望结果

Brand, Price($), Miles(in miles), Weight(lbs) 
Honda Civic,22000,1200, 2500
Toyota Corolla, 25000, 10045, 2335
Ford Focus, 27000, 22103, 2110
Audi A4, 35000, 1110, 2655

我的尝试

for col in df:
    df[col] = df[col].str.replace(r'\D', '').astype(float)

有很多方法可以解决这个问题。你可以

.str.replace

你不关心的标签，或者

.str.split

如果你知道数字总是在空格前的第一件事，例如

在这种情况下，您可以

提取任何看起来像数字的内容（[\d\.]+）
，然后使用pd.to\u numeric
将其转换为数字类型
for col in ['Price', 'Miles', 'Weight']:
    df[col] = pd.to_numeric(df[col].str.extract('([\d\.]+)', expand=False)) 


第1步：关注相关列的整数转换：
brands = df['Brand']
df = df.drop(columns=['Brand'])

步骤2:仅维护ints：
df = df.apply(lambda x: x.str.replace(r'\D', '')).astype(int)


    Price   Miles   Weight
0   22000   1200    2500
1   25000   10045   2335
2   27000   22103   2110
3   35000   1110    2655

步骤3：合并：（如concat中）
@你试过这个吗？您的尝试没有显示您的方法返回每个“价格”、“里程”和“重量”列中的重复值（第一个数字）。它返回每个单元格中的第一个类似数字的值，假设您在返回该值的每个单元格中只期望1个“数字”。例如，我希望在“价格”列中的输出为22000，2500027000和35000，但我从您的代码中得到的是22000、22000、22000和22000。我的意思是，我显然没有得到那个结果，因此，要么您复制了错误的内容，要么您的示例数据与实际问题不符。您是正确的，可以使用此示例数据，但不能使用我的原始数据。。我想这只是那些奇怪的python东西之一。。谢谢你的尝试谢谢你的尝试，有没有更有效的方法？假设我有5000列，我正在考虑一种更健壮的方法来遍历这些列，然后用字母数字值删除每列中的非数字部分。谢谢你能定义那些可以被忽略的人吗？（我指的是5K列，这太多了）或者。您查找应该删除的单词，如$、英里等，然后删除这些细节。这行得通吗？我喜欢你的方法，但是，我不认为仅仅因为我们想删除3列中的字符串就省略3000列是可行的。简而言之，除了第一步，你还有别的选择吗？我在想一些关于for循环和if-else语句的事情。
df = df.apply(lambda x: x.str.replace(r'\D', '')).astype(int)


    Price   Miles   Weight
0   22000   1200    2500
1   25000   10045   2335
2   27000   22103   2110
3   35000   1110    2655

pd.concat([brands,df],  axis=1)


    Brand           Price   Miles   Weight
0   Honda Civic     22000   1200    2500
1   Toyota Corolla  25000   10045   2335
2   Ford Focus      27000   22103   2110
3   Audi A4        35000    1110    2655


df.dtypes
Price     int32
Miles     int32
Weight    int32
dtype: object