Python 大熊猫回南
我有一个熊猫数据框,如示例所示:Python 大熊猫回南,python,pandas,Python,Pandas,我有一个熊猫数据框,如示例所示: mydf.head() Date Merchant/Description Debit/Credit 0 10/05/2018 FAKE TRANSACTION 1 -£7.50 1 09/05/2018 FAKE TRANSACTION 2 -£5.79 2 09/05/2018 FAKE TRANSACTION 3 -£28.50 3 08/05/2018 FAKE TRANSACTION 4 -£3.
mydf.head()
Date Merchant/Description Debit/Credit
0 10/05/2018 FAKE TRANSACTION 1 -£7.50
1 09/05/2018 FAKE TRANSACTION 2 -£5.79
2 09/05/2018 FAKE TRANSACTION 3 -£28.50
3 08/05/2018 FAKE TRANSACTION 4 -£3.99
4 08/05/2018 FAKE TRANSACTION 5 -£17.99
列['Debit/Credit']的数据类型为'object';它是弦和楠的混合体
我想把字符串转换成数字。我使用pandas.to_numeric尝试实现以下目标:
cols = ['Debit/Credit']
hsbcraw[cols] = hsbcraw[cols].apply(pd.to_numeric, errors='coerce')
这将把列['Debit/Credit']中的所有项目转换为NaN:
mydf.head()
Date Merchant/Description Debit/Credit
0 10/05/2018 FAKE TRANSACTION 1 NaN
1 09/05/2018 FAKE TRANSACTION 2 NaN
2 09/05/2018 FAKE TRANSACTION 3 NaN
3 08/05/2018 FAKE TRANSACTION 4 NaN
4 08/05/2018 FAKE TRANSACTION 5 NaN
我的代码或方法中有什么错误?在转换为数值之前,需要使用空字符串:
hsbcraw[cols]=hsbcraw[cols].replace('£','', regex=True).apply(pd.to_numeric, errors='coerce')
在转换为数值之前,需要使用空字符串:
hsbcraw[cols]=hsbcraw[cols].replace('£','', regex=True).apply(pd.to_numeric, errors='coerce')
您还可以使用regex
Ex:
import pandas as pd
df = pd.DataFrame({"Debit/Credit": ["-£7.50", "-£5.79", "-£28.50", "-£3.99", "-£17.99"]})
df["Debit/Credit"] = df["Debit/Credit"].str.extract("(\d*\.\d+)", expand=True).apply(pd.to_numeric)
print(df)
Debit/Credit
0 7.50
1 5.79
2 28.50
3 3.99
4 17.99
输出:
import pandas as pd
df = pd.DataFrame({"Debit/Credit": ["-£7.50", "-£5.79", "-£28.50", "-£3.99", "-£17.99"]})
df["Debit/Credit"] = df["Debit/Credit"].str.extract("(\d*\.\d+)", expand=True).apply(pd.to_numeric)
print(df)
Debit/Credit
0 7.50
1 5.79
2 28.50
3 3.99
4 17.99
您还可以使用regex
Ex:
import pandas as pd
df = pd.DataFrame({"Debit/Credit": ["-£7.50", "-£5.79", "-£28.50", "-£3.99", "-£17.99"]})
df["Debit/Credit"] = df["Debit/Credit"].str.extract("(\d*\.\d+)", expand=True).apply(pd.to_numeric)
print(df)
Debit/Credit
0 7.50
1 5.79
2 28.50
3 3.99
4 17.99
输出:
import pandas as pd
df = pd.DataFrame({"Debit/Credit": ["-£7.50", "-£5.79", "-£28.50", "-£3.99", "-£17.99"]})
df["Debit/Credit"] = df["Debit/Credit"].str.extract("(\d*\.\d+)", expand=True).apply(pd.to_numeric)
print(df)
Debit/Credit
0 7.50
1 5.79
2 28.50
3 3.99
4 17.99
我通常通过如下方式将其转换为浮点数:
df['Debit/Credit'] = df['Debit/Credit'].replace('£', '', regex = True).astype('float')
我通常通过如下方式将其转换为浮点数:
df['Debit/Credit'] = df['Debit/Credit'].replace('£', '', regex = True).astype('float')
你期待什么?
应该是什么类型的数字?你期望什么?
应该是什么类型的号码?