Python 将数据帧中的列转换为带nan值的浮点_Python_Pandas_Python 3.4

Python 将数据帧中的列转换为带nan值的浮点

python pandas

Python 将数据帧中的列转换为带nan值的浮点,python,pandas,python-3.4,Python,Pandas,Python 3.4,我正在使用pandas和Python3.4处理数据。我对特定的csv文件有问题。我不知道为什么，即使使用nan值，熊猫通常也会将列读取为float。在这里，它将它们读取为字符串。以下是我的csv文件的外观： Date RR TN TX 08/10/2015 0 10.5 19.5 09/10/2015 0 5.5 20 10/10/2015 0 5 24 11/10/2015 0.5 7 24.5 12/10/2015 3 12 23

我正在使用pandas和Python3.4处理数据。我对特定的csv文件有问题。我不知道为什么，即使使用

nan

值，熊猫通常也会将列读取为

float

。在这里，它将它们读取为

字符串

。以下是我的csv文件的外观：

Date        RR  TN  TX
08/10/2015  0   10.5    19.5
09/10/2015  0   5.5 20
10/10/2015  0   5   24
11/10/2015  0.5 7   24.5
12/10/2015  3   12  23
...
27/04/2017           
28/04/2017           
29/04/2017           
30/04/2017           
01/05/2017           
02/05/2017           
03/05/2017           
04/05/2017

问题是我无法将其转换为

float

，因为末尾有

nan

值。我需要它们作为

float

，因为我正在尝试执行

TN

TX

。这就是我迄今为止所尝试的：

读取文件时：

dfs[code] = pd.read_csv(path, sep = ';', index_col = 0, parse_dates = True, encoding = 'ISO-8859-1', dtype = float)

我还尝试：

dtype = {
    'TN': np.float,
    'TX': np.float
}
dfs[code] = pd.read_csv(path, sep = ';', index_col = 0, parse_dates = True, encoding = 'ISO-8859-1', dtype = dtype)

tn = dfs[code]['TN'].astype(float)
tx = dfs[code]['TX'].astype(float)
formatted_dfs[code] = tn + tx

否则，在执行添加时，我也尝试了：

dtype = {
    'TN': np.float,
    'TX': np.float
}
dfs[code] = pd.read_csv(path, sep = ';', index_col = 0, parse_dates = True, encoding = 'ISO-8859-1', dtype = dtype)

tn = dfs[code]['TN'].astype(float)
tx = dfs[code]['TX'].astype(float)
formatted_dfs[code] = tn + tx

但我总是犯同样的错误：

ValueError: could not convert string to float.

我知道我可以一行一行地进行测试，测试值是否为

nan

，但我确信有一种更简单的方法。你知道怎么做吗？还是我必须一排一排地做？谢谢。

您可以看到，如果允许pandas本身检测数据类型，就可以避免ValueError并发现潜在的问题

In [4]: df = pd.read_csv(path, sep=';', index_col=0, parse_dates=True, low_memory=False)
In [5]: df
Out[5]:
Empty DataFrame
Columns: []
Index: [08/10/2015  0   10.5    19.5, 09/10/2015  0   5.5 20, 10/10/2015  0   5   24, 11/10/2015  0.5 7   24.5, 12/10/2015  3   12  23, 27/04/2017           , 28/04/2017           , 29/04/2017           , 30/04/2017           , 01/05/2017           , 02/05/2017           , 03/05/2017           , 04/05/2017   ]

您似乎将分隔符指定为

“；”

是偶然的，因为您的文件是以空格分隔的。因为没有分号，所以整行都被读入索引

首先，尝试使用合适的分隔符读取文件

df = pd.read_csv(path, delim_whitespace=True, index_col=0, parse_dates=True, low_memory=False)

现在，有些行的数据不完整。从概念上讲，一个简单的解决方案是尝试将值转换为

np.float

，否则将其替换为

np.nan

def f(x):
    try:
        return np.float(x)
    except:
        return np.nan

df["TN"] = df["TN"].apply(f)
df["TX"] = df["TX"].apply(f)

print(df.dtypes)

这会根据需要返回

RR     object
TN    float64
TX    float64
dtype: object

在read方法-converters={'TN'：float，'TX'：float}中添加convert参数

dfs[code] = pd.read_csv(path, sep = ';',converters={'TN':float,'TX':float}, index_col = 0, parse_dates = True, encoding = 'ISO-8859-1', dtype = float)

为什么要使用

sep='；'如果您的文件用空格分隔？@Taylor它用“；”分隔，我只是在示例中使用WhiteSpaces编写了它，以使其更具可读性。谢谢！这很有效。我完全忘记了apply（）
方法。