如何阅读Python中大写字母E的科学符号csv？_Python_Pandas_Csv

如何阅读Python中大写字母E的科学符号csv？

python pandas csv

如何阅读Python中大写字母E的科学符号csv？,python,pandas,csv,Python,Pandas,Csv,我有一个用空格分隔的csv文件，看起来像： 5.64E-4 0.1259 3.556E-4 300 2.98E-4 4.7E-3 5.322E-4 270 我喜欢这个 df1 = pandas.read_csv(filepath[0], header=None, delim_whitespace=True, lineterminator='\r') 但我意识到pandas将数据帧保存为字符串，因为它不知道E是什么意思。我是否可以导入csv文件并将其转换为数字写入，以

我有一个用空格分隔的csv文件，看起来像：

5.64E-4   0.1259   3.556E-4   300
2.98E-4   4.7E-3   5.322E-4   270

我喜欢这个

df1 = pandas.read_csv(filepath[0], header=None, delim_whitespace=True, lineterminator='\r')

但我意识到pandas将数据帧保存为字符串，因为它不知道E是什么意思。

我是否可以导入csv文件并将其转换为数字写入，以便打印它？

使用以下命令强制将这些值推断为读取时的浮点值：

import pandas
import numpy as np

pandas.read_csv(filepath[0], header=None,
                delim_whitespace=True, lineterminator='\r',
                dtype=np.float64)

这适用于大写字母“E”

示例

pd.DataFrame({'a':['5.64E-4', '0.1259', '3.556E-4'],
              'b':['a', 'b', 'c']}, dtype=np.float64)

输出

          a  b
0  0.000564  a
1  0.125900  b
2  0.000356  c

在我看来，问题应该是一些非数值

可能的解决方案是使用

errors='concurve'

将非数字解析为

NaN

s，因为它只适用于一列（

Series

）：

由于对我来说，其他方法不起作用，如果不将所有内容解析为NaN，我将发布另一种阅读这种科学符号变体的方法

# all lines will be interpreted as strings for the asked notation
data = pd.read_csv(file_path)
# replace the notation across the whole dataframe
data = data.replace('E', 'e', regex=True).replace(',', '.', regex=True)
# convert notation to the one pandas allows
data = data.apply(pd.to_numeric, args=('coerce',))

这可能不是一个很好的蟒蛇式方法，但对我来说确实有用

这有帮助吗？不，我已经看过了。熊猫识别带有小写字母e的科学符号。对于大写字母E，我无法将其转换为浮点值。对于我来说，您的解决方案在pandas

0.23.0

中与您的示例数据配合得很好。可能我的问题在其他地方，但我得到了错误：ValueError:无法将字符串转换为浮点值：可能我理解错误，并且没有得到字符串数据帧。我得到以下错误：pandas.\u libs.parsers.TextReader.\u convert\u tokens TypeError:无法根据规则“safe”将数组从dtype（'O'）强制转换为dtype（'float64'）。您使用的pandas版本是什么？我建议

pip安装--升级pandas

以升级到最新版本。

0.23.0

我有pandas 0.23.0。我现在正在扫描我的文件以查找其他可能的错误。通过扫描这些30000行，我看到了3种情况：一些条目是正常的浮点值，比如3.1542，一些是整数0，一些是科学写作，比如2.34E-4（一些小到E-17），会有问题吗？

# all lines will be interpreted as strings for the asked notation
data = pd.read_csv(file_path)
# replace the notation across the whole dataframe
data = data.replace('E', 'e', regex=True).replace(',', '.', regex=True)
# convert notation to the one pandas allows
data = data.apply(pd.to_numeric, args=('coerce',))