Python 使用pandas.read\u csv的na_值的正则表达式
我想使用Python 使用pandas.read\u csv的na_值的正则表达式,python,regex,pandas,nan,Python,Regex,Pandas,Nan,我想使用pandas.read\u csv 1891, 91920, 7, 628,249, 59,51.0, 0.026, 0.028, NaN, NaN, NaN, NaN, NaN, 0.156, 0.071, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 21,500, 21,43.8, 0.005, 0.619, NaN,45.6, 0.048, 0.053, N
pandas.read\u csv
1891, 91920, 7, 628,249, 59,51.0, 0.026, 0.028, NaN, NaN, NaN, NaN, NaN, 0.156, 0.071, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 21,500, 21,43.8, 0.005, 0.619, NaN,45.6, 0.048, 0.053, NaN, NaN, NaN, NaN, NaN, -0.180, 0.088, 20, 0.012, 1.107, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN
1891, 91920, 16, 628,135, 22,41.2, 0.093, 0.087, NaN, NaN, NaN, NaN, NaN, 0.416, 0.212, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 21,500, 20,23.3, 0.021, 2.023, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN
1891, 91920, 3, 628, 28, 39,47.0, 0.041, 0.044, NaN, NaN, NaN, NaN, NaN, -0.006, 0.064, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 21,500, 21,37.5, 0.009, 0.964, NaN,45.3, 0.054, 0.055, NaN, NaN, NaN, NaN, NaN, -0.838, 0.228, 20, 0.013, 1.193, NaN,51.8, 0.025, 0.026, NaN, NaN, NaN, NaN, NaN, -0.021, 0.054, 21, 0.005, 0.540, NaN, NaN, NaN, NaN
1891, 91920, 6, 628,276, 20,40.0, 0.118, 0.101, NaN, NaN, NaN, NaN, NaN, -0.767, 0.558, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 21,500, 20,26.7, 0.032, 2.982, NaN,41.0, 0.088, 0.089, NaN, NaN, NaN, NaN, NaN, -0.141, 0.233, 20, 0.024, 2.074, NaN,46.2, 0.053, 0.049, NaN, NaN, NaN, NaN, NaN, 0.080, 0.034, 21, 0.012, 1.187, NaN, NaN, NaN, NaN
我在读它时遇到了一个问题,因为NaN值。如果文件是csv文件(分开),我没有问题,但它有空格。当我阅读时,使用:
df = pd.read_csv(file,index_col=None, header=None)
显然,带有NaN的列被读取为字符串,因为空格。如果空间的维数相同,我的问题就容易了。我可以使用:
df = pd.read_csv(file,index_col=None, header=None, na_values = " NaN")
问题解决了,但有不同空格的列。其中一些在NaN之前有4个空格,另一些在NaN之前有6个空格,依此类推
因此,我的问题是:是否有一个正则表达式用于指定na_值类似na_值=“\s+NaN”
?尝试以下操作:
df = pd.read_csv(engine='python', index_col=None, sep=',\s*', header=None)
解析引擎设置为python
以避免在使用正则表达式作为分隔符时收到警告。为什么不使用正则表达式分隔符,如sep=“,\s+”
?或者可以使用delim\u whitespace=True
或skipinitialspace=True
parameter@BrenBamskipinitialspace=True工作正常,谢谢。但是sep=“,\s+”不起作用