Python CSV阅读器和DictReader将数字字段转换为字符串

Python CSV阅读器和DictReader将数字字段转换为字符串,python,csv,Python,Csv,csv的第一行有标题。 以下是我的csv的示例行: 2013-07-31 00:00:00,,1.0,2013.0,7.0,Q3,21160742,32HHBS1307170203,KL0602130731,AIRFRANCE KLM,KLM,KLM,KLM,KL,KLM ROYAL DUTCH AIRLINES,,0602,,KL0602,KL,KLM ROYAL DUTCH AIRLINES,,,,KL,0602,,,LAX,AMS,,31-7-2013 0:00:00,2013-07-

csv的第一行有标题。 以下是我的csv的示例行:

2013-07-31 00:00:00,,1.0,2013.0,7.0,Q3,21160742,32HHBS1307170203,KL0602130731,AIRFRANCE 
KLM,KLM,KLM,KLM,KL,KLM ROYAL DUTCH AIRLINES,,0602,,KL0602,KL,KLM ROYAL DUTCH
 AIRLINES,,,,KL,0602,,,LAX,AMS,,31-7-2013 0:00:00,2013-07-31,2013-07-31,2013-07-31,2013-07-31,
13:55:00,14:39:00,20:55:00,21:39:00,2013-08-01,2013-08-01,2013-08-01,2013-08-01,
09:05:00,09:45:00,07:05:00,07:45:00,2.0,,2,,,LAX,LOS ANGELES INTERNATIONAL AIRPORT,
LAX,LAX,5.0,LAX,LOS ANGELES,US,UNITED STATES OF AMERICA,US,USA,NA8,NORTHERN AMERICA,
AMERICAS,,,,AMS,SCHIPHOL I,F,OFFLINE,I,INDIRECT OFFLINE,14.0,3.0,FRONT,Business,2.0,nan,
PLANNED,3.0,,2.0,2.0,34.0,4.0,400254887nan,1.0,2.0,2.0,2.0,1.0,2.0,6.0,3.0,1.0,3.0,1.0,1.0,
nan,nan,nan,nan,nan,nan,nan,3.0,3.0,3.0,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,
nan,2.0,2.0,2.0,2.0,2.0,7.0,nan,2.0,3.0,3.0,3.0,3.0,nan,nan,nan,nan,nan,nan,nan,nan,nan,
nan,nan,nan,nan,6.0,1.0,nan,nan,nan,nan,nan,2.0,nan,nan,nan,nan,nan,nan,nan,nan,nan,2.0,2.0,
nan,2.0,nan,3.0,nan,,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,13.7885862654653,
0.2, 34273499844164,nan,37.0,Booked,35.0,10.0,2.0,2.0,6.0,35.0,10.0,42.0,nan,nan,LAX,LAX,N
如果我使用
input\u file=csv.DictReader(打开(“file.csv”)
input\u file=csv.reader(打开(“file.csv”))
,我的所有对象都将变成字符串

用python打印的一行:

'2013-08-31 00:00:00', '', '1.0', '2013.0', '8.0', 'Q3','C', '03J', '', '',
 '', '', 'nan', 'nan', '', 'NON-AIRPORT', 'SELF-SERVICE', 'ICI', '', '19.0', '20130819', 
'1.0', '19.0', '9.0', '20130901', '2.0', '1.0', '1.0', '1.0', '10.0', '5.0', '5.0', '3.0',
 '4.0', '4.0', '2.0', '2.0', '', 'nan', '2.0', '', '24854524', 'nan', 'nan', 'nan', 'nan', 
'1.0', 'nan', '5.0', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 
'nan', '4.0', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 
'nan', 'nan', 'nan', '2.0', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 
'nan', '3.0', '5.0', '5.0'

正如您所看到的,所有日期、字符串、浮点和整数都已转换为字符串。我如何才能正确导入它们?假设我们有400列数据,并且我无法手动定义每列的类型。

它们没有转换为字符串,它们本来就是字符串。但您可以尝试将它们转换为浮点呃读他们,

假设
包含一行数据,则可以执行以下操作

newrow = []
for item in row:
    try:
        newrow.append(float(item))
    except ValueError:
        newrow.append(item)

你是在向后看。不是它们被转换成字符串,而是它们是字符串,从某种意义上说,CSV不是一种保存类型信息的格式。你没有做任何事情来将它们转换成其他任何东西,Python也不会猜。Nan是一个浮点数,还是一个人的祖父的亲切的名字r?是
3.0
一个浮点数,还是前卫的nerdcore蓝调乐队的名字

如果您可以想出一种算法来猜测类型,那么您当然可以应用该算法:

import csv
import ast
import datetime

def guess_type(x):
    attempt_fns = [ast.literal_eval,
                   float,
                   lambda x: datetime.datetime.strptime(x, 
                                                    "%Y-%m-%d %H:%M:%S")
                   ]
    for fn in attempt_fns:
        try:
            return fn(x)
        except (ValueError, SyntaxError):
            pass
    return x

with open("untyped.csv", "rb") as fp:
    reader = csv.reader(fp)
    for row in reader:
        row = [guess_type(x) for x in row]
        print row
        print map(type, row)
带着文件

2013-07-31 00:00:00,,1.0,2013.0,7.0,Q3,21160742,32HHBS1307170203,nan
上述代码将生成

[datetime.datetime(2013, 7, 31, 0, 0), '', 1.0, 2013.0, 7.0, 'Q3', 21160742, '32HHBS1307170203', nan]
[<type 'datetime.datetime'>, <type 'str'>, <type 'float'>, <type 'float'>, <type 'float'>, <type 'str'>, <type 'int'>, <type 'str'>, <type 'float'>]
[datetime.datetime(2013,7,31,0,0),'',1.0,2013.0,7.0',Q3',21160742',32HHBS1307170203',nan]
[, , , , ]
这还不错

PS:如果你打算在Python中认真处理CSV文件,我强烈建议你去看看——否则你会浪费时间重新实现它的部分功能