Python numpy genfromtxt中的dtype参数
我正在尝试从以下文件内容创建MX2 numpy矩阵或数组:Python numpy genfromtxt中的dtype参数,python,arrays,numpy,genfromtxt,Python,Arrays,Numpy,Genfromtxt,我正在尝试从以下文件内容创建MX2 numpy矩阵或数组: shell: head WORLD#America.csv "2013-04-17 12","3","WORLD","#America" "2013-04-17 13","9","WORLD","#America" "2013-04-17 14","4","WORLD","#America" "2013-04-17 15","3","WORLD","#America" "2013-04-17 16","7","WORLD","#Amer
shell: head WORLD#America.csv
"2013-04-17 12","3","WORLD","#America"
"2013-04-17 13","9","WORLD","#America"
"2013-04-17 14","4","WORLD","#America"
"2013-04-17 15","3","WORLD","#America"
"2013-04-17 16","7","WORLD","#America"
"2013-04-17 17","8","WORLD","#America"
"2013-04-17 18","6","WORLD","#America"
"2013-04-17 19","6","WORLD","#America"
"2013-04-17 20","6","WORLD","#America"
"2013-04-17 21","2","WORLD","#America"
我遇到了genfromtxt()
函数,但未能成功提取数据。使用名为f
的文件,我尝试了以下操作:ts=genfromtxt(f,delimiter=“,”)
并得到一个数组,数组中填充了所有的nan
。这只是第一次尝试,因此我阅读了有关指定数组数据类型的dtype
参数的文档。似乎要获得具有(datetime,int)
形式的条目的MX2矩阵,我需要以下内容:dtype=[('f1',datetime64),('f2',uint)]
。当我这样做时,我将以下内容分配给变量ts
:
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L)],
dtype=[('f1', ('<M8[us]', {})), ('f2', '<u8')])
(datetime.datetime(1969,12,31,23,59,599999),18446744073709551615L),
(datetime.datetime(1969,12,31,23,59,59,99999),18446744073709551615L),
(datetime.datetime(1969,12,31,23,59,59,99999),18446744073709551615L),
(datetime.datetime(1969,12,31,23,59,59,99999),18446744073709551615L),
(datetime.datetime(1969,12,31,23,59,59,99999),18446744073709551615L),
(datetime.datetime(1969,12,31,23,59,59,99999),18446744073709551615L),
(datetime.datetime(1969,12,31,23,59,59,99999),18446744073709551615L),
(datetime.datetime(1969,12,31,23,59,59,99999),18446744073709551615L),
(datetime.datetime(1969,12,31,23,59,59,99999),18446744073709551615L),
(datetime.datetime(1969,12,31,23,59,59,99999),18446744073709551615L),
(datetime.datetime(1969,12,31,23,59,59,99999),18446744073709551615L),
(datetime.datetime(1969,12,31,23,59,59,99999),18446744073709551615L),
(datetime.datetime(1969,12,31,23,59,59,99999),18446744073709551615L),
(datetime.datetime(1969,12,31,23,59,59,99999),18446744073709551615L),
(datetime.datetime(1969,12,31,23,59,59,99999),18446744073709551615L),
(datetime.datetime(1969,12,31,23,59,59,99999),18446744073709551615L),
(datetime.datetime(1969,12,31,23,59,59,99999),18446744073709551615L),
(datetime.datetime(1969,12,31,23,59,59,99999),18446744073709551615L),
(datetime.datetime(1969,12,31,23,59,599999),18446744073709551615L)],
dtype=[('f1',(')正如注释中指出的,使用genfromtxt
读取此文件的一个困难是存在引号字符。也许最好只是(以编程方式)删除引号,但也可能绕过此问题:指定引号字符作为分隔符:
np.genfromtxt(filename, delimiter='"', dtype=str, comments=None)[0]
# array(['', '2013-04-17 12', ',', '3', ',', 'WORLD', ',', '#America', ''],
# dtype='|S13')
现在该文件被解释为有9列,其中第二列和第四列包含感兴趣的数据
另一个问题是指定日期时间列的数据类型。在较新的(?)版本的Numpy中,必须指定时间/日期单位,或者genfromtxt
抛出错误。在这种情况下,显然需要使用M8[h]
作为数据类型,以指定小时单位
总而言之,我能够用以下内容加载该文件:
ts = np.genfromtxt(filename,
delimiter='"',
dtype='M8[h], uint',
usecols=[1,3])
或者,您也可以查看或尝试。查看以下答案:我怀疑您的报价可能会造成问题,您需要手动编写转换器。第一列中的第一项是“2013-04-17 12”
。12
在该字段中的含义是什么?是一天中的小时,还是一个单独的数据字段?一天中的小时,正确!