从python中的numpy genfromtxt获取列名
在python中使用numpy genfromtxt,我希望能够获取列标题作为给定数据的键。我尝试了以下方法,但无法获得相应数据的列名从python中的numpy genfromtxt获取列名,python,numpy,genfromtxt,Python,Numpy,Genfromtxt,在python中使用numpy genfromtxt,我希望能够获取列标题作为给定数据的键。我尝试了以下方法,但无法获得相应数据的列名 column = np.genfromtxt(pathToFile,dtype=str,delimiter=',',usecols=(0)) columnData = np.genfromtxt(pathToFile,dtype=str,delimiter=',') data = dict(zip(column,columnData.tolist())) 下面
column = np.genfromtxt(pathToFile,dtype=str,delimiter=',',usecols=(0))
columnData = np.genfromtxt(pathToFile,dtype=str,delimiter=',')
data = dict(zip(column,columnData.tolist()))
下面是数据文件
header0,header1,header2
mydate,3.4,2.0
nextdate,4,6
afterthat,7,8
目前,它将数据显示为
{
"mydate": [
"mydate",
"3.4",
"2.0"
],
"nextdate": [
"nextdate",
"4",
"6"
],
"afterthat": [
"afterthat",
"7",
"8"
]
}
我想用这种格式
{
"mydate": {
"header1":"3.4",
"header2":"2.0"
},
"nextdate": {
"header1":"4",
"header2":"6"
},
"afterthat": {
"header1":"7",
"header2": "8"
}
}
有什么建议吗?使用熊猫模块:
In [94]: fn = r'D:\temp\.data\z.csv'
将CSV读入数据框:
In [95]: df = pd.read_csv(fn)
In [96]: df
Out[96]:
header0 header1 header2
0 mydate 3.4 2.0
1 nextdate 4.0 6.0
2 afterthat 7.0 8.0
获取所需的dict:
In [97]: df.set_index('header0').to_dict('index')
Out[97]:
{'afterthat': {'header1': 7.0, 'header2': 8.0},
'mydate': {'header1': 3.3999999999999999, 'header2': 2.0},
'nextdate': {'header1': 4.0, 'header2': 6.0}}
或作为JSON字符串:
In [107]: df.set_index('header0').to_json(orient='index')
Out[107]: '{"mydate":{"header1":3.4,"header2":2.0},"nextdate":{"header1":4.0,"header2":6.0},"afterthat":{"header1":7.0,"header2":8.0}}'
通过示例文件和
genfromtxt
调用,我得到了两个数组:
In [89]: column
Out[89]:
array(['header0', 'mydate', 'nextdate', 'afterthat'],
dtype='<U9')
In [90]: columnData
Out[90]:
array([['header0', 'header1', 'header2'],
['mydate', '3.4', '2.0'],
['nextdate', '4', '6'],
['afterthat', '7', '8']],
dtype='<U9')
现在构建一个字典字典(我不需要单独的列数组):
稍微细化一下:
In [95]: {row[0]: {h:v for h,v in zip(headers[1:], row[1:])} for row in columnData[1:]}
Out[95]:
{'afterthat': {'header1': '7', 'header2': '8'},
'mydate': {'header1': '3.4', 'header2': '2.0'},
'nextdate': {'header1': '4', 'header2': '6'}}
我喜欢字典的理解
您的列表词典版本:
In [100]: {row[0]:row[1:] for row in columnData[1:].tolist()}
Out[100]: {'afterthat': ['7', '8'], 'mydate': ['3.4', '2.0'], 'nextdate': ['4', '6']}
你考虑过吗?为什么是3.39999999999999???@RAVI,它是4.0
;)的python/pandas表示形式为什么只有3.4->3.399999999999而没有任何其他值,如2.0->1.9999999999999??同样在to_json输出中,它正确地得到了3.4。拉维,我不知道python/pandas什么时候决定以不同的方式表示浮动。但您可以通过以下示例查看它print(0.1+0.2)
。顺便说一句:df.to_dict('index')
正确显示浮动…这对numpy非常有效。非常感谢你。我不能使用熊猫,因为在我的情况下,新装置是非常有选择性的,除非有适当的理由。
In [95]: {row[0]: {h:v for h,v in zip(headers[1:], row[1:])} for row in columnData[1:]}
Out[95]:
{'afterthat': {'header1': '7', 'header2': '8'},
'mydate': {'header1': '3.4', 'header2': '2.0'},
'nextdate': {'header1': '4', 'header2': '6'}}
In [100]: {row[0]:row[1:] for row in columnData[1:].tolist()}
Out[100]: {'afterthat': ['7', '8'], 'mydate': ['3.4', '2.0'], 'nextdate': ['4', '6']}