Python 如何将csv文件读取到numpy ndarrays
我正在寻找一种将我的Python 如何将csv文件读取到numpy ndarrays,python,numpy,csv,multidimensional-array,Python,Numpy,Csv,Multidimensional Array,我正在寻找一种将我的csv文件读入ndarrays的方法 myfile.csv user,latitude,longitude 500,39.984608,116.317761 500,39.984563,116.317517 500,39.984539,116.317294 605,26.16167,119.943128 605,26.161566,119.942352 605,26.161558,119.942401 745,22.814336,108.332281 745,22.81429
csv文件
读入ndarrays的方法
myfile.csv
user,latitude,longitude
500,39.984608,116.317761
500,39.984563,116.317517
500,39.984539,116.317294
605,26.16167,119.943128
605,26.161566,119.942352
605,26.161558,119.942401
745,22.814336,108.332281
745,22.81429,108.3322566
745,22.81432,108.3322583
我的代码:
import numpy as np
my_data = np.genfromtxt('myfile.csv', delimiter=',', skip_header=True)
type(my_data)
numpy.ndarray
print(my_data)
[[500. 39.984608 116.317761 ]
[500. 39.984563 116.317517 ]
[500. 39.984539 116.317294 ]
[605. 26.16167 119.943128 ]
[605. 26.161566 119.942352 ]
[605. 26.161558 119.942401 ]
[745. 22.814336 108.332281 ]
[745. 22.81429 108.3322566]
[745. 22.81432 108.3322583]]
但是,我的预期输出是获取数组的数组,每个数组对应一个用户,因此输出为:
[
[[500. 39.984608 116.317761 ]
[500. 39.984563 116.317517 ]
[500. 39.984539 116.317294 ]]
[[605. 26.16167 119.943128 ]
[605. 26.161566 119.942352 ]
[605. 26.161558 119.942401 ]]
[[745. 22.814336 108.332281 ]
[745. 22.81429 108.3322566]
[745. 22.81432 108.3322583]]
]
如何重写代码以实现此目的?试试以下方法:
def getArraysofArray(my_data):
FinalList=[]
temp=[]
for i in range(len(my_data)):
if(i==0):
temp.append(my_data[i])
continue
if(my_data[i][0]!=my_data[i-1][0]):
FinalList.append(temp)
temp=[]
if(my_data[i]==my_data[-1]):
FinalList.append(temp)
temp.append(my_data[i])
return FinalList
# Main / Testing of Function
my_data=[[500,39.984608 ,116.317761]
,[500,39.984563,116.317517]
,[500,39.984539,116.317294]
,[605,26.16167,119.943128]
,[605,26.161566,119.942352]
,[605,26.161558,119.942401]
,[745,22.814336,108.332281]
,[745,22.81429,108.3322566]
,[745,22.81432,108.3322583]]
list=getArraysofArray(my_data)
print(list)
# Output
[[[500, 39.984608, 116.317761], [500, 39.984563, 116.317517], [500, 39.984539, 116.317294]],
[[605, 26.16167, 119.943128], [605, 26.161566, 119.942352], [605, 26.161558, 119.942401]],
[[745, 22.814336, 108.332281], [745, 22.81429, 108.3322566], [745, 22.81432, 108.3322583]]]
此解决方案将为您提供一个
numpy.ndarray
,该数组由my_data
的第一列进行分区。如果顺序很重要,您可以在理解中对分区值
进行排序,或者对分组值
进行排序
import numpy as np
my_data = np.genfromtxt('myfile.csv', delimiter=',', skip_header=True)
partition_values = {row[0] for row in my_data}
grouped_data = np.array([my_data[my_data[:,0] == pvalue, :]
for pvalue in partition_values])
@艾瑞文:是的,我当然知道。但是所有关于numpy到ndarray的答案,不是像我想要的示例输出那样进行嵌套实际上,每个数组对应一个用户记录。每个用户总是3个记录?如果是这样的话,请改成(-1,3,3)。@Michael Ruth很抱歉回到这个答案上。似乎我正在丢失嵌套数组的形状,这需要以后处理:
分组数据。形状(3,)
内部数组有两列用于纬度和经度@super\u ask,当我运行上面的代码时,我得到了分组数据。形状(3,3)
。在什么条件下运行此代码?我的代码如下:macOS 10.15.3、python 3.7.6、numpy 1.16.4。@Micheal Ruth这是在我向id 500添加了一个条目后发生的,因为我的内部数组可能具有不同的长度,这取决于行程时间。@super_ask,不幸的是,这是一个形状限制。没有元组可以表示您描述的数组的形状。唯一的候选者是(3,3,3)和(3,4,3),但第一个候选者只有在删除用户500的数据时才正确,而后者只有在删除用户605和745的数据时才正确。对于具有非均匀形状的多维数组,您最好迭代组件数组并获取其形状,例如[group.shape for group in group_data]。