Python 如何将选定数据转换为相同长度(形状)
我将多个.csv文件作为一个具有相同形状的熊猫数据框读取。对于某些索引,某些值为零,因此我想选择具有相同形状的每个索引的值,并将相同索引的值置为零,然后删除零以成为相同形状:Python 如何将选定数据转换为相同长度(形状),python,arrays,pandas,dataframe,Python,Arrays,Pandas,Dataframe,我将多个.csv文件作为一个具有相同形状的熊猫数据框读取。对于某些索引,某些值为零,因此我想选择具有相同形状的每个索引的值,并将相同索引的值置为零,然后删除零以成为相同形状: a = pd.DataFrame(pd.read_csv("path_a",index_col=0)) b = pd.DataFrame(pd.read_csv("path_b",index_col=0)) c = pd.DataFrame(pd.read_csv("path_c",index_col=0)) print
a = pd.DataFrame(pd.read_csv("path_a",index_col=0))
b = pd.DataFrame(pd.read_csv("path_b",index_col=0))
c = pd.DataFrame(pd.read_csv("path_c",index_col=0))
print a,"\n",b,"\n",c
L = np.array(a.shape)
X = L[0]
d = a.index.values
a = np.array(a)
b = np.array(b)
c = np.array(c)
for i in range (0,X):
xdata = a[i]
xdata1 = b[i]
xdata2 = c[i]
xdata = np.where(xdata2==0,0,xdata)
xdata1 = np.where(xdata2==0,0,xdata1)
xdata1 = np.where(xdata==0,0,xdata1)
xdata2 = np.where(xdata==0,0,xdata2)
xdata = np.where(xdata1==0,0,xdata)
xdata2 = np.where(xdata1==0,0,xdata2)
indexX = np.argwhere(xdata==0)
index1X = np.argwhere(xdata1==0)
index2X = np.argwhere(xdata2==0)
xdata = np.delete(xdata,indexX)
xdata1 = np.delete(xdata1,index1X)
xdata2 = np.delete(xdata2,index2X)
print d[i],"\n",xdata,"\n",xdata1,"\n",xdata2
这段代码可以工作,但这是一种尝试性的方法,当数据量很大时,它就没有效率。你能给我建议一种更有效的方法,以及如何根据最小长度索引选择数据吗?一个想法是将所有3个数组都乘以,然后测试它是否为非
0
,也可以使用列表中的3个数组循环L1
。然后还更改了逻辑-选择与掩码不匹配的值,而不是np.argwhere
和np.delete
:
L = np.array(a.shape)
X = L[0]
d = a.index.values
a = np.array(a)
b = np.array(b)
c = np.array(c)
m = (a * b * c) != 0
L1 = [a,b,c]
for i in range (0,X):
for arr in L1:
xdata = arr[i][m[i]]
print (xdata)
如果使用pandas 0.24+,则转换为numpy数组的更好方法是使用:
编辑:
谢谢,但是如果我想将扩展数据用作输入数据,我如何区分它们?@water77-一个可能的解决方案是为每个索引值创建numpy数组,请选中编辑后的答案。@water77-谢谢,很乐意提供帮助。如果适合你,别忘了接受答案!:)
L = np.array(a.shape)
X = L[0]
d = a.index.values
a = np.array(a)
b = np.array(b)
c = np.array(c)
m = (a * b * c) != 0
L1 = [a,b,c]
for i in range (0,X):
for arr in L1:
xdata = arr[i][m[i]]
print (xdata)
L = np.array(a.shape)
X = L[0]
d = a.index.to_numpy()
a = a.to_numpy()
b = b.to_numpy()
c = c.to_numpy()
m = (a * b * c) != 0
L1 = [a,b,c]
for i in range (0,X):
for arr in L1:
xdata = arr[i][m[i]]
print (xdata)
L = np.array(a.shape)
X = L[0]
d = a.index.to_numpy()
a = a.to_numpy()
b = b.to_numpy()
c = c.to_numpy()
m = (a * b * c) != 0
L1 = [a,b,c]
for i in range (0,X):
out = []
for arr in L1:
xdata = arr[i][m[i]]
out.append(xdata)
data = np.vstack((out))
print (data)
[]
[[ 3. 4. ]
[15.8 16.4]
[ 4.7 5.8]]
[[0.2 0.5 0.2 1.3 1.6 2.7]
[1.5 1.6 1.6 1.6 1.6 1.7]
[0.2 0.5 0.2 1.3 1.6 2.7]]
[]
[]
[[ 3.1 6.7 5.3 15.1 17.2 18.2 18.7]
[ 1.8 1.8 1.7 1.8 1.8 1.9 1.9]
[ 3.1 6.7 5.3 15.1 17.2 18.2 18.7]]
[[0.4 0.5 0.5 0.4 1.2 1.3]
[1.8 1.8 1.7 1.8 1.9 1.5]
[0.4 0.5 0.5 0.4 1.2 1.3]]