Python 传递的项目数错误-向dataframe列添加numpy数组内容
我有以下Python 传递的项目数错误-向dataframe列添加numpy数组内容,python,arrays,pandas,numpy,dataframe,Python,Arrays,Pandas,Numpy,Dataframe,我有以下sioma_df数据框: 这些是sioma_dfshape和column索引。它有13807行和37列: sioma_df.columns (13807, 37) Index(['Luz (lux)', 'Precipitación (ml)', 'Temperatura (°C)', 'Velocidad del Viento (km/h)', 'E', 'N', 'NE', 'NO', 'O', 'S', 'SE', 'SO', 'PORVL2N1',
sioma_df
数据框:
这些是sioma_df
shape和column索引。它有13807行和37列:
sioma_df.columns
(13807, 37)
Index(['Luz (lux)', 'Precipitación (ml)', 'Temperatura (°C)',
'Velocidad del Viento (km/h)', 'E', 'N', 'NE', 'NO', 'O', 'S', 'SE',
'SO', 'PORVL2N1', 'PORVL2N2', 'PORVL4N1', 'PORVL5N1', 'PORVL6N1',
'PORVL7N1', 'PORVL8N1', 'PORVL9N1', 'PORVL10N1', 'PORVL13N1',
'PORVL14N1', 'PORVL15N1', 'PORVL16N1', 'PORVL16N2', 'PORVL18N1',
'PORVL18N2', 'PORVL18N3', 'PORVL18N4', 'PORVL21N1', 'PORVL21N2',
'PORVL21N3', 'PORVL21N4', 'PORVL21N5', 'PORVL24N1', 'PORVL24N2'],
dtype='object')
我想应用k-means算法,我已经决定在随机初始化阶段,我将使用k=9
质心
# Turn the dataframe to numpy array
sioma_numpy = sioma_df.get_values()
k=9
# Create a dictionary with the centroids coordinates
centroids = {
i + 1: [np.random.randint(0, np.max(sioma_numpy)), np.random.randint(0, np.max(sioma_numpy))]
for i in range(k)
}
我在应用聚类之前绘制数据
# I get each column individually into an array
c1 = sioma_df['Luz (lux)'].values
c2 = sioma_df['Precipitación (ml)'].values
c3 = sioma_df['Temperatura (°C)'].values
c4 = sioma_df['Velocidad del Viento (km/h)'].values
c5 = sioma_df['PORVL2N1'].values
c6 = sioma_df['PORVL2N2'].values
c7 = sioma_df['PORVL4N1'].values
c8 = sioma_df['PORVL5N1'].values
c9 = sioma_df['PORVL6N1'].values
c10 = sioma_df['PORVL7N1'].values
c11 = sioma_df['PORVL8N1'].values
c12 = sioma_df['PORVL9N1'].values
c13 = sioma_df['PORVL10N1'].values
c14 = sioma_df['PORVL13N1'].values
c15 = sioma_df['PORVL14N1'].values
c16 = sioma_df['PORVL15N1'].values
c17 = sioma_df['PORVL16N1'].values
c18 = sioma_df['PORVL16N2'].values
c19 = sioma_df['PORVL18N1'].values
c20 = sioma_df['PORVL18N2'].values
c21 = sioma_df['PORVL18N3'].values
c22 = sioma_df['PORVL18N4'].values
c23 = sioma_df['PORVL18N4'].values
c24 = sioma_df['PORVL21N1'].values
c25 = sioma_df['PORVL21N2'].values
c26 = sioma_df['PORVL21N3'].values
c27 = sioma_df['PORVL21N4'].values
c28 = sioma_df['PORVL21N5'].values
c29 = sioma_df['PORVL24N1'].values
c30 = sioma_df['E'].values
c31 = sioma_df['N'].values
c32 = sioma_df['NE'].values
c33 = sioma_df['NO'].values
c34 = sioma_df['O'].values
c35 = sioma_df['S'].values
c36 = sioma_df['SE'].values
c37 = sioma_df['S'].values
""" I generate the X and Y coordinates points of previous c1 to c36
variables above. With zip I've associate between each Ci and store in
a list to will represent array X and array Y
"""
X = np.array(list(zip(c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18)))
print( " ARRAY X" +'\n', X, '\n' )
Y = np.array(list(zip(c19,c20,c21,c22,c23,c24,c25,c26,c27,c28,c29,c30,c31,c32,c33,c34,c35,c36,)))
print( " ARRAY Y" +'\n', Y, '\n' )
然后,我生成了一对x,y质心坐标
我想从指定阶段开始,在该阶段中,我将数据点指定给最近的质心。我有以下资料:
def assignment(df, centroids):
# We take the k=9 centroids keys to iterations based
for i in centroids.keys():
# sqrt((x1 - x2)^2 - (y1 - y2)^2)
# I want create a new column in a sioma_df dataframe named
#distance_from_i
sioma_df['distance_from_{}'.format(i)] = (
# We calculate the distances between each data point and
# each one of the 9 centroids
# The distance_from_i column will have the distance value
# of each data point with reference to each centroid (Are 9 in total)
np.sqrt(
(X - centroids[i][0]) ** 2
+ (Y - centroids[i][1]) ** 2
)
)
# We iterate by each distance value of each data point i with
# reference to each centroid j to compare and meet to what
# distance is more closest
centroid_distance_cols = ['distance_from_{}'.format(i) for i in centroids.keys()]
# We create the closest column in the sioma_df dataframe,
# selecting the more minimum values in the column axis=1:
sioma_df['closest'] = sioma_df.loc[:, centroid_distance_cols].idxmin(axis=1)
sioma_df['closest'] = sioma_df['closest'].map(lambda x: int(x.lstrip('distance_from_')))
sioma_df['color'] = sioma_df['closest'].map(lambda x: colmap[x])
return df
# We wxecute the assignment function which perform the compute of what data point is more closest to each centroid
df = assignment(sioma_df, centroids)
print(df.head)
但是,当我执行代码时,会出现以下错误:
KeyError: 'distance_from_1'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-160-b96e0351c13d> in <module>()
24
25 #
---> 26 df = assignment(sioma_df, centroids)
27 print(df.head)
<ipython-input-160-b96e0351c13d> in assignment(df, centroids)
11 np.sqrt(
12 (X - centroids[i][0]) ** 2
---> 13 + (Y - centroids[i][1]) ** 2
14 )
15 )
ValueError: Wrong number of items passed 18, placement implies 1
我真的不明白如何解决这个不方便的意义上有一个正确的分配;这使得我很难排除故障
任何能为我指明正确方向的支持都将受到高度赞赏我的问题是
np.sqrt(…)
语句不返回一维数组。
每行的列位置期望值为1,但由于X
和Y
numpy数组的长度,它接收的数组长度为18个元素
numpy数组上的操作是按元素进行的,因此可能不会更改正在操作的数组的形状。
然后,当我想要创建新的distance\u from\u I
列时,请执行以下操作:
sioma_df['distance_from_{}'.format(i)] = (
np.sqrt(
(X - centroids[i][0]) ** 2
+ (Y - centroids[i][1]) ** 2
)
)
我指定给这个distance\u from\u I
列,而不是一个一维数组,它是必须接收或接受的容量,否则,我的distance\u from\u I
列(每行,列)接收18个元素长的数组,这就是错误的原因
ValueError:传递的项目数错误18,放置意味着1
然后,我将新的distance\u从_I
列初始化为NaN
值,然后将np.sqrt(…)
语句的结果值分配给它,它就工作了。我的任务职能运作正常,一直以来都是这样:
def assignment(df, centroids):
# We take the k=9 centroids keys to iterations based
for i in centroids.keys():
# sqrt((x1 - x2)^2 - (y1 - y2)^2)
# We calculate the distances between each data point and
# each one of the 9 centroids
# The distance_from_i column will have the distance value
# of each data point with reference to each centroid (Are 9 in total)
n = np.sqrt(
(X - centroids[i][0]) ** 2
+ (Y - centroids[i][1]) ** 2
)
# I want create a new column in a sioma_df dataframe named
# distance_from_i
sioma_df['distance_from_{}'.format(i)] = np.nan
sioma_df['distance_from_{}'.format(i)] = n
# We iterate by each distance value of each data point i with
# reference to each centroid j to compare and meet to what
# distance is more closest
centroid_distance_cols = ['distance_from_{}'.format(i) for i in centroids.keys()]
# We create the closest column in the sioma_df dataframe,
# selecting the more minimum values in the column axis=1
sioma_df['closest'] = sioma_df.loc[:, centroid_distance_cols].idxmin(axis=1)
sioma_df['closest'] = sioma_df['closest'].map(lambda x: int(x.lstrip('distance_from_')))
sioma_df['color'] = sioma_df['closest'].map(lambda x: colmap[x])
return df
# We execute the assignment function which perform the compute of what data point is more closest to each centroid
df = assignment(sioma_df, centroids)
print(df.head)
您真的应该使用字典之类的东西来存储所有这些数组。很cleaner@user3483203这是真的,但是,我使用这些ci数组将它们应用于zip函数,并生成
X
和Y
坐标,以便稍后在散射函数中使用。字典中有X
和Y
元素,如何在散布函数中使用?或者你告诉我用字典计算距离?确切地说,我是从哪里得到错误的?
def assignment(df, centroids):
# We take the k=9 centroids keys to iterations based
for i in centroids.keys():
# sqrt((x1 - x2)^2 - (y1 - y2)^2)
# We calculate the distances between each data point and
# each one of the 9 centroids
# The distance_from_i column will have the distance value
# of each data point with reference to each centroid (Are 9 in total)
n = np.sqrt(
(X - centroids[i][0]) ** 2
+ (Y - centroids[i][1]) ** 2
)
# I want create a new column in a sioma_df dataframe named
# distance_from_i
sioma_df['distance_from_{}'.format(i)] = np.nan
sioma_df['distance_from_{}'.format(i)] = n
# We iterate by each distance value of each data point i with
# reference to each centroid j to compare and meet to what
# distance is more closest
centroid_distance_cols = ['distance_from_{}'.format(i) for i in centroids.keys()]
# We create the closest column in the sioma_df dataframe,
# selecting the more minimum values in the column axis=1
sioma_df['closest'] = sioma_df.loc[:, centroid_distance_cols].idxmin(axis=1)
sioma_df['closest'] = sioma_df['closest'].map(lambda x: int(x.lstrip('distance_from_')))
sioma_df['color'] = sioma_df['closest'].map(lambda x: colmap[x])
return df
# We execute the assignment function which perform the compute of what data point is more closest to each centroid
df = assignment(sioma_df, centroids)
print(df.head)