Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python TypeError:类型为'的对象;地图';没有len()蟒蛇3_Python_Python 3.x_Apache Spark_Pyspark_K Means - Fatal编程技术网

Python TypeError:类型为'的对象;地图';没有len()蟒蛇3

Python TypeError:类型为'的对象;地图';没有len()蟒蛇3,python,python-3.x,apache-spark,pyspark,k-means,Python,Python 3.x,Apache Spark,Pyspark,K Means,我试图用Pyspark实现KMeans算法,它在while循环的最后一行给出了上面的错误。它在循环之外工作得很好,但是在我创建了循环之后,它给了我这个错误 我该如何解决这个问题 # Find K Means of Loudacre device status locations # # Input data: file(s) with device status data (delimited by '|') # including latitude (13th field) and long

我试图用Pyspark实现KMeans算法,它在while循环的最后一行给出了上面的错误。它在循环之外工作得很好,但是在我创建了循环之后,它给了我这个错误 我该如何解决这个问题

#  Find K Means of Loudacre device status locations
#
# Input data: file(s) with device status data (delimited by '|')
# including latitude (13th field) and longitude (14th field) of device locations
# (lat,lon of 0,0 indicates unknown location)
# NOTE: Copy to pyspark using %paste

# for a point p and an array of points, return the index in the array of the point closest to p
def closestPoint(p, points):
    bestIndex = 0
    closest = float("+inf")
    # for each point in the array, calculate the distance to the test point, then return
    # the index of the array point with the smallest distance
    for i in range(len(points)):
        dist = distanceSquared(p,points[i])
        if dist < closest:
            closest = dist
            bestIndex = i
    return bestIndex

# The squared distances between two points
def distanceSquared(p1,p2):
    return (p1[0] - p2[0]) ** 2 + (p1[1] - p2[1]) ** 2

# The sum of two points
def addPoints(p1,p2):
    return [p1[0] + p2[0], p1[1] + p2[1]]

# The files with device status data
filename = "/loudacre/devicestatus_etl/*"

# K is the number of means (center points of clusters) to find
K = 5

# ConvergeDist -- the threshold "distance" between iterations at which we decide we are done
convergeDist=.1

# Parse device status records into [latitude,longitude]
rdd2=rdd1.map(lambda line:(float((line.split(",")[3])),float((line.split(",")[4]))))
# Filter out records where lat/long is unavailable -- ie: 0/0 points
# TODO
filterd=rdd2.filter(lambda x:x!=(0,0))
# start with K randomly selected points from the dataset
# TODO
sample=filterd.takeSample(False,K,42)
# loop until the total distance between one iteration's points and the next is less than the convergence distance specified
tempDist =float("+inf")
while tempDist > convergeDist:
    # for each point, find the index of the closest kpoint.  map to (index, (point,1))
    # TODO
    indexed =filterd.map(lambda (x1,x2):(closestPoint((x1,x2),sample),((x1,x2),1)))

    # For each key (k-point index), reduce by adding the coordinates and number of points

    reduced=indexed.reduceByKey(lambda x,y: ((x[0][0]+y[0][0],x[0][1]+y[0][1]),x[1]+y[1]))
    # For each key (k-point index), find a new point by calculating the average of each closest point
    # TODO
    newCenters=reduced.mapValues(lambda x1: [x1[0][0]/x1[1], x1[0][1]/x1[1]]).sortByKey()
    # calculate the total of the distance between the current points and new points
    newSample=newCenters.collect() #new centers as a list
    samples=zip(newSample,sample) #sample=> old centers
    samples1=sc.parallelize(samples)
    totalDistance=samples1.map(lambda x:distanceSquared(x[0][1],x[1]))
    # Copy the new points to the kPoints array for the next iteration
    tempDist=totalDistance.sum()
    sample=map(lambda x:x[1],samples) #new sample for next iteration as list
sample
#查找设备状态位置的K个平均值
#
#输入数据:包含设备状态数据的文件(以“|”分隔)
#包括设备位置的纬度(第13栏)和经度(第14栏)
#(lat,lon为0,0表示未知位置)
#注意:使用%paste复制到pyspark
#对于点p和点数组,返回最接近点p的数组中的索引
def闭合点(p,点):
最佳索引=0
最近=浮动(“+inf”)
#对于阵列中的每个点,计算到测试点的距离,然后返回
#具有最小距离的阵列点的索引
对于范围内的i(len(点)):
距离=距离平方(p,点[i])
如果距离<最近:
最近的=距离
最佳索引=i
返回最佳索引
#两点之间的平方距离
def距离平方(p1,p2):
返回(p1[0]-p2[0])**2+(p1[1]-p2[1])**2
#两点之和
def添加点(p1、p2):
返回[p1[0]+p2[0],p1[1]+p2[1]]
#包含设备状态数据的文件
filename=“/loudacre/devicestatus\etl/*”
#K是要查找的平均数(簇的中心点)
K=5
#ConvergeDist——迭代之间的阈值“距离”,在该距离处,我们决定完成迭代
聚合距离=.1
#将设备状态记录解析为[纬度,经度]
rdd2=rdd1.map(lambda行:(float((line.split(“,”[3])),float((line.split(“,”[4]))
#筛选出不可用lat/long的记录--即:0/0分
#待办事项
filterd=rdd2.filter(λx:x!=(0,0))
#从数据集中随机选择的K个点开始
#待办事项
样本=filterd.takeSample(假,K,42)
#循环,直到一个迭代点和下一个迭代点之间的总距离小于指定的收敛距离
tempDist=float(“+inf”)
当tempDist>convergeDist时:
#对于每个点,找到最近kpoint的索引。映射到(索引,(点,1))
#待办事项
索引=过滤映射(λ(x1,x2):(闭合点((x1,x2),样本),(x1,x2),1)))
#对于每个关键点(k点索引),通过添加坐标和点数来减少
reduced=index.reduceByKey(λx,y:((x[0][0]+y[0][0],x[0][1]+y[0][1]),x[1]+y[1]))
#对于每个关键点(k点索引),通过计算每个最近点的平均值找到一个新点
#待办事项
newCenters=reduced.mapValues(lambda-x1:[x1[0][0]/x1[1],x1[0][1]/x1[1]])。sortByKey()
#计算当前点和新点之间的总距离
newSample=newCenters.collect()#新中心作为列表
样本=zip(新样本,样本)#样本=>旧中心
samples1=sc.parallelize(样本)
totalDistance=samples1.map(λx:distanceSquared(x[0][1],x[1]))
#将新点复制到kPoints数组以供下一次迭代
tempDist=totalDistance.sum()
样本=映射(lambda x:x[1],样本)#下一次迭代的新样本作为列表
样品

出现此错误是因为您试图获取不支持
len
的对象(生成器类型)。例如:

>>> x = [[1, 'a'], [2, 'b'], [3, 'c']]

# `map` returns object of map type
>>> map(lambda a: a[0], x)
<map object at 0x101b75ba8>

# on doing `len`, raises error
>>> len(map(lambda a: a[0], x))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'map' has no len()
或者,使用列表理解(不使用
map
)创建一个列表更好,如下所示:


错误消息对我来说非常明确-
map
返回一个生成器,而不是像Python 2中那样的列表。请发布strack跟踪。您发布了100行代码,但没有提到哪一行有问题。相关(可能重复?):或具有列表理解:
sample=[x[1]表示示例中的x]
不是每个迭代器都是生成器,尤其是
map
>>> len(list(map(lambda a: a[0], x)))
3
>>> my_list = [a[0] for a in x]

# since it is a `list`, you can take it's length
>>> len(my_list)
3