Python 计算平均值的最简单的方法

Python 计算平均值的最简单的方法,python,Python,我在3d字典中有如下数据: movieid, date,customer_id,views 0, (2011,12,22), 0, 22 0, (2011,12,22), 1, 2 0, (2011,12,22), 2, 12 ..... 0, (2011,12,22), 7, 2 0, (2011,12,23), 0, 123 。。 因此,基本上,数据表示一部电影每天被观看的次数。。每个客户(只有8个客户) 现在,我想计算平均每位客户观看一部电影的次数 所以基本上 mo

我在3d字典中有如下数据:

 movieid, date,customer_id,views
 0, (2011,12,22), 0, 22
 0, (2011,12,22), 1, 2
 0, (2011,12,22), 2, 12
 .....
 0, (2011,12,22), 7, 2
 0, (2011,12,23), 0, 123
。。 因此,基本上,数据表示一部电影每天被观看的次数。。每个客户(只有8个客户)

现在,我想计算平均每位客户观看一部电影的次数

所以基本上

    movie_id,customer_id, avg_views
     0, 0, 33.2
     0, 1 , 22.3

  and so on
解决这个问题的pythonic方法是什么

萨克斯

编辑:

 data = defaultdict(lambda : defaultdict(dict))
 date = datetime.datetime(2011,1,22)
 data[0][date][0] = 22
 print data
defaultdict(<function <lambda> at 0x00000000022F7CF8>, 
 {0: defaultdict(<type 'dict'>, 
 {datetime.datetime(2011, 1, 22, 0, 0): {0: 22}}))
注意:客户1在1月23日没有观看id为0的电影 现在答案是

 movie_id,customer_id,avg_views
  0   , 0 ,    (22+44)/2
  0,    1,      (23)/1

sum
使这变得简单。在我的原始版本中,我使用了很多
dict.keys()
,但在默认情况下,迭代字典会为您提供键

此函数计算结果的单行:

def average_daily_views(movie_id, customer_id, data):
    daily_values = [data[movie_id][date][customer_id] for date in data[movie_id]]
    return sum(daily_values)/len(daily_values)
然后你可以循环它,得到你想要的任何形式。也许:

def get_averages(data):
    result = [average_daily_views(movie, customer, data) for customer in 
              data[movie] for movie in data]
我的愿景是:

pool = [
    (0, (2011,12,22), 0, 22),
    (0, (2011,12,22), 1, 2),
    (0, (2011,12,22), 2, 12),
    (0, (2011,12,22), 7, 2),
    (0, (2011,12,23), 0, 123),
]


def calc(memo, row):
    if (row[2] in memo.keys()):
        num, value = memo[2]
    else:
        num, value = 0, 0

    memo[row[2]] = (num + 1, value + row[3])
    return memo

# dic with sum and number
v = reduce(calc, pool, {})
# calc average
avg = map(lambda x: (x[0], x[1][1] / x[1][0]), v.items())

print dict(avg)

其中,
avg
-是一个带有key=customer\u id和value-average of views的字典

我认为您应该稍微调整一下数据结构,以便更好地服务于您的目的:

restructured_data = collections.defaultdict(lambda: collections.deafualtdict(collections.defaultdict(int)))
for movie in data:
    for date in data[movie]:
        for customer,count in date.iteritems():
            restructured_data[customer_id][movie_id][date] += count

averages = collections.defaultdict(dict)
for customer in restructured_data:
    for movie in restructured_data[customer]:
        avg = sum(restructured_data[customer][movie].itervalues())/float(len(restructured_data[customer][movie]))
        averages[movie][customer] = avg

for movie in averages:
    for customer, avg in averages[movie].iteritems():
        print "%d, %d, %f" %(movie, customer, avg)

希望这对您有所帮助

请发布(至少一个条目)保存此数据的三维词典。如果您还可以向我们展示您希望结果的样子……您能否格式化您的
defaultdict
,使其易于阅读?如果需要,使用
pprint.pprint
。这是一个相当复杂的
defaultdict
。你考虑过使用Numpy吗?实际上,我认为你应该让它成为
data[customer\u id][movie\u id][date]=count
restructured_data = collections.defaultdict(lambda: collections.deafualtdict(collections.defaultdict(int)))
for movie in data:
    for date in data[movie]:
        for customer,count in date.iteritems():
            restructured_data[customer_id][movie_id][date] += count

averages = collections.defaultdict(dict)
for customer in restructured_data:
    for movie in restructured_data[customer]:
        avg = sum(restructured_data[customer][movie].itervalues())/float(len(restructured_data[customer][movie]))
        averages[movie][customer] = avg

for movie in averages:
    for customer, avg in averages[movie].iteritems():
        print "%d, %d, %f" %(movie, customer, avg)