Python 计算每个类别的收益百分比_Python_Pandas_Percentage

Python 计算每个类别的收益百分比

python pandas

Python 计算每个类别的收益百分比,python,pandas,percentage,Python,Pandas,Percentage,我有以下python数据帧： | Number of visits per year | user id | 2013 | 2014 | 2015 | 2016 | A 4 3 6 0 B 3 0 7 3 C 10 6 3 0 我想根据访问次数计算返回的用户百分比。很抱歉，我还没有任何

我有以下python数据帧：

          |   Number of visits per year  |
user id   |  2013  | 2014 | 2015 | 2016  |
   A           4       3     6      0     
   B           3       0     7      3
   C          10       6     3      0

我想根据访问次数计算返回的用户百分比。很抱歉，我还没有任何代码，我不知道如何启动

这是我想要的最终结果：

         |       Number of visits in the year     |
 Year    | 1  | 2 | 3  | 4  | 5 | 6 | 7  | 8  | 9 | 10 |  
 2014      7%   3%  4%   15%  6%  7%  18%  17% 3%   2%   
 2015      3% ....
 2016

因此，基于上述情况，我可以说，2013年4次光顾该店的客户中有15%在2014年再次光顾该店

多谢各位

更新：这就是我所做的，也许有更好的方法通过循环

每年，我都有这样一个csv：

user_id |    NR_V
   A           4      
   B           3       
   C          10

NR_V代表访问次数

p_2009 = pd.DataFrame()
p_2009['%returned2010'] = (pivot_2009['shopped2010']/pivot_2009['NR_V'])*100

所以我上传了每个csv作为它自己的df，我有df_2009，df_2010。。。直到2016年

对于每个文件，如果他们第二年购物，我会添加一列0/1

 df_2009['shopped2010'] = np.where(df_2009['user_ID'].isin(df_2010['user_ID']), 1, 0)

然后我旋转每个数据帧

 pivot_2009 = pd.pivot_table(df_2009,index=["NR_V"],aggfunc={"NR_V":len, "shopped2010":np.sum})

接下来，对于每个数据帧，我创建了一个新的数据帧，其中有一列按访问次数计算百分比

p_2009 = pd.DataFrame()
p_2009['%returned2010'] = (pivot_2009['shopped2010']/pivot_2009['NR_V'])*100

最后，我将所有这些数据帧合并为一个

dfs = [p_2009, p_2010, p_2011, p_2012, p_2013, p_2014, p_2015 ]
final = pd.concat(dfs, axis=1)

请在下面找到我的解决方案。值得注意的是，我非常肯定这是可以改进的


#步骤0：创建数据帧
数据帧（{'2013'：[4,3,10]，'2014'：[3,0,6]，'2015'：[6,7,3]，'2016'：[0,3,0]}，索引=['A'，'B'，'C']））
#要连接的数据帧的容器列表
帧=[]
#一次遍历一列数据帧，并确定其值\u计数（freq table）
对于名称，df.iteritems（）中的系列：
frames.append（series.value\u counts（））
#将所有列的频率表合并到数据帧中
temp_df=pd.concat（帧，轴=1）.transpose（）.fillna（0）
#找到新数据帧的键（即列数的范围），并追加缺少的列
cols=温度和方向柱
min=cols.min（）
max=cols.max（）
对于范围内的i（最小值、最大值）：
如果（不是a中的i）：
温度[str（i）]=0
#计算百分比
最终测向=温度测向div（温度测向和（轴=1），轴=0）

考虑访问数据帧的示例

df

df = pd.DataFrame(
    np.random.randint(1, 10, (100, 5)),
    pd.Index(['user_{}'.format(i) for i in range(1, 101)], name='user id'),
    [
        ['Number of visits per year'] * 5,
        [2012, 2013, 2014, 2015, 2016]
    ]
)

df.head()

您可以使用参数

normalize=True

此外，由于

表示8次单独的访问，因此应计8次。我将使用

repeat

在

值\u计数之前完成此操作

def count_visits(col):
    v = col.values
    return pd.value_counts(v.repeat(v), normalize=True)

df.apply(count_visits).stack().unstack(0)

我使用每位访客的索引值，并检查下一年相同的索引值（即相同的访客ID）是否大于0。然后以True或False的形式将其添加到字典中，您可以将其用于条形图。我还制作了两个列表（times_returned和returned_at_all），用于额外的数据操作

import pandas as pd

# Part 1, Building the dataframe.

df = pd.DataFrame({
                   'Visitor_ID':[1,2,3],
                   '2010'      :[4,3,10],
                   '2011'      :[3,0,6],
                   '2012'      :[6,7,3],
                   '2013'      :[0,3,0]    
                   })

df.set_index("Visitor_ID", inplace=True)

# Part 2, preparing the required variables.

def dictionary (max_visitors):
    dictionary={}
    for x in range(max_visitors):
        dictionary["number_{}".format(x)] = []
#    print(dictionary)
    return dictionary

# Part 3, Figuring out if the customer returned.             

def compare_yearly_visits(current_year, next_year):    
    index = 1 
    years = df.columns
    for x in df[current_year]: 
#        print (df[years][current_year][index], 'this year.')
#        print (df[years][next_year][index], 'Next year.')
        how_many_visits = df[years][current_year][index] 
        did_he_return   = df[years][next_year][index]

        if did_he_return > 0: 
            # If the visitor returned, add to a bunch of formats:
            returned_at_all.append([how_many_visits, True])
            times_returned.append([how_many_visits, did_he_return])
            dictionary["number_{}".format(x)].append(True)
        else: 
            ## If the visitor did not return, add to a bunch of formats:
            returned_at_all.append([how_many_visits, False])
            dictionary["number_{}".format(x)].append(False)

        index = index +1 

# Part 4, The actual program:
highest_amount_of_visits = 11 # should be done automatically, max(visits)?        
relevant_years = len(df.columns) -1
times_returned = []
returned_at_all = []

dictionary = dictionary(highest_amount_of_visits)
for column in range(relevant_years):  
#   print (dictionary)
    this_year = df.columns[column]
    next_year = df.columns[column+1]
    compare_yearly_visits(this_year, next_year)
    print ("cumulative dictionary up to:", this_year,"\n", dictionary)

谢谢，不幸的是，迭代函数立即杀死了我的内核。我想知道如何在没有迭代的情况下完成这项工作。当我运行它时，我得到“python.exe已停止工作”-历史保存线程遇到意外错误（AttributeError（“long”对象没有属性“acquire”）。历史记录不会写入数据库。看起来故障是其他原因造成的，请确认@piRSquared，但这只给出了第二年回来的客户数量，而不是百分比。谢谢，我现在正在尝试此解决方案。谢谢，我现在正在尝试此解决方案。