根据嵌套列表python中的类别统计用户数

根据嵌套列表python中的类别统计用户数,python,python-3.x,list,dictionary,Python,Python 3.x,List,Dictionary,我有一个包含两个子列表的列表。 这里看起来像这样 a = [['user1', 'referral'], ['user2', 'referral'], ['user1', 'referral'], ['user1', 'affiliate'], ['user7', 'affiliate'], ['user1', 'affiliate'], ['user9', 'affiliate'], ['user4', 'cpc'], ['user4', 'referral'], ['user2', 'ref

我有一个包含两个子列表的列表。 这里看起来像这样

a = [['user1', 'referral'], ['user2', 'referral'], ['user1', 'referral'], ['user1', 'affiliate'], ['user7', 'affiliate'], ['user1', 'affiliate'], ['user9', 'affiliate'], ['user4', 'cpc'], ['user4', 'referral'], ['user2', 'referral'], ['user7', 'affiliate'], ['user14', 'cpc'], ['user3', 'orgainic'], ['user2', 'orgainic'], ['user4', 'cpc'], ['user2', 'cpc'], ['user8', 'cpc'], ['user2', 'orgainic']]
我想根据类别统计用户数量

所需:

required = [['referral',3],['affiliate',3],['cpc',4],['orgainic',2]]
我得到的输出:

{'referral': 3, 'affiliate': 2, 'cpc': 4, 'orgainic': 3}
数错了

以下是我尝试的代码:

a = [['user1', 'referral'], ['user2', 'referral'], ['user1', 'referral'], ['user1', 'affiliate'], ['user7', 'affiliate'], ['user1', 'affiliate'], ['user9', 'affiliate'], ['user4', 'cpc'], ['user4', 'referral'], ['user2', 'referral'], ['user7', 'affiliate'], ['user14', 'cpc'], ['user3', 'orgainic'], ['user2', 'orgainic'], ['user4', 'cpc'], ['user2', 'cpc'], ['user8', 'cpc'], ['user2', 'orgainic']]

required = [['referral',3],['affiliate',3],['cpc',4],['orgainic',2]]

c = {}
visits = []
for i in a:
    # print(i)
    for j in i[1:]:
        if j not in c and i[0] not in visits:
            c[j] = 1
            visits.append(i[0])
        elif j in c and i[0] not in visits:
            c[j] = c[j]+1
print(c)

帮助我找到一些解决方案…

这是一种使用collections.defaultdict的方法

例:

输出:

# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]

这是一种使用collections.defaultdict的方法

例:

输出:

# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
基于defaultdict和for循环的解决方案 这可以使用defaultdict完成:

输出:

# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
基于groupby的解决方案 或者,可以使用itertools中的groupby完成此操作:

输出:

# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
基于defaultdict和for循环的解决方案 这可以使用defaultdict完成:

输出:

# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
基于groupby的解决方案 或者,可以使用itertools中的groupby完成此操作:

输出:

# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]

这听起来像是熊猫的例子,您的列表已经是正确的形状:

import pandas as pd
a = [['user1', 'referral'], ['user2', 'referral'], ['user1', 'referral'], ['user1', 'affiliate'], ['user7', 'affiliate'], ['user1', 'affiliate'], ['user9', 'affiliate'], ['user4', 'cpc'], ['user4', 'referral'], ['user2', 'referral'], ['user7', 'affiliate'], ['user14', 'cpc'], ['user3', 'orgainic'], ['user2', 'orgainic'], ['user4', 'cpc'], ['user2', 'cpc'], ['user8', 'cpc'], ['user2', 'orgainic']]

df = pd.DataFrame(a)
df.columns=["user", "type"]

unique_per_type = df.groupby("type")["user"].unique()
现在,每种类型的唯一\u为:

type
affiliate            [user1, user7, user9]
cpc          [user4, user14, user2, user8]
orgainic                    [user3, user2]
referral             [user1, user2, user4]
Name: user, dtype: object
你可以做如下事情:

# access length by key
len(unique_per_type["affiliate"]) 

# or use it like a dict
for key, val in unique_per_type.items():
    print(key, len(val)))
此解决方案添加了熊猫,这是一个巨大的依赖项。但一旦您将数据放入数据框中,您就可以用它做很多事情:

df["user"].unique() # shows all unique users

df.query("user=='user1'") # shows all observations involving user1

这听起来像是熊猫的例子,您的列表已经是正确的形状:

import pandas as pd
a = [['user1', 'referral'], ['user2', 'referral'], ['user1', 'referral'], ['user1', 'affiliate'], ['user7', 'affiliate'], ['user1', 'affiliate'], ['user9', 'affiliate'], ['user4', 'cpc'], ['user4', 'referral'], ['user2', 'referral'], ['user7', 'affiliate'], ['user14', 'cpc'], ['user3', 'orgainic'], ['user2', 'orgainic'], ['user4', 'cpc'], ['user2', 'cpc'], ['user8', 'cpc'], ['user2', 'orgainic']]

df = pd.DataFrame(a)
df.columns=["user", "type"]

unique_per_type = df.groupby("type")["user"].unique()
现在,每种类型的唯一\u为:

type
affiliate            [user1, user7, user9]
cpc          [user4, user14, user2, user8]
orgainic                    [user3, user2]
referral             [user1, user2, user4]
Name: user, dtype: object
你可以做如下事情:

# access length by key
len(unique_per_type["affiliate"]) 

# or use it like a dict
for key, val in unique_per_type.items():
    print(key, len(val)))
此解决方案添加了熊猫,这是一个巨大的依赖项。但一旦您将数据放入数据框中,您就可以用它做很多事情:

df["user"].unique() # shows all unique users

df.query("user=='user1'") # shows all observations involving user1

首先,让我们使条目具有唯一性:

c = {tuple(sublist) for sublist in a}
现在我们有了唯一的用户和类型对

对于计数,我们不需要用户,因此让我们将其列为一个仅包含第二个参数的列表:

c = [elem[1] for elem in c]
现在我们可以很容易地计算:

from collections import Counter
c = Counter(c)
结果:计数器{'cpc':4,'附属机构':3,'转诊':3,'组织机构':2}

现在,让我们把这一切放在一起:

from collections import Counter

c = Counter(elem[1] for elem in {tuple(sublist) for sublist in a})

首先,让我们使条目具有唯一性:

c = {tuple(sublist) for sublist in a}
现在我们有了唯一的用户和类型对

对于计数,我们不需要用户,因此让我们将其列为一个仅包含第二个参数的列表:

c = [elem[1] for elem in c]
现在我们可以很容易地计算:

from collections import Counter
c = Counter(c)
结果:计数器{'cpc':4,'附属机构':3,'转诊':3,'组织机构':2}

现在,让我们把这一切放在一起:

from collections import Counter

c = Counter(elem[1] for elem in {tuple(sublist) for sublist in a})

如何将其转换为所需输出的嵌套列表。@SmackAlpha,在代码示例中添加了最后一行以转换为列表列表。如何将其转换为所需输出的嵌套列表。@SmackAlpha,在代码示例中添加了最后一行以转换为列表列表。非常简单的解决方案可以是listc.items。非常简单的解决方案可以是listc.items。