根据嵌套列表python中的类别统计用户数
我有一个包含两个子列表的列表。 这里看起来像这样根据嵌套列表python中的类别统计用户数,python,python-3.x,list,dictionary,Python,Python 3.x,List,Dictionary,我有一个包含两个子列表的列表。 这里看起来像这样 a = [['user1', 'referral'], ['user2', 'referral'], ['user1', 'referral'], ['user1', 'affiliate'], ['user7', 'affiliate'], ['user1', 'affiliate'], ['user9', 'affiliate'], ['user4', 'cpc'], ['user4', 'referral'], ['user2', 'ref
a = [['user1', 'referral'], ['user2', 'referral'], ['user1', 'referral'], ['user1', 'affiliate'], ['user7', 'affiliate'], ['user1', 'affiliate'], ['user9', 'affiliate'], ['user4', 'cpc'], ['user4', 'referral'], ['user2', 'referral'], ['user7', 'affiliate'], ['user14', 'cpc'], ['user3', 'orgainic'], ['user2', 'orgainic'], ['user4', 'cpc'], ['user2', 'cpc'], ['user8', 'cpc'], ['user2', 'orgainic']]
我想根据类别统计用户数量
所需:
required = [['referral',3],['affiliate',3],['cpc',4],['orgainic',2]]
我得到的输出:
{'referral': 3, 'affiliate': 2, 'cpc': 4, 'orgainic': 3}
数错了
以下是我尝试的代码:
a = [['user1', 'referral'], ['user2', 'referral'], ['user1', 'referral'], ['user1', 'affiliate'], ['user7', 'affiliate'], ['user1', 'affiliate'], ['user9', 'affiliate'], ['user4', 'cpc'], ['user4', 'referral'], ['user2', 'referral'], ['user7', 'affiliate'], ['user14', 'cpc'], ['user3', 'orgainic'], ['user2', 'orgainic'], ['user4', 'cpc'], ['user2', 'cpc'], ['user8', 'cpc'], ['user2', 'orgainic']]
required = [['referral',3],['affiliate',3],['cpc',4],['orgainic',2]]
c = {}
visits = []
for i in a:
# print(i)
for j in i[1:]:
if j not in c and i[0] not in visits:
c[j] = 1
visits.append(i[0])
elif j in c and i[0] not in visits:
c[j] = c[j]+1
print(c)
帮助我找到一些解决方案…这是一种使用collections.defaultdict的方法 例: 输出:
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
这是一种使用collections.defaultdict的方法 例: 输出:
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
基于defaultdict和for循环的解决方案
这可以使用defaultdict完成:
输出:
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
基于groupby的解决方案
或者,可以使用itertools中的groupby完成此操作:
输出:
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
基于defaultdict和for循环的解决方案
这可以使用defaultdict完成:
输出:
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
基于groupby的解决方案
或者,可以使用itertools中的groupby完成此操作:
输出:
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
# [['affiliate', 3], ['cpc', 4], ['orgainic', 2], ['referral', 3]]
这听起来像是熊猫的例子,您的列表已经是正确的形状:
import pandas as pd
a = [['user1', 'referral'], ['user2', 'referral'], ['user1', 'referral'], ['user1', 'affiliate'], ['user7', 'affiliate'], ['user1', 'affiliate'], ['user9', 'affiliate'], ['user4', 'cpc'], ['user4', 'referral'], ['user2', 'referral'], ['user7', 'affiliate'], ['user14', 'cpc'], ['user3', 'orgainic'], ['user2', 'orgainic'], ['user4', 'cpc'], ['user2', 'cpc'], ['user8', 'cpc'], ['user2', 'orgainic']]
df = pd.DataFrame(a)
df.columns=["user", "type"]
unique_per_type = df.groupby("type")["user"].unique()
现在,每种类型的唯一\u为:
type
affiliate [user1, user7, user9]
cpc [user4, user14, user2, user8]
orgainic [user3, user2]
referral [user1, user2, user4]
Name: user, dtype: object
你可以做如下事情:
# access length by key
len(unique_per_type["affiliate"])
# or use it like a dict
for key, val in unique_per_type.items():
print(key, len(val)))
此解决方案添加了熊猫,这是一个巨大的依赖项。但一旦您将数据放入数据框中,您就可以用它做很多事情:
df["user"].unique() # shows all unique users
df.query("user=='user1'") # shows all observations involving user1
这听起来像是熊猫的例子,您的列表已经是正确的形状:
import pandas as pd
a = [['user1', 'referral'], ['user2', 'referral'], ['user1', 'referral'], ['user1', 'affiliate'], ['user7', 'affiliate'], ['user1', 'affiliate'], ['user9', 'affiliate'], ['user4', 'cpc'], ['user4', 'referral'], ['user2', 'referral'], ['user7', 'affiliate'], ['user14', 'cpc'], ['user3', 'orgainic'], ['user2', 'orgainic'], ['user4', 'cpc'], ['user2', 'cpc'], ['user8', 'cpc'], ['user2', 'orgainic']]
df = pd.DataFrame(a)
df.columns=["user", "type"]
unique_per_type = df.groupby("type")["user"].unique()
现在,每种类型的唯一\u为:
type
affiliate [user1, user7, user9]
cpc [user4, user14, user2, user8]
orgainic [user3, user2]
referral [user1, user2, user4]
Name: user, dtype: object
你可以做如下事情:
# access length by key
len(unique_per_type["affiliate"])
# or use it like a dict
for key, val in unique_per_type.items():
print(key, len(val)))
此解决方案添加了熊猫,这是一个巨大的依赖项。但一旦您将数据放入数据框中,您就可以用它做很多事情:
df["user"].unique() # shows all unique users
df.query("user=='user1'") # shows all observations involving user1
首先,让我们使条目具有唯一性:
c = {tuple(sublist) for sublist in a}
现在我们有了唯一的用户和类型对
对于计数,我们不需要用户,因此让我们将其列为一个仅包含第二个参数的列表:
c = [elem[1] for elem in c]
现在我们可以很容易地计算:
from collections import Counter
c = Counter(c)
结果:计数器{'cpc':4,'附属机构':3,'转诊':3,'组织机构':2}
现在,让我们把这一切放在一起:
from collections import Counter
c = Counter(elem[1] for elem in {tuple(sublist) for sublist in a})
首先,让我们使条目具有唯一性:
c = {tuple(sublist) for sublist in a}
现在我们有了唯一的用户和类型对
对于计数,我们不需要用户,因此让我们将其列为一个仅包含第二个参数的列表:
c = [elem[1] for elem in c]
现在我们可以很容易地计算:
from collections import Counter
c = Counter(c)
结果:计数器{'cpc':4,'附属机构':3,'转诊':3,'组织机构':2}
现在,让我们把这一切放在一起:
from collections import Counter
c = Counter(elem[1] for elem in {tuple(sublist) for sublist in a})
如何将其转换为所需输出的嵌套列表。@SmackAlpha,在代码示例中添加了最后一行以转换为列表列表。如何将其转换为所需输出的嵌套列表。@SmackAlpha,在代码示例中添加了最后一行以转换为列表列表。非常简单的解决方案可以是listc.items。非常简单的解决方案可以是listc.items。