Python 3.x 如果第2列(表A)中存在第1列(表B),则计算第1列(表A)的唯一值

Python 3.x 如果第2列(表A)中存在第1列(表B),则计算第1列(表A)的唯一值,python-3.x,pandas,Python 3.x,Pandas,我有两个表,如下所示: df_1 title column_a a {"blabla","dog","cat"} a {"aaaa","apple","dog"} a {"abcde","apple","cat"} b {"qwert","dog","apple"} c {"bbbbb","dog"} 第二张桌子 df_2 category cat dog apple

我有两个表,如下所示:

df_1
title     column_a
    a     {"blabla","dog","cat"} 
    a     {"aaaa","apple","dog"}
    a     {"abcde","apple","cat"}
    b     {"qwert","dog","apple"}
    c     {"bbbbb","dog"}
第二张桌子

 df_2 
 category
      cat
      dog
    apple
我想在df_2上创建一个新列,根据df_1[“列a”]中存在的df_2[“类别”]计算df_1[“标题]的唯一值。我想要的结果如下所示:

 df_2 
 category  unique_count_of_title
      cat                      1
      dog                      3
    apple                      2

我尝试了一些研究,但大多数答案告诉我根据“column_a”进行分组,但在我的情况下,由于一行中存在多个值,因此无法进行分组。请提供任何帮助:)

首先将表示集合的字符串转换为集合:

import ast

df_1['column_a'] = df_1['column_a'].apply(ast.literal_eval)
与“将集合转换为列表”一起使用,对于计数,使用唯一值:

新列的最后一次使用:

df_2['unique_count_of_title'] = df_2['category'].map(s)
print (df_2)
  category  unique_count_of_title
0      cat                      1
1      dog                      3
2    apple                      2
另一种使用
defaultdict
和字典集长度的解决方案称为
d1

from collections import defaultdict

d = defaultdict(set)
for a, b in df_1[['title','column_a']].to_numpy():
    for val in b:
        d[val].add(a)

print (d)
defaultdict(<class 'set'>, {'dog': {'a', 'c', 'b'}, 'cat': {'a'}, 
                            'blabla': {'a'}, 'aaaa': {'a'}, 
                            'apple': {'a', 'b'}, 
                            'abcde': {'a'}, 'qwert': {'b'}, 
                            'bbbbb': {'c'}})

d1 = {k:len(v) for k, v in d.items()}
df_2['unique_count_of_title'] = df_2['category'].map(d1)
print (df_2)
  category  unique_count_of_title
0      cat                      1
1      dog                      3
2    apple                      2
从集合导入defaultdict
d=默认DICT(设置)
对于df_1[['title','column_a']]中的a,b。to_numpy():
对于b中的val:
d[val]。添加(a)
印刷品(d)
defaultdict(,{'dog':{'a','c','b'},'cat':{'a'},
"blabla":{'a'},"aaaa":{'a'},,
苹果:{a',b'},
"abcde":{'a'},"qwert":{'b'},,
'bbbbb':{'c'})
d1={k:len(v)表示k,v表示d.items()}
df_2[“标题的唯一计数”]=df_2[“类别”]。地图(d1)
打印(df_2)
类别唯一\u标题的\u计数\u
0第1类
1只狗3只
2苹果2
使用和:


什么是
打印(类型(df['column\u a'].iat[0])
?不确定这是否是问题所在?因为无法从下面的解决方案中获得预期的答案。可以使用
df\u 1['column\u a']=df\u 1['column\u a'])。应用(ast.literal\u eval)
在我的解决方案之前?答案被编辑。第一个解决方案像魔术一样工作!我能知道这个代码的用途吗:导入ast df_1['column_a']=df_1['column_a']]。应用(ast.literal_eval)@ChngFongChia-它将字符串转换为集合。
from collections import defaultdict

d = defaultdict(set)
for a, b in df_1[['title','column_a']].to_numpy():
    for val in b:
        d[val].add(a)

print (d)
defaultdict(<class 'set'>, {'dog': {'a', 'c', 'b'}, 'cat': {'a'}, 
                            'blabla': {'a'}, 'aaaa': {'a'}, 
                            'apple': {'a', 'b'}, 
                            'abcde': {'a'}, 'qwert': {'b'}, 
                            'bbbbb': {'c'}})

d1 = {k:len(v) for k, v in d.items()}
df_2['unique_count_of_title'] = df_2['category'].map(d1)
print (df_2)
  category  unique_count_of_title
0      cat                      1
1      dog                      3
2    apple                      2
# reproduce the input data
import pandas as pd
df_1_data = {'title': ['a','a','a','b','c'], 
             'column_a': [{"blabla","dog","cat"}, {"aaaa","apple","dog"}, {"abcde","apple","cat"}, {"qwert","dog","apple"}, {"bbbbb","dog"}]}
df_2_data = {'category': ['cat', 'dog', 'apple']}
df_1 = pd.DataFrame(df_1_data)
df_2 = pd.DataFrame(df_2_data)

# unpack the sets into columns
df_1 = df_1.set_index('title')['column_a'].apply(lambda x: pd.Series(list(x))).reset_index()

# pivot the columns into rows
df_1 = df_1.melt('title')

# merge both dataframes and count uniquely, grouping by the desired column
s = pd.merge(df_1, df_2, left_on = 'value', right_on = 'category').groupby('category')['title'].nunique()

# update the original dataframe with the unique counts
df_2['unique_count_of_title'] = s[df_2.category].values
    category    unique_count_of_title
0   cat         1
1   dog         3
2   apple       2