Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/svg/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Pandas groupby查找公共字符串_Python_Pandas_Pandas Groupby - Fatal编程技术网

Python Pandas groupby查找公共字符串

Python Pandas groupby查找公共字符串,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我的数据帧: Name fav_fruit 0 justin apple 1 bieber justin apple 2 Kris Justin bieber apple 3 Kim Lee orange 4 lee kim orange 5 mary barnet orange 6 tom hawkins pear

我的数据帧:

    Name              fav_fruit
0   justin              apple
1   bieber justin       apple
2   Kris Justin bieber  apple
3   Kim Lee             orange
4   lee kim             orange
5   mary barnet         orange
6   tom hawkins         pears
7   Sr Tom Hawkins      pears
8   Jose Hawkins        pears
9   Shanita             pineapple
10  Joe                 pineapple

df1=pd.DataFrame({'Name':['justin','bieber justin','Kris Justin bieber','Kim Lee','lee kim','mary barnet','tom hawkins','Sr Tom Hawkins','Jose Hawkins','Shanita','Joe'],
'fav_fruit':['apple'
,'apple'
,'apple'
,'orange'
,'orange'
,'orange'
,'pears'
,'pears','pears'
,'pineapple','pineapple']})
我想在fav_fruit列上grouby之后计算Name列中常用词的数量,因此苹果的计数是2贾斯汀·比伯,橙色的kim,lee和菠萝的计数是0

预期产出:

Name                  fav_fruit            count
0   justin              apple                2
1   bieber justin       apple                2
2   Kris Justin bieber  apple                2
3   Kim Lee             orange               2
4   lee kim             orange               2
5   mary barnet         orange               2
6   tom hawkins         pears                2
7   Sr Tom Hawkins      pears                2
8   Jose Hawkins        pears                2
9   Shanita             pineapple            0
10  Joe                 pineapple            0
我认为需要使用自定义函数-首先创建一大串连接值,转换为小写和拆分,最后使用过滤所有重复值:

from collections import Counter

def f(x):
    a = ' '.join(x).lower().split()
    return len([k for k, v in Counter(a).items() if v != 1])

df['count'] = df.groupby('fav_fruit')['Name'].transform(f)
print (df)
                  Name  fav_fruit  count
0               justin      apple      2
1        bieber justin      apple      2
2   Kris Justin bieber      apple      2
3              Kim Lee     orange      2
4              lee kim     orange      2
5          mary barnet     orange      2
6          tom hawkins      pears      2
7       Sr Tom Hawkins      pears      2
8         Jose Hawkins      pears      2
9              Shanita  pineapple      0
10                 Joe  pineapple      0

使用集合尝试了类似的操作,但如果word在所有行中都不常见,则该操作无效。我只是在评估你的解决方案。将让您知道它是否适用于整个数据集。