Pandas 如何从数据透视表中进行加权字数计算

Pandas 如何从数据透视表中进行加权字数计算,pandas,dataframe,Pandas,Dataframe,这是我的透视表 No Keyword Count 1 Sell Laptop Online 10 2 Buy Computer Online 8 3 Laptop and Case 5 这是我想要的 No Word Count 1 Online 18 2 Laptop 15 3 Sell 10 4 Buy 8 5 Computer 8 6 and

这是我的透视表

No  Keyword              Count
1   Sell Laptop Online   10
2   Buy Computer Online  8
3   Laptop and Case      5
这是我想要的

No   Word      Count
1    Online    18
2    Laptop    15
3    Sell      10
4    Buy        8
5    Computer   8
6    and        5
7    Case       5 
我所做的是

df['Word'].apply(lambda x: x.str.split(expand=True).stack()).stack().value_counts()
但结果是

No   Word      Count
1    Online    2
2    Laptop    2
3    Sell      1
4    Buy       1
5    Computer  1
6    and       1
7    Case      1 
我想从数据透视表中加权字数

使用:

No  Keyword              Count
1   Sell Laptop Online   10
2   Buy Computer Online  8
3   Laptop and Case      5
df1 = (df.set_index('Count')['Keyword']
         .str.split(expand=True)
         .stack()
         .reset_index(name='Word')
         .groupby('Word')['Count']
         .sum()
         .sort_values(ascending=False)
         .reset_index())
说明

  • Count
    设置为索引,以防止丢失此信息
  • 创建
    DataFrame
    by
  • 重塑
  • 通过将
    多索引
    转换为列
  • 聚合<代码>总和
  • 排序
    系列
  • 最后
  • 另一种解决方案-如果数据帧更大,则速度更快:

    from itertools import chain
    
    s = df['Keyword'].str.split()
    
    df = pd.DataFrame({
        'Word' : list(chain.from_iterable(s.values.tolist())), 
        'Count' : df['Count'].repeat(s.str.len())
    })
    
    print (df)
           Word  Count
    0      Sell     10
    0    Laptop     10
    0    Online     10
    1       Buy      8
    1  Computer      8
    1    Online      8
    2    Laptop      5
    2       and      5
    2      Case      5
    
    df1 = df.groupby('Word')['Count'].sum().sort_values(ascending=False).reset_index()
    print (df1)
           Word  Count
    0    Online     18
    1    Laptop     15
    2      Sell     10
    3  Computer      8
    4       Buy      8
    5       and      5
    6      Case      5
    
    说明

  • 首先重复
    Count
    值,将
    关键字的拆分值计数到新的数据帧
  • 聚合
    sum
    ,排序序列和最后一次
    reset\u索引
  • 解决方案包括:






    这里有一个简单的方法,只需一个热编码

    df['Keyword'].str.get_dummies(sep=' ').mul(df['Count'],axis=0).sum(0).to_frame('Count')
    
              Count
    Buy           8
    Case          5
    Computer      8
    Laptop       15
    Online       18
    Sell         10
    and           5
    
    如果速度提高,请尝试scikit的多标签二进制软件。i、 e

    from sklearn.preprocessing import MultiLabelBinarizer
    vec = MultiLabelBinarizer()
    
    oh = (vec.fit_transform(df['Keyword'].str.split()) * df['Count'].values[:,None]).sum(0)
    pd.DataFrame({'Count': oh ,'Word':vec.classes_})
    
    说明

    Get dummies将生成热编码数据帧

        Buy  Case  Computer  Laptop  Online  Sell  and
     0    0     0         0       1       1     1    0
     1    1     0         1       0       1     0    0
     2    0     1         0       1       0     0    1
    
    与各列的计数相乘

       Buy  Case  Computer  Laptop  Online  Sell  and
    0    0     0         0      10      10    10    0
    1    8     0         8       0       8     0    0
    2    0     5         0       5       0     0    5
    
    求和并转换为数据帧

    Buy          8
    Case         5
    Computer     8
    Laptop      15
    Online      18
    Sell        10
    and          5
    dtype: int64
    
    df['Keyword'].str.get_dummies(sep=' ').mul(df['Count'],axis=0).sum(0).to_frame('Count')
    
              Count
    Buy           8
    Case          5
    Computer      8
    Laptop       15
    Online       18
    Sell         10
    and           5
    
    from sklearn.preprocessing import MultiLabelBinarizer
    vec = MultiLabelBinarizer()
    
    oh = (vec.fit_transform(df['Keyword'].str.split()) * df['Count'].values[:,None]).sum(0)
    pd.DataFrame({'Count': oh ,'Word':vec.classes_})
    
        Buy  Case  Computer  Laptop  Online  Sell  and
     0    0     0         0       1       1     1    0
     1    1     0         1       0       1     0    0
     2    0     1         0       1       0     0    1
    
       Buy  Case  Computer  Laptop  Online  Sell  and
    0    0     0         0      10      10    10    0
    1    8     0         8       0       8     0    0
    2    0     5         0       5       0     0    5
    
    Buy          8
    Case         5
    Computer     8
    Laptop      15
    Online      18
    Sell        10
    and          5
    dtype: int64