Python 跨多列计算每个唯一行的字符串出现次数

Python 跨多列计算每个唯一行的字符串出现次数,python,pandas,Python,Pandas,我想计算多列中某些字符串的出现次数,并在新列中返回总计数 所以我知道我可以使用value_counts来计算给定列中值的总出现次数: data['col'].value_counts(dropna=False) 结果: [["win" TKO technical knockout] 336 [["win" UD unanimous decision] 307 [["win" KO knockout] 225 [["loss" UD unanimo

我想计算多列中某些字符串的出现次数,并在新列中返回总计数

所以我知道我可以使用value_counts来计算给定列中值的总出现次数:

data['col'].value_counts(dropna=False)
结果:

[["win" TKO technical knockout]     336
[["win" UD unanimous decision]      307
[["win" KO knockout]                225
[["loss" UD unanimous decision]      97
[["loss" TKO technical knockout]     64
[["win" nan null]                    53
[["draw" MD majority decision]       43
[["loss" KO knockout]                41
[["loss" MD majority decision]       35
[["loss" nan null]                   32
[["loss" SD split decision]          29
[["unknown" nan null]                29
[["win" SD split decision]           27
[["draw" PTS null]                   18
[["win" RTD corner retirement]       17
[["draw" SD split decision]          12
[["loss" RTD corner retirement]      11
[["win" MD majority decision]         9
[["loss" DQ disqualification]         6
[["win" PTS null]                     6
[["unknown" NC null]                  3
问题是,例如,我想计算[[“win”KO knockout]在每个相关列中的出现次数(相关列为col1到col20)

以下是我的数据示例:

{'col1': {0: ['["win" UD unanimous decision'],
  1: ['["win" UD unanimous decision'],
  2: ['["win" TKO technical knockout'],
  3: ['["win" UD unanimous decision'],
  4: ['["win" UD unanimous decision']},
 'col2': {0: ['["win" TKO technical knockout'],
  1: ['["win" TKO technical knockout'],
  2: ['["win" TKO technical knockout'],
  3: ['["win" UD unanimous decision'],
  4: ['["win" UD unanimous decision']},
 'col3': {0: ['["win" TKO technical knockout'],
  1: ['["win" KO knockout'],
  2: ['["win" TKO technical knockout'],
  3: ['["win" TKO technical knockout'],
  4: ['["win" UD unanimous decision']},
 'col4': {0: ['["win" UD unanimous decision'],
  1: ['["win" UD unanimous decision'],
  2: ['["win" KO knockout'],
  3: ['["win" TKO technical knockout'],
  4: ['["win" UD unanimous decision']}}
在这种情况下,所需的输出为:

      win UD   win TKO   win KO 
0       2         2         0
1       2         1         1
2       0         3         1
3       2         2         0
4       4         0         0
更新:

我还尝试使用大小和groupby:

#list of column names
col_outcome = ['col'+str(i) for i in range(1,11)]
data.groupby(col_outcome).size()
但是,这将返回以下错误消息:

TypeError:不可损坏的类型:“列表”

IIUC,让我们使用
堆栈将“宽”数据帧重塑为“长”,然后进行一点数据字符串清理,然后使用正则表达式
提取
替换
,接下来
分组依据
应用
值\u计数
,最后使用
取消堆栈
来重塑结果:

df.stack().str[0].str.replace('\[|\"','')\
  .str.extract('(\w+\s\w+)')\
  .groupby(level=0)[0].apply(pd.Series.value_counts).unstack(fill_value=0)
输出:

   win KO  win TKO  win UD
0       0        2       2
1       1        1       2
2       1        3       0
3       0        2       2
4       0        0       4

你能给我们一个数据和代码的样本,这样我们就可以通过复制/粘贴的方式运行它吗?@PrinceFrancis我在我的问题中添加了一个数据样本作为字典-仅限于4列你要求计算[[“win”KO knockout]的出现次数。但你的预期结果是另一回事。我很困惑,这就是为什么我问了一个简单的例子
df.stack().value\u counts()
?或
df.melt(value\u name='vals')['vals'].。value\u counts()
@princefrances这只是一个让问题更清楚的例子,如果引起混淆,它将被删除