Python 在多列中按类别计算数据帧值_Python_Pandas

Python 在多列中按类别计算数据帧值

python pandas

Python 在多列中按类别计算数据帧值,python,pandas,Python,Pandas,我有一个这种类型的数据帧（只有15列）我想要的是组织它，使答案如下： electronic_used how_it_works what_it_says how_it_looks smartphone_right 1 1 1 computer_right 2 2 2

我有一个这种类型的数据帧（只有15列）

我想要的是组织它，使答案如下：

         electronic_used    how_it_works    what_it_says     how_it_looks
         smartphone_right          1          1                 1
         computer_right            2          2                 2
         smartphone_wrong          2          1                 1
         computer_wrong            1          3                 2

我不知道如何做到这一点，但基本上我想通过使用的电子表格来组织它，并计算每个类别有多少对错

任何帮助都将不胜感激

您可以在

crosstab

s=df.melt('electronic_used')
new=pd.crosstab(s['electronic_used']+s['value'],s['variable']).reset_index()
new
variable            row_0  how_it_looks  how_it_works  what_it_says
0           computerright             1             1             1
1           computerwrong             1             1             1
2         smartphoneright             1             1             1
3         smartphonewrong             1             1             1

熔化后使用透视表：


df1 = df.melt(id_vars = 'electronic_used')
df1.assign(electronic_used = df1.electronic_used + '_' + df1.value, value = 1)\
  .pivot_table(columns = 'variable',index = 'electronic_used', values = 'value', aggfunc = 'sum')\
  .reset_index()

结果:

variable   electronic_used  how_it_looks  how_it_works  what_it_says
0           computer_right             1             1             1
1           computer_wrong             1             1             1
2         smartphone_right             1             1             1
3         smartphone_wrong             1             1             1

请注意，列

的新值是如何通过连接列来更改的：
electronic\u used=df1.electronic\u used+'''''+df1.value
由于只有两个可能的值，“right”或“error”，因此可以使用groupby和sum进行两种布尔比较
data = """
 smartphone           right          wrong           wrong
 computer             wrong          wrong           wrong
 smartphone           right          right           wrong
 computer             right          right           right
"""
columns = ["electronic_used", "how_it_works", "what_it_says", "how_it_looks"]
df = pd.DataFrame(np.array(data.split()).reshape((4, 4)), columns=columns)
df = df.set_index('electronic_used')
right_counts = (df == "right").astype('int').groupby('electronic_used').sum()
wrong_counts = (df == "wrong").astype('int').groupby('electronic_used').sum()
print(right_counts)
print(wrong_counts)

输出：
                 how_it_works  what_it_says  how_it_looks
electronic_used                                          
computer                    1             1             1
smartphone                  2             1             0
                 how_it_works  what_it_says  how_it_looks
electronic_used                                          
computer                    1             1             1
smartphone                  0             1             2

让标题比“组织专栏”更具体可能是个好主意。例如，“在多个列中按类别计算数据帧值”如何？事实上，我认为这可能是这个问题的重复：当我这样做时，它确实以正确的方式排序，但所有列的计数都是0，这是不正确的。。。我不明白为什么它不起作用。我添加了完整的代码并更改了一些数据，这样你就可以看到它是如何工作的。太好了！起初我做得不对，但最终还是成功了，谢谢！当我尝试第二行时得到这个结果：TypeError:只能将str（不是“float”）连接到str。当我尝试第二行时得到这个结果：TypeError:只能将str（不是“float”）连接到strmabe。在数据集列中，值也是数字，因此会发生错误。当在您提供的数据上运行代码时，我这边没有错误。
                 how_it_works  what_it_says  how_it_looks
electronic_used                                          
computer                    1             1             1
smartphone                  2             1             0
                 how_it_works  what_it_says  how_it_looks
electronic_used                                          
computer                    1             1             1
smartphone                  0             1             2