Python 计算熊猫中多列问题的likert量表结果数_Python_Pandas_Numpy_Group By_Pandas Groupby

Python 计算熊猫中多列问题的likert量表结果数

python pandas numpy

Python 计算熊猫中多列问题的likert量表结果数,python,pandas,numpy,group-by,pandas-groupby,Python,Pandas,Numpy,Group By,Pandas Groupby,我有以下数据帧： Question1 Question2 Question3 Question4 User1 Agree Agree Disagree Strongly Disagree User2 Disagree Agree Agree Disagree User3 Agree Agree

我有以下数据帧：

       Question1        Question2         Question3          Question4
User1  Agree            Agree          Disagree         Strongly Disagree
User2  Disagree         Agree          Agree            Disagree
User3  Agree            Agree          Agree            Agree

有没有办法将上面列出的数据帧转换为以下格式

              Agree         Disagree         Strongly Disagree
 Question1    2               1                  0

 Question2    2               1                  0

 Question3    2               1                  0
 Question4    1               1                  1

这与我先前的问题类似：

我试着用stack/pivot查看前面的问题，但无法理解。实际的数据框有20多个问题和一个likert量表，分别是“强烈同意”、“同意”、“中立”、“不同意”、“强烈不同意”。

您可以使用

pd.Series.value\u计数对列进行迭代。如果使用“应用”执行此操作，索引将自动对齐：
df.apply(pd.Series.value_counts)
Out: 
                   Question1  Question2  Question3  Question4
Agree                    2.0        3.0        2.0          1
Disagree                 1.0        NaN        1.0          1
Strongly Disagree        NaN        NaN        NaN          1

一点后处理：
df.apply(pd.Series.value_counts).fillna(0).astype('int')
Out: 
                   Question1  Question2  Question3  Question4
Agree                      2          3          2          1
Disagree                   1          0          1          1
Strongly Disagree          0          0          0          1

使用pd.get\u假人

pd.get_dummies(df.stack()).groupby(level=1).sum()

           Agree  Disagree  Strongly Disagree
Question1      2         1                  0
Question2      3         0                  0
Question3      2         1                  0
Question4      1         1                  1


将其提升到另一个层次

我们可以使用numpy.bincount
来加快速度。但我们必须注意尺寸

另一个numpy
选项
v = df.values
n, m = v.shape
f, u = pd.factorize(v.ravel())

pd.DataFrame(
    np.eye(u.size, dtype=int)[f].reshape(n, m, -1).sum(0),
    df.columns, u
)

           Agree  Disagree  Strongly Disagree
Question1      2         1                  0
Question2      3         0                  0
Question3      2         1                  0
Question4      1         1                  1

%%timeit
v = df.values
f, u = pd.factorize(v.ravel())
n, m = u.size, v.shape[1]
r = np.tile(np.arange(m), n)
b0 = np.bincount(r * n + f)
pad = np.zeros(n * m - b0.size, dtype=int)
b = np.append(b0, pad)

pd.DataFrame(b.reshape(m, n), df.columns, u)
1000 loops, best of 3: 194 µs per loop

%%timeit
v = df.values
n, m = v.shape
f, u = pd.factorize(v.ravel())

pd.DataFrame(
    np.eye(u.size, dtype=int)[f].reshape(n, m, -1).sum(0),
    df.columns, u
)
1000 loops, best of 3: 195 µs per loop

%timeit pd.get_dummies(df.stack()).groupby(level=1).sum()
1000 loops, best of 3: 1.2 ms per loop


速度差
v = df.values
n, m = v.shape
f, u = pd.factorize(v.ravel())

pd.DataFrame(
    np.eye(u.size, dtype=int)[f].reshape(n, m, -1).sum(0),
    df.columns, u
)

           Agree  Disagree  Strongly Disagree
Question1      2         1                  0
Question2      3         0                  0
Question3      2         1                  0
Question4      1         1                  1

%%timeit
v = df.values
f, u = pd.factorize(v.ravel())
n, m = u.size, v.shape[1]
r = np.tile(np.arange(m), n)
b0 = np.bincount(r * n + f)
pad = np.zeros(n * m - b0.size, dtype=int)
b = np.append(b0, pad)

pd.DataFrame(b.reshape(m, n), df.columns, u)
1000 loops, best of 3: 194 µs per loop

%%timeit
v = df.values
n, m = v.shape
f, u = pd.factorize(v.ravel())

pd.DataFrame(
    np.eye(u.size, dtype=int)[f].reshape(n, m, -1).sum(0),
    df.columns, u
)
1000 loops, best of 3: 195 µs per loop

%timeit pd.get_dummies(df.stack()).groupby(level=1).sum()
1000 loops, best of 3: 1.2 ms per loop

非常感谢。这非常有效，我最后一个问题中的“额外学分”部分帮助我对专栏进行排序。