Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/317.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 计算熊猫中多列问题的likert量表结果数_Python_Pandas_Numpy_Group By_Pandas Groupby - Fatal编程技术网

Python 计算熊猫中多列问题的likert量表结果数

Python 计算熊猫中多列问题的likert量表结果数,python,pandas,numpy,group-by,pandas-groupby,Python,Pandas,Numpy,Group By,Pandas Groupby,我有以下数据帧: Question1 Question2 Question3 Question4 User1 Agree Agree Disagree Strongly Disagree User2 Disagree Agree Agree Disagree User3 Agree Agree

我有以下数据帧:

       Question1        Question2         Question3          Question4
User1  Agree            Agree          Disagree         Strongly Disagree
User2  Disagree         Agree          Agree            Disagree
User3  Agree            Agree          Agree            Agree
有没有办法将上面列出的数据帧转换为以下格式

              Agree         Disagree         Strongly Disagree
 Question1    2               1                  0

 Question2    2               1                  0

 Question3    2               1                  0
 Question4    1               1                  1
这与我先前的问题类似:


我试着用stack/pivot查看前面的问题,但无法理解。实际的数据框有20多个问题和一个likert量表,分别是“强烈同意”、“同意”、“中立”、“不同意”、“强烈不同意”。

您可以使用
pd.Series.value\u计数对列进行迭代。如果使用“应用”执行此操作,索引将自动对齐:

df.apply(pd.Series.value_counts)
Out: 
                   Question1  Question2  Question3  Question4
Agree                    2.0        3.0        2.0          1
Disagree                 1.0        NaN        1.0          1
Strongly Disagree        NaN        NaN        NaN          1
一点后处理:

df.apply(pd.Series.value_counts).fillna(0).astype('int')
Out: 
                   Question1  Question2  Question3  Question4
Agree                      2          3          2          1
Disagree                   1          0          1          1
Strongly Disagree          0          0          0          1

使用
pd.get\u假人

pd.get_dummies(df.stack()).groupby(level=1).sum()

           Agree  Disagree  Strongly Disagree
Question1      2         1                  0
Question2      3         0                  0
Question3      2         1                  0
Question4      1         1                  1

将其提升到另一个层次
我们可以使用
numpy.bincount
来加快速度。但我们必须注意尺寸


另一个
numpy
选项

v = df.values
n, m = v.shape
f, u = pd.factorize(v.ravel())

pd.DataFrame(
    np.eye(u.size, dtype=int)[f].reshape(n, m, -1).sum(0),
    df.columns, u
)

           Agree  Disagree  Strongly Disagree
Question1      2         1                  0
Question2      3         0                  0
Question3      2         1                  0
Question4      1         1                  1
%%timeit
v = df.values
f, u = pd.factorize(v.ravel())
n, m = u.size, v.shape[1]
r = np.tile(np.arange(m), n)
b0 = np.bincount(r * n + f)
pad = np.zeros(n * m - b0.size, dtype=int)
b = np.append(b0, pad)
​
pd.DataFrame(b.reshape(m, n), df.columns, u)
1000 loops, best of 3: 194 µs per loop

%%timeit
v = df.values
n, m = v.shape
f, u = pd.factorize(v.ravel())

pd.DataFrame(
    np.eye(u.size, dtype=int)[f].reshape(n, m, -1).sum(0),
    df.columns, u
)
1000 loops, best of 3: 195 µs per loop

%timeit pd.get_dummies(df.stack()).groupby(level=1).sum()
1000 loops, best of 3: 1.2 ms per loop

速度差

v = df.values
n, m = v.shape
f, u = pd.factorize(v.ravel())

pd.DataFrame(
    np.eye(u.size, dtype=int)[f].reshape(n, m, -1).sum(0),
    df.columns, u
)

           Agree  Disagree  Strongly Disagree
Question1      2         1                  0
Question2      3         0                  0
Question3      2         1                  0
Question4      1         1                  1
%%timeit
v = df.values
f, u = pd.factorize(v.ravel())
n, m = u.size, v.shape[1]
r = np.tile(np.arange(m), n)
b0 = np.bincount(r * n + f)
pad = np.zeros(n * m - b0.size, dtype=int)
b = np.append(b0, pad)
​
pd.DataFrame(b.reshape(m, n), df.columns, u)
1000 loops, best of 3: 194 µs per loop

%%timeit
v = df.values
n, m = v.shape
f, u = pd.factorize(v.ravel())

pd.DataFrame(
    np.eye(u.size, dtype=int)[f].reshape(n, m, -1).sum(0),
    df.columns, u
)
1000 loops, best of 3: 195 µs per loop

%timeit pd.get_dummies(df.stack()).groupby(level=1).sum()
1000 loops, best of 3: 1.2 ms per loop

非常感谢。这非常有效,我最后一个问题中的“额外学分”部分帮助我对专栏进行排序。