Python 计算熊猫中多列问题的likert量表结果数
我有以下数据帧:Python 计算熊猫中多列问题的likert量表结果数,python,pandas,numpy,group-by,pandas-groupby,Python,Pandas,Numpy,Group By,Pandas Groupby,我有以下数据帧: Question1 Question2 Question3 Question4 User1 Agree Agree Disagree Strongly Disagree User2 Disagree Agree Agree Disagree User3 Agree Agree
Question1 Question2 Question3 Question4
User1 Agree Agree Disagree Strongly Disagree
User2 Disagree Agree Agree Disagree
User3 Agree Agree Agree Agree
有没有办法将上面列出的数据帧转换为以下格式
Agree Disagree Strongly Disagree
Question1 2 1 0
Question2 2 1 0
Question3 2 1 0
Question4 1 1 1
这与我先前的问题类似:
我试着用stack/pivot查看前面的问题,但无法理解。实际的数据框有20多个问题和一个likert量表,分别是“强烈同意”、“同意”、“中立”、“不同意”、“强烈不同意”。您可以使用
pd.Series.value\u计数对列进行迭代。如果使用“应用”执行此操作,索引将自动对齐:
df.apply(pd.Series.value_counts)
Out:
Question1 Question2 Question3 Question4
Agree 2.0 3.0 2.0 1
Disagree 1.0 NaN 1.0 1
Strongly Disagree NaN NaN NaN 1
一点后处理:
df.apply(pd.Series.value_counts).fillna(0).astype('int')
Out:
Question1 Question2 Question3 Question4
Agree 2 3 2 1
Disagree 1 0 1 1
Strongly Disagree 0 0 0 1
使用pd.get\u假人
pd.get_dummies(df.stack()).groupby(level=1).sum()
Agree Disagree Strongly Disagree
Question1 2 1 0
Question2 3 0 0
Question3 2 1 0
Question4 1 1 1
将其提升到另一个层次
我们可以使用numpy.bincount
来加快速度。但我们必须注意尺寸
另一个numpy
选项
v = df.values
n, m = v.shape
f, u = pd.factorize(v.ravel())
pd.DataFrame(
np.eye(u.size, dtype=int)[f].reshape(n, m, -1).sum(0),
df.columns, u
)
Agree Disagree Strongly Disagree
Question1 2 1 0
Question2 3 0 0
Question3 2 1 0
Question4 1 1 1
%%timeit
v = df.values
f, u = pd.factorize(v.ravel())
n, m = u.size, v.shape[1]
r = np.tile(np.arange(m), n)
b0 = np.bincount(r * n + f)
pad = np.zeros(n * m - b0.size, dtype=int)
b = np.append(b0, pad)
pd.DataFrame(b.reshape(m, n), df.columns, u)
1000 loops, best of 3: 194 µs per loop
%%timeit
v = df.values
n, m = v.shape
f, u = pd.factorize(v.ravel())
pd.DataFrame(
np.eye(u.size, dtype=int)[f].reshape(n, m, -1).sum(0),
df.columns, u
)
1000 loops, best of 3: 195 µs per loop
%timeit pd.get_dummies(df.stack()).groupby(level=1).sum()
1000 loops, best of 3: 1.2 ms per loop
速度差
v = df.values
n, m = v.shape
f, u = pd.factorize(v.ravel())
pd.DataFrame(
np.eye(u.size, dtype=int)[f].reshape(n, m, -1).sum(0),
df.columns, u
)
Agree Disagree Strongly Disagree
Question1 2 1 0
Question2 3 0 0
Question3 2 1 0
Question4 1 1 1
%%timeit
v = df.values
f, u = pd.factorize(v.ravel())
n, m = u.size, v.shape[1]
r = np.tile(np.arange(m), n)
b0 = np.bincount(r * n + f)
pad = np.zeros(n * m - b0.size, dtype=int)
b = np.append(b0, pad)
pd.DataFrame(b.reshape(m, n), df.columns, u)
1000 loops, best of 3: 194 µs per loop
%%timeit
v = df.values
n, m = v.shape
f, u = pd.factorize(v.ravel())
pd.DataFrame(
np.eye(u.size, dtype=int)[f].reshape(n, m, -1).sum(0),
df.columns, u
)
1000 loops, best of 3: 195 µs per loop
%timeit pd.get_dummies(df.stack()).groupby(level=1).sum()
1000 loops, best of 3: 1.2 ms per loop
非常感谢。这非常有效,我最后一个问题中的“额外学分”部分帮助我对专栏进行排序。