Python 使用Pandas沿文本文件行进行描述性统计_Python_Python 2.7_Pandas

Python 使用Pandas沿文本文件行进行描述性统计

python python-2.7 pandas

Python 使用Pandas沿文本文件行进行描述性统计,python,python-2.7,pandas,Python,Python 2.7,Pandas,我正在用Python中的Pandas读取一个文本文件。我正在使用Python 2.7。这个问题中使用的数据集与我以前问过的一个问题有关。具体来说，我的数据的前两行和第一列由文本信息组成。以下是我的数据集的截断版本的快照可以找到数据文件。我使用给出的有用答案加载数据集（df=pd.read\u csv（'dum.txt'，sep='\t'，header=[0,1]，index\u col=0））我希望获得数据帧的描述性统计信息，而不是列。我曾尝试使用df.descripe（），但它提供了列的描

我正在用Python中的Pandas读取一个文本文件。我正在使用Python 2.7。这个问题中使用的数据集与我以前问过的一个问题有关。具体来说，我的数据的前两行和第一列由文本信息组成。以下是我的数据集的截断版本的快照

可以找到数据文件。我使用给出的有用答案加载数据集（

df=pd.read\u csv（'dum.txt'，sep='\t'，header=[0,1]，index\u col=0）

）

我希望获得数据帧的描述性统计信息，而不是列。我曾尝试使用

df.descripe（）

，但它提供了列的描述性统计信息。我看了一下问题中给出的答案，但是当我使用该链接中建议的答案时，我得到了以下错误

TypeError: ('unbound method describe() must be called with DataFrame instance as first argument (got Series instance instead)', u'occurred at index foxq1')

如何使用Pandas获取我拥有的数据集的每一行中的数字条目的描述性统计信息？提前谢谢

下面是一些注释，包括我正在使用的实际代码和错误消息：

实际代码如下所示：

df = pd.read_csv('dum.txt',sep='\t', header=[0,1], index_col=0)
df.apply(pd.DataFrame.describe, axis=1)

错误消息：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-0d7a5fde0f42> in <module>()
----> 1 df.apply(pd.DataFrame.describe, axis=1)
      2 #df.apply(pd.DataFrame.describe, axis=1)

/Users/LG/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   4260                         f, axis,
   4261                         reduce=reduce,
-> 4262                         ignore_failures=ignore_failures)
   4263             else:
   4264                 return self._apply_broadcast(f, axis)

/Users/LG/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in _apply_standard(self, func, axis, ignore_failures, reduce)
   4356             try:
   4357                 for i, v in enumerate(series_gen):
-> 4358                     results[i] = func(v)
   4359                     keys.append(v.name)
   4360             except Exception as e:

TypeError: ('unbound method describe() must be called with DataFrame instance as first argument (got Series instance instead)', u'occurred at index object1')

---------------------------------------------------------------------------
TypeError回溯（最近一次调用上次）
在（）
---->1 df.apply（pd.DataFrame.descripe，axis=1）
2#df.apply（pd.DataFrame.descripe，axis=1）
/用户/LG/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc在应用中（self、func、axis、broadcast、raw、reduce、args、**kwds）
轴线4260楼，
4261减少=减少，
->4262忽略故障=忽略故障）
4263其他：
4264返回自应用广播（f轴）
/用户/LG/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc在应用标准中（self、func、axis、ignore、reduce）
4356请尝试：
4357用于枚举中的i、v（系列）：
->4358结果[i]=func（v）
4359键。追加（v.name）
4360例外情况除外，如e：
TypeError:（“必须使用DataFrame实例作为第一个参数调用unbound method Descripte（）（改为使用Get Series实例）”，u“发生在索引object1上”）

从您引用的代码中，您可以只使用此代码（换句话说，沿行应用描述）：

得到以下结果：

         count  mean       std  min  25%  50%  75%  max
object1    5.0   3.1  1.581139  1.1  2.1  3.1  4.1  5.1
object2    5.0   3.2  1.581139  1.2  2.2  3.2  4.2  5.2
object3    5.0   3.3  1.581139  1.3  2.3  3.3  4.3  5.3
object4    5.0   3.4  1.581139  1.4  2.4  3.4  4.4  5.4
object5    5.0   3.5  1.581139  1.5  2.5  3.5  4.5  5.5
object6    5.0   3.6  1.581139  1.6  2.6  3.6  4.6  5.6
object7    5.0   3.7  1.581139  1.7  2.7  3.7  4.7  5.7
object8    5.0   3.8  1.581139  1.8  2.8  3.8  4.8  5.8

您可以尝试使用numpy获取行的大部分统计信息：

df = pd.read_csv('dum.txt',sep='\t', header=[0,1], index_col=0)
print df 

Type      T1   T2   T3   T4   T5   T6   T7
Tag     Tag1 Tag1 Tag1 Tag5 Tag5 Tag6 Tag6
object1  1.1  2.1  3.1  4.1  5.1  6.1  7.1
object2  1.2  2.2  3.2  4.2  5.2  6.2  7.2
object3  1.3  2.3  3.3  4.3  5.3  6.3  7.3
object4  1.4  2.4  3.4  4.4  5.4  6.4  7.4
object5  1.5  2.5  3.5  4.5  5.5  6.5  7.5
object6  1.6  2.6  3.6  4.6  5.6  6.6  7.6
object7  1.7  2.7  3.7  4.7  5.7  6.7  7.7
object8  1.8  2.8  3.8  4.8  5.8  6.8  7.8

data = df.values
data_mean = np.mean(data, axis=1)
data_std = np.std(data, axis=1)
data_min = np.min(data, axis=1)
data_max = np.max(data, axis=1)

print data_mean 

[ 4.1  4.2  4.3  4.4  4.5  4.6  4.7  4.8]

print data_std

[ 2.  2.  2.  2.  2.  2.  2.  2.]

print data_min

[ 1.1  1.2  1.3  1.4  1.5  1.6  1.7  1.8]

print data_max

[ 7.1  7.2  7.3  7.4  7.5  7.6  7.7  7.8]

请包括导致问题的实际代码和完整的错误消息。@DYZ:我现在包括了代码和完整的错误消息。我希望它有帮助。@DYZ：我正在使用Python 2.7。我想知道这是否是我出错的原因。当然，您包含的代码（两次！）不是实际的代码，因为它没有导致错误的行。@DYZ：对不起，现在更正了。当我使用您描述的方法时，我遇到以下错误：

TypeError:（'unbound method descripe（）必须使用DataFrame实例作为第一个参数调用（改为使用Series实例）“，u”出现在索引object1’

Joe:您使用的是Python 2.7吗？哦，对不起，我使用的是Python 3.6。这可能就是问题所在。乔：我真的很感谢你的帮助，我也很感激你我正在使用的pandas版本是这样的：`u'0.22.0'`我在几个月前从0.21.x进行了更新，因为命令工作不正常。我现在使用的是0.23.4。您可以尝试更新pandas，看看这是否解决了问题。

df = pd.read_csv('dum.txt',sep='\t', header=[0,1], index_col=0)
print df 

Type      T1   T2   T3   T4   T5   T6   T7
Tag     Tag1 Tag1 Tag1 Tag5 Tag5 Tag6 Tag6
object1  1.1  2.1  3.1  4.1  5.1  6.1  7.1
object2  1.2  2.2  3.2  4.2  5.2  6.2  7.2
object3  1.3  2.3  3.3  4.3  5.3  6.3  7.3
object4  1.4  2.4  3.4  4.4  5.4  6.4  7.4
object5  1.5  2.5  3.5  4.5  5.5  6.5  7.5
object6  1.6  2.6  3.6  4.6  5.6  6.6  7.6
object7  1.7  2.7  3.7  4.7  5.7  6.7  7.7
object8  1.8  2.8  3.8  4.8  5.8  6.8  7.8

data = df.values
data_mean = np.mean(data, axis=1)
data_std = np.std(data, axis=1)
data_min = np.min(data, axis=1)
data_max = np.max(data, axis=1)

print data_mean 

[ 4.1  4.2  4.3  4.4  4.5  4.6  4.7  4.8]

print data_std

[ 2.  2.  2.  2.  2.  2.  2.  2.]

print data_min

[ 1.1  1.2  1.3  1.4  1.5  1.6  1.7  1.8]

print data_max

[ 7.1  7.2  7.3  7.4  7.5  7.6  7.7  7.8]