Python 将具有元组项的数据帧解压到单独的数据帧中_Python_Pandas_Iterable Unpacking

Python 将具有元组项的数据帧解压到单独的数据帧中

python pandas

Python 将具有元组项的数据帧解压到单独的数据帧中,python,pandas,iterable-unpacking,Python,Pandas,Iterable Unpacking,我写了一个小类来计算一些无需替换的统计数据。对于那些不熟悉此技术的人，您可以获得一些数据的n随机子样本，计算每个子样本上所需的统计数据（比如中值），然后比较子样本之间的值。这允许您获得数据集上获得的中值的方差度量我在一个类中实现了这一点，但将其简化为以下函数给出的MWE import numpy as np import pandas as pd def bootstrap_median(df, n=5000, fraction=0.1): if isinstance(df, pd.

我写了一个小类来计算一些无需替换的统计数据。对于那些不熟悉此技术的人，您可以获得一些数据的

随机子样本，计算每个子样本上所需的统计数据（比如中值），然后比较子样本之间的值。这允许您获得数据集上获得的中值的方差度量

我在一个类中实现了这一点，但将其简化为以下函数给出的MWE

import numpy as np
import pandas as pd

def bootstrap_median(df, n=5000, fraction=0.1):
    if isinstance(df, pd.DataFrame):
        columns = df.columns
    else:
        columns = None
    # Get the values as a ndarray
    arr = np.array(df.values)

    # Get the bootstrap sample through random permutations
    sample_len = int(len(arr)*fraction)
    if sample_len<1:
        sample_len = 1
    sample = []
    for n_sample in range(n):
        sample.append(arr[np.random.permutation(len(arr))[:sample_len]])
    sample = np.array(sample)

    # Compute the median on each sample
    temp = np.median(sample, axis=1)

    # Get the mean and std of the estimate across samples
    m = np.mean(temp, axis=0)
    s = np.std(temp, axis=0)/np.sqrt(len(sample))

    # Convert output to DataFrames if necesary and return
    if columns:
        m = pd.DataFrame(data=m[None, ...], columns=columns)
        s = pd.DataFrame(data=s[None, ...], columns=columns)
    return m, s

这张照片

    data  group
0      0      1
1      1      1
2      2      1
3      3      1
4      4      1
5      5      1
6      6      1
7      7      1
8      8      1
9      9      1
10    10      2
11    11      2
12    12      2
13    13      2
14    14      2
15    15      2
16    16      2
17    17      2
18    18      2
19    19      2

(9.5161999999999995, 0.056585753613431718)

到目前为止还不错，因为

bootstrap\u median

返回两个元素的

tuple

。但是，如果我在

groupby

In: df.groupby('group')['data'].apply(bootstrap_median)

Out:
group
1     (4.5356, 0.0409710449952)
2    (14.5006, 0.0403772204095)

每个单元格内的值都是

元组

s，正如人们从

apply

中所期望的那样。通过迭代如下元素，我可以将结果解压为两个数据帧：

index = []
data1 = []
data2 = []
for g, (m, s) in out.iteritems():
    index.append(g)
    data1.append(m)
    data2.append(s)
dfm = pd.DataFrame(data=data1, index=index, columns=['E[median]'])
dfm.index.name = 'group'
dfs = pd.DataFrame(data=data2, index=index, columns=['std[median]'])
dfs.index.name = 'group'

因此

这有点麻烦，我的问题是，是否有一种更为自然的方法来“解包”一个数据帧，其值是元组，并将其放入单独的数据帧中

似乎相关，但它涉及字符串正则表达式替换，而不是解压真正的元组。

我认为您需要更改：

return m, s

致：

然后得到：

df1 = df.groupby('group')['data'].apply(bootstrap_median)
print (df1)
group   
1      m     4.480400
       s     0.040542
2      m    14.565200
       s     0.040373
Name: data, dtype: float64

因此，可以通过以下方式进行选择：

此外，如果需要一列数据框，请添加：

很好的建议！通过返回一个

pandas.Series

我只是避开了必须解包元组的问题，并留在

pandas

框架内。我会再等一会儿，看看是否有其他关于解包元组的答案，如果没有，我会接受你的答案。

return m, s

return pd.Series([m, s], index=['m','s'])

df1 = df.groupby('group')['data'].apply(bootstrap_median)
print (df1)
group   
1      m     4.480400
       s     0.040542
2      m    14.565200
       s     0.040373
Name: data, dtype: float64

print (df1.xs('s', level=1))
group
1    0.040542
2    0.040373
Name: data, dtype: float64

print (df1.xs('m', level=1))
group
1     4.4804
2    14.5652
Name: data, dtype: float64

df1 = df.groupby('group')['data'].apply(bootstrap_median).to_frame()
print (df1)
              data
group             
1     m   4.476800
      s   0.041100
2     m  14.468400
      s   0.040719

print (df1.xs('s', level=1))
           data
group          
1      0.041100
2      0.040719

print (df1.xs('m', level=1))
          data
group         
1       4.4768
2      14.4684