Python 数据帧矢量化采样_Python_Pandas

Python 数据帧矢量化采样

python pandas

Python 数据帧矢量化采样,python,pandas,Python,Pandas,我有一个简单的df形成一个pivot_表： d = {'one' : ['A', 'B', 'B', 'C', 'C', 'C'], 'two' : [6., 5., 4., 3., 2., 1.], 'three' : [6., 5., 4., 3., 2., 1.], 'four' : [6., 5., 4., 3., 2., 1.]} df = pd.DataFrame(d) pivot = pd.pivot_table(df,index=['one','t

我有一个简单的df形成一个pivot_表：

    d = {'one' : ['A', 'B', 'B', 'C', 'C', 'C'], 'two' : [6., 5., 4., 3., 2., 1.],     'three' : [6., 5., 4., 3., 2., 1.], 'four' : [6., 5., 4., 3., 2., 1.]}
    df = pd.DataFrame(d)
    pivot = pd.pivot_table(df,index=['one','two'])

我想从结果pivot对象的“one”列中的每个不同元素中随机抽取一行。（在本例中，“A”将始终被采样，而“B”和“C”有更多选项。）我刚刚开始使用pandas的0.18.0版本，并且了解该方法。我搞砸了.groupby方法，应用了如下采样函数：

    grouped = pivot.groupby('one').apply(lambda x: x.sample(n=1, replace=False))

当我尝试这个主题的变化时，我提出了一个关键错误，所以我认为是时候对这个看似简单的问题进行一些新的思考了

谢谢你的帮助

由于“one”不是pivot中的列，而是索引的名称，因此会引发KeyError：

In [11]: pivot
Out[11]:
         four  three
one two
A   6.0   6.0    6.0
B   4.0   4.0    4.0
    5.0   5.0    5.0
C   1.0   1.0    1.0
    2.0   2.0    2.0
    3.0   3.0    3.0

必须使用级别参数：

In [12]: pivot.groupby(level='one').apply(lambda x: x.sample(n=1, replace=False))
Out[12]:
             four  three
one one two
A   A   6.0   6.0    6.0
B   B   4.0   4.0    4.0
C   C   1.0   1.0    1.0

这不太正确，因为索引是重复的！使用

会稍微好一些，因为_index=False

：

In [13]: pivot.groupby(level='one', as_index=False).apply(lambda x: x.sample(n=1))
Out[13]:
           four  three
  one two
0 A   6.0   6.0    6.0
1 B   4.0   4.0    4.0
2 C   2.0   2.0    2.0

注意：每次都会随机选取一行

作为替代方案，一种潜在的性能更高的变体（拉出子帧：

In [21]: df.iloc[[np.random.choice(x) for x in g.indices.values()]]
Out[21]:
   four one  three  two
1   5.0   B    5.0  5.0
3   3.0   C    3.0  3.0
0   6.0   A    6.0  6.0

令人印象深刻的海登先生。令人印象深刻。：）