如何使用Python重复某个命令（引导重采样）_Python_Dataframe_Resampling_Statistics Bootstrap

如何使用Python重复某个命令（引导重采样）

python dataframe

如何使用Python重复某个命令（引导重采样）,python,dataframe,resampling,statistics-bootstrap,Python,Dataframe,Resampling,Statistics Bootstrap,我有一个数据帧（长度为4个数据点），想做X次引导数据帧示例： Index A B 0 1 2 1 1 2 2 1 2 3 1 2 我找到了引导重采样的代码 boot = resample(df, replace=True, n_samples=len(df), random_state=1) pr

我有一个数据帧（长度为4个数据点），想做X次引导

数据帧示例：

我找到了引导重采样的代码

      boot = resample(df, replace=True, n_samples=len(df), random_state=1)
      print('Bootstrap Sample: %s' % boot)

但现在我想重复这个X次。我该怎么做

x=20的输出

  Sample Nr.    Index A B
      1         0   1 2
                1   1 2
                2   1 2
                3   1 2 
     ...
      20        0   1 2
                1   1 2
                1   1 2
                2   1 2

谢谢你们

最佳方法1：并行采样数据

因为调用<代码> N< /代码>时间是一个数据文件的示例方法，可以考虑应用<代码>示例< /C>方法并行。

import multiprocessing
from itertools import repeat

def sample_data(df, replace, random_state):
    '''Generate one sample of size len(df)'''
    return df.sample(replace=replace, n=len(df), random_state=random_state)

def resample_data(df, replace, n_samples, random_state):
    '''Call n_samples time the sample method parallely'''
    
    # Invoke lambda in parallel
    pool = multiprocessing.Pool(multiprocessing.cpu_count())
    bootstrap_samples = pool.starmap(sample_data, zip(repeat(df, n_samples), repeat(replace), repeat(random_state)))
    pool.close()
    pool.join()

    return bootstrap_samples

现在，如果我想生成15个样本，

resample\u data

将返回一个列表，其中包含来自

df

的15个样本

samples = resample_data(df, True, n_samples=15, random_state=1)

请注意，要返回不同的结果，可以方便地将

random_state

设置为

None

方法2：线性采样数据样本数据的另一种方法是通过列表理解，因为函数

sample\u data

已经定义，所以在列表中调用它很简单

def resample_data_linearly(df, replace, n_samples, random_state):
    
    return [sample_data(df, replace, random_state) for _ in range(n_samples)] 

# Generate 10 samples of size len(df)
samples = resample_data_linearly(df, True, n_samples=10, random_state=1)

你是说我想从我的数据中得到n个不同的引导样本？是的，没错@MiguelTrejo。上面的代码只能创建一个引导示例。但我想得到X多（比如可能>1000）。非常感谢您您是指

sample

函数还是

resample

函数？您指定的参数用于sample函数？用于resample函数。为了更清楚地解释：1）我们有原始数据2）在重采样数据中创建原始数据的X倍。2）代码：boot=resample（df，replace=True，n_samples=len（df），random_state=1）print（'Bootstrap Sample:%s'%boot'）仅从原始数据创建1个重采样数据。-->因此，目标是从原始数据中创建更多重采样数据（重复重采样）@我非常感谢你。但是输出似乎产生了很多错误：（在当前进程完成引导阶段之前，有人试图启动一个新进程。这可能意味着您没有使用fork启动子进程，并且忘记在主模块中使用正确的习惯用法：）比如说@MiguelTrejoand同时，新样本的长度必须与原始样本相同。（因此n_samples=15不需要做15个新样本，而是从原始样本中创建一个只有15个数据点的样本来创建一个新样本。问题似乎是对于Windows，也许你可以使用docker容器来运行你的代码，这在Linux上很好。如果样本大小应该是数据大小，那么它可以更改，我将e edit@LinhTran如果您可以使用列表理解，请参阅上一次编辑的代码示例。我希望这对您有用。