Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python:从数据帧获取随机数据_Python_Python 3.x_Pandas_Dataframe_Random - Fatal编程技术网

Python:从数据帧获取随机数据

Python:从数据帧获取随机数据,python,python-3.x,pandas,dataframe,random,Python,Python 3.x,Pandas,Dataframe,Random,具有具有以下值的df: name algo accuracy tom 1 88 tommy 2 87 mark 1 88 stuart 3 100 alex 2 99 lincoln 1 88 如何从df中随机选取4条记录,条件是至少应从每个唯一的algo列值中选取一条记录。这里,algo列只有3个唯一值(1、2、3) 样本输出:

具有具有以下值的df:

name     algo      accuracy
tom       1         88
tommy     2         87
mark      1         88
stuart    3         100
alex      2         99
lincoln   1         88
如何从df中随机选取4条记录,条件是至少应从每个唯一的algo列值中选取一条记录。这里,algo列只有3个唯一值(1、2、3)

样本输出:

name     algo      accuracy
tom       1         88
tommy     2         87
stuart    3         100
lincoln   1         88
样本输出2:

name     algo      accuracy
mark      1         88
stuart    3         100
alex      2         99
lincoln   1         88
单程

num_sample, num_algo = 4, 3

# sample one for each algo
out = df.groupby('algo').sample(n=num_sample//num_algo)

# append one more sample from those that didn't get selected.
out = out.append(df.drop(out.index).sample(n=num_sample-num_algo) )

另一种方法是洗牌整个数据,枚举每个algo中的行,按该枚举排序,并获取所需数量的样本。这比第一种方法的代码略多,但更便宜,并产生更平衡的算法计数:

# shuffle data
df_random = df['algo'].sample(frac=1)

# enumerations of rows with the same algo
enums = df_random.groupby(df_random).cumcount()

# sort with `np.argsort`:
enums = enums.sort_values()

# pick the first num_sample indices
# these will be indices of the samples
# so we can use `loc`
out = df.loc[enums.iloc[:num_sample].index]
单程

num_sample, num_algo = 4, 3

# sample one for each algo
out = df.groupby('algo').sample(n=num_sample//num_algo)

# append one more sample from those that didn't get selected.
out = out.append(df.drop(out.index).sample(n=num_sample-num_algo) )

另一种方法是洗牌整个数据,枚举每个algo中的行,按该枚举排序,并获取所需数量的样本。这比第一种方法的代码略多,但更便宜,并产生更平衡的算法计数:

# shuffle data
df_random = df['algo'].sample(frac=1)

# enumerations of rows with the same algo
enums = df_random.groupby(df_random).cumcount()

# sort with `np.argsort`:
enums = enums.sort_values()

# pick the first num_sample indices
# these will be indices of the samples
# so we can use `loc`
out = df.loc[enums.iloc[:num_sample].index]