Python 随机拆分数据帧（取决于唯一值）_Python_Pandas

Python 随机拆分数据帧（取决于唯一值）

python pandas

Python 随机拆分数据帧（取决于唯一值）,python,pandas,Python,Pandas,我有一个数据帧df，看起来像这样： | A | B | ... | --------------------- | one | ... | ... | | one | ... | ... | | one | ... | ... | | two | ... | ... | | three | ... | ... | | three | ... | ... | | four | ... | ... | | five | ... | ... | | five | ..

我有一个数据帧

df

，看起来像这样：

|  A    |  B  | ... |
---------------------
| one   | ... | ... |
| one   | ... | ... |
| one   | ... | ... |
| two   | ... | ... |
| three | ... | ... |
| three | ... | ... |
| four  | ... | ... |
| five  | ... | ... |
| five  | ... | ... |

如您所见，对于

有5个唯一的值。我想随机分割数据帧。例如，我希望在数据帧

df1

中有3个唯一值，在数据帧

df2

中有2个唯一值。我的问题是它们不是独一无二的。我不想在两个数据帧上拆分这些唯一值

因此，生成的数据帧可能如下所示：

|  A    |  B  | ... |
---------------------
| one   | ... | ... |
| one   | ... | ... |
| one   | ... | ... |
| two   | ... | ... |
| three | ... | ... |
| three | ... | ... |
| four  | ... | ... |
| five  | ... | ... |
| five  | ... | ... |

数据帧

df1

具有3个唯一值：

|  A    |  B  | ... |
---------------------
| one   | ... | ... |
| one   | ... | ... |
| one   | ... | ... |
| three | ... | ... |
| three | ... | ... |
| five  | ... | ... |
| five  | ... | ... |

|  A    |  B  | ... |
---------------------
| two   | ... | ... |
| four  | ... | ... |

数据帧

df2

具有两个唯一值：

|  A    |  B  | ... |
---------------------
| one   | ... | ... |
| one   | ... | ... |
| one   | ... | ... |
| three | ... | ... |
| three | ... | ... |
| five  | ... | ... |
| five  | ... | ... |

|  A    |  B  | ... |
---------------------
| two   | ... | ... |
| four  | ... | ... |

有没有办法轻松做到这一点？我考虑过分组，但我不知道如何从这开始

v = df1['A'].unique() # Get the unique values
np.shuffle(v) # Shuffle them
v1,v2 = np.array_split(v,2) # Split the unique values into two arrays

最后，使用

.isin（）

方法为数据帧编制索引，以获得所需的结果

r1 = df[df['A'].isin(v1)]
r2 = df[df['A'].isin(v2)]

设置

df=pd.DataFrame({'A': {0: 'one',
  1: 'one',
  2: 'one',
  3: 'two',
  4: 'three',
  5: 'three',
  6: 'four',
  7: 'five',
  8: 'five'},
 'B': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8}})

解决方案

#get 2 unique keys from column A for df1. You can control the split either
# by absolute number in each group, or by a percentage. Check docs for the .sample() func.
df1_keys = df.A.drop_duplicates().sample(2)
df1 = df[df.A.isin(df1_keys)]
#anything not in df1_keys will be assigned to df2
df2 = df[~df.A.isin(df1_keys)]

df1_keys
Out[294]: 
7    five
0     one
Name: A, dtype: object

df1
Out[295]: 
      A  B
0   one  0
1   one  1
2   one  2
7  five  7
8  five  8

df2
Out[296]: 
       A  B
3    two  3
4  three  4
5  three  5
6   four  6

您必须将唯一A因子提取到一个列表中，然后将该列表拆分为两个列表，然后根据这两个列表选择数据帧。