Python 当存在多个重复列时，删除某个列的重复项_Python_Pandas_Duplicates

Python 当存在多个重复列时，删除某个列的重复项

python pandas

Python 当存在多个重复列时，删除某个列的重复项,python,pandas,duplicates,Python,Pandas,Duplicates,我有一个具有多个重复列的dataframe，但我想删除“class”列的副本，同时保持其他重复列的完整性。下面您可以看到有许多重复的列。然而，我只想删除“类”列，只保留一份副本。其他列应保持不变，行号不应更改数据帧： train=pd.DataFrame（{'class'：{0:1， 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 1, 9: 2, 10: 3, 11: 4, 12: 5, 13: 6, 14: 7, 15: 8}, 'class.1'

我有一个具有多个重复列的dataframe，但我想删除“class”列的副本，同时保持其他重复列的完整性。下面您可以看到有许多重复的列。然而，我只想删除“类”列，只保留一份副本。其他列应保持不变，行号不应更改

数据帧：

train=pd.DataFrame（{'class'：{0:1，
1: 2,
2: 3,
3: 4,
4: 5,
5: 6,
6: 7,
7: 8,
8: 1,
9: 2,
10: 3,
11: 4,
12: 5,
13: 6,
14: 7,
15: 8},
'class.1'：{0:1，
1: 2,
2: 3,
3: 4,
4: 5,
5: 6,
6: 7,
7: 8,
8: 1,
9: 2,
10: 3,
11: 4,
12: 5,
13: 6,
14: 7,
15: 8},
'class.2'：{0:1，
1: 2,
2: 3,
3: 4,
4: 5,
5: 6,
6: 7,
7: 8,
8: 1,
9: 2,
10: 3,
11: 4,
12: 5,
13: 6,
14: 7,
15: 8},
'x_feature_1'：{0:-0.30424321，
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'x_功能_1.1'：{0:-0.30424321，
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'x_feature_2'：{0:-0.30424321，
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'y_feature_1'：{0:-0.30424321，
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'y_feature_2'：{0:-0.30424321，
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'y_feature_2.1'：{0:-0.30424321，
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
‘z_特征_1’：{0:-0.30424321，
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
‘z_特征_1.1’：{0:-0.30424321，
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
‘z_特征_2’：{0:-0.30424321，
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296}})

预期：

expected=pd.DataFrame（{'class'：{0:1，
1: 2,
2: 3,
3: 4,
4: 5,
5: 6,
6: 7,
7: 8,
8: 1,
9: 2,
10: 3,
11: 4,
12: 5,
13: 6,
14: 7,
15: 8},
'x_feature_1'：{0:-0.30424321，
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'x_功能_1.1'：{0:-0.30424321，
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'x_feature_2'：{0:-0.30424321，
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'y_feature_1'：{0:-0.30424321，
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'y_feature_2'：{0:-0.30424321，
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'y_feature_2.1'：{0:-0.30424321，
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
‘z_特征_1’：{0:-0.30424321，
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

m1 = train.columns.str.startswith('class')
m2 = train.columns.str.split('.').str[0].duplicated()
train = train.loc[:, ~m1 | ~m2]
print (train)