Python 从df中删除特征集抛出序列真值的错误是不明确的
我有一个拥有大量特征的数据集。 我过滤掉了过滤器,并将所选功能的名称存储在4个数组中。 我想删除未选择的功能Python 从df中删除特征集抛出序列真值的错误是不明确的,python,pandas,feature-selection,Python,Pandas,Feature Selection,我有一个拥有大量特征的数据集。 我过滤掉了过滤器,并将所选功能的名称存储在4个数组中。 我想删除未选择的功能 df = pd.read_excel("Anonymizeddataset.xlsx") df = df.fillna(0) # 4 arrays features_selected_with_nan_value KBest_select_feature features_selected_with_mean_value laso_selected_features def drop
df = pd.read_excel("Anonymizeddataset.xlsx")
df = df.fillna(0)
# 4 arrays
features_selected_with_nan_value
KBest_select_feature
features_selected_with_mean_value
laso_selected_features
def drop_features(features):
for index, row in df.iterrows():
for i in range(len(features)):
if row != features[i]:
df_with_selected_features = df.drop([row], axis = 1, inplace = True)
return df_with_selected_features
但它抛出了一个错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
数据集
Target Predictor 1 Predictor 2 Predictor 3 Predictor 4 Predictor 5 Predictor 6 Predictor 7 Predictor 8 Predictor 9 ... Predictor 1065 Predictor 1066 Predictor 1067 Predictor 1068 Predictor 1069 Predictor 1070 Predictor 1071 Predictor 1072 Predictor 1073 Predictor 1074
0 5704.7 98.013498 98.380881 66.012913 21.447560 0.0 0.0 0.0 57.549196 12 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 3200.0 51.224883 98.380881 70.885204 21.447560 0.0 0.0 0.0 57.549196 13 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 6487.9 44.563802 98.380881 85.757141 21.447560 0.0 0.0 0.0 57.549196 13 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 1278.3 65.039616 98.380881 18.380713 87.745614 0.0 0.0 0.0 57.549196 13 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 1368.5 1.905928 98.380881 96.797313 87.745614 0.0 0.0 0.0 57.549196 13 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 rows × 1075 columns
功能\u选择\u与\u nan\u值数组
['Predictor 387', 'Predictor 381', 'Predictor 383', 'Predictor 376', 'Predictor 28', 'Predictor 35', 'Predictor 4', 'Predictor 37', 'Predictor 34', 'Predictor 19', 'Predictor 16', 'Predictor 17', 'Predictor 25', 'Predictor 880', 'Predictor 856', 'Predictor 849', 'Predictor 851', 'Predictor 852', 'Predictor 857', 'Predictor 853', 'Predictor 855', 'Predictor 850', 'Predictor 854', 'Predictor 40', 'Predictor 881', 'Predictor 882', 'Predictor 883', 'Predictor 884', 'Predictor 1015', 'Predictor 487', 'Predictor 738', 'Predictor 476', 'Predictor 473', 'Predictor 749', 'Predictor 604', 'Predictor 607', 'Predictor 618', 'Predictor 848', 'Predictor 1014', 'Predictor 1007', 'Predictor 1012', 'Predictor 979', 'Predictor 344', 'Predictor 345', 'Predictor 356', 'Predictor 392', 'Predictor 858', 'Predictor 859', 'Predictor 860', 'Predictor 861', 'Predictor 879', 'Predictor 862', 'Predictor 863', 'Predictor 980', 'Predictor 864', 'Predictor 878', 'Predictor 865', 'Predictor 877', 'Predictor 866', 'Predictor 867', 'Predictor 869', 'Predictor 870', 'Predictor 871', 'Predictor 872', 'Predictor 873', 'Predictor 874', 'Predictor 876', 'Predictor 735', 'Predictor 981', 'Predictor 982', 'Predictor 983', 'Predictor 1011', 'Predictor 1010', 'Predictor 1009', 'Predictor 1008', 'Predictor 875', 'Predictor 1006', 'Predictor 1005', 'Predictor 1004', 'Predictor 1003', 'Predictor 1002', 'Predictor 1001', 'Predictor 1000', 'Predictor 342', 'Predictor 998', 'Predictor 997', 'Predictor 996', 'Predictor 995', 'Predictor 994', 'Predictor 992', 'Predictor 991', 'Predictor 990', 'Predictor 989', 'Predictor 988', 'Predictor 987', 'Predictor 986', 'Predictor 985', 'Predictor 984', 'Predictor 1013', 'Predictor 993']
我做错了什么?如果我正确理解了你的问题,你可以这样做:
df = pd.read_excel("Anonymizeddataset.xlsx")
df = df.fillna(0)
list_columns = features_selected_with_nan_value + KBest_select_feature +
features_selected_with_mean_value + laso_selected_features
df = df[list_columns]
您能提供一些示例数据和预期输出吗?我怀疑有一种更简单的过滤方法可以避免循环