Pandas 删除基于列名的重复列_Pandas_Dataframe_Duplicates

Pandas 删除基于列名的重复列

pandas dataframe

Pandas 删除基于列名的重复列,pandas,dataframe,duplicates,Pandas,Dataframe,Duplicates,是否有一种方法可以仅根据列名称删除特定的重复列。价值观并不重要。下表中有名称A、B、C、D重复的列。我只想除去第一个列之外的列A A B C D A B C D 18 75 92 24 88 75 49 7 30 56 42 58 60 72 36 47 85 84 28 84 68 49 71 9 71 70 97 99 63 71 1 14 82 63 67 23 74 38 74 37 8

是否有一种方法可以仅根据列名称删除特定的重复列。价值观并不重要。下表中有名称A、B、C、D重复的列。我只想除去第一个列之外的列A

A   B   C   D   A   B   C   D
18  75  92  24  88  75  49  7
30  56  42  58  60  72  36  47
85  84  28  84  68  49  71  9
71  70  97  99  63  71  1   14
82  63  67  23  74  38  74  37
87  11  47  69  52  24  51  50
46  73  85  55  69  15  3   26
47  100 6   42  26  76  88  54
81  100 79  59  4   19  5   89
93  67  83  51  66  52  52  84
29  26  58  54  31  30  52  17
48  29  7   79  75  90  13  95
77  40  16  30  41  92  98  68
56  11  39  45  65  69  4   75
93  83  98  83  9   21  66  89
2   10  4   29  44  95  53  67

最终结果应该是这样的

A   B   C   D   B   C   D
18  75  92  24  75  49  7
30  56  42  58  72  36  47
85  84  28  84  49  71  9
71  70  97  99  71  1   14
82  63  67  23  38  74  37
87  11  47  69  24  51  50
46  73  85  55  15  3   26
47  100 6   42  76  88  54
81  100 79  59  19  5   89
93  67  83  51  52  52  84
29  26  58  54  30  52  17
48  29  7   79  90  13  95
77  40  16  30  92  98  68
56  11  39  45  69  4   75
93  83  98  83  21  66  89
2   10  4   29  95  53  67

我已经尝试过使用它来删除找到的所有重复列，而不仅仅是一个

df = df.loc[:,~df.columns.duplicated()]

让我们试试

cumcount

s = df.columns.to_series()
out = df.loc[:,~((s.groupby(s).cumcount()==1) & (s =='A'))]
Out[113]: 
     A    B   C   D   B   C   D
0   18   75  92  24  75  49   7
1   30   56  42  58  72  36  47
2   85   84  28  84  49  71   9
3   71   70  97  99  71   1  14
4   82   63  67  23  38  74  37
5   87   11  47  69  24  51  50
6   46   73  85  55  15   3  26
7   47  100   6  42  76  88  54
8   81  100  79  59  19   5  89
9   93   67  83  51  52  52  84
10  29   26  58  54  30  52  17
11  48   29   7  79  90  13  95
12  77   40  16  30  92  98  68
13  56   11  39  45  69   4  75
14  93   83  98  83  21  66  89
15   2   10   4  29  95  53  67

让我们试试

cumcount

s = df.columns.to_series()
out = df.loc[:,~((s.groupby(s).cumcount()==1) & (s =='A'))]
Out[113]: 
     A    B   C   D   B   C   D
0   18   75  92  24  75  49   7
1   30   56  42  58  72  36  47
2   85   84  28  84  49  71   9
3   71   70  97  99  71   1  14
4   82   63  67  23  38  74  37
5   87   11  47  69  24  51  50
6   46   73  85  55  15   3  26
7   47  100   6  42  76  88  54
8   81  100  79  59  19   5  89
9   93   67  83  51  52  52  84
10  29   26  58  54  30  52  17
11  48   29   7  79  90  13  95
12  77   40  16  30  92  98  68
13  56   11  39  45  69   4  75
14  93   83  98  83  21  66  89
15   2   10   4  29  95  53  67

这应该起作用：

# Mark all columns except `A`s as True
cols = ~(df.columns == 'A')

# Add the first occurrence of `A`:
cols[list(df.columns).index('A')] = True

# Here you get `df` only with first occurrence of `A`:
df.loc[:,cols]

这应该起作用：

# Mark all columns except `A`s as True
cols = ~(df.columns == 'A')

# Add the first occurrence of `A`:
cols[list(df.columns).index('A')] = True

# Here you get `df` only with first occurrence of `A`:
df.loc[:,cols]

使用

iloc

和

get\u indexer\u for

a = df.columns.get_indexer_for(['A'])[1:]
df_final = df.iloc[:, [c for c in range(df.shape[1]) if c not in a]]

Out[1131]:
     A    B   C   D   B   C   D
0   18   75  92  24  75  49   7
1   30   56  42  58  72  36  47
2   85   84  28  84  49  71   9
3   71   70  97  99  71   1  14
4   82   63  67  23  38  74  37
5   87   11  47  69  24  51  50
6   46   73  85  55  15   3  26
7   47  100   6  42  76  88  54
8   81  100  79  59  19   5  89
9   93   67  83  51  52  52  84
10  29   26  58  54  30  52  17
11  48   29   7  79  90  13  95
12  77   40  16  30  92  98  68
13  56   11  39  45  69   4  75
14  93   83  98  83  21  66  89
15   2   10   4  29  95  53  67

使用

iloc

和

get\u indexer\u for

a = df.columns.get_indexer_for(['A'])[1:]
df_final = df.iloc[:, [c for c in range(df.shape[1]) if c not in a]]

Out[1131]:
     A    B   C   D   B   C   D
0   18   75  92  24  75  49   7
1   30   56  42  58  72  36  47
2   85   84  28  84  49  71   9
3   71   70  97  99  71   1  14
4   82   63  67  23  38  74  37
5   87   11  47  69  24  51  50
6   46   73  85  55  15   3  26
7   47  100   6  42  76  88  54
8   81  100  79  59  19   5  89
9   93   67  83  51  52  52  84
10  29   26  58  54  30  52  17
11  48   29   7  79  90  13  95
12  77   40  16  30  92  98  68
13  56   11  39  45  69   4  75
14  93   83  98  83  21  66  89
15   2   10   4  29  95  53  67