Python 熊猫中的列重复重命名

Python 熊猫中的列重复重命名,python,pandas,csv,dataframe,indexing,Python,Pandas,Csv,Dataframe,Indexing,我有以下csv数据文件: id,number,id 132605,1,1 132750,2,1 Pandas当前将其重命名为: id number id.1 0 132605 1 1 1 132750 2 1 有没有办法自定义如何重命名?例如,我更喜欢: id number id2 0 132605 1 1 1 132750 2 1 重命名:使用句点分隔符 假设列名包含句点()的实例

我有以下csv数据文件:

id,number,id
132605,1,1
132750,2,1
Pandas当前将其重命名为:

       id number id.1
0  132605      1    1
1  132750      2    1
有没有办法自定义如何重命名?例如,我更喜欢:

           id number id2
0  132605      1    1
1  132750      2    1
重命名
:使用句点分隔符 假设列名包含句点(
)的实例中只有重复的列标签,则可以将自定义函数用于:

csv.reader
:强健的解决方案 通过标准库中的
csv
模块,可以实现强健的解决方案:

from collections import defaultdict
import csv

# replace StringIO(file) with open('file.csv', 'r')
with StringIO(file) as fin:
    headers = next(csv.reader(fin))

def rename_duplicates(original_cols):
    count = defaultdict(int)
    for x in original_cols:
        count[x] += 1
        yield f'{x}{count[x]}' if count[x] > 1 else x

df.columns = rename_duplicates(headers)

简短回答

不可以。您不能更改使用
pandas
API添加后缀的方式

长答案

这由pandas.read_csv的
mangle_dupe_cols
选项处理,该选项当前不支持将其关闭

您可以做的是修改pandas.io.parsers的源代码。\可能是\u dedup\u names,但通常不太推荐这样做

def _maybe_dedup_names(self, names):
    if self.mangle_dupe_cols:
        names = list(names)  
        # counts = defaultdict(int)
        counts = defaultdict(lambda:1) 
        # So that your duplicated column suffix starts with 2 not 1
        is_potential_mi = _is_potential_multi_index(names)

        for i, col in enumerate(names):
            cur_count = counts[col]

            while cur_count > 0:
                counts[col] = cur_count + 1

                if is_potential_mi:
                    # col = col[:-1] + ('%s.%d' % (col[-1], cur_count),)
                    col = col[:-1] + ('%s%d' % (col[-1], cur_count),)
                else:
                    # col = '%s.%d' % (col, cur_count)
                    col = '%s%d' % (col, cur_count)
                # eliminate '.' from formating
                cur_count = counts[col]

            names[i] = col
            counts[col] = cur_count + 1
def _maybe_dedup_names(self, names):
    if self.mangle_dupe_cols:
        names = list(names)  
        # counts = defaultdict(int)
        counts = defaultdict(lambda:1) 
        # So that your duplicated column suffix starts with 2 not 1
        is_potential_mi = _is_potential_multi_index(names)

        for i, col in enumerate(names):
            cur_count = counts[col]

            while cur_count > 0:
                counts[col] = cur_count + 1

                if is_potential_mi:
                    # col = col[:-1] + ('%s.%d' % (col[-1], cur_count),)
                    col = col[:-1] + ('%s%d' % (col[-1], cur_count),)
                else:
                    # col = '%s.%d' % (col, cur_count)
                    col = '%s%d' % (col, cur_count)
                # eliminate '.' from formating
                cur_count = counts[col]

            names[i] = col
            counts[col] = cur_count + 1