Python 熊猫中的列重复重命名_Python_Pandas_Csv_Dataframe_Indexing

Python 熊猫中的列重复重命名

python pandas csv dataframe indexing

Python 熊猫中的列重复重命名,python,pandas,csv,dataframe,indexing,Python,Pandas,Csv,Dataframe,Indexing,我有以下csv数据文件： id,number,id 132605,1,1 132750,2,1 Pandas当前将其重命名为： id number id.1 0 132605 1 1 1 132750 2 1 有没有办法自定义如何重命名？例如，我更喜欢： id number id2 0 132605 1 1 1 132750 2 1 重命名：使用句点分隔符假设列名包含句点（）的实例

我有以下csv数据文件：

id,number,id
132605,1,1
132750,2,1

Pandas当前将其重命名为：

       id number id.1
0  132605      1    1
1  132750      2    1

有没有办法自定义如何重命名？例如，我更喜欢：

           id number id2
0  132605      1    1
1  132750      2    1

重命名

：使用句点分隔符假设列名包含句点（

）的实例中只有重复的列标签，则可以将自定义函数用于：

csv.reader

：强健的解决方案通过标准库中的

csv

模块，可以实现强健的解决方案：

from collections import defaultdict
import csv

# replace StringIO(file) with open('file.csv', 'r')
with StringIO(file) as fin:
    headers = next(csv.reader(fin))

def rename_duplicates(original_cols):
    count = defaultdict(int)
    for x in original_cols:
        count[x] += 1
        yield f'{x}{count[x]}' if count[x] > 1 else x

df.columns = rename_duplicates(headers)

简短回答

不可以。您不能更改使用

pandas

API添加后缀的方式

长答案

这由pandas.read_csv的

mangle_dupe_cols

选项处理，该选项当前不支持将其关闭

您可以做的是修改pandas.io.parsers的源代码。\可能是\u dedup\u names，但通常不太推荐这样做

def _maybe_dedup_names(self, names):
    if self.mangle_dupe_cols:
        names = list(names)  
        # counts = defaultdict(int)
        counts = defaultdict(lambda:1) 
        # So that your duplicated column suffix starts with 2 not 1
        is_potential_mi = _is_potential_multi_index(names)

        for i, col in enumerate(names):
            cur_count = counts[col]

            while cur_count > 0:
                counts[col] = cur_count + 1

                if is_potential_mi:
                    # col = col[:-1] + ('%s.%d' % (col[-1], cur_count),)
                    col = col[:-1] + ('%s%d' % (col[-1], cur_count),)
                else:
                    # col = '%s.%d' % (col, cur_count)
                    col = '%s%d' % (col, cur_count)
                # eliminate '.' from formating
                cur_count = counts[col]

            names[i] = col
            counts[col] = cur_count + 1

def _maybe_dedup_names(self, names):
    if self.mangle_dupe_cols:
        names = list(names)  
        # counts = defaultdict(int)
        counts = defaultdict(lambda:1) 
        # So that your duplicated column suffix starts with 2 not 1
        is_potential_mi = _is_potential_multi_index(names)

        for i, col in enumerate(names):
            cur_count = counts[col]

            while cur_count > 0:
                counts[col] = cur_count + 1

                if is_potential_mi:
                    # col = col[:-1] + ('%s.%d' % (col[-1], cur_count),)
                    col = col[:-1] + ('%s%d' % (col[-1], cur_count),)
                else:
                    # col = '%s.%d' % (col, cur_count)
                    col = '%s%d' % (col, cur_count)
                # eliminate '.' from formating
                cur_count = counts[col]

            names[i] = col
            counts[col] = cur_count + 1