Python 熊猫中的列重复重命名
我有以下csv数据文件:Python 熊猫中的列重复重命名,python,pandas,csv,dataframe,indexing,Python,Pandas,Csv,Dataframe,Indexing,我有以下csv数据文件: id,number,id 132605,1,1 132750,2,1 Pandas当前将其重命名为: id number id.1 0 132605 1 1 1 132750 2 1 有没有办法自定义如何重命名?例如,我更喜欢: id number id2 0 132605 1 1 1 132750 2 1 重命名:使用句点分隔符 假设列名包含句点()的实例
id,number,id
132605,1,1
132750,2,1
Pandas当前将其重命名为:
id number id.1
0 132605 1 1
1 132750 2 1
有没有办法自定义如何重命名?例如,我更喜欢:
id number id2
0 132605 1 1
1 132750 2 1
重命名
:使用句点分隔符
假设列名包含句点(
)的实例中只有重复的列标签,则可以将自定义函数用于:
csv.reader
:强健的解决方案
通过标准库中的csv
模块,可以实现强健的解决方案:
from collections import defaultdict
import csv
# replace StringIO(file) with open('file.csv', 'r')
with StringIO(file) as fin:
headers = next(csv.reader(fin))
def rename_duplicates(original_cols):
count = defaultdict(int)
for x in original_cols:
count[x] += 1
yield f'{x}{count[x]}' if count[x] > 1 else x
df.columns = rename_duplicates(headers)
简短回答 不可以。您不能更改使用
pandas
API添加后缀的方式
长答案
这由pandas.read_csv的mangle_dupe_cols
选项处理,该选项当前不支持将其关闭
您可以做的是修改pandas.io.parsers的源代码。\可能是\u dedup\u names,但通常不太推荐这样做
def _maybe_dedup_names(self, names):
if self.mangle_dupe_cols:
names = list(names)
# counts = defaultdict(int)
counts = defaultdict(lambda:1)
# So that your duplicated column suffix starts with 2 not 1
is_potential_mi = _is_potential_multi_index(names)
for i, col in enumerate(names):
cur_count = counts[col]
while cur_count > 0:
counts[col] = cur_count + 1
if is_potential_mi:
# col = col[:-1] + ('%s.%d' % (col[-1], cur_count),)
col = col[:-1] + ('%s%d' % (col[-1], cur_count),)
else:
# col = '%s.%d' % (col, cur_count)
col = '%s%d' % (col, cur_count)
# eliminate '.' from formating
cur_count = counts[col]
names[i] = col
counts[col] = cur_count + 1
def _maybe_dedup_names(self, names):
if self.mangle_dupe_cols:
names = list(names)
# counts = defaultdict(int)
counts = defaultdict(lambda:1)
# So that your duplicated column suffix starts with 2 not 1
is_potential_mi = _is_potential_multi_index(names)
for i, col in enumerate(names):
cur_count = counts[col]
while cur_count > 0:
counts[col] = cur_count + 1
if is_potential_mi:
# col = col[:-1] + ('%s.%d' % (col[-1], cur_count),)
col = col[:-1] + ('%s%d' % (col[-1], cur_count),)
else:
# col = '%s.%d' % (col, cur_count)
col = '%s%d' % (col, cur_count)
# eliminate '.' from formating
cur_count = counts[col]
names[i] = col
counts[col] = cur_count + 1