Python 2.7 无法重新索引重复轴

Python 2.7 无法重新索引重复轴,python-2.7,pandas,Python 2.7,Pandas,我正在尝试将多个csv文件合并到一个文件夹中 它们看起来像这样(实际上有两个以上的df): df1 df2 我想把所有的数据帧放在一个列表中,然后用reduce合并它们。要做到这一点,它们需要有相同的索引 我正在尝试以下代码: combined = [] reindex = [2,3,4,5,6] folder = r'C:\path_to_files' for f in os.listdir(folder): #read each file df = pd.read_cs

我正在尝试将多个csv文件合并到一个文件夹中

它们看起来像这样(实际上有两个以上的df):

df1

df2

我想把所有的数据帧放在一个列表中,然后用reduce合并它们。要做到这一点,它们需要有相同的索引

我正在尝试以下代码:

combined = []
reindex = [2,3,4,5,6]

folder = r'C:\path_to_files'

for f in os.listdir(folder):

    #read each file
    df = pd.read_csv(os.path.join(folder,f))

    #check for duplicates - returns empty lists
    print df[df.index.duplicated()]

    #reindex
    df.set_index([df.columns[0]], inplace=True)
    df=df.reindex(reindex, fill_value=0)

    #append
    combined.append(df)


#merge on 'LCC' column
final = reduce(lambda left, right: pd.merge(left, right, on=['LCC'], how='outer'), combined)
但这仍然返回:

Traceback (most recent call last):

  File "<ipython-input-31-45f925f6d48d>", line 9, in <module>
    df=df.reindex(reindex, fill_value=0)

  File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\frame.py", line 2741, in reindex
    **kwargs)

  File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\generic.py", line 2229, in reindex
    fill_value, copy).__finalize__(self)

  File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\frame.py", line 2687, in _reindex_axes
    fill_value, limit, tolerance)

  File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\frame.py", line 2698, in _reindex_index
    allow_dups=False)

File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\generic.py", line 2341, in _reindex_with_indexers
    copy=copy)

  File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\internals.py", line 3586, in reindex_indexer
    self.axes[axis]._can_reindex(indexer)

  File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\indexes\base.py", line 2293, in _can_reindex
    raise ValueError("cannot reindex from a duplicate axis")

ValueError: cannot reindex from a duplicate axis
回溯(最近一次呼叫最后一次):
文件“”,第9行,在
df=df.reindex(reindex,fill_值=0)
文件“C:\Users\spotter\AppData\Local\Continuum\Anaconda2\u 2\lib\site packages\pandas\core\frame.py”,第2741行,reindex格式
**kwargs)
文件“C:\Users\spotter\AppData\Local\Continuum\Anaconda2\u 2\lib\site packages\pandas\core\generic.py”,第2229行,reindex格式
填写值,复制)。\uuuuu完成\uuuuuu(自我)
文件“C:\Users\spotter\AppData\Local\Continuum\Anaconda2\u 2\lib\site packages\pandas\core\frame.py”,第2687行,在reindex\u轴中
填充(值、限制、公差)
文件“C:\Users\spotter\AppData\Local\Continuum\Anaconda2\u 2\lib\site packages\pandas\core\frame.py”,第2698行,在索引中
允许(dups=False)
文件“C:\Users\spotter\AppData\Local\Continuum\Anaconda2\u 2\lib\site packages\pandas\core\generic.py”,第2341行,位于带有索引器的reindex\u中
复制=复制)
reindex\U索引器中的文件“C:\Users\spotter\AppData\Local\Continuum\Anaconda2\u 2\lib\site packages\pandas\core\internals.py”,第3586行
自身轴[轴]。\u可以\u重新索引(索引器)
文件“C:\Users\spotter\AppData\Local\Continuum\Anaconda2\u 2\lib\site packages\pandas\index\base.py”,第2293行,位于重新索引中
raise VALUERROR(“无法从重复轴重新索引”)
ValueError:无法从重复轴重新编制索引

将第一列设置为索引后,您需要检查索引的重复项

#set index by first column
df.set_index([df.columns[0]], inplace=True)

#check for duplicates - returns NO empty lists
print df[df.index.duplicated()]

#reindex
df=df.reindex(reindex, fill_value=0)
或者检查第一列中的重复项而不是索引,同时参数
keep=False
返回所有重复项(如有必要):

Traceback (most recent call last):

  File "<ipython-input-31-45f925f6d48d>", line 9, in <module>
    df=df.reindex(reindex, fill_value=0)

  File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\frame.py", line 2741, in reindex
    **kwargs)

  File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\generic.py", line 2229, in reindex
    fill_value, copy).__finalize__(self)

  File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\frame.py", line 2687, in _reindex_axes
    fill_value, limit, tolerance)

  File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\frame.py", line 2698, in _reindex_index
    allow_dups=False)

File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\generic.py", line 2341, in _reindex_with_indexers
    copy=copy)

  File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\internals.py", line 3586, in reindex_indexer
    self.axes[axis]._can_reindex(indexer)

  File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\indexes\base.py", line 2293, in _can_reindex
    raise ValueError("cannot reindex from a duplicate axis")

ValueError: cannot reindex from a duplicate axis
#set index by first column
df.set_index([df.columns[0]], inplace=True)

#check for duplicates - returns NO empty lists
print df[df.index.duplicated()]

#reindex
df=df.reindex(reindex, fill_value=0)
#check duplicates in first column
print df[df.iloc[:, 0].duplicated(keep=False)]

#set index + reindex
df.set_index([df.columns[0]], inplace=True)
df=df.reindex(reindex, fill_value=0)