Python 熊猫：难以理解合并是如何工作的_Python_Pandas

Python 熊猫：难以理解合并是如何工作的

python pandas

Python 熊猫：难以理解合并是如何工作的,python,pandas,Python,Pandas,我做了一些错误的合并，我不能理解它是什么。为了估计一系列整数值的直方图，我做了以下工作： import pandas as pnd import numpy as np series = pnd.Series(np.random.poisson(5, size = 100)) tmp = {"series" : series, "count" : np.ones(len(series))} hist = pnd.DataFrame(tmp).groupby("series").sum()

我做了一些错误的合并，我不能理解它是什么。为了估计一系列整数值的直方图，我做了以下工作：

import pandas as pnd
import numpy  as np

series = pnd.Series(np.random.poisson(5, size = 100))
tmp  = {"series" : series, "count" : np.ones(len(series))}
hist = pnd.DataFrame(tmp).groupby("series").sum()
freq = (hist / hist.sum()).rename(columns = {"count" : "freq"})

如果我打印

hist

和

freq

这就是我得到的：

> print hist
        count
series       
0           2
1           4
2          13
3          15
4          12
5          16
6          18
7           7
8           8
9           3
10          1
11          1

> print freq 
        freq
series      
0       0.02
1       0.04
2       0.13
3       0.15
4       0.12
5       0.16
6       0.18
7       0.07
8       0.08
9       0.03
10      0.01
11      0.01

它们都由

“series”

索引，但如果我尝试合并：

> df   = pnd.merge(freq, hist, on = "series")

我得到一个

keyrerror:“没有名为series的项”

异常。如果我省略

on=“series”

我会得到一个

索引器：列表索引超出范围

异常

我不明白我做错了什么。也许“系列”是一个索引，而不是一列，所以我必须用不同的方式来做

来自：

打开：要连接的列（名称）。必须在左侧和右侧找到右数据框对象。如果未通过，则为左索引和右索引如果为False，则数据帧中列的交点将为推断为连接键

我不知道为什么这不在文档字符串中，但它解释了您的问题

您可以给出

左索引

和

右索引

：

In : pnd.merge(freq, hist, right_index=True, left_index=True)
Out:
        freq  count
series
0       0.01      1
1       0.04      4
2       0.14     14
3       0.12     12
4       0.21     21
5       0.14     14
6       0.17     17
7       0.07      7
8       0.05      5
9       0.01      1
10      0.01      1
11      0.03      3

或者，您可以将索引设为一列，并在上使用

：
In : freq2 = freq.reset_index()

In : hist2 = hist.reset_index()

In : pnd.merge(freq2, hist2, on='series')
Out:
    series  freq  count
0        0  0.01      1
1        1  0.04      4
2        2  0.14     14
3        3  0.12     12
4        4  0.21     21
5        5  0.14     14
6        6  0.17     17
7        7  0.07      7
8        8  0.05      5
9        9  0.01      1
10      10  0.01      1
11      11  0.03      3

或者，更简单地说，DataFrame
有一种方法，可以完全满足您的需要：
In : freq.join(hist)
Out:
        freq  count
series
0       0.01      1
1       0.04      4
2       0.14     14
3       0.12     12
4       0.21     21
5       0.14     14
6       0.17     17
7       0.07      7
8       0.05      5
9       0.01      1
10      0.01      1
11      0.03      3