Python 如何在散点图中添加最佳拟合线_Python_Numpy_Pandas_Matplotlib_Plot

Python 如何在散点图中添加最佳拟合线

python numpy pandas matplotlib plot

Python 如何在散点图中添加最佳拟合线,python,numpy,pandas,matplotlib,plot,Python,Numpy,Pandas,Matplotlib,Plot,我目前正在与Pandas和matplotlib合作，以执行一些数据可视化，我想在散点图中添加一条最适合的线这是我的密码： import matplotlib import matplotlib.pyplot as plt import pandas as panda import numpy as np def PCA_scatter(filename): matplotlib.style.use('ggplot') data = panda.read_csv(filenam

我目前正在与Pandas和matplotlib合作，以执行一些数据可视化，我想在散点图中添加一条最适合的线

这是我的密码：

import matplotlib
import matplotlib.pyplot as plt
import pandas as panda
import numpy as np

def PCA_scatter(filename):

   matplotlib.style.use('ggplot')

   data = panda.read_csv(filename)
   data_reduced = data[['2005', '2015']]

   data_reduced.plot(kind='scatter', x='2005', y='2015')
   plt.show()

PCA_scatter('file.csv')

我该怎么做呢？

您可以使用

np.polyfit（）

和

np.poly1d（）

。使用相同的

值估计一次多项式，并将其添加到由

.scatter（）

绘图创建的

ax

对象中。举个例子：

import numpy as np

     2005   2015
0   18882  21979
1    1161   1044
2     482    558
3    2105   2471
4     427   1467
5    2688   2964
6    1806   1865
7     711    738
8     928   1096
9    1084   1309
10    854    901
11    827   1210
12   5034   6253

估计一次多项式：

z = np.polyfit(x=df.loc[:, 2005], y=df.loc[:, 2015], deg=1)
p = np.poly1d(z)
df['trendline'] = p(df.loc[:, 2005])

     2005   2015     trendline
0   18882  21979  21989.829486
1    1161   1044   1418.214712
2     482    558    629.990208
3    2105   2471   2514.067336
4     427   1467    566.142863
5    2688   2964   3190.849200
6    1806   1865   2166.969948
7     711    738    895.827339
8     928   1096   1147.734139
9    1084   1309   1328.828428
10    854    901   1061.830437
11    827   1210   1030.487195
12   5034   6253   5914.228708

并绘制：

ax = df.plot.scatter(x=2005, y=2015)
df.set_index(2005, inplace=True)
df.trendline.sort_index(ascending=False).plot(ax=ax)
plt.gca().invert_xaxis()

要获得：

还提供了直线方程：

'y={0:.2f} x + {1:.2f}'.format(z[0],z[1])

y=1.16 x + 70.46

另一种选择（使用）：

你可以用它一下子完成整个拟合和绘图

本文详细介绍了

方法
#load the libraries

import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

# create the data
N = 50
x = pd.Series(np.random.randn(N))
y = x*2.2 - 1.8

# plot the data as a scatter plot
fig = px.scatter(x=x, y=y) 

# fit a linear model 
m, c = fit_line(x = x, 
                y = y)

# add the linear fit on top
fig.add_trace(
    go.Scatter(
        x=x,
        y=m*x + c,
        mode="lines",
        line=go.scatter.Line(color="red"),
        showlegend=False)
)
# optionally you can show the slop and the intercept 
mid_point = x.mean()

fig.update_layout(
    showlegend=False,
    annotations=[
        go.layout.Annotation(
            x=mid_point,
            y=m*mid_point + c,
            xref="x",
            yref="y",
            text=str(round(m, 2))+'x+'+str(round(c, 2)) ,
        )
    ]
)
fig.show()

其中fit_line
为
def fit_line(x, y):
    # given one dimensional x and y vectors - return x and y for fitting a line on top of the regression
    # inspired by the numpy manual - https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.lstsq.html 
    x = x.to_numpy() # convert into numpy arrays
    y = y.to_numpy() # convert into numpy arrays

    A = np.vstack([x, np.ones(len(x))]).T # sent the design matrix using the intercepts
    m, c = np.linalg.lstsq(A, y, rcond=None)[0]

    return m, c

以上最佳答案是使用seaborn。
要添加到上述内容，如果要使用循环创建多个绘图，仍然可以使用matplotlib
    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt

    data_reduced= pd.read_csv('fake.txt',sep='\s+')
    for x in data_reduced.columns:
        sns.regplot(data_reduced[x],data_reduced['2015'])
        plt.show()

plt.show（）将暂停执行，以便您可以一次查看一个绘图
该行trendline.plot（ax=ax）
给了我一个无效的语法错误该行z=np.polyfit（x=data_reduced['2005']]，y=data_reduced['2015']]，1）
给了我一个“位置参数跟随关键字参数”错误抱歉，需要在=1
之前添加deg
的deg
，请参阅update.TypeError:x行z=np.polyfit的预期1D向量（x=data_reduced['2005']]，y=data_reduced['2015']]，deg=1）
。这是我的数据或代码的问题吗？需要使用.loc[]
使单列成为pd.Series
。使用[[]]
进行选择会将单个列保留为数据框，因此会出现维度警告。更新后，同样适用于下一行。我的错，已经很晚了…但我想使用matplotlib！：（这个解决方案非常简单！非常感谢！如果您想在循环和创建多个图表时一次查看一个图表，您仍然需要matplotlib的plt.show（）。这是否回答了您的问题？
def fit_line(x, y):
    # given one dimensional x and y vectors - return x and y for fitting a line on top of the regression
    # inspired by the numpy manual - https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.lstsq.html 
    x = x.to_numpy() # convert into numpy arrays
    y = y.to_numpy() # convert into numpy arrays

    A = np.vstack([x, np.ones(len(x))]).T # sent the design matrix using the intercepts
    m, c = np.linalg.lstsq(A, y, rcond=None)[0]

    return m, c

    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt

    data_reduced= pd.read_csv('fake.txt',sep='\s+')
    for x in data_reduced.columns:
        sns.regplot(data_reduced[x],data_reduced['2015'])
        plt.show()