Python 箱线图和散点图

Python 箱线图和散点图,python,pandas,matplotlib,seaborn,Python,Pandas,Matplotlib,Seaborn,我有一个时间序列数据,我想建立一个覆盖的散点图和箱线图。数据如下: TokenUsed date 0 8 2020-01-05 1 8 2020-01-05 2 8 2020-01-05 3 8 2020-01-05 4 8 2020-01-05 ... ... ... 51040 7 2020-02-23 51041 7 2020-02-23 51042 7 2020-02-23 51043 7 2020-02

我有一个时间序列数据,我想建立一个覆盖的散点图和箱线图。数据如下:

    TokenUsed   date
0   8   2020-01-05
1   8   2020-01-05
2   8   2020-01-05
3   8   2020-01-05
4   8   2020-01-05
... ... ...
51040   7   2020-02-23
51041   7   2020-02-23
51042   7   2020-02-23
51043   7   2020-02-23
51044   7   2020-02-23
这个时间序列可以整洁地显示为箱线图(我对x轴作为日期遇到了问题,但解决了将其转换为字符串的问题)。现在,在我的例子中,我只想显示sum优于阈值(>81)的数据。代码和生成的图像如下所示:

fig, ax = plt.subplots(figsize = (12,6))  



ax = sns.boxplot(x="date", y="TokenUsed", data=df, ax= ax, whis=[0,100])


ax.axhline(81)

plt.locator_params(axis='x', nbins=10)
plt.show()

当我添加散点图时,我得到图像(2),通过仅过滤那些>81的图像,我得到图像(3)。我不明白的是为什么这两张图之间的x轴看起来不匹配

代码:

答复: 请尝试编辑您的筛选,以便不会实际删除
df
行。也就是说,专门在
TokenUsed
列上应用掩码,以便将值替换为
NaN
(而不是删除整行)。以下是我将如何实现这一点:

#make a new copy df, use that to plot
df2['TokenUsed'] = df2['TokenUsed'].mask(df2['TokenUsed'] < 81)
ax = sns.scatterplot(x="date", y="TokenUsed", data=df2, ax= ax,color=".25")


如果我使用与您应用的过滤器相同的过滤器,我会遇到相同的问题。

太好了!这正是问题所在。我忘了掩码函数保留索引!
#make a new copy df, use that to plot
df2['TokenUsed'] = df2['TokenUsed'].mask(df2['TokenUsed'] < 81)
ax = sns.scatterplot(x="date", y="TokenUsed", data=df2, ax= ax,color=".25")
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# fake data, only one date has values over 80
dr = ['01-05-2020'] * 100 + ['01-12-2020'] * 100 + ['01-19-2020'] * 100
data = list(np.random.randint(0,80,200)) + list(np.random.randint(50,150,100))
df = pd.DataFrame({'date':dr, 'TokenUsed':data})

fig, ax = plt.subplots(figsize = (12,6))
ax = sns.boxplot(x="date", y="TokenUsed", data=df, ax=ax, whis=[0,100])

df2 = df.copy()
df2['TokenUsed'] = df2['TokenUsed'].mask(df2['TokenUsed'] < 81)

# the fix
df2 = df.copy()
df2['TokenUsed'] = df2['TokenUsed'].mask(df2['TokenUsed'] < 81)
ax = sns.scatterplot(x="date", y="TokenUsed", data=df2, ax= ax,color=".25")

ax.axhline(81)
plt.locator_params(axis='x', nbins=10)
plt.show()