Python 箱线图和散点图
我有一个时间序列数据,我想建立一个覆盖的散点图和箱线图。数据如下:Python 箱线图和散点图,python,pandas,matplotlib,seaborn,Python,Pandas,Matplotlib,Seaborn,我有一个时间序列数据,我想建立一个覆盖的散点图和箱线图。数据如下: TokenUsed date 0 8 2020-01-05 1 8 2020-01-05 2 8 2020-01-05 3 8 2020-01-05 4 8 2020-01-05 ... ... ... 51040 7 2020-02-23 51041 7 2020-02-23 51042 7 2020-02-23 51043 7 2020-02
TokenUsed date
0 8 2020-01-05
1 8 2020-01-05
2 8 2020-01-05
3 8 2020-01-05
4 8 2020-01-05
... ... ...
51040 7 2020-02-23
51041 7 2020-02-23
51042 7 2020-02-23
51043 7 2020-02-23
51044 7 2020-02-23
这个时间序列可以整洁地显示为箱线图(我对x轴作为日期遇到了问题,但解决了将其转换为字符串的问题)。现在,在我的例子中,我只想显示sum优于阈值(>81)的数据。代码和生成的图像如下所示:
fig, ax = plt.subplots(figsize = (12,6))
ax = sns.boxplot(x="date", y="TokenUsed", data=df, ax= ax, whis=[0,100])
ax.axhline(81)
plt.locator_params(axis='x', nbins=10)
plt.show()
当我添加散点图时,我得到图像(2),通过仅过滤那些>81的图像,我得到图像(3)。我不明白的是为什么这两张图之间的x轴看起来不匹配
代码:
答复:
请尝试编辑您的筛选,以便不会实际删除df
行。也就是说,专门在TokenUsed
列上应用掩码,以便将值替换为NaN
(而不是删除整行)。以下是我将如何实现这一点:
#make a new copy df, use that to plot
df2['TokenUsed'] = df2['TokenUsed'].mask(df2['TokenUsed'] < 81)
ax = sns.scatterplot(x="date", y="TokenUsed", data=df2, ax= ax,color=".25")
如果我使用与您应用的过滤器相同的过滤器,我会遇到相同的问题。太好了!这正是问题所在。我忘了掩码函数保留索引!
#make a new copy df, use that to plot
df2['TokenUsed'] = df2['TokenUsed'].mask(df2['TokenUsed'] < 81)
ax = sns.scatterplot(x="date", y="TokenUsed", data=df2, ax= ax,color=".25")
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# fake data, only one date has values over 80
dr = ['01-05-2020'] * 100 + ['01-12-2020'] * 100 + ['01-19-2020'] * 100
data = list(np.random.randint(0,80,200)) + list(np.random.randint(50,150,100))
df = pd.DataFrame({'date':dr, 'TokenUsed':data})
fig, ax = plt.subplots(figsize = (12,6))
ax = sns.boxplot(x="date", y="TokenUsed", data=df, ax=ax, whis=[0,100])
df2 = df.copy()
df2['TokenUsed'] = df2['TokenUsed'].mask(df2['TokenUsed'] < 81)
# the fix
df2 = df.copy()
df2['TokenUsed'] = df2['TokenUsed'].mask(df2['TokenUsed'] < 81)
ax = sns.scatterplot(x="date", y="TokenUsed", data=df2, ax= ax,color=".25")
ax.axhline(81)
plt.locator_params(axis='x', nbins=10)
plt.show()