Python 拆分数据框并在Plotly中使用不同的线样式打印_Python_Pandas_Plotly_Plotly Python_Plotly.graph Objects

Python 拆分数据框并在Plotly中使用不同的线样式打印

python pandas

Python 拆分数据框并在Plotly中使用不同的线样式打印,python,pandas,plotly,plotly-python,plotly.graph-objects,Python,Pandas,Plotly,Plotly Python,Plotly.graph Objects,我是plotly和pandas的新手，我正在尝试找到一个优雅的解决方案，因为我相信我在plotly中没有有效地使用groupby，或者我的数据以某种方式堆积起来，这妨碍了我将其可视化为了制作一个测试图表，我使用了一个假数据集，将3个列表（组、月、支出）压缩在一起，并在特定月份（3月20日）后将其分为“实际”值和“预测”值当我试图添加一个包含3个不同组的预测df跟踪数月后，我得到了下面的怪物当我将索引更改为组，然后使用loc将其子集为3个单独的集合（每组一个）时，我成功地制作了以下图表，尽

我是plotly和pandas的新手，我正在尝试找到一个优雅的解决方案，因为我相信我在plotly中没有有效地使用groupby，或者我的数据以某种方式堆积起来，这妨碍了我将其可视化

为了制作一个测试图表，我使用了一个假数据集，将3个列表（组、月、支出）压缩在一起，并在特定月份（3月20日）后将其分为“实际”值和“预测”值

当我试图添加一个包含3个不同组的预测df跟踪数月后，我得到了下面的怪物

当我将索引更改为组，然后使用loc将其子集为3个单独的集合（每组一个）时，我成功地制作了以下图表，尽管这感觉像是一个弗兰肯斯坦的解决方案：

我想知道是否有一种方法可以绘出初始数据框，并在x轴的某个点后更改线样式，如果没有，是否有一种方法可以在包含三个不同组（组1、组2、组3）的数据子集上使用跟踪？我不确定使用三条独立的记录道并一次又一次地分割数据是否是最好的解决方案，我相信有一个更有效的解决方案

以下是我目前如何获得单独的组：

# reset index 
forecast = forecast.set_index(['group'])

#split
group1_forecast =forecast.loc['group1']
group2_forecast = forecast.loc['group2']
group3_forecast = forecast.loc['group3']

以下是带有单独轨迹的图表的（最低）代码：

fig = None

fig = px.line(actual, 
            x="month", y="spend", color='group',
            title=title)

# group1 
fig.add_scatter(
    x= group1_forecast.month,
    y = group1_forecast.spend,
    mode = 'lines',
    line = dict(shape = 'linear', color = 'purple', width = 1, dash = 'dot'),
    connectgaps = True
)

# group2 trace 
fig.add_scatter(
    x= group2_forecast.month,
    y = group2_forecast.spend,
    mode = 'lines',
    line = dict(shape = 'linear', color = '#33C1FF', width = 1, dash = 'dot'),
    connectgaps = True
)

# group3 trace
fig.add_scatter(
    x= group3_forecast.month,
    y = group3_forecast.spend,
    mode = 'lines',
    line = dict(shape = 'linear', color = '#FFDD33', width = 1, dash = 'dot'),
    connectgaps = True
)

fig.show()

以下是数据：

months = ["Mar '19", "Mar '19", "Mar '19", 
          "Apr '19", "Apr '19", "Apr '19", 
          "May '19", "May '19", "May '19", 
          "Jun '19", "Jun '19", "Jun '19", 
          "Jul '19", "Jul '19", "Jul '19", 
          "Aug '19", "Aug '19", "Aug '19", 
          "Sep '19", "Sep '19", "Sep '19", 
          "Oct '19", "Oct '19", "Oct '19", 
          "Nov '19", "Nov '19", "Nov '19", 
          "Dec '19", "Dec '19", "Dec '19", 
          "Jan '20", "Jan '20", "Jan '20", 
          "Feb '20", "Feb '20", "Feb '20", 
          "Mar '20", "Mar '20", "Mar '20", 
          "Apr '20", "Apr '20", "Apr '20", 
          "May '20", "May '20", "May '20", 
          "Jun '20", "Jun '20", "Jun '20", 
          "Jul '20", "Jul '20", "Jul '20", 
          "Aug '20", "Aug '20", "Aug '20", 
          "Sep '20", "Sep '20", "Sep '20"]

groups = ['group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3']

spend = [57, 150, 75, 
        61.5, 156, 78, 
        66, 150, 75, 
        63, 162, 81, 
        69, 163.5, 81.75,
        76.5, 162, 81, 
        78, 168, 84,
        79.5, 168, 84, 
        84, 162, 81, 
        87, 169.5, 84.75, 
        93, 171, 85.5, 
        96, 169.5, 84.75, 
        97.5, 168, 84,
        97.9, 167.7, 84.5,
        98.4, 167.9, 85.1,
        99.9, 168.1, 85.7,
        100.9, 168, 86.1,
        101.6, 168.4, 86.3,
        102.7, 168.8, 86.9]

spend_by_group_list = list(zip(months, groups, spend))

spend_df = pd.DataFrame(spend_by_group_list, columns = ['month', 'group', 'spend'])

创建

expense\u df

后，我重新执行了数据处理步骤。我不能100%确定问题的根本原因是什么，因为您没有提供准确的代码来重现该问题。但是，如果您像这样拆分组，则应该没有问题：

spend_df[spend_df[“group”]==“groupN”]

。应保留月份顺序

# use spend_df created by your code

# split the different groups
split_month = 13
ls_actual = []  # by group
ls_forecast = []  # by group
for i in range(3):
    df = spend_df[spend_df["group"] == f"group{i+1}"]
    ls_actual.append(df[:split_month])
    ls_forecast.append(df[split_month:])

actual = pd.concat(ls_actual, axis=0)  # stack vertically

# plot
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "browser"

# actual
ls_colors = ['purple', '#33C1FF', '#FFDD33']
fig = px.line(
    actual, x="month", y="spend", color='group',
    color_discrete_map={f"group{i+1}": ls_colors[i] for i in range(3)},
    title="title"
)

# forecast
for i in range(3):
    fig.add_scatter(
        x=ls_forecast[i].month,
        y=ls_forecast[i].spend,
        mode='lines',
        line=dict(shape='linear', color=ls_colors[i], width=1, dash='dot'),
        connectgaps=True
    )

fig.show()

结果:

你可以提供一份样本数据吗？你说得对，我在编辑文章时删除了它！谢谢你让我知道@billhulld

spend_df

forecast

？还有什么是

go

和

px

？请确认您发布的代码的重复性，即使一些别名是常用的。Expense_df是整个df，其中预测值在3月20日之后开始，实际值到3月20日。这里的Px是plotly express，标题是任意标题：）