Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/336.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在matplotlib/seaborn中高亮显示绘图的线段?_Python_Matplotlib_Time Series_Seaborn_Highlight - Fatal编程技术网

Python 如何在matplotlib/seaborn中高亮显示绘图的线段?

Python 如何在matplotlib/seaborn中高亮显示绘图的线段?,python,matplotlib,time-series,seaborn,highlight,Python,Matplotlib,Time Series,Seaborn,Highlight,我有多个时间序列和多个标签。每当有标签可用时,我都希望以红色突出显示时间序列 现有地块 我有一个折线图,可以突出显示绘图的某些元素,例如: 此绘图可以转换为cycleplot,突出显示某些周期,如: 但是现在标签不能再显示了 问题 如何将标签重新添加到循环绘图?任何方法都可以。但到目前为止,我认为最好用红色突出显示匹配的时间间隔 数据生成 要生成一些示例数据,请执行以下操作: %pylab inline import pandas as pd import numpy as np impo

我有多个时间序列和多个标签。每当有标签可用时,我都希望以红色突出显示时间序列

现有地块 我有一个折线图,可以突出显示绘图的某些元素,例如:

此绘图可以转换为cycleplot,突出显示某些周期,如:

但是现在标签不能再显示了

问题 如何将标签重新添加到循环绘图?任何方法都可以。但到目前为止,我认为最好用红色突出显示匹配的时间间隔

数据生成 要生成一些示例数据,请执行以下操作:

%pylab inline

import pandas as pd
import numpy as np
import seaborn as sns; sns.set()
import matplotlib.dates as mdates

aut_locator = mdates.AutoDateLocator(minticks=3, maxticks=7)
aut_formatter = mdates.ConciseDateFormatter(aut_locator)

import random
random_seed = 47
np.random.seed(random_seed)

random.seed(random_seed)

def generate_df_for_device(n_observations, n_metrics, device_id, geo_id, topology_id, cohort_id):
        df = pd.DataFrame(np.random.randn(n_observations,n_metrics), index=pd.date_range('2020', freq='H', periods=n_observations))
        df.columns = [f'metrik_{c}' for c in df.columns]
        df['geospatial_id'] = geo_id
        df['topology_id'] = topology_id
        df['cohort_id'] = cohort_id
        df['device_id'] = device_id
        return df
    
def generate_multi_device(n_observations, n_metrics, n_devices, cohort_levels, topo_levels):
    results = []
    for i in range(1, n_devices +1):
        #print(i)
        r = random.randrange(1, n_devices)
        cohort = random.randrange(1, cohort_levels)
        topo = random.randrange(1, topo_levels)
        df_single_dvice = generate_df_for_device(n_observations, n_metrics, i, r, topo, cohort)
        results.append(df_single_dvice)
        #print(r)
    return pd.concat(results)

# hourly data, 1 week of data
n_observations = 7 * 24
n_metrics = 3
n_devices = 20
cohort_levels = 3
topo_levels = 5

df = generate_multi_device(n_observations, n_metrics, n_devices, cohort_levels, topo_levels)
df = df.sort_index()
df = df.reset_index().rename(columns={'index':'hour'})
df['dt'] = df.hour.dt.date

and labels:

marker_labels = pd.DataFrame({'cohort_id':[1,1, 1], 'marker_type':['a', 'b', 'a'], 'start':['2020-01-2', '2020-01-04 05', '2020-01-06'], 'end':[np.nan, '2020-01-05 16', np.nan]})
marker_labels['start'] = pd.to_datetime(marker_labels['start'])
marker_labels['end'] = pd.to_datetime(marker_labels['end'])
marker_labels.loc[marker_labels['end'].isnull(), 'end'] =  marker_labels.start + pd.Timedelta(days=1) - pd.Timedelta(seconds=1)
marker_labels
详细的Jupyter笔记本,包括示例数据和当前绘图代码,可在此处找到:

编辑 假设我们对时间段的标签执行左连接:

merged_res = (df.reset_index()
         .merge(marker_labels, on='cohort_id', how='left')
         .query('start <= hour <= end')
         .set_index('index')
         .reindex(df.index)
      )

merged_res = merged_res.combine_first(df)
merged_res.marker_type = merged_res.marker_type.fillna('no_labels_reported')
结果:

然而:

  • 这仍然相当混乱
  • 设备的单个时间序列在可视化中聚合/平均

到目前为止,最好的事情似乎是hvplot:

merged_res['hour_time'] = merged_res['hour'].dt.hour
merged_res.device_id = merged_res.device_id.astype(str)

for cohort_id in sorted(merged_res.cohort_id.unique()):
    print(cohort_id)
    current_plot = merged_res[merged_res.cohort_id == cohort_id].set_index(['hour_time'])[['metrik_0',  'marker_type', 'device_id', 'dt']].hvplot(by=['marker_type'], 
                                                                                                                                                  hover_cols=['dt', 'device_id'], width=width, height=height).opts(active_tools=['box_zoom'])
    display(current_plot)
导致:

因为我仍然不完全满意——我将把它留着(没有答案)看是否有人提出了更好的解决方案


特别是,我不喜欢这样显示线-可能点会更好。也就是说,当某些东西从没有标签变为有标签时,时间序列不是连续绘制的(=改变颜色),而是事实上跳跃(=创建了一条新的不同的线)。因此使用点也只是一种解决方法(但可能比有跳线要好。

也就是说,
line\u dash='domind'
值得一试,但也不完全值得一试。
merged_res = (df.reset_index()
         .merge(marker_labels, on='cohort_id', how='left')
         .query('start <= hour <= end')
         .set_index('index')
         .reindex(df.index)
      )

merged_res = merged_res.combine_first(df)
merged_res.marker_type = merged_res.marker_type.fillna('no_labels_reported')
for cohort_id in sorted(merged_res.cohort_id.unique()):
    print(cohort_id)
    
    figsize = (25, 9)
    fig, ax = plt.subplots(figsize=figsize)
    a1 = sns.lineplot(x=merged_res['hour'].dt.hour, y='metrik_0', hue='marker_type', units='dt', style='dt', estimator=None, data=merged_res[(merged_res.cohort_id == cohort_id)], ax=ax)
    handles, labels = a1.get_legend_handles_labels()
    a1.legend(handles=handles[1:], labels=labels[1:], loc='center', bbox_to_anchor=(0.5, -0.25), ncol=6, fontsize=20)

    plt.title(f'cohort_id: {cohort_id}', fontsize=35)
    plt.xlabel('hour of the day', fontsize=35)
    plt.ylabel('metrik_0', fontsize=35)
    plt.show()
merged_res['hour_time'] = merged_res['hour'].dt.hour
merged_res.device_id = merged_res.device_id.astype(str)

for cohort_id in sorted(merged_res.cohort_id.unique()):
    print(cohort_id)
    current_plot = merged_res[merged_res.cohort_id == cohort_id].set_index(['hour_time'])[['metrik_0',  'marker_type', 'device_id', 'dt']].hvplot(by=['marker_type'], 
                                                                                                                                                  hover_cols=['dt', 'device_id'], width=width, height=height).opts(active_tools=['box_zoom'])
    display(current_plot)