Python 熊猫多索引：对每个第一索引使用相同的第二索引_Python_Pandas_Matplotlib_Pandas Groupby

Python 熊猫多索引：对每个第一索引使用相同的第二索引

python pandas matplotlib

Python 熊猫多索引：对每个第一索引使用相同的第二索引,python,pandas,matplotlib,pandas-groupby,Python,Pandas,Matplotlib,Pandas Groupby,我有一个与多个参与者的聊天日志（来自whatsapp），我已将其转换为熊猫数据帧。目的是用不同的线条/颜色，以几种不同的绘图样式，绘制随时间发送的信息；条形图、折线图等（这对我来说主要是一个练习）我有一个类对象myConvo，其中myConvo.message_log是会话的数据帧。如果有帮助的话，在这篇文章的底部有一些虚拟数据。我首先按日期筛选所需数据： start_date=pd.Timestamp("2019-01-01 00:00:00") end_date=pd.Timestamp(

我有一个与多个参与者的聊天日志（来自whatsapp），我已将其转换为熊猫数据帧。目的是用不同的线条/颜色，以几种不同的绘图样式，绘制随时间发送的信息；条形图、折线图等（这对我来说主要是一个练习）

我有一个类对象myConvo，其中myConvo.message_log是会话的数据帧。如果有帮助的话，在这篇文章的底部有一些虚拟数据。我首先按日期筛选所需数据：

start_date=pd.Timestamp("2019-01-01 00:00:00")
end_date=pd.Timestamp("2019-12-31 00:00:00")
filt = (myConvo.message_log["date"] >= start_date) & (myConvo.message_log["date"] <= end_date)
df = myConvo.message_log[filt]
df.set_index("date", inplace=True)

旁注：我的程序还有一个选项，可以绘制每个人发送的累积消息，我必须在一次1人时获得这些消息，如下所示：

grouped_df.loc["Person 3"].cumsum()

理想情况下，我希望绘制每个人每天的消息计数（即分组的_df图）或发送的累积消息。我不知道如何使用pandas内置的plot方法来实现这一点，但以前在没有pandas的情况下通过将列表传递给matplotlib来实现这一点

现在我正在使用pandas，我一直在使用matplotlib将数据转换为列表和绘图，matplotlib可以工作，但正如您将看到的

Person 3

的时间数据（主索引）与

Person 1

或

Person 2

的时间索引数据不同，因此，将这些列表转换为列表会为每个人生成不同长度的列表。Matplotlib然后在尝试使用一个x轴数据（列表格式）打印时抛出错误

所以我的问题是：如何绘制一个以主索引日期时间为x轴的多索引数据帧，而将每个次索引作为不同的线？或如何确保dataframe为每个用户使用相同的次轴值，在message count列中填充零以表示任何缺失的数据

虚拟数据：

{'sender': {Timestamp('2019-07-29 19:58:00'): 'Person 2',
  Timestamp('2019-07-29 20:03:00'): 'Person 1',
  Timestamp('2019-01-08 19:22:00'): 'Person 2',
  Timestamp('2019-01-08 19:23:00'): 'Person 1',
  Timestamp('2019-01-08 19:25:00'): 'Person 2',
  Timestamp('2019-04-08 11:28:00'): 'Person 1',
  Timestamp('2019-04-08 11:29:00'): 'Person 1',
  Timestamp('2019-04-08 12:43:00'): 'Person 1',
  Timestamp('2019-04-08 12:49:00'): 'Person 2',
  Timestamp('2019-04-08 12:51:00'): 'Person 2',
  Timestamp('2019-08-25 22:33:00'): 'Person 1',
  Timestamp('2019-08-27 11:55:00'): 'Person 2',
  Timestamp('2019-08-27 18:35:00'): 'Person 2',
  Timestamp('2019-06-11 18:53:00'): 'Person 3',
  Timestamp('2019-06-11 18:54:00'): 'Person 2',
  Timestamp('2019-06-11 20:42:00'): 'Person 1',
  Timestamp('2019-07-11 00:16:00'): 'Person 2',
  Timestamp('2019-07-11 15:24:00'): 'Person 1',
  Timestamp('2019-07-11 16:06:00'): 'Person 2',
  Timestamp('2019-08-11 11:48:00'): 'Person 2',
  Timestamp('2019-08-11 11:53:00'): 'Person 1',
  Timestamp('2019-08-11 11:55:00'): 'Person 2',
  Timestamp('2019-08-11 11:59:00'): 'Person 3',
  Timestamp('2019-08-11 12:03:00'): 'Person 2',
  Timestamp('2019-12-24 13:40:00'): 'Person 2',
  Timestamp('2019-12-24 13:42:00'): 'Person 1',
  Timestamp('2019-12-24 13:43:00'): 'Person 2',
  Timestamp('2019-12-24 13:44:00'): 'Person 2'},
 'message': {Timestamp('2019-07-29 19:58:00'): 'Hello',
  Timestamp('2019-07-29 20:03:00'): 'Hi there',
  Timestamp('2019-01-08 19:22:00'): "How's things",
  Timestamp('2019-01-08 19:23:00'): 'good',
  Timestamp('2019-01-08 19:25:00'): 'I am glad',
  Timestamp('2019-04-08 11:28:00'): 'Me too.',
  Timestamp('2019-04-08 11:29:00'): 'Indeed we are.',
  Timestamp('2019-04-08 12:43:00'): 'I sure hope this is enough fake conversation for stackoverflow.',
  Timestamp('2019-04-08 12:49:00'): 'Better write a few more messages just in case',
  Timestamp('2019-04-08 12:51:00'): 'Oh yeah.',
  Timestamp('2019-08-25 22:33:00'): "I'm going to stop now.",
  Timestamp('2019-08-27 11:55:00'): 'redacted',
  Timestamp('2019-08-27 18:35:00'): 'redacted',
  Timestamp('2019-06-11 18:53:00'): 'redacted',
  Timestamp('2019-06-11 18:54:00'): 'redacted',
  Timestamp('2019-06-11 20:42:00'): 'redacted',
  Timestamp('2019-07-11 00:16:00'): 'redacted',
  Timestamp('2019-07-11 15:24:00'): 'redacted',
  Timestamp('2019-07-11 16:06:00'): 'redacted',
  Timestamp('2019-08-11 11:48:00'): 'redacted',
  Timestamp('2019-08-11 11:53:00'): 'redacted',
  Timestamp('2019-08-11 11:55:00'): 'redacted',
  Timestamp('2019-08-11 11:59:00'): 'redacted',
  Timestamp('2019-08-11 12:03:00'): 'redacted',
  Timestamp('2019-12-24 13:40:00'): 'redacted',
  Timestamp('2019-12-24 13:42:00'): 'redacted',
  Timestamp('2019-12-24 13:43:00'): 'redacted',
  Timestamp('2019-12-24 13:44:00'): 'redacted'}}

您可以在groupby之后取消堆叠

sender

，使其成为列并打印：

(df.groupby('sender').message
   .resample('D').count()
   .unstack('sender')
   .plot()
)

输出：

如果你想要一个累加的总和，只需在

之前这样做。plot

：

(df.groupby('sender').message
   .resample('D').count()
   .unstack('sender')
   .cumsum()
   .plot()
)

输出：

谢谢！这对我来说解决了两个问题。而且反应如此迅速。

(df.groupby('sender').message
   .resample('D').count()
   .unstack('sender')
   .plot()
)

(df.groupby('sender').message
   .resample('D').count()
   .unstack('sender')
   .cumsum()
   .plot()
)