Python 在大熊猫中，按组获得连续数周的最长连胜_Python_Pandas_Time Series

Python 在大熊猫中，按组获得连续数周的最长连胜

python pandas

Python 在大熊猫中，按组获得连续数周的最长连胜,python,pandas,time-series,Python,Pandas,Time Series,目前，我正在处理不同受试者的每周数据，但可能会有一些没有数据的长条纹，因此，我想做的是，每id保持连续几周的最长条纹。我的数据如下所示： id week 1 8 1 15 1 60 1 61 1 62 2 10 2 11 2 12 2 13 2 25 2 26 我的预期产出是： id week 1 60 1 61 1 62 2 10

目前，我正在处理不同受试者的每周数据，但可能会有一些没有数据的长条纹，因此，我想做的是，每

id

保持连续几周的最长条纹。我的数据如下所示：

我的预期产出是：

当

week

week.shift（）+1

时，我有点接近，试图用1标记。问题是这种方法不会标记条纹中的第一次出现，而且我也无法过滤最长的一次：

df.loc[ (df['id'] == df['id'].shift())&(df['week'] == df['week'].shift()+1),'streak']=1

根据我的例子，这将带来：

id    week  streak
1      8     nan
1      15    nan
1      60    nan
1      61    1
1      62    1
2      10    nan
2      11    1
2      12    1
2      13    1
2      25    nan
2      26    1

关于如何实现我想要的有什么想法吗

试试这个：

df['consec'] = df.groupby(['id',df['week'].diff(-1).ne(-1).shift().bfill().cumsum()]).transform('count')

df[df.groupby('id')['consec'].transform('max') == df.consec]

输出：

   id  week  consec
2   1    60       3
3   1    61       3
4   1    62       3
5   2    10       4
6   2    11       4
7   2    12       4
8   2    13       4

不像@ScottBoston那样简洁，但我喜欢这种方法

def max_streak(s):
  a = s.values    # Let's deal with an array

  # I need to know where the differences are not `1`.
  # Also, because I plan to use `diff` again, I'll wrap
  # the boolean array with `True` to make things cleaner
  b = np.concatenate([[True], np.diff(a) != 1, [True]])

  # Tell the locations of the breaks in streak
  c = np.flatnonzero(b)

  # `diff` again tells me the length of the streaks
  d = np.diff(c)

  # `argmax` will tell me the location of the largest streak
  e = d.argmax()

  return c[e], d[e]

def make_thing(df):
  start, length = max_streak(df.week)
  return df.iloc[start:start + length].assign(consec=length)

pd.concat([
  make_thing(g) for _, g in df.groupby('id')    
])

   id  week  consec
2   1    60       3
3   1    61       3
4   1    62       3
5   2    10       4
6   2    11       4
7   2    12       4
8   2    13       4

通过应用以下命令，您可以拥有另一列（streak1）：

week==week.shift（-1）-1

，这样您也可以识别第一列。您可能需要xor streak和streak1来获得最终结果。啊，这很好。唯一的问题是我得到了这个错误：

ValueError:当运行第一行时，传递的项目数错误30，placement意味着1

。你知道会发生什么吗？首先尝试升级pandas。在你的2rd groupkey中有一个小建议，我认为使用groupby创建它更节省。

df.groupby（'id'）.week.apply（lambda x:x.diff（）.ne（1.cumsum（））

尝试更新pandas，但现在我无法导入它。获取此错误：

AttributeError:module'numpy.core.umath'没有属性'divmod'

Edit:updated numpy，现在可以工作了。现在我得到了

ValueError：通过的项目数量错误25，位置意味着1