Python 根据数字序列将电子表格拆分为多个部分_Python_Google Sheets_Spreadsheet

Python 根据数字序列将电子表格拆分为多个部分

python google-sheets

Python 根据数字序列将电子表格拆分为多个部分,python,google-sheets,spreadsheet,Python,Google Sheets,Spreadsheet,我在电子表格中有一个数据集，它基本上是关于纽约地铁每趟列车的数据 ╔═══════╦══════╦══════════════╦════════════════╦═════════╦═══════════════╦══════════════════╗ ║ trip ║ id ║ arrival_time ║ departure_time ║ stop_id ║ stop_sequence ║ Station ║ ╠═══════╬══════╬═════════════

我在电子表格中有一个数据集，它基本上是关于纽约地铁每趟列车的数据

╔═══════╦══════╦══════════════╦════════════════╦═════════╦═══════════════╦══════════════════╗
║ trip  ║  id  ║ arrival_time ║ departure_time ║ stop_id ║ stop_sequence ║     Station      ║
╠═══════╬══════╬══════════════╬════════════════╬═════════╬═══════════════╬══════════════════╣
║ GO505 ║ 20_2 ║ 0:06:00      ║ 0:06:00        ║     237 ║             1 ║ Penn Station     ║
║ GO505 ║ 20_2 ║ 0:18:00      ║ 0:18:00        ║     214 ║             2 ║ Woodside         ║
║ GO505 ║ 20_2 ║ 0:23:00      ║ 0:23:00        ║      55 ║             3 ║ Forest Hills     ║
║ GO505 ║ 20_2 ║ 0:25:00      ║ 0:25:00        ║     107 ║             4 ║ Kew Gardens      ║
║ GO505 ║ 20_2 ║ 0:29:00      ║ 0:32:00        ║     102 ║             5 ║ Jamaica          ║
║ GO505 ║ 20_2 ║ 0:47:00      ║ 0:47:00        ║     183 ║             6 ║ Rockville Centre ║
║ GO505 ║ 20_2 ║ 0:50:00      ║ 0:50:00        ║     225 ║             7 ║ Baldwin          ║
║ GO505 ║ 20_2 ║ 0:53:00      ║ 0:53:00        ║      64 ║             8 ║ Freeport         ║
║ GO505 ║ 20_2 ║ 0:56:00      ║ 0:56:00        ║     226 ║             9 ║ Merrick          ║
║ GO505 ║ 20_2 ║ 0:59:00      ║ 0:59:00        ║      16 ║            10 ║ Bellmore         ║
║ GO505 ║ 20_2 ║ 1:02:00      ║ 1:02:00        ║     215 ║            11 ║ Wantagh          ║
║ GO505 ║ 20_2 ║ 1:05:00      ║ 1:05:00        ║     187 ║            12 ║ Seaford          ║
║ GO505 ║ 20_2 ║ 1:07:00      ║ 1:07:00        ║     136 ║            13 ║ Massapequa       ║
║ GO505 ║ 20_2 ║ 1:09:00      ║ 1:09:00        ║     135 ║            14 ║ Massapequa Park  ║
║ GO505 ║ 20_2 ║ 1:12:00      ║ 1:12:00        ║       8 ║            15 ║ Amityville       ║
║ GO505 ║ 20_2 ║ 1:15:00      ║ 1:15:00        ║      38 ║            16 ║ Copiague         ║
║ GO505 ║ 20_2 ║ 1:18:00      ║ 1:18:00        ║     117 ║            17 ║ Lindenhurst      ║
║ GO505 ║ 20_2 ║ 1:23:00      ║ 1:23:00        ║      27 ║            18 ║ Babylon          ║
║ GO505 ║ 20_3 ║ 1:00:00      ║ 1:00:00        ║      27 ║             1 ║ Babylon          ║
║ GO505 ║ 20_3 ║ 1:05:00      ║ 1:05:00        ║     117 ║             2 ║ Lindenhurst      ║
║ GO505 ║ 20_3 ║ 1:08:00      ║ 1:08:00        ║      38 ║             3 ║ Copiague         ║
║ GO505 ║ 20_3 ║ 1:10:00      ║ 1:10:00        ║       8 ║             4 ║ Amityville       ║
║ GO505 ║ 20_3 ║ 1:13:00      ║ 1:13:00        ║     135 ║             5 ║ Massapequa Park  ║
╚═══════╩══════╩══════════════╩════════════════╩═════════╩═══════════════╩══════════════════╝

我需要根据stop_序列中的序列，以某种方式将其拆分为多个部分。从1到n（此处为18）的每个序列表示列车的一次跳闸。因此，例如，我需要计算每次旅行的时间（即每次最后一站的出发时间-第一站的到达时间）（大约有5000次）。我怎么能做到呢？我希望我能将python和pandas中的专栏分成几次旅行，并计算每次旅行的时间。但我不知道怎么做

我的预期产出是

行程id║ 行程时间

GO505 20_2║ x:xx:xx

GO505 20_3║ x:xx:xx

我是数据科学的新手。请帮忙

范围A:G->列车每次行程的数据

单元格I1:

=QUERY（{ArrayFormula（A:A&&&&B:B），ArrayFormula（VALUE（C:D）），“选择Col1，max（Col3）-min（Col2），其中Col1！=''按Col1标签分组max（Col3）-min（Col2）'行程持续时间'格式max（Col3）-min（Col2）'hh:mm:ss'

函数引用

您可以只选择与id匹配的行吗？这对你的工作很有用example@B.Go大约有5000个ID彼此不相关。所以这不是最好的办法。我的另一个想法是制作一个正则表达式，匹配所有的18个（我猜是巴比伦还是宾州站），并获取所有匹配的数组@B.哈哈，这也不可能，因为有时在同一条路线上有18个、17个或22个车站：（我想总是有>1个车站！？但是终点总是相同的2个城市吗？