Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/vb.net/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Pandas 熊猫中基于时间和字符串比较函数的行分组_Pandas_Dataframe_Apply_Rolling Computation - Fatal编程技术网

Pandas 熊猫中基于时间和字符串比较函数的行分组

Pandas 熊猫中基于时间和字符串比较函数的行分组,pandas,dataframe,apply,rolling-computation,Pandas,Dataframe,Apply,Rolling Computation,我有一个数据帧,我想根据时间差和字符串之间的差异将其分组成行。原始数据帧实际上如下所示: import datetime import pandas as pd data = { 'timestamp': [ datetime.datetime(2020,10,10,12,0,0), datetime.datetime(2020,10,10,12,0,2), datetime.datetime(2020,10,10,12,0,10),

我有一个数据帧,我想根据时间差和字符串之间的差异将其分组成行。原始数据帧实际上如下所示:

import datetime
import pandas as pd

data = {
    'timestamp': [
        datetime.datetime(2020,10,10,12,0,0),
        datetime.datetime(2020,10,10,12,0,2),
        datetime.datetime(2020,10,10,12,0,10),
        datetime.datetime(2020,10,10,12,0,12),
        datetime.datetime(2020,10,10,12,0,30),
        datetime.datetime(2020,10,10,12,1,0),
        datetime.datetime(2020,10,10,12,3,0),
        datetime.datetime(2020,10,10,12,3,10),
        datetime.datetime(2020,10,10,12,3,40),
        datetime.datetime(2020,10,10,12,10,0)
        ],
    'row_number': [i for i in range(10)],
    'input': [
        'hello',
        'hello w',
        'hello wor',
        'this is a',
        'hello world',
        'this is a new',
        'hello',
        'hello w',
        'hello wor',
        'hello world'

        ]
}

pd.DataFrame(data=data)
此数据帧需要根据前面一行之间60秒或更少的差异和4或更少的字符串字符差异,将数据帧分组为多个部分,以便将这些行分成以下组。目前我已经将这些行格式化为单独的数据帧,但理想情况下应该是要上传到BigQuery中的列表列表

data = {
    'timestamp': [
        datetime.datetime(2020,10,10,12,0,0),
        datetime.datetime(2020,10,10,12,0,2),
        datetime.datetime(2020,10,10,12,0,10),
        datetime.datetime(2020,10,10,12,0,30)
        ],
    'row_number': [0,1,2,4],
    'input': [
        'hello',
        'hello w',
        'hello wor',
        'hello world'
        ]
}
pd.DataFrame(data=data)

data = {
    'timestamp': [
        datetime.datetime(2020,10,10,12,0,12),
        datetime.datetime(2020,10,10,12,1,0)
        ],
    'row_number': [3,5],
    'input': [
        'this is a',
        'this is a new',
        ]
}

pd.DataFrame(data=data)

data = {
    'timestamp': [
        datetime.datetime(2020,10,10,12,3,0),
        datetime.datetime(2020,10,10,12,3,10),
        datetime.datetime(2020,10,10,12,3,40),
        datetime.datetime(2020,10,10,12,10,0)
        ],
    'row_number': [6,7,8,9],
    'input': [
        'hello',
        'hello w',
        'hello wor',
        'hello world'
        ]
}
pd.DataFrame(data=data)
我已经有了字符串比较函数,但不确定如何使用apply将其应用于pandas滚动窗口函数,或者这是否是最有效的方法

df.rolling('60s').apply()
我目前使用字典列表进行比较,但当行数为~100k且有许多组只有一行时,循环进行所有比较需要很长时间。

df.rolling('60s')
df.rolling('60s')