Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/323.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
python选择子数组与timedelta不同_Python_Pandas_Dataframe - Fatal编程技术网

python选择子数组与timedelta不同

python选择子数组与timedelta不同,python,pandas,dataframe,Python,Pandas,Dataframe,当我尝试按datetime delta获取子数组时,如下所示: dx = dx[(dx[['ts']].diff() > threshold).any(axis=1)] 在下面的示例中,它应该删除太近的标签,但它不起作用 完整代码: #!/usr/bin/env python import re import pandas as pd import numpy as np from datetime import datetime,timedelta import matplotlib.

当我尝试按datetime delta获取子数组时,如下所示:

dx = dx[(dx[['ts']].diff() > threshold).any(axis=1)]
在下面的示例中,它应该删除太近的标签,但它不起作用

完整代码:

#!/usr/bin/env python
import re
import pandas as pd
import numpy as np
from datetime import datetime,timedelta
import matplotlib.pyplot as plt

def str2dt(tstr):
    return datetime.strptime(tstr, '%m-%d %H:%M:%S.%f')

def plotme():
    lst = [
    "09-04 11:55:05.011 5",
    "09-04 11:55:15.011 2",
    "09-04 11:55:16.011 3",
    "09-04 11:55:20.011 4",
    "09-04 11:55:25.011 4",
    "09-04 11:55:30.011 4",
    "09-04 11:55:35.011 4",
    "09-04 11:55:40.011 4",
    "09-04 11:55:45.011 9",
    "09-04 11:55:50.011 9",
    "09-04 11:55:55.011 9",
    "09-04 11:56:00.011 9",
    "09-04 11:56:05.011 7",
    "09-04 11:56:15.011 8",
    ]
    scorelst = []
    tslst = []
    index = np.arange(1) # array of numbers for the number of samples
    df = pd.DataFrame(columns=["ts","score"], index=index)
    count = 0
    tsregex = "\d+-\d+ \d+:\d+:\d+.\d+"
    for line in lst:
        m = re.search("(%s).* (\d+)"%(tsregex),line)
        if m:
            tstr = m.group(1)
            tsdt = str2dt(tstr)
            tslst.append(tsdt)
            score = int(m.group(2))
            scorelst.append(score)
            df.ix[count] = [tsdt,score]
            count += 1
    fig, ax = plt.subplots(figsize=(12,6))
    ax.plot(df['ts'],df['score'],label = "score")  

    cols = ["score"]
    dfl = df[(df[cols].shift() != df[cols]).any(axis=1)]
    dfr = df[(df[cols].shift(-1) != df[cols]).any(axis=1)]
    dx = pd.concat([dfl,dfr],ignore_index=True)
    dx = dx.sort_values(['ts'])
    threshold = timedelta(seconds=5) 
    print dx[['ts']].diff()
    print dx[['ts']].diff() > threshold
    dx = dx[(dx[['ts']].diff() > threshold).any(axis=1)]
    fig.autofmt_xdate()
    ax.xaxis.set_ticks(np.array(dx['ts']))
    #ax.yaxis.set_ticks(delta['score'])
    ax.yaxis.grid(True)
    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)
    plt.show()
    return

plotme()
输出:

如果我理解正确,您希望删除时间与前一行太近的行。试试这个:

lst = [
    "09-04 11:55:05.011 5",
    "09-04 11:55:15.011 2",
    "09-04 11:55:16.011 3",
    "09-04 11:55:20.011 4",
    "09-04 11:55:25.011 4",
    "09-04 11:55:30.011 4",
    "09-04 11:55:35.011 4",
    "09-04 11:55:40.011 4",
    "09-04 11:55:45.011 9",
    "09-04 11:55:50.011 9",
    "09-04 11:55:55.011 9",
    "09-04 11:56:00.011 9",
    "09-04 11:56:05.011 7",
    "09-04 11:56:15.011 8",
]

df = pd.DataFrame(lst)[0].str.extract(r'(?P<Date>.+) (?P<Value>\d+)', expand=True)
df['Date'] = pd.to_datetime(df['Date'], format='%m-%d %H:%M:%S.%f')
df['Value'] = pd.to_numeric(df['Value'])

threshold = pd.Timedelta(seconds=5)
mask = df['Date'].diff() >= threshold
df[mask].plot('Date', 'Value')
lst=[
"09-04 11:55:05.011 5",
"09-04 11:55:15.011 2",
"09-04 11:55:16.011 3",
"09-04 11:55:20.011 4",
"09-04 11:55:25.011 4",
"09-04 11:55:30.011 4",
"09-04 11:55:35.011 4",
"09-04 11:55:40.011 4",
"09-04 11:55:45.011 9",
"09-04 11:55:50.011 9",
"09-04 11:55:55.011 9",
"09-04 11:56:00.011 9",
"09-04 11:56:05.011 7",
"09-04 11:56:15.011 8",
]
df=pd.DataFrame(lst)[0].str.extract(r'(?P.+)(?P\d+),expand=True)
df['Date']=pd.to_datetime(df['Date'],格式=“%m-%d%H:%m:%S.%f”)
df['Value']=pd.to_numeric(df['Value'])
阈值=pd.Timedelta(秒=5)
掩码=df['Date'].diff()>=阈值
df[mask]。绘图('Date','Value'))
输出:


请注意,此代码仅将一行与前一行进行比较。如果您有一长串的行,它们之间有很小的间隔(比如说,2秒),那么最终将丢失一个很大的范围。考虑在一个时间间隔内规范所有的观察。< /P>伟大的答案!这是非常漂亮的代码!听起来我无法将datetime对象放入df.ix[count],但需要将其转换为pd.to_datetime作为您的代码!但它不起作用的原因是什么?