Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/353.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在另一个索引中将序列的索引移动1行?_Python_Pandas_Dataframe_Datetime_Series - Fatal编程技术网

Python 如何在另一个索引中将序列的索引移动1行?

Python 如何在另一个索引中将序列的索引移动1行?,python,pandas,dataframe,datetime,series,Python,Pandas,Dataframe,Datetime,Series,我有一个名为“raw_Ix”的pd.DatetimeIndex,其中包含我正在使用的所有索引和两个pandas(时间)序列(“t1”和“nextloc_Ix”)(都具有相同的时间索引)。 “nextloc_ixS”中的值与t1.index和nextloc_ixS.index的索引相同,在原始_Ix中移位1。为了更好地理解“nextloc_ixS”是什么: 这三个函数都被传递到一个函数中,我需要它们的形式如下: 我需要删除t1.index不在raw_Ix中的t1行(为了避免错误,因为raw_Ix可

我有一个名为“raw_Ix”的pd.DatetimeIndex,其中包含我正在使用的所有索引和两个pandas(时间)序列(“t1”和“nextloc_Ix”)(都具有相同的时间索引)。 “nextloc_ixS”中的值与t1.index和nextloc_ixS.index的索引相同,在原始_Ix中移位1。为了更好地理解“nextloc_ixS”是什么:

这三个函数都被传递到一个函数中,我需要它们的形式如下:

  • 我需要删除t1.index不在raw_Ix中的t1行(为了避免错误,因为raw_Ix可能被操纵)
  • 在此之后,我现在深入复制t1(我们称之为t1_copy)。因为我需要nextloc_ixS的值作为t1_copy的新DatetimeIndex。(听起来很简单,但我遇到了困难)
  • 但在替换的索引之前,我可能需要将t1_copy的旧索引保存为t1_copy中的一列,以用于最后一步(=步骤5)
  • 实际函数在特定过程中选择t1\U copy的一些索引,并返回“result”,这是一个pd.DatetimeIndex,其中包含t1\U copy的一些索引和重复项
  • 我需要将结果移回1,但不是通过np.searchsorted。(注意:“结果”仍然被人为地向前移动,因此我们可以通过在t1_copy.index中获取索引位置,然后在步骤3的备份列中获取“旧”索引,将其向后设置
  • 我知道这听起来有点复杂,因此下面是我编写的低效代码:

    def main(raw_Ix, t1, nextloc, nextloc_ixS=None):   
    
        t1_copy = t1.copy(deep=True).to_frame()
        nextloc_ixS = nextloc_ixS.to_frame() 
        
        if nextloc_ixS is not None: 
             
            t1_copy                  = t1_copy.loc[t1_copy.index.intersection(pd.DatetimeIndex(raw_Ix))] 
            t1_copy                  = t1_copy[~t1_copy.index.duplicated(keep='first')]# somehow duplicates came up, I couldnt explain why
            t1_copy["index_old"] = t1_copy.index.copy(deep=True) 
            temp                     = nextloc_ixS.loc[nextloc_ixS.index.intersection(raw_Ix)].copy(deep=True) 
            t1_copy.set_index(pd.DatetimeIndex(temp[~temp.index.duplicated(keep='first')].values), inplace=True) # somehow duplicates came up, I couldnt explain why therefore the .duplicated(...)
    
    
    else: # in this case we just should find the intersection
            t1_copy = t1_copy.loc[t1.index.intersection(pd.DatetimeIndex(raw_Ix))]
            t1_copy = t1_copy[~t1_copy.index.duplicated(keep='first')]  
    
    result = func(t1_copy, raw_Ix) # this function is a huge nested algorithm. For relevance one can get random indices from t1_copy with duplicates (result has the same length as t1_copy, but random chosen indices with multiple duplicates)
    
    if nextloc:
       # this is just "pseudo" code.
       result_locations = t1_copy.index.where(result)
       result = t1_copy["index_old"].iloc[result_locations]
    
    
    简言之: 我尝试前后移动索引,同时避免使用np.searchsorted(),而是使用两个pd.Series(或者更好地称它为列,因为它们是从数据帧中单独传递的)


    在代码线和时间使用方面,有什么方法可以有效地做到这一点吗?(非常多的行)

    要实现两件事,您的逻辑很复杂

  • 删除不在列表中的行。我为此使用了一个技巧,因此可以使用
    dropna()
  • shift()
    a列
  • 这执行得相当不错。在大于0.5m行的数据集上只需几分之一秒

    import time
    d = [d for d in pd.date_range(dt.datetime(2015,5,1,2), 
                              dt.datetime(2020,5,1,4), freq="128s") 
         if random.randint(0,3) < 2 ] # miss some sample times... 
    
    # random manipulation of rawIdx so there are some rows where ts is not in rawIdx
    df = pd.DataFrame({"ts":d, "rawIdx":[x if random.randint(0,3)<=2 
                                         else x + pd.Timedelta(1, unit="s") for x in d], 
                       "val":[random.randint(0,50) for x in d]}).set_index("ts")
    start = time.time()
    print(f"size before: {len(df)}")
    dfc = df.assign(
        # make it float64 so can have nan, map False to nan so can dropna() rows that are not in rawIdx
        issue=lambda dfa: np.array(np.where(dfa.index.isin(dfa["rawIdx"]),True, np.nan), dtype="float64"),
    ).dropna().drop(columns="issue").assign(
        # this should be just a straight forward shift.  rawIdx will be same as index due to dropna()
        nextloc_ixS=df.rawIdx.shift(-1)
    )
    
    print(f"size after: {len(dfc)}\ntime: {time.time()-start:.2f}s\n\n{dfc.head().to_string()}")
    
    import time
    d = [d for d in pd.date_range(dt.datetime(2015,5,1,2), 
                              dt.datetime(2020,5,1,4), freq="128s") 
         if random.randint(0,3) < 2 ] # miss some sample times... 
    
    # random manipulation of rawIdx so there are some rows where ts is not in rawIdx
    df = pd.DataFrame({"ts":d, "rawIdx":[x if random.randint(0,3)<=2 
                                         else x + pd.Timedelta(1, unit="s") for x in d], 
                       "val":[random.randint(0,50) for x in d]}).set_index("ts")
    start = time.time()
    print(f"size before: {len(df)}")
    dfc = df.assign(
        # make it float64 so can have nan, map False to nan so can dropna() rows that are not in rawIdx
        issue=lambda dfa: np.array(np.where(dfa.index.isin(dfa["rawIdx"]),True, np.nan), dtype="float64"),
    ).dropna().drop(columns="issue").assign(
        # this should be just a straight forward shift.  rawIdx will be same as index due to dropna()
        nextloc_ixS=df.rawIdx.shift(-1)
    )
    
    print(f"size after: {len(dfc)}\ntime: {time.time()-start:.2f}s\n\n{dfc.head().to_string()}")
    
    size before: 616264
    size after: 462207
    time: 0.13s
    
                                     rawIdx  val         nextloc_ixS
    ts                                                              
    2015-05-01 02:02:08 2015-05-01 02:02:08   33 2015-05-01 02:06:24
    2015-05-01 02:06:24 2015-05-01 02:06:24   40 2015-05-01 02:08:33
    2015-05-01 02:10:40 2015-05-01 02:10:40   15 2015-05-01 02:12:48
    2015-05-01 02:12:48 2015-05-01 02:12:48   45 2015-05-01 02:17:04
    2015-05-01 02:17:04 2015-05-01 02:17:04   14 2015-05-01 02:21:21