Python 如何在一个数据帧中为一列找到两个具有相同值(字符串)的连续行,并在它们之间添加更多行?

Python 如何在一个数据帧中为一列找到两个具有相同值(字符串)的连续行,并在它们之间添加更多行?,python,pandas,dataframe,time-series,Python,Pandas,Dataframe,Time Series,如何在一个数据帧中为一列找到两个具有相同值(字符串)的连续行,并在它们之间添加更多行?数据帧具有timeseries索引 例如:如果列A中具有相同值的两个连续行的索引为5:30 pm和6:00 pm,我想在这两行之间添加更多的行,增量为1分钟,即5:01 pm、5:02 pm….5:59 pm。这里有一种方法: import pandas as pd import numpy as np # say this is your df: df = pd.DataFrame(index=pd.dat

如何在一个数据帧中为一列找到两个具有相同值(字符串)的连续行,并在它们之间添加更多行?数据帧具有timeseries索引

例如:如果列A中具有相同值的两个连续行的索引为5:30 pm和6:00 pm,我想在这两行之间添加更多的行,增量为1分钟,即5:01 pm、5:02 pm….5:59 pm。

这里有一种方法:

import pandas as pd
import numpy as np

# say this is your df:
df = pd.DataFrame(index=pd.date_range(periods=6, 
                                      start='12:00', end='12:30'))
df['A'] = [1,1,2,3,3,4]
print(df)

#                         A
#2019-05-09 12:00:00      1
#2019-05-09 12:06:00      1
#2019-05-09 12:12:00      2
#2019-05-09 12:18:00      3
#2019-05-09 12:24:00      3
#2019-05-09 12:30:00      4

# find positions with same value
ends_idx = np.arange(df.shape[0])[
    (df['A'].diff() == 0).values]

print(ends_idx)
# [1 4]

# create index with additional time stamps
old_index = df.index
new_index = sorted(np.unique(np.concatenate([
    pd.date_range(start=old_index[i-1], 
                  end=old_index[i], freq='min').values
    for i in ends_idx
] + [old_index.values])))

# create a new dataframe
new_df = pd.DataFrame(index=new_index)

# assign a default value
new_df['A'] = np.nan

# assign values from old dataframe
new_df.loc[old_index, 'A'] = df['A']
print(new_df)

#                       A
#2019-05-09 12:00:00  1.0
#2019-05-09 12:01:00  NaN
#2019-05-09 12:02:00  NaN
#2019-05-09 12:03:00  NaN
#2019-05-09 12:04:00  NaN
#2019-05-09 12:05:00  NaN
#2019-05-09 12:06:00  1.0
#2019-05-09 12:12:00  2.0
#2019-05-09 12:18:00  3.0
#2019-05-09 12:19:00  NaN
#2019-05-09 12:20:00  NaN
#2019-05-09 12:21:00  NaN
#2019-05-09 12:22:00  NaN
#2019-05-09 12:23:00  NaN
#2019-05-09 12:24:00  3.0
#2019-05-09 12:30:00  4.0
编辑:对于A中的字符串值,您可以将找到位置的零件替换为:

# find positions with same value
n = df.shape[0]
# place holders:
ends_idx = np.arange(n) 
same = np.array([False] * n)
# compare values explicitly
same[1:] = df['A'][1:].values == df['A'][:-1].values 
ends_idx = ends_idx[same]
使用:



你能给我们看一些样本数据吗?谢谢!但是我不能用df.diff来表示字符串。知道如何检查连续的字符串吗?嗨,我可以修改你的初始注释并对字符串进行检查。下面是我所做的:
mylist=(df.status!=df.status.shift()).values mylist=np.asarray([not I for I in mylist])ends\u idx=np.arange(df.shape[0])[mylist]
df = pd.DataFrame({'A':[1,1,2,3,3,4]}, index=pd.date_range(periods=6, 
                                        start='12:00', end='12:30'))
print(df)
                     A
2019-05-09 12:00:00  1
2019-05-09 12:06:00  1
2019-05-09 12:12:00  2
2019-05-09 12:18:00  3
2019-05-09 12:24:00  3
2019-05-09 12:30:00  4
df = df.asfreq('min')
print (df)
                       A
2019-05-09 12:00:00  1.0
2019-05-09 12:01:00  NaN
2019-05-09 12:02:00  NaN
2019-05-09 12:03:00  NaN
2019-05-09 12:04:00  NaN
2019-05-09 12:05:00  NaN
2019-05-09 12:06:00  1.0
2019-05-09 12:07:00  NaN
2019-05-09 12:08:00  NaN
2019-05-09 12:09:00  NaN
2019-05-09 12:10:00  NaN
2019-05-09 12:11:00  NaN
2019-05-09 12:12:00  2.0
2019-05-09 12:13:00  NaN
2019-05-09 12:14:00  NaN
2019-05-09 12:15:00  NaN
2019-05-09 12:16:00  NaN
2019-05-09 12:17:00  NaN
2019-05-09 12:18:00  3.0
2019-05-09 12:19:00  NaN
2019-05-09 12:20:00  NaN
2019-05-09 12:21:00  NaN
2019-05-09 12:22:00  NaN
2019-05-09 12:23:00  NaN
2019-05-09 12:24:00  3.0
2019-05-09 12:25:00  NaN
2019-05-09 12:26:00  NaN
2019-05-09 12:27:00  NaN
2019-05-09 12:28:00  NaN
2019-05-09 12:29:00  NaN
2019-05-09 12:30:00  4.0