Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/343.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在时间上找到对应X元素之前的所有Y元素熊猫、蟒蛇_Python_Pandas_Dataframe - Fatal编程技术网

Python 如何在时间上找到对应X元素之前的所有Y元素熊猫、蟒蛇

Python 如何在时间上找到对应X元素之前的所有Y元素熊猫、蟒蛇,python,pandas,dataframe,Python,Pandas,Dataframe,我使用Pandas来试图找到所有那些在时间上位于相应X元素之前的Y元素 df = {'time':[1,2,3,4,5,6,7,8], 'X':['x','w','r','a','k','y','u','xa'],'Y':['r','xa','a','x','w','u','k','y']} df = pd.DataFrame.from_dict(df) time X Y 0 1 x r 1 2 w xa 2 3 r a 3 4 a

我使用Pandas来试图找到所有那些在时间上位于相应X元素之前的Y元素

df = {'time':[1,2,3,4,5,6,7,8], 'X':['x','w','r','a','k','y','u','xa'],'Y':['r','xa','a','x','w','u','k','y']}

df = pd.DataFrame.from_dict(df)

time    X   Y
0   1   x   r
1   2   w   xa
2   3   r   a
3   4   a   x
4   5   k   w
5   6   y   u
6   7   u   k
7   8   xa  y
我希望达到的目标是:

time    X   Y
0   1   x   r
1   2   w   xa
2   3   r   a
5   6   y   u
有什么想法吗?

试试这个:

result = df[df.apply(lambda row: row['Y'] in df.loc[row.time:,'X'].values, axis=1)]

print(result)

   time  X   Y
0     1  x   r
1     2  w  xa
2     3  r   a
5     6  y   u

您可以制作两个跟踪索引的词典。然后使用获取布尔索引,然后使用

df.不建议在轴1上应用
,应将其作为您的最后手段。退房

下面是支持该声明的timeit分析

In [74]: %%timeit
    ...: df[df.apply(lambda row: row['Y'] in df.loc[row.time:,'X'].values, axis=1)]
    ...:
    ...:
2.26 ms ± 203 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [80]: %%timeit
    ...: idx = dict(zip(df['X'],df['time']))
    ...: idx2 = dict(zip(df['Y'],df['time']))
    ...: mask = df['Y'].map(lambda k: idx[k]>idx2[k])
    ...: x = df[mask]
    ...:
    ...:
498 µs ± 30.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

几乎快了5倍。

6 7 u k
为什么不添加此项?因为X列中的k出现在YM列中的k之前,我理解了这个问题。现在它有意义了。谢谢你的澄清。好问题+1。不是说答案不好,而是
df.apply
over axis 1只能作为最后手段,因为它效率很低
In [74]: %%timeit
    ...: df[df.apply(lambda row: row['Y'] in df.loc[row.time:,'X'].values, axis=1)]
    ...:
    ...:
2.26 ms ± 203 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [80]: %%timeit
    ...: idx = dict(zip(df['X'],df['time']))
    ...: idx2 = dict(zip(df['Y'],df['time']))
    ...: mask = df['Y'].map(lambda k: idx[k]>idx2[k])
    ...: x = df[mask]
    ...:
    ...:
498 µs ± 30.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)