Python 使用在同一数据帧行中找到的列表中的术语选择数据帧行_Python_Pandas_Search

Python 使用在同一数据帧行中找到的列表中的术语选择数据帧行

python pandas search

Python 使用在同一数据帧行中找到的列表中的术语选择数据帧行,python,pandas,search,Python,Pandas,Search,我有一个有两列的数据框。一列包含术语的名称，第二列是与第一列关联的术语列表。通常看起来是这样的： Name Terms Jupiter [5,planet, big,] June [month,6,hot] Neptune [blue, planet,big] Seventeen [17, number,teen] Whale [animal, big, swim] 我想做的是通过搜索/查询第二列在第一列中查找术语。例如，如果我要搜索术语==‘行星

我有一个有两列的数据框。一列包含术语的名称，第二列是与第一列关联的术语列表。通常看起来是这样的：

Name      Terms
Jupiter    [5,planet, big,]
June       [month,6,hot]
Neptune    [blue, planet,big]
Seventeen  [17, number,teen]
Whale      [animal, big, swim]

我想做的是通过搜索/查询第二列在第一列中查找术语。例如，如果我要搜索术语==‘行星’，我想返回一个包含木星和海王星的列表，或者包含这两颗行星的部分日期框。我怎样才能在Python中做到这一点呢？

您可以使用

explode

：

df.loc[df.explode('Terms').query('Terms == "planet"').index]

输出：

      Name                Terms
0  Jupiter     [5, planet, big]
2  Neptune  [blue, planet, big]

      Name                Terms
0  Jupiter     [5, planet, big]
2  Neptune  [blue, planet, big]

或嵌套列表理解

df.loc[[any(n == 'planet' for n in i) for i in df['Terms']]]

输出：

      Name                Terms
0  Jupiter     [5, planet, big]
2  Neptune  [blue, planet, big]

      Name                Terms
0  Jupiter     [5, planet, big]
2  Neptune  [blue, planet, big]

时间：

%timeit df.loc[df.explode('Terms').query('Terms == "planet"').index]
7.07 ms ± 95 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df[df['Terms'].apply(lambda x : True if "Planet" in x else False)]
861 µs ± 43 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df.loc[[any(n == 'planet' for n in i) for i in df['Terms']]]
674 µs ± 33.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

试试这个，

df[df['Terms'].apply（lambda x：如果x中的“planet”为True，否则为False）]

是的，成功了。非常感谢你！非常感谢。这帮了大忙！