Python-在dataframe行内的列表中搜索元素_Python_Python 3.x_Pandas

Python-在dataframe行内的列表中搜索元素

python python-3.x pandas

Python-在dataframe行内的列表中搜索元素,python,python-3.x,pandas,Python,Python 3.x,Pandas,我试图捕获列表格式的数据框/熊猫中的元素。下面捕获整个列表如果字符串存在，如何只按行捕获特定字符串的元素而忽略其余元素这是我试过的 l1 = [1,2,3,4,5,6] l2 = ['hello world \n my world','world is a great place \n we live in it','planet earth',np.NaN,'\n save the water',''] df = pd.DataFrame(list(zip(l1,l2)),

我试图捕获列表格式的数据框/熊猫中的元素。下面捕获整个列表如果字符串存在，如何只按行捕获特定字符串的元素而忽略其余元素

这是我试过的

l1 = [1,2,3,4,5,6]
l2 = ['hello world \n my world','world is a great place \n we live in it','planet earth',np.NaN,'\n save the water','']

df = pd.DataFrame(list(zip(l1,l2)),
            columns=['id','sentence'])
df['sentence_split'] = df['sentence'].str.split('\n')
print(df)

此代码的结果：

df[df.sentence_split.str.join(' ').str.contains('world', na=False)]  # does the trick but still not exactly what I am looking for. 


id  sentence                                  sentence_split
1   hello world \n my world                   [hello world , my world]
2   world is a great place \n we live in it   [world is a great place , we live in it]

但是寻找：

id  sentence                                  sentence_split
1   hello world \n my world                   hello world; my world
2   world is a great place \n we live in it   world is a great place

您正在搜索序列列表中的字符串。一种方法是：

# Drop NaN rows
df = df.dropna(subset=["sentence_split"])

应用只保留要查找的列表中的元素的函数

# Apply this lamda function
df["sentence_split"] = df["sentence_split"].apply(lambda x: [i for i in x if "world" in i])

   id                                 sentence             sentence_split
0   1                  hello world \n my world  [hello world ,  my world]
1   2  world is a great place \n we live in it  [world is a great place ]
2   3                             planet earth                         []
4   5                        \n save the water                         []
5   6                                                                  []