Python 删除某些字符前后的文本

Python 删除某些字符前后的文本,python,python-3.x,pandas,Python,Python 3.x,Pandas,我不确定我是否有一个好的标题,所以如果有人有建议,我愿意接受 假设我有以下场景: 搜索“何处” 输入: <Dave likes cake.> <Dave goes to school.> <Where is dave today, after school?/><I do not know where dave is> <Cindy reads a book><Where is my shoe asked cindy.>&l

我不确定我是否有一个好的标题,所以如果有人有建议,我愿意接受

假设我有以下场景:

搜索“何处”

输入:

<Dave likes cake.> <Dave goes to school.> <Where is dave today, after school?/><I do not know where dave is>
<Cindy reads a book><Where is my shoe asked cindy.><Cindy likes bacon.><Cindy goes to the park.><where did cindy go?>
<Sally drinks wine.><The lake is where I am from commented Sally><Cindy watches day time television while watching the kids.><Cindy makes great sandwiches><where is the sandwich cindy made?>
编辑#4:返回的所有匹配项

提供我的解决方案的用户提到使用
findall
而不是
extract
返回所有行


现在已100%解决此问题。

解决方案可能是:

import re

a ='<Dave likes cake.> <Dave goes to school.> <Where is dave today, after school?/>'
b ='<Cindy reads a book><Where is my shoe asked cindy.><Cindy likes bacon.><Cindy goes to the park.>'
def find_where(str):
    mylist =str.split('<')
    r = re.compile(".*[W,w]here")
    newlist = list(filter(r.match, mylist)) # Read Note
    finallist = ['<'+x for x in newlist]
    return finallist[0]
并打印结果,即可获得输出:

'<Where is dave today, after school?/>'

解决方案可能如下所示:

import re

a ='<Dave likes cake.> <Dave goes to school.> <Where is dave today, after school?/>'
b ='<Cindy reads a book><Where is my shoe asked cindy.><Cindy likes bacon.><Cindy goes to the park.>'
def find_where(str):
    mylist =str.split('<')
    r = re.compile(".*[W,w]here")
    newlist = list(filter(r.match, mylist)) # Read Note
    finallist = ['<'+x for x in newlist]
    return finallist[0]
并打印结果,即可获得输出:

'<Where is dave today, after school?/>'

使用
str.extract

df.text.str.extract(r'(?i)(<[^<]*?where[^>]*?>)')

                                               0
0          <Where is dave today, after school?/>
1                <Where is my shoe asked cindy.>
2  <The lake is where I am from commented Sally>
df.text.str.extract(r'(?i)('))
0
0
1.
2.
正则表达式解释:

(?i)                        # Case insensitive matching
(                           # Start of matching group
  <                         # matches the < character
  [^<]                      # matches anything that's *not* <
  *?                        # matches zero-unlimited times
  where                     # matches the substring where
  [^>]                      # matches anything that's *not* >
  *?                        # matches zero-unlimited times
  >                         # matches >
)                           # end of matching group
(?i)#不区分大小写的匹配
(#匹配组的开始
<#匹配<字符
[^
*?#匹配零个无限次
>#匹配项>
)#匹配组结束

使用
str.extract

df.text.str.extract(r'(?i)(<[^<]*?where[^>]*?>)')

                                               0
0          <Where is dave today, after school?/>
1                <Where is my shoe asked cindy.>
2  <The lake is where I am from commented Sally>
df.text.str.extract(r'(?i)('))
0
0
1.
2.
正则表达式解释:

(?i)                        # Case insensitive matching
(                           # Start of matching group
  <                         # matches the < character
  [^<]                      # matches anything that's *not* <
  *?                        # matches zero-unlimited times
  where                     # matches the substring where
  [^>]                      # matches anything that's *not* >
  *?                        # matches zero-unlimited times
  >                         # matches >
)                           # end of matching group
(?i)#不区分大小写的匹配
(#匹配组的开始
<#匹配<字符
[^
*?#匹配零个无限次
>#匹配项>
)#匹配组结束


好奇这与熊猫有什么关系?我编辑我的帖子是为了添加上下文,而不是将它们连接到一行中,为什么不将每行存储在数据帧的一列中,然后利用
df['my Lines'].str.contains('where'))
编辑我的帖子是为了添加更多的上下文,这有意义吗?好奇这与熊猫有什么关系?我编辑帖子是为了添加上下文,而不是将它们连接到一行中,为什么不将每行存储在数据框的一列中,然后利用
df['my Lines'].str.contains('where'))
编辑我的帖子以添加更多上下文,这有意义吗?感谢您的帮助,现在就开始尝试。将在完成实验后更新。我决定使用另一个答案,因为它更简单。感谢您的尝试,我更新了您的帖子。感谢您的帮助,现在就开始尝试。将在完成实验后更新。我决定我想使用另一个答案,因为它更简单。谢谢你的尝试,我对你的帖子投了更高的票。我创建了这个数据框来测试:df2=pd.dataframe(['],columns=['A']),然后我键入了这个:df2.A.str.extract(r'(?I)('))返回的错误消息是re.error:在位置22处无需重复。当我使用该数据帧时,我得到
好的,继续测试。这只会找到第一次出现。使用
findall
查找所有发生的情况这很完美!我创建此数据帧是为了测试:df2=pd.dataframe(['],columns=['A']))然后我键入:df2.A.str.extract(r’(?I)())返回的错误消息是re.error:在位置22处无需重复当我使用该数据帧时,我得到
好的,继续测试。这只会找到第一次出现。使用
findall
查找所有出现的情况这太完美了!