Python 删除某些字符前后的文本_Python_Python 3.x_Pandas

Python 删除某些字符前后的文本

python python-3.x pandas

Python 删除某些字符前后的文本,python,python-3.x,pandas,Python,Python 3.x,Pandas,我不确定我是否有一个好的标题，所以如果有人有建议，我愿意接受假设我有以下场景：搜索“何处” 输入： <Dave likes cake.> <Dave goes to school.> <Where is dave today, after school?/><I do not know where dave is> <Cindy reads a book><Where is my shoe asked cindy.>&l

我不确定我是否有一个好的标题，所以如果有人有建议，我愿意接受

假设我有以下场景：

搜索“何处”

输入：

<Dave likes cake.> <Dave goes to school.> <Where is dave today, after school?/><I do not know where dave is>
<Cindy reads a book><Where is my shoe asked cindy.><Cindy likes bacon.><Cindy goes to the park.><where did cindy go?>
<Sally drinks wine.><The lake is where I am from commented Sally><Cindy watches day time television while watching the kids.><Cindy makes great sandwiches><where is the sandwich cindy made?>

编辑#4:返回的所有匹配项

提供我的解决方案的用户提到使用

findall

而不是

extract

返回所有行

现在已100%解决此问题。

解决方案可能是：

import re

a ='<Dave likes cake.> <Dave goes to school.> <Where is dave today, after school?/>'
b ='<Cindy reads a book><Where is my shoe asked cindy.><Cindy likes bacon.><Cindy goes to the park.>'
def find_where(str):
    mylist =str.split('<')
    r = re.compile(".*[W,w]here")
    newlist = list(filter(r.match, mylist)) # Read Note
    finallist = ['<'+x for x in newlist]
    return finallist[0]

并打印结果，即可获得输出：

'<Where is dave today, after school?/>'

解决方案可能如下所示：

import re

a ='<Dave likes cake.> <Dave goes to school.> <Where is dave today, after school?/>'
b ='<Cindy reads a book><Where is my shoe asked cindy.><Cindy likes bacon.><Cindy goes to the park.>'
def find_where(str):
    mylist =str.split('<')
    r = re.compile(".*[W,w]here")
    newlist = list(filter(r.match, mylist)) # Read Note
    finallist = ['<'+x for x in newlist]
    return finallist[0]

并打印结果，即可获得输出：

'<Where is dave today, after school?/>'

使用

str.extract

：

df.text.str.extract(r'(?i)(<[^<]*?where[^>]*?>)')

                                               0
0          <Where is dave today, after school?/>
1                <Where is my shoe asked cindy.>
2  <The lake is where I am from commented Sally>

df.text.str.extract（r'（？i）（'））
0
0
1.
2.

正则表达式解释：

(?i)                        # Case insensitive matching
(                           # Start of matching group
  <                         # matches the < character
  [^<]                      # matches anything that's *not* <
  *?                        # matches zero-unlimited times
  where                     # matches the substring where
  [^>]                      # matches anything that's *not* >
  *?                        # matches zero-unlimited times
  >                         # matches >
)                           # end of matching group

（？i）#不区分大小写的匹配
（#匹配组的开始
<#匹配<字符
[^
*？#匹配零个无限次
>#匹配项>
)#匹配组结束

使用

str.extract

：

df.text.str.extract(r'(?i)(<[^<]*?where[^>]*?>)')

                                               0
0          <Where is dave today, after school?/>
1                <Where is my shoe asked cindy.>
2  <The lake is where I am from commented Sally>

df.text.str.extract（r'（？i）（'））
0
0
1.
2.

正则表达式解释：

(?i)                        # Case insensitive matching
(                           # Start of matching group
  <                         # matches the < character
  [^<]                      # matches anything that's *not* <
  *?                        # matches zero-unlimited times
  where                     # matches the substring where
  [^>]                      # matches anything that's *not* >
  *?                        # matches zero-unlimited times
  >                         # matches >
)                           # end of matching group

（？i）#不区分大小写的匹配
（#匹配组的开始
<#匹配<字符
[^
*？#匹配零个无限次
>#匹配项>
)#匹配组结束

好奇这与熊猫有什么关系？我编辑我的帖子是为了添加上下文，而不是将它们连接到一行中，为什么不将每行存储在数据帧的一列中，然后利用

df['my Lines'].str.contains（'where'））

编辑我的帖子是为了添加更多的上下文，这有意义吗？好奇这与熊猫有什么关系？我编辑帖子是为了添加上下文，而不是将它们连接到一行中，为什么不将每行存储在数据框的一列中，然后利用

df['my Lines'].str.contains（'where'））

编辑我的帖子以添加更多上下文，这有意义吗？感谢您的帮助，现在就开始尝试。将在完成实验后更新。我决定使用另一个答案，因为它更简单。感谢您的尝试，我更新了您的帖子。感谢您的帮助，现在就开始尝试。将在完成实验后更新。我决定我想使用另一个答案，因为它更简单。谢谢你的尝试，我对你的帖子投了更高的票。我创建了这个数据框来测试：df2=pd.dataframe（[']，columns=['A']），然后我键入了这个：df2.A.str.extract（r'（？I）（'））返回的错误消息是re.error:在位置22处无需重复。当我使用该数据帧时，我得到

好的，继续测试。这只会找到第一次出现。使用

findall

查找所有发生的情况这很完美！我创建此数据帧是为了测试：df2=pd.dataframe（[']，columns=['A']））然后我键入：df2.A.str.extract（r’（？I）（））返回的错误消息是re.error：在位置22处无需重复当我使用该数据帧时，我得到

好的，继续测试。这只会找到第一次出现。使用

findall

查找所有出现的情况这太完美了！