Python 如何在数据框中查找任何位置都包含单个字符的句子_Python_Pandas_Dataframe

Python 如何在数据框中查找任何位置都包含单个字符的句子

python pandas dataframe

Python 如何在数据框中查找任何位置都包含单个字符的句子,python,pandas,dataframe,Python,Pandas,Dataframe,我试着从一个包含一个字符的单词的数据框中打印出句子，不管它是句子的开头、中间还是结尾，我试着的代码是 lookfor = '[' + re.escape("A-Za-z") + ']' tdata = pd.read_csv(fileinput, nrows=0).columns[0] skip = int(tdata.count(' ') == 0) tdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip) f

我试着从一个包含一个字符的单词的数据框中打印出句子，不管它是句子的开头、中间还是结尾，我试着的代码是

lookfor = '[' + re.escape("A-Za-z") + ']'

tdata = pd.read_csv(fileinput, nrows=0).columns[0]
skip = int(tdata.count(' ') == 0)
tdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip)



filtered = tdata[tdata.sentences.str.contains(lookfor, regex=True, na=False)]
print(filtered)

#a sample set
-----------------------------

#hi, how are; you z
#im  w good thanks
#How  am I
#good, what about  you
#my name is alex
#K hello, alex how are you !
#it  is a car
#great news
#thanks!
-----------------------------

expected output 

-----------------------------
#hi, how are; you z
#im  w good thanks
#How  am I
#K hello, alex how are you !
#it  is a car
-----------------------------

即使我在lookfor数组中写下了所有字母，它也不起作用。它将打印包含这些字母的任何句子，而不是当它们单独出现时。有任何想法吗？

与一个有单词边界的单词一起使用，并通过以下方式过滤：

编辑：对于排除

和

，您可以在比较之前使用

替换

：

df = df[df['sentences'].str.replace(r'\b[AI]\b', '').str.contains(r'\b\w{1}\b')]
print (df)
                     sentences
0           hi, how are; you z
1            im  w good thanks
5  K hello, alex how are you !
6                 it  is a car

或：

与一个具有单词边界的单词一起使用，并通过以下方式过滤：

编辑：对于排除

和

，您可以在比较之前使用

替换

：

df = df[df['sentences'].str.replace(r'\b[AI]\b', '').str.contains(r'\b\w{1}\b')]
print (df)
                     sentences
0           hi, how are; you z
1            im  w good thanks
5  K hello, alex how are you !
6                 it  is a car

或：

尝试：

df.loc[df.statemens.str.contains（r“（[^\w]|^）\w（[^\w]|$）”）

产出：

句子
你好，你好；尤兹
我很好，谢谢
我好吗
你好，亚历克斯你好！
这是一辆小汽车

试试：

df.loc[df.statemens.str.contains（r“（[^\w]|^）\w（[^\w]|$）”）

产出：

句子
你好，你好；尤兹
我很好，谢谢
我好吗
你好，亚历克斯你好！
这是一辆小汽车

我能做些什么来排除像A和I这样的字母吗？@programmingfreak-一个想法是在比较之前将它们替换为空字符串，答案是经过编辑的。这个答案非常适合英文脚本，你知道如何在阿拉伯语脚本上实现它，就像在阿拉伯语中出现单个字符时一样，比如بذ@programmingfreak-hmm，我认为最好是为阿拉伯语比赛创建特殊问题，从不使用阿拉伯语，所以不是ideacan我会做一些事情来排除字母A和I？@programmingfreak-一个想法是在比较之前将它们替换为空字符串，答案经过编辑。答案非常适合英文脚本，你知道如何在阿拉伯文脚本上实现它吗，就像每当单个字符出现在阿拉伯文中时一样，比如بذ@programmingfreak-hmm，我认为最好是为阿拉伯文匹配创建特殊问题，永远不要使用阿拉伯文，所以不要使用ideacan，我会做一些事情来排除字母A和I？这类似于：

r”（[^\w]|^）[bcdefghj…bcdefghj…]（[^\w]|$）“

只要把你想保留的字母放进去，我能做些什么来排除像A和I这样的字母吗？那就像：

r”（[^\w]| ^）[bcdefghj…bcdefghj…]（[^\w]|$）”

只要把你想保留的字母放进去就行了

df = df[~df['sentences'].str.contains(r'\b[AI]\b') & 
         df['sentences'].str.contains(r'\b\w{1}\b')]
print (df)
                     sentences
0           hi, how are; you z
1            im  w good thanks
5  K hello, alex how are you !
6                 it  is a car