Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/322.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何从数据框的文本字段中删除松散的字母_Python_Pandas_Dataframe - Fatal编程技术网

Python 如何从数据框的文本字段中删除松散的字母

Python 如何从数据框的文本字段中删除松散的字母,python,pandas,dataframe,Python,Pandas,Dataframe,我有以下数据源: import pandas as pd df_Msg = pd.DataFrame({'Id': [1, 2, 3], 'Sentence': ['I like fictions', 'Thank s you', 'I need to by a new book']}) print(df_Msg) 输出: Id Sentence 1 I like fictions 2 Thank s you 3 I need

我有以下数据源:

  import pandas as pd
  df_Msg = pd.DataFrame({'Id': [1, 2, 3], 
               'Sentence': ['I like fictions', 'Thank s you', 'I need to by a new book']})

  print(df_Msg)
输出:

Id   Sentence
1    I like fictions
2    Thank s you
3    I need to by a new book
我想删除字母表中单独出现的所有字母。在这种情况下,我需要删除字母:I、s和a。 为此,我使用了replace(),如下所示:

df_Msg['Sentence'] = df_Msg['Sentence'].replace('I', '', regex=True)
df_Msg['Sentence'] = df_Msg['Sentence'].replace('s', '', regex=True)
df_Msg['Sentence'] = df_Msg['Sentence'].replace('a', '', regex=True)
输出:

Id  Sentence
1   like fiction
2   Thnk you
3   need to by new book
但是,我希望输出为:

Id   Sentence
1    like fictions
2    Thank you
3    need to by new book

谢谢

IIUC使用单词边界:

print(df_Msg["Sentence"].str.replace(r"\b[A-Za-z]\b\s?", ""))

0          like fictions
1              Thank you
2    need to by new book
Name: Sentence, dtype: object

为避免背对背的单字符单词周围出现空白的复杂情况,请拆分、删除1个字符长的单词,然后重新连接

df_Msg = pd.DataFrame({'Id': [1, 2, 3, 4], 
                      'Sentence': ['I like fictions', 'Thank s you', 'I need to by a new book', 
                                   'a b foo b b bar baz b c']})

s = df_Msg['Sentence'].str.split(expand=True).stack()
df_Msg['sentence'] = s[s.str.len().gt(1)].groupby(level=0).agg(' '.join)

你可以这样做

import pandas as pd
df_Msg = pd.DataFrame({'Id': [1, 2, 3], 
                   'Sentence': ['I like fictions', 'Thank s you', 'I need to by a new book']})

for i in range(len(df_Msg['Sentence'])):
   lst = df_Msg['Sentence'][i].split()
   for item in lst:
        if len(item) == 1 and item == 'I' or item == 's' or item == 'a':
            lst.remove(item)
        elif item[-1] == 's':
            lst.remove(item)
            item = item.replace('s','')
            lst.append(item)
   sentence = " ".join(lst)
   df_Msg['Sentence'][i] = sentence

在句子栏中使用apply方法

df_Msg['Sentence'] = df_Msg.Sentence.apply(lambda x :" ".join([i for i in x.split() if len(i)>1]))

对于正则表达式替换来说,这是一项简单的工作,尽管挑战在于,如果您非常精确地知道在字符串的开头、中间或结尾处应该如何处理周围的空格。
df_Msg['Sentence'] = df_Msg.Sentence.apply(lambda x :" ".join([i for i in x.split() if len(i)>1]))