Python:如果行中只有一个单词,则替换dataframe/列中的字符串
我有相当混乱的数据,我正在尝试用或空字符串替换可能只包含1个单词或字符串的行 以下是原始数据:Python:如果行中只有一个单词,则替换dataframe/列中的字符串,python,python-3.x,pandas,Python,Python 3.x,Pandas,我有相当混乱的数据,我正在尝试用或空字符串替换可能只包含1个单词或字符串的行 以下是原始数据: df = pd.DataFrame({'some_text': [ 'I enjoy read Mark Twain\'s Books', 'Library is very useful', '/', '\\', '/ /', '', 'I enjoy read Mark Twain\'s
df = pd.DataFrame({'some_text': [
'I enjoy read Mark Twain\'s Books',
'Library is very useful',
'/',
'\\',
'/ /',
'',
'I enjoy read Mark Twain\'s Books',
'an',
'the',
'Books are interesting'
]})
我试过这个:这是删除行。我不想删除行,只要替换它就行了
count = df['some_text'].str.split().str.len()
df[~(count==1)]
所需的最终产出:
I enjoy read Mark Twain's Books
Library is very useful
/ /
I enjoy read Mark Twain's Books
Books are interesting
可以在不使用遮罩的情况下对列应用转换:
df['replaced_text'] = df['some_text'].apply(lambda x: '' if len(x.strip().split()) == 1 else x)
print(df.to_string())
df
>>
some_text replaced_text
0 I enjoy read Mark Twain's Books I enjoy read Mark Twain's Books
1 Library is very useful Library is very useful
2 /
3 \
4 / / / /
5
6 I enjoy read Mark Twain's Books I enjoy read Mark Twain's Books
7 an
8 the
9 Books are interesting Books are interesting
与您所应用的非常类似,lambda函数检查每个长度等于1的带空格字符串,并将其替换为。您可以在不带掩码的情况下对列应用转换:
df['replaced_text'] = df['some_text'].apply(lambda x: '' if len(x.strip().split()) == 1 else x)
print(df.to_string())
df
>>
some_text replaced_text
0 I enjoy read Mark Twain's Books I enjoy read Mark Twain's Books
1 Library is very useful Library is very useful
2 /
3 \
4 / / / /
5
6 I enjoy read Mark Twain's Books I enjoy read Mark Twain's Books
7 an
8 the
9 Books are interesting Books are interesting
与您所应用的非常类似,lambda函数检查每个字符串,其中删除了长度等于1的空白,并将其替换为。您可以在此处使用一个简单的正则表达式:
df['new_text'] = df['some_text'].str.replace('^\S+$','');
>>> df
some_text new_text
0 I enjoy read Mark Twain's Books I enjoy read Mark Twain's Books
1 Library is very useful Library is very useful
2 /
3 \
4 / / / /
5
6 I enjoy read Mark Twain's Books I enjoy read Mark Twain's Books
7 an
8 the
9 Books are interesting Books are interesting
您可以在此处使用简单的正则表达式:
df['new_text'] = df['some_text'].str.replace('^\S+$','');
>>> df
some_text new_text
0 I enjoy read Mark Twain's Books I enjoy read Mark Twain's Books
1 Library is very useful Library is very useful
2 /
3 \
4 / / / /
5
6 I enjoy read Mark Twain's Books I enjoy read Mark Twain's Books
7 an
8 the
9 Books are interesting Books are interesting
使用您所做的实现,而不是删除行,而是按如下方式指定一个新值:
count = df['some_text'].str.split().str.len()
df[count == 1] = ""
使用您所做的实现,而不是删除行,而是按如下方式指定一个新值:
count = df['some_text'].str.split().str.len()
df[count == 1] = ""
请注意,此正则表达式不会替换只有一个单词但也有前导或尾随空格的字符串,但可以根据需要进行修改。请注意,此正则表达式不会替换只有一个单词但也有前导或尾随空格的字符串,但可以根据需要进行修改。