如何在python中使用正则表达式删除特定单词？_Python_Regex - Fatal编程技术网

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/wcf/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在python中使用正则表达式删除特定单词？_Python_Regex - Fatal编程技术网

如何在python中使用正则表达式删除特定单词？

python regex

如何在python中使用正则表达式删除特定单词？,python,regex,Python,Regex,我正在研究一个机器学习模型，以推荐工作功能，员工可以根据其职务工作。我的数据集有两个分类变量（job title，job function）职务列的值与此类似： [“开罗高级销售代表”、“西班牙语技术支持代表”、“仅限女性电话销售专家”] 我想忽略很多词，例如“仅限”、“开罗”、“西班牙语”和“女性”” 在向算法提供数据之前，如何从数据集中删除这些单词？我试着把这些单词列成一个大列表，然后在我的职务列上迭代以检测这些单词并删除它们，但我认为这种方法会很乏味，因为我的数据集中有很多这样的单词是

我正在研究一个机器学习模型，以推荐

工作功能

，员工可以根据其

职务

工作。我的数据集有两个分类变量（

job title

，

job function

）

职务

列的值与此类似： [“开罗高级销售代表”、“西班牙语技术支持代表”、“仅限女性电话销售专家”]

我想忽略很多词，例如“

仅限”、“开罗”、“西班牙语”和“女性”
”
在向算法提供数据之前，如何从数据集中删除这些单词？我试着把这些单词列成一个大列表，然后在我的职务
列上迭代以检测这些单词并删除它们，但我认为这种方法会很乏味，因为我的数据集中有很多这样的单词
是否有正则表达式技术来检测和删除这些单词？
如果您想从给定文本中删除一组单词，可以这样做
baised_words = ["Spanish", "Females", "only", "Cario"]
pattern = r'''\b({})\b'''.format('|'.join(baised_words))
source_str = "...." # your source string
compiled_pattern = re.compile(pattern, re.I)
re.sub(compiled_pattern, '', source_str)

您可以查看
您可以检测单词并将其替换为空字符串。e、 g re.sub（r“regex模式”，是“”）你根据什么判断这些词毫无价值？@VJAYSLN我不希望模型对某些国家或性别有偏见。我想用作业标题来概括它，用一些语法使问题更容易理解。




[regex]相关文章推荐



                                                        
Regex 正则表达式从标题中删除文章-the、an、a
regex 
Regex 使用正则表达式匹配字符串，如果字符串包含；MSIE“；但如果它们包含；歌剧；
regex 
Regex 更改列中的值
regexstringr 
Regex 正则表达式解释（0+；1）*1（0+；1）*
regex 
Regex 用于删除特定字符之前的字符的正则表达式？
regexperl 
Regex 需要协助的特定正则表达式问题
regex 
Regex 没有特定表达式的文件名
regexapache.htaccessmod-rewrite 
Regex 用Haskell从字符串列表中提取文本区域
regexhaskell 
Regex 获取日期的正则表达式
regexoracle 
Regex 匹配不以分号分隔的电子邮件地址
regexoracleemail 
Regex Perl正则表达式从点分隔整数字符串中提取数字组
regexperl 
Regex 如何使用正则表达式在单词后添加
regex 
Regex ElasticSearch和正则表达式查询
regex 
Regex powershell中针对进程.Path的动态正则表达式
regexpowershell 
Regex chef中的正则表达式
regexchef-infra 
Regex 正则表达式-介于0和13.6之间的正则表达式
regex 
Regex 如何从md5deep的输出中排除一些文件和目录？
regex 
Regex 如何从Google工作表中的字符串中删除一组数字
regexgoogle-sheets 
Regex 正则表达式与可能更大的字符串中的16个连续数字精确匹配
regex 
Regex 使用Excel VBA中的正则表达式仅从Word文档中提取第一个匹配项
regexvbams-word 
                                       





随机文章推荐


                                        

                                        
                                        


                                                
                                                        [python]相关推荐
                                                        
Python 在MATLAB中实现双精度到单精度的转换
									Python
							 									Matlab
							 									Cuda
							 
Python 我可以在windows（XP或更早版本）中使用哪个shell命令来显示“打印图片”对话框？
									Python
							 									Windows
							 									Shell
							 									Printing
							 
Python：同时启动多个脚本
									Python
							 									Shell
							 									Scripting
							 
Python 如何在PyCharm中设置远程docker compose解释器
									Python
							 									Docker
							 									Docker Compose
							 									Pycharm
							 
Python 金字塔+；MySQL
									Python
							 									Mysql
							 
Python ctypes访问冲突读取
									Python
							 
在python中是否可以从pandas DataFrame获取sql列元数据？
									Python
							 									Sql
							 									Dataframe
							 
Python 使用从两个列表中提取的两个变量迭代连接到API的代码
									Python
							 									Python 3.x
							 									Pandas
							 
Python 我需要尽快向服务器发送响应，然后在flask中执行函数？
									Python
							 									Flask
							 
Python 在循环中使用动作链
									Python
							 									Selenium
							 
Python 如何导入导入本地模块的远程模块？
									Python
							 									Import
							 
Python '；延迟属性'；对象没有属性'；从'开始；
									Python
							 									Django
							 
Python 从一个列表中提取重复项的索引，并从该索引中合并另一个列表的项
									Python
							 
Python 在将数据框的列导出为Excel工作表之前，是否有方法突出显示该列中的特定单词？
									Python
							 									Pandas
							 
Python 加载多个拼花文件时保留dask数据帧分割
									Python
							 									Dataframe
							 									Dask
							 
Python 导入错误：无法导入名称'；lzip'；
									Python
							 									Pandas
							 
Python 删除文件中的特殊字符和非标准值
									Python
							 									Regex
							 									Python 3.x
							 									Pandas
							 									Dataframe
							 
在终端中运行python会产生错误的版本
									Python
							 									Python 3.x
							 									Bash
							 
Python 如何使用JSON连接到RESTAPI？
									Python
							 									Json
							 									Rest
							 									Api
							 
记事本++；：如何配置记事本++；DBGp（代码调试插件）与PYTHON一起工作？
									Python
							 									Plugins
							 									Notepad++
							 
Python 向CSV提供故障电流数据
									Python
							 									Api
							 
Python Pandas-希望将基于int的窗口转换为基于时间的窗口
									Python
							 									Pandas
							 
Python没有名为PIL的模块
									Python
							 									Pip
							 
Python pyinstaller，AttributeError:&x27；非类型'；对象没有属性'；StaticSource'；
									Python
							 
Python 如何摆脱Choropleth的白色背景？
									Python
							 									Dictionary
							 
Python Neo4j经常出现内存不足错误
									Python
							 									Memory
							 									Neo4j
							 
Python usb.core.USBError:[Errno 5]输入/输出错误
									Python
							 
Python InvalidArgumentException:消息：无效参数：用户数据目录已在使用错误使用--user data dir使用Selenium启动Chrome
									Python
							 									Selenium
							 									Google Chrome
							 
Python oldstr（None）引发异常：无法转换'；非类型'；对象到字节
									Python
							 									Python 3.x
							 									Python 2.7
							 
使用Python自动生成sql文件
									Python
							 									Sql
							 									Python 3.x
							 
                                                        
                                                

                                                
                                                        Tags
                                                        
Io
Apache2
Sed
Validation
Oracle10g
Ios4
Xcode4
Compiler Construction
Jakarta Ee
Flutter
Express
Sap
Scrapy
Xml
Websphere
Orm
Drools
Input
Junit
Pip
Caching
Mule
Performance
Rdf
Listview
Ibm Midrange
Filter
Tkinter
If Statement
Vagrant
Sms
Docker
Oauth
Hyperledger Fabric
Deployment
Gwt
Odoo
Google Visualization
Regex
Socket.io
Redis
Doctrine Orm
Google Sheets
Api
Cygwin
Logging
Wxpython
Join
Backbone.js
Ignite
Linker
Ssrs 2008
Sql
Rss
Cordova
Centos
Authentication
Django
Perl
Search
Openlayers
Serialization
Knockout.js
Telerik
Debian
Google App Maker
Select
Jersey
Mysql
Drupal
Stored Procedures
Nestjs
Z3
Mod Rewrite
Opencl
Android
Webstorm
Image
Sublimetext2
Cucumber
Tableau Api
Asp.net Mvc 2
Networking
Sql Server 2008
Apache Zookeeper
Phpmyadmin
.htaccess
Google Plus
Mpi
D
Ios5
Sparql
Debugging
Teamcity
Phantomjs
Computer Vision
Swift3
Asynchronous
Download
Magento
Oop
Antlr4
Merge
Cmd
Netbeans
Octave
Entity Framework Core
Visual Studio 2008
Ssl
C
Dialogflow Es
Kibana
Jestjs
Testing
Swift
Openerp
C++
Ruby On Rails
Spring Integration
Layout
Computer Science
Extjs
Java 8
Stripe Payments
Uitableview
Modelica
Streaming
Kdb
Pytorch
Math
Seo
Memory Management
Data Binding
Npm
C# 4.0
Web Services
Orchardcms
Forms
Artifactory
Compiler Errors
Linux
Influxdb
Dart
Programming Languages
Botframework
Ibm Cloud
Heroku
C++ Cli
Spring Boot
Events
Geometry
Canvas
Glassfish
Struts2
Composer Php
Javafx
Testng
Ms Access
Gruntjs
F#
Liferay
Logic
Audio
Inheritance
Properties
Wolfram Mathematica
Amazon Dynamodb
Xquery
Kotlin
Racket
Google Calendar Api
Ecmascript 6
Ionic Framework
Requirejs
Indexing
Memory
Zend Framework2
View
Dependency Injection
Exchange Server
Zend Framework
Vba
Jupyter Notebook
Powerbi
Post
Firebase
Tabs
Amazon Ec2
Visual Studio 2013
Amazon Redshift
Angular
Spring
Oauth 2.0
Google Cloud Dataflow
Loops
Selenium
Entity Framework 4
File
Autocomplete
Db2
EmptyTag


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网