Scikit learn SKTF-IDF要放弃号码吗？_Scikit Learn_Tf Idf - Fatal编程技术网

Scikit learn SKTF-IDF要放弃号码吗？

scikit-learn

Scikit learn SKTF-IDF要放弃号码吗？,scikit-learn,tf-idf,Scikit Learn,Tf Idf,我在做文本分析，我想忽略那些只是数字的单词。从文本“This is 000 Sparta！”中，只应使用“This”、“is”和“Sparta”三个词。有办法做到这一点吗？如何？TFIDFvectorier的默认令牌模式是u'（？u）\\b\\w\\w+\\b'，它与至少有两个单词字符的单词相匹配，即[a-zA-Z0-9\/code>；您可以根据需要修改标记模式，例如，regex（？ui）\\b\\w*[a-z]+\\w*\\b确保它与单词匹配，但至少包含一个字母： from sklearn.f

我在做文本分析，我想忽略那些只是数字的单词。从文本“This is 000 Sparta！”中，只应使用“This”、“is”和“Sparta”三个词。有办法做到这一点吗？如何？

TFIDFvectorier的默认令牌模式是

u'（？u）\\b\\w\\w+\\b'

，它与至少有两个单词字符的单词相匹配，即

[a-zA-Z0-9\/code>；您可以根据需要修改标记模式，例如，regex（？ui）\\b\\w*[a-z]+\\w*\\b
确保它与单词匹配，但至少包含一个字母：
from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer(token_pattern=u'(?ui)\\b\\w*[a-z]+\\w*\\b')

text = ["This is 000 Sparta!"]
tfidf_matrix =  tf.fit_transform(text)
feature_names = tf.get_feature_names() 

print(feature_names)
[u'is', u'sparta', u'this']

 看看它是如何在官方文档中实现的。
所有数字都被视为一个特征#数字




[mips]相关文章推荐



                                                        
MIPS获取地址未在字边界上对齐，已使用。对齐4，仍然无法使用
mips 
Mips字符串长度、连接和复制显示错误
mips 
MIPS—跟踪此代码
mips 
超标量MIPS管道中的nops
mips 
Mips 如果没有分支，BGEZAL和BTLZAL是否修改$31？
mips 
在MIPS中写作时，我不确定是使用li还是addi。我仍然不清楚两者的区别是什么。
mips 
在MIPS中，命令行参数如何准确地存储在$a1中？
mips 
Mips 继续尝试在0x00400098执行非指令
.data
分数：单词849992455546786779
数组化：。单词9
提示1:.asciiz“请输入及格分数（介于0和100之间）：”
提示2:“通过本课程的学生人数为：”
.文本
主要内容：
李$v0,4
mips 
mips传统5级cpu设计中的字节顺序问题
mipsverilog 
                                       





随机文章推荐



                                                        
Combobox ExtJS组合框动态行为
comboboxextjs 
如何使用ComboBoxTableCell显示不同的值？
comboboxjavafx-2 
Combobox 如何使用组合框指定文本行
combobox 
Combobox 通过列表索引链接组合框和文本框
comboboxvb6 
Combobox 将组合框更改为自动完成文本字段代码名1
comboboxautocompletecodenameone 
Combobox Inno设置：下拉菜单中的组合框立即关闭（启动应用程序）
comboboxinno-setup 
Combobox 如何将组合框设置为默认值
combobox


                                        

                                        
                                        


                                                
                                                        [scikit learn]相关推荐
                                                        
Scikit learn 多标签分类的特征选择（scikit学习）
									Scikit Learn
							 
Scikit learn 如何在scikit learn中通过GridSearchCV调整嵌套管道的参数？
									Scikit Learn
							 
Scikit learn 无法导入名称AdaBoostClassifier
									Scikit Learn
							 
Scikit learn 哈明顿失利不支持交叉评分？
									Scikit Learn
							 
Scikit learn 交叉验证管道的分类报告
									Scikit Learn
							 
Scikit learn scikit学习中NMF（也称为NNMF）测试集的重建错误
									Scikit Learn
							 
Scikit learn Keras GridSearch scikit学习冻结
									Scikit Learn
							 									Deep Learning
							 									Keras
							 
Scikit learn scikit学习中的多目标岭回归是如何工作的？
									Scikit Learn
							 
Scikit learn 加载并使用已保存的Keras model.h5
									Scikit Learn
							 									Keras
							 
Scikit learn 使用scikit learn按顺序排列所有功能
									Scikit Learn
							 
Scikit learn dask是否支持sklearn learning_曲线函数？
									Scikit Learn
							 									Dask
							 
Scikit learn 确定最佳双聚类数
									Scikit Learn
							 
Scikit learn tfidf是否应结合列车和测试集生成？
									Scikit Learn
							 
Scikit learn 带sklearn差异的PCA
									Scikit Learn
							 
Scikit learn skmultilearn多标签分类中的多个估计量
									Scikit Learn
							 
                                                        
                                                

                                                
                                                        Tags
                                                        
Amazon Ec2
System Verilog
Sprite Kit
Ionic Framework
Ajax
Linux Kernel
Resharper
Geolocation
Localization
Serialization
Events
Image
Django
Ide
Android Layout
Google Maps
Apache Camel
Google Chrome Devtools
Haskell
Prometheus
Compilation
Macos
Wordpress
Kdb
Jupyter Notebook
Google Maps Api 3
Youtube Api
Jboss
Proxy
For Loop
Inheritance
Exception Handling
Gatsby
Random
Maps
Appium
Ftp
Openssl
Docker Compose
Core Data
Programming Languages
Gridview
Ag Grid
Select
Web Services
Yaml
Statistics
Aurelia
Logstash
Webgl
Drupal 7
Usb
Ibm Mq
Redirect
User Interface
Ms Access
Wpf
Properties
Url
Sharepoint 2007
Asp.net Mvc 3
Gruntjs
Cakephp
Asp.net Core
Sap
Log4net
Botframework
Gradle
Pytorch
Drupal
Lua
Debugging
Subsonic
Dart
Download
Telegram
Swagger
Intellij Idea
C++11
Flash
Exception
Dependency Injection
Javafx 2
Amazon Web Services
Join
Chef Infra
Gmail
Jar
Windows Phone 7
Crystal Reports
Jsp
Gps
Ldap
Cygwin
Xpath
Google Colaboratory
Vb6
Ios5
Gtk
Jquery Ui
Google Plus
Azure Sql Database
Django Rest Framework
Microservices
Vagrant
Ocaml
Charts
Bootstrap 4
Sml
Sorting
Angular6
X86
Sockets
Postgresql
Model View Controller
Lisp
Spring
Iis
Graphics
Apache Kafka
Microsoft Graph Api
Stream
Tree
Ios7
Node.js
Ansible
Cocos2d Iphone
Scheme
Centos
Hbase
Openlayers
Alfresco
Stripe Payments
Netlogo
Moodle
Awk
Entity Framework 4
Vuejs2
Spring Mvc
Lotus Notes
Aws Lambda
Breeze
C
Report
Keras
Active Directory
Zend Framework
Mapreduce
String
Oauth
Mpi
Actionscript 3
Ibm Cloud
Phantomjs
Sas
Nativescript
Asp Classic
Coq
Sharepoint 2010
Glassfish
Directx
Logic
Reference
Tomcat
Parallel Processing
Arrays
Osgi
Twilio
Dotnetnuke
Computer Science
Testing
Big O
Ios8
Android Studio
Plsql
Google Chrome Extension
Sparql
Groovy
Video Streaming
Csv
Puppet
Open Source
Quickbooks
Monitoring
Iframe
Snmp
Identityserver4
Airflow
Xampp
Python Sphinx
3d
Influxdb
Nest
Drools
Dialogflow Es
Content Management System
Mono
Zend Framework2
Llvm
Batch File
Methods


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网