Python 获取特定的类n-grams_Python_Nlp_N Gram_Vocabulary_Countvectorizer - Fatal编程技术网

Python 获取特定的类n-grams

python nlp

Python 获取特定的类n-grams,python,nlp,n-gram,vocabulary,countvectorizer,Python,Nlp,N Gram,Vocabulary,Countvectorizer,我有一个tweet数据集，每个tweet标记为仇恨（1）或非仇恨（0）。我使用[3,4]个字符的n-grams单词包（sklearn的CountVectorizer）对数据进行了矢量化，我想为每个类提取最频繁的n-grams。下面的代码可以工作，但它概括了整个数据，而不是集中在类本身 bag_of_words = CountVectorizer( ngram_range =(3,4), analyzer='char' ) bag_of_words_mx = bag_of_wor

我有一个tweet数据集，每个tweet标记为仇恨（1）或非仇恨（0）。我使用[3,4]个字符的n-grams单词包（sklearn的CountVectorizer）对数据进行了矢量化，我想为每个类提取最频繁的n-grams。下面的代码可以工作，但它概括了整个数据，而不是集中在类本身

bag_of_words = CountVectorizer( ngram_range =(3,4), analyzer='char' ) bag_of_words_mx = bag_of_words.fit_transform(X) vocab = bag_of_words.vocabulary_ count_values = bag_of_words_mx.toarray().sum(axis=0) # output n-grams for ng_count, ng_text in sorted([(count_values[i],k) for k,i in vocab.items()]): if ng_count > 1: print(ng_count, ng_text)

有没有办法按类对词汇表进行排序？
试试
bag\u of theu words\u mx[y==0]
和
bag\u of theu words\u mx[y==1]
，其中
y
是包含目标变量的数组

[nlp]相关文章推荐

Nlp 超短文本的语言检测 nlp

Nlp 如何模糊地搜索字典中的单词？ nlp

Nlp 信息提取和文本挖掘之间的区别是什么？ nlp

Nlp 从三元图列表生成单元图和双元图 nlp speech-recognition

Nlp 疑问句句法结构的识别与比较 nlp

Nlp 如何测试SyntaxNet训练模型（西班牙语UD）？ nlp tensorflow

Nlp 简单二进制文本分类 nlp

Nlp 如何自动创建日语语法？ nlp

Nlp 从文本文章列表中的模式检测大纲 nlp artificial-intelligence

Nlp 为api.ai中的列表设置默认值 nlp dialogflow-es

Nlp 如何从文本样本中自动检测代码段？ nlp

Nlp Keras model.predict-argmax始终输出0（seq2seq模型） nlp deep-learning keras

Nlp 从cyk中提取概率和最可能的解析树 nlp

Nlp 训练NER模型识别自定义实体 nlp stanford-nlp

Nlp 创建基于模板的聊天机器人。它是通过互动来工作的，这意味着一个信号游戏发生了。缺少的是一个基础语法。这是一个关于可用主题的本体，以及它们如何组合在一起。例如，关于超市的问题是位置问题的子问题。 ## User asks for price * ask_loc nlp artificial-intelligence

Nlp 如何使用spaCy获取令牌ID（我想将一个文本语句映射到整数序列） nlp

Nlp 什么是我可以用来编写调查意见的现成工具 nlp

Nlp 如何比较三个预训练模型的余弦相似性？ nlp

Nlp 如何预测给定句子中的蒙面词 nlp

Nlp 使用HuggingFace库构建语言模型（即下一个单词预测器） nlp

随机文章推荐

如何将CoffeeScript AST编译成CoffeeScript而不是JavaScript？ coffeescript

Coffeescript 咖啡脚本未定义 coffeescript

coffeescript实例变量未定义 coffeescript

Coffeescript 咖啡脚本更改时未显示Div coffeescript

如何阻止Coffeescript转义关键字？ coffeescript

Coffeescript JST模板引擎，带Slim&；咖啡脚本 coffeescript

Coffeescript 在coffee脚本中的消息中插入换行符 coffeescript

Coffeescript 在Coffescript中连接文件 coffeescript

Coffeescript ReactJS：如何访问子组件的引用？ coffeescript reactjs

Coffeescript在后台打开新的浏览器窗口 coffeescript

Coffeescript 如何在Rhodes/RhoMobile中编写咖啡脚本和sass？ coffeescript sass

Coffeescript 使用语法sugar简化数组差异代码 coffeescript

[python]相关推荐

如何使用python从google获得过去2小时的结果？
Python

Python 将字符串拆分为列表，然后将列表拆分为列表列表
Python List

Python 将相同的元素添加到列表中
Python List

当其他python脚本退出时，如何停止退出python脚本？
Python

在Python中转换一个月的天数？
Python Date Calendar

如果文件中有某个字符，Python将编写一条语句
Python Python 2.7 File Io

Python 在遗传算法中，什么样的选择函数可以找到目标数？
Python Algorithm Artificial Intelligence

Python 如何在Kivy中使用视频小部件？
Python

Python TensorFlow实现Seq2seq情绪分析
Python Machine Learning Tensorflow

Python 熊猫面板合并
Python Pandas Merge

Python 电子邮件中的Base64编码图像
Python Html Email

Python 使用带有修饰的协程的单事件循环返回未来结果
Python

Python 按分组并聚合到新列中
Python Pandas

Python 更改QTabWidget中的选项卡大小
Python

Python 类的张量流估计数不变
Python Tensorflow Machine Learning Deep Learning

Python 奇数行为布尔就地运算
Python Python 3.x Numpy

如何找到python模块的完整路径'；包含哪些DLL文件？
Python Python 3.x Windows Cygwin

python嵌套字典列表：奇怪的行为
Python Python 3.x Dictionary

Python Visual Studio代码新功能连接到Jupyter笔记本，是否也可以连接到JupyterHub？
Python Visual Studio Code Jupyter Notebook

在boost:：python:：import（"；cv2"；上设置了异常boost:：python:：error
Python C++ Opencv

如何在Python中处理多个临时文件的序列？
Python Python 3.x Google Cloud Platform Google Cloud Storage

在python中从字符串中删除前导文本字符
Python Regex Pandas

sphinx：在python中记录数据而不显示数据
Python Python 3.x Python Sphinx

Python chromebook上没有视频设备
Python

Python 在Altair中，是否有可能对具有多个数据源的分层图表进行刻面？
Python

如何在Python3中沿三维方向堆叠两个以上的numpy图像数组
Python Arrays Python 3.x Image

Python 为什么Libclang无法获取头文件中定义的函数的定义？
Python C++ Parsing Clang

Python Elasticsearch API未返回所有结果
Python

`使用Python JIRA登录时的验证码挑战
Python Windows Jira

Python 我能'；I don’我不知道如何使我的图表只显示整数
Python

Tags

Rx Java Nhibernate Tabs Oracle10g Nginx Macros Function Youtube Syntax Jquery Ui Network Programming Matrix Google Chrome Gulp Maven Frameworks Pascal Javafx 2 Python 3.x Visual Studio Dns Dynamics Crm Memory Smalltalk Boost Chef Infra Regex Magento2 Grafana Google Analytics Sql Server 2008 Replace Java Xamarin.android Spring Security Servlets Google Apps Script Clojure Heroku Apache Pig Windows Url Rewriting Uwp Activemq Gitlab Architecture Pointers Material Ui Ffmpeg Map Drupal Eclipse Plugin Requirejs Apache Storm Udp Xamarin.forms .net 4.0 Google Drive Api Linkedin Omnet++ Com Swagger Linux Kernel Identityserver4 Pycharm .net Core Pentaho Spring Boot Woocommerce Hyperlink Pdf String Swiftui Tags Nunit Log4net Google Cloud Platform File Upload Silverlight Extjs Excel Caching Ipython Python 2.7 Parse Platform Ajax Openerp Tsql Acumatica Camera Biztalk Php Redirect Isabelle Sml Html5 Canvas Cluster Computing Uiview Lucene Grid Sql Server 2008 R2 Exception Cloud Foundry Autocomplete Concurrency Symfony Ruby On Rails 4 Jquery Mobile Debugging Leaflet Spring Mod Rewrite Airflow List Ms Access Abap Rust Service Magento Http Cordova Machine Learning Shell Exception Handling Ruby On Rails 3 Coding Style Content Management System Hyperledger Fabric Uml Salesforce Clang Workflow Opencart Google Maps Time Spring Mvc Operating System Post Github Cocos2d X Gruntjs Apache Flink Nsis Amazon Ec2 Crystal Reports Redux Model Primefaces Vuejs2 Timer Reflection Robotframework Groovy Websocket Drop Down Menu Deployment Bluetooth Yii Firefox Cron Statistics Openid Amazon Cloudformation Scheme Highcharts Tcp Monitoring Glassfish Design Patterns Jekyll Bootstrap 4 Sencha Touch Amazon S3 Safari Automation C# Tree Configuration If Statement 3d Kdb Google Api Titanium Marklogic Liferay C# 3.0 Django Swift Maps Memory Leaks Gremlin Less Datetime Speech Recognition Mpi View Time Complexity Subsonic Mapping Kendo Ui Https

Copyright © 2024. All Rights Reserved by - Fatal编程技术网