为什么我找不到bigrams的数量=单词的数量-1？_R_N Gram - Fatal编程技术网

为什么我找不到bigrams的数量=单词的数量-1？

r

为什么我找不到bigrams的数量=单词的数量-1？,r,n-gram,R,N Gram,我正在写一个R脚本来寻找bigrams 我有一串4157个单词现在，使用stylo，我在向量中取bigrams，如下所示 library(stylo) allBi <- txt.to.words(myLines) myBigrams <- make.ngrams(allBi, ngram.size = 2) 库（stylo） allBi问题在于，您没有进行任何测试来试图找出发生了什么从下面的测试中可以看出，myLines中4127个条目中的一个（或多个）似乎没有实际包含“单词

我正在写一个R脚本来寻找bigrams

我有一串4157个单词

现在，使用

stylo

，我在向量中取bigrams，如下所示

library(stylo)

allBi <- txt.to.words(myLines)
myBigrams <- make.ngrams(allBi, ngram.size = 2)

库（stylo）
allBi问题在于，您没有进行任何测试来试图找出发生了什么
从下面的测试中可以看出，myLines
中4127个条目中的一个（或多个）似乎没有实际包含“单词”，因为style
包看到单词：
library(stylo)

此文件在我的OS X系统上有235886个合法单词：
words <- readLines("/usr/share/dict/words")

所以，这不是一个向量大小的问题。这可能是矢量问题中缺少实际单词吗？让我们测试一下：
# inject some badness
words[4] <- sprintf("  , %s - ", words[4])
words[30] <- "//"
words[900] <- "-1--1-"
words[4000]  <- ".."

让我们看看它对真正的“坏”有什么影响：
使用此选项查找单词中没有字母的条目：
which(grepl("^[^[:alpha:]]+$", words))
# [1]   30  900 4000

测试FTW（当事情不按预期进行时，实际执行一些测试并不需要太多工作）。如果我记下5个单词，通过使用此代码，我将得到4个bi克。但是当我用了太多的单词，比如4157，我会得到更少的双克数。
all(sapply(seq(from=2, to=20000, by=100), function(i) {
  return(i - length(make.ngrams(txt.to.words(words[1:i]), ngram.size=2))==1)
}))
# [1] FALSE

txt.to.words(words[c(4, 30, 900, 4000)])
# [1] "aal"

which(grepl("^[^[:alpha:]]+$", words))
# [1]   30  900 4000




[go]相关文章推荐



                                                        
从Go中的文件解析矩阵
go 
Go 可变函数参数通过
go 
什么是「；小于后接破折号“；go语言中的运算符？
go 
我用Go语言输入值的语法
go 
使用接口在GO中过度加载函数
go 
未能安装golang.org/x/crypto/bcrypt
goinfluxdb 
Golang XML解析
go 
Go 在终止线程之前，运行线程一定时间
go 
Go 什么'；从我的边缘节点推送卡夫卡消息的最佳方式是什么？
goapache-kafka 
使用Golang Oauth2库更新访问令牌
gooauth-2.0 
Go 转到http响应头
go 
在Golang中使用goroutine时遇到问题
主程序包
进口(
“fmt”
//-“时间”
)
func main（）{
c:=制造（成交量）
对于i:=0；i
go 
Go 如何制作一个接收自定义接口数组的函数
gointerface 
Go 返回结构指针的函数作为返回接口的函数
go 
Go “如何修复”；zip：不是有效的zip文件；围棋出错？
go 
Go micro rabbit mq插件-以优先级发布消息
gorabbitmq 
如何使用Go Oauth2库更新每次令牌调用的发出时间和到期声明？
gooauth-2.0jwt 
Go IPv4地址双字节顺序
go 
Go 如何中断通道上的发送
go 
Go 与互斥体竞争（？）-映射中的数据已损坏
gocaching 
                                       





随机文章推荐



                                                        
按mongodb中的子集合计数选择
mongodb 
在MongoDB中对文档的相关类型进行分组
mongodb 
mongodb-子文档id值
mongodb 
mongodb聚合框架-获取第一个文档&x27；嵌套数组的s字段
mongodb 
Mongodb 有没有一种方法可以执行“一个”；“干运行”；更新操作的安全性？
mongodb 
Mongodb 猫鼬可以'；使用findById进行t查询
mongodbexpresscoffeescriptmongoosemeteor 
从Grunt任务中启动MongoDB
mongodbshellautomationgruntjs 
mongodb检查元素是否有一个嵌套属性
mongodb 
Mongodb 集合上的复合索引总是比具有单个索引好
mongodb 
带蜂巢DW的MongoDB
mongodbhadoophivenosql 
如何使用grails配置mongodb
mongodbgrails 
用Mongoose审查MongoDB字段
mongodbmongoose 
MongoDB完整性更新边缘案例
mongodb 
MongoDB单/中心authSource与多/分布式authSource
mongodbsecurityauthentication 
Mongodb 返回聚合中的特定数组值字段
mongodb 
Mongodb unix套接字与官方mongo go驱动程序的连接？
mongodbsocketsunixgo 
MongoDB-有条件地匹配两个数组
mongodb 
更新查询在mongoDB对象中不起作用
mongodb 
Mongodb 使用$addFields将_id设置为新的UUID
mongodb 
MongoDB连接错误网络无法访问。原因：无法'；无法连接到服务器本地主机：27017，连接尝试失败：SocketException
mongodbmacos


                                        

                                        
                                        


                                                
                                                        [r]相关推荐
                                                        
如何在不存在部分重复项的情况下聚合和恢复R中的原始列？
									R
							 									Merge
							 									Dataframe
							 
R 使用长格式数据计算基线变化
									R
							 
授权令牌时出现twitteR错误
									R
							 									Twitter
							 
使用R的滑动窗口
									R
							 
R 如何在用户中断之前执行令人尴尬的并行计算，并在所有线程完成迭代之前延迟中断处理？
									R
							 									Exception Handling
							 
Can'；t在R中关闭mysql连接
									R
							 
在for循环中生成序列
									R
							 									Loops
							 									For Loop
							 
如何使用R将大光栅写入表或数据库？
									R
							 									Maps
							 
在r中使用LogisticDx包中的gof（）函数时出错
									R
							 
R循环长数据返回最小值和累积值
									R
							 									Loops
							 									Dataframe
							 
需要Fortran in R的软件包在macOS Sierra中不起作用
									R
							 									Macos
							 									Fortran
							 
在R中为google地形图重新着色
									R
							 									Colors
							 									Maps
							 
R 将矩阵列表转换为向量数据帧。
									R
							 
错误：使用SparkyR包时出现意外数据类型
									R
							 
如何重新安装base R软件包“grid”？
									R
							 
R sf对象上的行操作
									R
							 
R:mclappy/pblappy与lappy-用例
									R
							 									Parallel Processing
							 
I'；我试图在R闪亮的绘图中添加多条基于下拉值的线
									R
							 									Plot
							 									Shiny
							 
R acctionbutton clik之后的条件面板第二次不工作
									R
							 									Shiny
							 
R：使用输入参数从数据框中选择和显示值
									R
							 									Shiny
							 
Fligner-Killeen检验和调整后p值的成对方差比较
									R
							 
R 使用opencpu连接云服务器时获取身份验证签名消息
									R
							 									Linux
							 									Curl
							 									Terminal
							 
用svyglm和SVREP设计预测LR
									R
							 
R：如何按组查找数据帧中的第一个非零元素
									R
							 									Dataframe
							 
R错误：为‘；加载包失败；robustbase’；。robustbase.so:未定义的符号：is_redescender
									R
							 
如何在r中创建多个折线图
									R
							 
关于与四舍五入素数有关的R代码的问题
									R
							 
R 如何根据特定条件将此字符串划分为不同的列？
									R
							 									String
							 
R 如何使直方图的y轴同时具有对数和百分比？
									R
							 
R 使用仓位大小绘制条形图
									R
							 
                                                        
                                                

                                                
                                                        Tags
                                                        
Pycharm
Java Me
Nsis
Zend Framework2
Activerecord
Laravel 4
Windows Phone
3d
Csv
Scheme
Imagemagick
Matplotlib
Discord.py
C# 4.0
Android Studio
Ruby On Rails 3.2
Django Rest Framework
Compiler Construction
Tree
Asp.net Mvc 4
Automation
Markdown
Perl
Input
Ipython
Jdbc
Project Management
Visual Studio 2013
Grid
Mariadb
Dialogflow Es
Prolog
Yii2
Core Data
Redirect
Windows Phone 7
Java 8
System Verilog
Docker
Microsoft Graph Api
Api
Marklogic
Mule
Cygwin
Webview
Compression
Joomla
Twig
Ios5
Sencha Touch
Powershell
Time
Google Maps
Email
Swagger
Perforce
Web Crawler
Functional Programming
Xaml
Methods
Jms
Couchdb
Asp.net
Keycloak
Windows Store Apps
Jboss
Gdb
Asp.net Core Mvc
Jakarta Ee
Search
Twitter
Utf 8
Google App Engine
Xsd
Ffmpeg
Ide
Primefaces
Ftp
Cloud
Gridview
Llvm
X86
Inheritance
Navigation
Devexpress
Gps
Awk
Snmp
Indexing
Raspberry Pi
Data Structures
Download
Cron
Telegram
Amazon S3
Internet Explorer 8
Protocol Buffers
Binary
Editor
Amazon Cloudformation
Pine Script
Date
Dart
Ms Word
Hyperledger Fabric
Tkinter
Spring
Vbscript
Nativescript
Rx Java
Sharepoint 2010
Symfony
Replace
Fonts
Aws Lambda
Asp.net Mvc 5
Documentation
Oracle
Graphviz
Log4j
Enums
Google Apps Script
Bluetooth
Terraform
Ag Grid
Robotframework
Google Cloud Platform
Azure Ad B2c
Ipad
Web Services
Activemq
Spring Security
Apache Nifi
Drupal 7
Azure Active Directory
Sed
Google Plus
Nlp
Quickbooks
Wolfram Mathematica
Ionic2
Dynamic
Networking
Lucene
Yaml
Mapbox
Mod Rewrite
Qt
Pyspark
Linker
Xquery
Requirejs
Teradata
Localization
Process
Actionscript
Google Cloud Dataflow
Nginx
Vector
Ms Access
Arangodb
Loopbackjs
Python 2.7
Image
Shopify
Cocoa Touch
Entity Framework Core
Memory Management
Dns
Libgdx
Itext
Ssrs 2008
Function
Chart.js
Unix
Coding Style
Typo3
Authentication
Discord
Vagrant
Amazon Ec2
Ldap
Abap
.net Core
Apache2
Coq
Canvas
Facebook
Video
Sql Server 2005
Protractor
Opengl Es
Architecture
Virtualbox
Clang
Sml
Neo4j
Swiftui
Big O
Jqgrid
Linux Kernel


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网