Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/75.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用于文本挖掘类别的word2vec_R - Fatal编程技术网

用于文本挖掘类别的word2vec

用于文本挖掘类别的word2vec,r,R,我有这样一份清单: .NET ABAP Access Account Management Accounting Active Directory Agile Methodologies Agile Project Management AJAX Algorithms Analysis Android Android Development AngularJS Ant Apache ASP ASP.NET B2B Banking BPMN Budgets Business Analysis Bu

我有这样一份清单:

.NET
ABAP
Access
Account Management
Accounting
Active Directory
Agile Methodologies
Agile Project Management
AJAX
Algorithms
Analysis
Android
Android Development
AngularJS
Ant
Apache
ASP
ASP.NET
B2B
Banking
BPMN
Budgets
Business Analysis
Business Development
Business Intelligence
Business Planning
Business Process
Business Process Design
Business Process...
我想减少我必须分析的变量数量,创建一个更抽象的类别。从前面的列表中,每个单词对我来说都是一个变量

我发现有,但我找不到CRAN文档


我如何使用它?我能用这些数据做些什么?

不是word2vec,而是看看:

库(XML)
图书馆(dplyr)
库(记录链接)
df%比较.重复数据消除(strcmp=TRUE)%>%
epiWeights()%>%
表分类(0.8)%>%
getPairs(show=“links”,single.rows=TRUE)->匹配
左联合(变异(df,ID=1:nrow(df)),
选择(匹配项,id1,id2)%%>%arrange(id1)%%>%filter(!duplicated(id2)),
by=c(“ID”=“id2”))%>%
突变(ID=ifelse(is.na(id1),ID,id1))%>%
选择(-id1)->dfnew
主管(新,30)
#单词ID
#1.NET 1
#2 ABAP 2
#3通道3

#4账户管理4#不是word2vec,而是另一种方式:

库(XML)
图书馆(dplyr)
库(记录链接)
df%比较.重复数据消除(strcmp=TRUE)%>%
epiWeights()%>%
表分类(0.8)%>%
getPairs(show=“links”,single.rows=TRUE)->匹配
左联合(变异(df,ID=1:nrow(df)),
选择(匹配项,id1,id2)%%>%arrange(id1)%%>%filter(!duplicated(id2)),
by=c(“ID”=“id2”))%>%
突变(ID=ifelse(is.na(id1),ID,id1))%>%
选择(-id1)->dfnew
主管(新,30)
#单词ID
#1.NET 1
#2 ABAP 2
#3通道3
#4客户管理4#
library(XML)
library(dplyr)
library(RecordLinkage)
df <- data.frame(words=capture.output(htmlParse("https://stackoverflow.com/questions/35904182/word2vec-for-text-mining-categories")[["//div/pre/code/text()"]]))
df %>% compare.dedup(strcmp = TRUE) %>%
             epiWeights() %>%
             epiClassify(0.8) %>%
             getPairs(show = "links", single.rows = TRUE) -> matches
left_join(mutate(df,ID = 1:nrow(df)), 
          select(matches,id1,id2) %>% arrange(id1) %>% filter(!duplicated(id2)), 
          by=c("ID"="id2")) %>%
    mutate(ID = ifelse(is.na(id1), ID, id1) ) %>%
    select(-id1) -> dfnew
head(dfnew, 30)
#                       words ID
# 1                      .NET  1
# 2                      ABAP  2
# 3                    Access  3
# 4        Account Management  4 # <--
# 5                Accounting  4 # <--
# 6          Active Directory  6
# 7       Agile Methodologies  7 # <--
# 8  Agile Project Management  7 # <--
# 9                      AJAX  9
# 10               Algorithms 10
# 11                 Analysis 11
# 12                  Android 12 # <--
# 13      Android Development 12 # <--
# 14                AngularJS 14
# 15                      Ant 15
# 16                   Apache 16
# 17                      ASP 17 # <--
# 18                  ASP.NET 17 # <--
# 19                      B2B 19
# 20                  Banking 20
# 21                     BPMN 21
# 22                  Budgets 22
# 23        Business Analysis 23 # <--
# 24     Business Development 23 # <--
# 25    Business Intelligence 23 # <--
# 26        Business Planning 23 # <--
# 27         Business Process 23 # <--
# 28  Business Process Design 23 # <--
# 29      Business Process... 23 # <--
# 30        Business Strategy 23 # <--