Json 将单词与R中的原始文件进行比较
我有json格式的原始数据集。让我们把它装进RJson 将单词与R中的原始文件进行比较,json,r,Json,R,我有json格式的原始数据集。让我们把它装进R library("rjson") setwd("mydir") getwd() json_data <- fromJSON(paste(readLines("N1.json"), collapse="")) uu <- unlist(json_data) uutext <- uu[names(uu) == "text"] 我需要找到mydata2中的单词,只有这些单词存在于json文件中的消息中。然后将这些消息写入新文档“xyz
library("rjson")
setwd("mydir")
getwd()
json_data <- fromJSON(paste(readLines("N1.json"), collapse=""))
uu <- unlist(json_data)
uutext <- uu[names(uu) == "text"]
我需要找到mydata2中的单词,只有这些单词存在于json文件中的消息中。然后将这些消息写入新文档“xyz.txt”中,如何操作
chalk indirect pick reaction team skip pumpkin surprise bless ignorance
1 time patient road extent decade cemetery staircase monarch bubble abbey
2 service conglomerate banish pan friendly position tight highlight rice disappear
3 write swear break tire jam neutral momentum requirement relationship matrix
4 inspire dose jump promote trace latest absolute adjust joystick habit
5 wrong behave claim dedicate threat sell particle statement teach lamb
6 eye tissue prescription problem secretion revenge barrel beard mechanism platform
7 forest kick face wisecrack uncertainty ratio complain doubt reflection realism
8 total fee debate hall soft smart sip ritual pill category
9 contain headline lump absorption superintendent digital increase key banner second
i mean
chalk -1 number1 indirect -2 number2
模板
Word1-1 number1-1; Word1-2 number 1-2; …; Word 1-10 number 1-10
Word2-1 number2-1; Word2-2 number 2-2; …; Word 2-10 number 2-10
下次请包括真实数据。简化模型:
library(data.table)
word = c("test","meh","blah")
jsonF = c("let's do test", "blah is right", "test blah", "test test")
outp <- list()
for (i in 1:length(word)) {
outp[[i]] = as.data.frame(grep(word[i],jsonF,v=T,fixed=T)) # possibly, ignore.case=T
}
qq = rbindlist(outp)
qq = unique(qq)
print(qq)
1: let's do test
2: test blah
3: test test
4: blah is right
给出:
> Fin
L
[1,] "A-1 1-1; B-2 1-2; C-3 1-3; D-4 1-4; E-5 1-5; F-6 1-6; G-7 1-7; H-8 1-8; I-9 1-9; J-10 1-10"
[2,] "K-1 2-1; L-2 2-2; M-3 2-3; N-4 2-4; O-5 2-5; P-6 2-6; Q-7 2-7; R-8 2-8; S-9 2-9; T-10 2-10"
我们不太可能只是为了回答你而去下载一些东西。请发布你的数据样本。谢谢你,阿列克西!你能告诉我吗。在mydata2中,如何在新数据集中以这种格式写入前10个单词。字1-1编号1-1;字1-2编号1-2…;单词1-10数字1-10单词2-1数字2-1;字2-2编号2-2…;单词2-10数字2-10,我需要安排它进行进一步分析。Thanks@fenton. 你能不能,请,
dput
该文件的前10个字以及JSON文件的前10-15行?出于安全考虑,并非所有人都会从未知来源下载未知文件。您所说的“Word1-1 number1-1”是什么意思?只需在一列中列出所有单词,然后运行一个seq(1,N)
为每个单词创建一个标识符(Alexey,现在我已经用示例编辑了我的帖子)你明白了吗?我明白了,但目的是什么?你不会将一串单词+数字与JSON进行比较,是吗?在最初的帖子中,您希望检查单词[i]是否在一个JSON字符串中,并将该JSON字符串写入文本文件。
library(data.table)
x = LETTERS[1:10]
y = LETTERS[11:20]
df = rbind(x,y)
L = list()
for (i in 1:nrow(df)) {
L[i] = paste0(df[i,],"-",seq(1,10)," ",i,"-",seq(1,10),collapse="; ")
}
Fin = cbind(L)
View(Fin)
> Fin
L
[1,] "A-1 1-1; B-2 1-2; C-3 1-3; D-4 1-4; E-5 1-5; F-6 1-6; G-7 1-7; H-8 1-8; I-9 1-9; J-10 1-10"
[2,] "K-1 2-1; L-2 2-2; M-3 2-3; N-4 2-4; O-5 2-5; P-6 2-6; Q-7 2-7; R-8 2-8; S-9 2-9; T-10 2-10"