Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/multithreading/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
将文本转换为矩阵以在R中变成.csv_R_Export To Csv_Strsplit - Fatal编程技术网

将文本转换为矩阵以在R中变成.csv

将文本转换为矩阵以在R中变成.csv,r,export-to-csv,strsplit,R,Export To Csv,Strsplit,我有以下案文: Anada - Asociación de nada Address: calle 13 13 Medellin Colombia Other address: Phone.: 13-13-136131 13-13-13-1313 E-mail: anada@13.co Web page: Category: 3. Private sector Notes: Atodo - Asociación de todo Address: calle 12 Bogota Colombia

我有以下案文:

Anada - Asociación de nada Address: calle 13 13 Medellin Colombia Other
address: Phone.: 13-13-136131 13-13-13-1313 E-mail: anada@13.co Web page: Category: 3. Private sector Notes:
Atodo - Asociación de todo Address: calle 12 Bogota Colombia
Other address: Phone.: 12-1-23-32  E-mail: Web page: www.atodoooo.com, Category: 99. Public sector Notes: note that there are missing fields.
我想获得一个包含列名的矩阵,将其转换为.csv文件,如下所示:

Company, Address, Other Address, Tel, E-mail, Web page, Category, Sector, Notes
和行:

Anada - Asociación de nada, calle 13 13 Medellin Colombia, 13-13-136131 13-13-13-1313,anada@13.co,,3,Private,,

Atodo - Asociación de todo,calle 12 Bogota Colombia,,12-1-23-32,www.atodoooo.com,99,Public,note that there are missing fields.

如何使用R来完成它呢?

这可能很乏味,但似乎需要字符串处理

splitlist = 'Address|Other address|Phone|E-mail|Web page|Category'  
a = str_split(text[1], ':')  

for (i in 1:length(a[[1]])) {  
 a[[1]][i] = str_replace_all(a[[1]][i], splitlist, "")  
}  

# [[1]]
# [1] "Atodo - Asociacin de todo "           " calle 12 Bogota Colombia "          
# [3] " ."                                   " 12-1-23-32  "                       
# [5] " "                                    " www.atodoooo.com, "                 
# [7] " 99. Public sector Notes"             " note that there are missing fields."
然后,您可以用较少的字符串处理来提取每个字段

splitlist = 'Address|Other address|Phone|E-mail|Web page|Category'  
a = str_split(text[1], ':')  

for (i in 1:length(a[[1]])) {  
 a[[1]][i] = str_replace_all(a[[1]][i], splitlist, "")  
}  

# [[1]]
# [1] "Atodo - Asociacin de todo "           " calle 12 Bogota Colombia "          
# [3] " ."                                   " 12-1-23-32  "                       
# [5] " "                                    " www.atodoooo.com, "                 
# [7] " 99. Public sector Notes"             " note that there are missing fields."

在这种情况下,除了正则表达式之外,我想不出任何更简单的方法。

这可能很乏味,但似乎需要字符串处理

splitlist = 'Address|Other address|Phone|E-mail|Web page|Category'  
a = str_split(text[1], ':')  

for (i in 1:length(a[[1]])) {  
 a[[1]][i] = str_replace_all(a[[1]][i], splitlist, "")  
}  

# [[1]]
# [1] "Atodo - Asociacin de todo "           " calle 12 Bogota Colombia "          
# [3] " ."                                   " 12-1-23-32  "                       
# [5] " "                                    " www.atodoooo.com, "                 
# [7] " 99. Public sector Notes"             " note that there are missing fields."
然后,您可以用较少的字符串处理来提取每个字段

splitlist = 'Address|Other address|Phone|E-mail|Web page|Category'  
a = str_split(text[1], ':')  

for (i in 1:length(a[[1]])) {  
 a[[1]][i] = str_replace_all(a[[1]][i], splitlist, "")  
}  

# [[1]]
# [1] "Atodo - Asociacin de todo "           " calle 12 Bogota Colombia "          
# [3] " ."                                   " 12-1-23-32  "                       
# [5] " "                                    " www.atodoooo.com, "                 
# [7] " 99. Public sector Notes"             " note that there are missing fields."

在这种情况下,除了正则表达式之外,我想不出任何更简单的方法。

这可能很乏味,但似乎需要字符串处理

splitlist = 'Address|Other address|Phone|E-mail|Web page|Category'  
a = str_split(text[1], ':')  

for (i in 1:length(a[[1]])) {  
 a[[1]][i] = str_replace_all(a[[1]][i], splitlist, "")  
}  

# [[1]]
# [1] "Atodo - Asociacin de todo "           " calle 12 Bogota Colombia "          
# [3] " ."                                   " 12-1-23-32  "                       
# [5] " "                                    " www.atodoooo.com, "                 
# [7] " 99. Public sector Notes"             " note that there are missing fields."
然后,您可以用较少的字符串处理来提取每个字段

splitlist = 'Address|Other address|Phone|E-mail|Web page|Category'  
a = str_split(text[1], ':')  

for (i in 1:length(a[[1]])) {  
 a[[1]][i] = str_replace_all(a[[1]][i], splitlist, "")  
}  

# [[1]]
# [1] "Atodo - Asociacin de todo "           " calle 12 Bogota Colombia "          
# [3] " ."                                   " 12-1-23-32  "                       
# [5] " "                                    " www.atodoooo.com, "                 
# [7] " 99. Public sector Notes"             " note that there are missing fields."

在这种情况下,除了正则表达式之外,我想不出任何更简单的方法。

这可能很乏味,但似乎需要字符串处理

splitlist = 'Address|Other address|Phone|E-mail|Web page|Category'  
a = str_split(text[1], ':')  

for (i in 1:length(a[[1]])) {  
 a[[1]][i] = str_replace_all(a[[1]][i], splitlist, "")  
}  

# [[1]]
# [1] "Atodo - Asociacin de todo "           " calle 12 Bogota Colombia "          
# [3] " ."                                   " 12-1-23-32  "                       
# [5] " "                                    " www.atodoooo.com, "                 
# [7] " 99. Public sector Notes"             " note that there are missing fields."
然后,您可以用较少的字符串处理来提取每个字段

splitlist = 'Address|Other address|Phone|E-mail|Web page|Category'  
a = str_split(text[1], ':')  

for (i in 1:length(a[[1]])) {  
 a[[1]][i] = str_replace_all(a[[1]][i], splitlist, "")  
}  

# [[1]]
# [1] "Atodo - Asociacin de todo "           " calle 12 Bogota Colombia "          
# [3] " ."                                   " 12-1-23-32  "                       
# [5] " "                                    " www.atodoooo.com, "                 
# [7] " 99. Public sector Notes"             " note that there are missing fields."
在这种情况下,除了regex之外,我想不出任何更简单的方法。

以下假设您的记录在每个条目的一行上,即它看起来像:

text <- c("Anada - Asociación de nada Address: calle 13 13 Medellin Colombia Other address: Phone.: 13-13-136131 13-13-13-1313 E-mail: anada@13.co Web page: Category: 3. Private sector Notes:", 
          "Atodo - Asociación de todo Address: calle 12 Bogota Colombia Other address: Phone.: 12-1-23-32  E-mail: Web page: www.atodoooo.com, Category: 99. Public sector Notes: note that there are missing fields.")

在此基础上,方法基本如下:

library(devtools)
library(data.table)
library(reshape2)
source_gist("11380733") ## For cSplit
  • 提取“标题”部分的
    列表
  • 提取相关值的
    列表
  • 把它们重新组合成一个向量
  • 再把他们分开
  • 将结果从“长”格式改为“宽”格式
使用的工具如下:

library(devtools)
library(data.table)
library(reshape2)
source_gist("11380733") ## For cSplit
该方法与@won782类似

splitlist <- c("Address:", "Other address:", "Phone.:", "E-mail:", "Web page:",
               "Category:", "Public sector Notes:", "Private sector Notes:")
pattern <- paste0(splitlist, collapse = "|")
cSplit
函数与
data.table
s很好地配合使用,所以让我们直接使用它

DT <- data.table(V1 = unlist(Combined))       ## unlist the values
DT <- cSplit(DT, "V1", ":")                   ## Split by a colon
DT[, V1_1 := gsub("Public sector |Private sector ", "", V1_1)]  ## Just "notes"
DT[, id := cumsum(V1_1 == "Company")]         ## Add an id column
以下假设您的记录在每个条目的一行上,即它看起来像:

text <- c("Anada - Asociación de nada Address: calle 13 13 Medellin Colombia Other address: Phone.: 13-13-136131 13-13-13-1313 E-mail: anada@13.co Web page: Category: 3. Private sector Notes:", 
          "Atodo - Asociación de todo Address: calle 12 Bogota Colombia Other address: Phone.: 12-1-23-32  E-mail: Web page: www.atodoooo.com, Category: 99. Public sector Notes: note that there are missing fields.")

在此基础上,方法基本如下:

library(devtools)
library(data.table)
library(reshape2)
source_gist("11380733") ## For cSplit
  • 提取“标题”部分的
    列表
  • 提取相关值的
    列表
  • 把它们重新组合成一个向量
  • 再把他们分开
  • 将结果从“长”格式改为“宽”格式
使用的工具如下:

library(devtools)
library(data.table)
library(reshape2)
source_gist("11380733") ## For cSplit
该方法与@won782类似

splitlist <- c("Address:", "Other address:", "Phone.:", "E-mail:", "Web page:",
               "Category:", "Public sector Notes:", "Private sector Notes:")
pattern <- paste0(splitlist, collapse = "|")
cSplit
函数与
data.table
s很好地配合使用,所以让我们直接使用它

DT <- data.table(V1 = unlist(Combined))       ## unlist the values
DT <- cSplit(DT, "V1", ":")                   ## Split by a colon
DT[, V1_1 := gsub("Public sector |Private sector ", "", V1_1)]  ## Just "notes"
DT[, id := cumsum(V1_1 == "Company")]         ## Add an id column
以下假设您的记录在每个条目的一行上,即它看起来像:

text <- c("Anada - Asociación de nada Address: calle 13 13 Medellin Colombia Other address: Phone.: 13-13-136131 13-13-13-1313 E-mail: anada@13.co Web page: Category: 3. Private sector Notes:", 
          "Atodo - Asociación de todo Address: calle 12 Bogota Colombia Other address: Phone.: 12-1-23-32  E-mail: Web page: www.atodoooo.com, Category: 99. Public sector Notes: note that there are missing fields.")

在此基础上,方法基本如下:

library(devtools)
library(data.table)
library(reshape2)
source_gist("11380733") ## For cSplit
  • 提取“标题”部分的
    列表
  • 提取相关值的
    列表
  • 把它们重新组合成一个向量
  • 再把他们分开
  • 将结果从“长”格式改为“宽”格式
使用的工具如下:

library(devtools)
library(data.table)
library(reshape2)
source_gist("11380733") ## For cSplit
该方法与@won782类似

splitlist <- c("Address:", "Other address:", "Phone.:", "E-mail:", "Web page:",
               "Category:", "Public sector Notes:", "Private sector Notes:")
pattern <- paste0(splitlist, collapse = "|")
cSplit
函数与
data.table
s很好地配合使用,所以让我们直接使用它

DT <- data.table(V1 = unlist(Combined))       ## unlist the values
DT <- cSplit(DT, "V1", ":")                   ## Split by a colon
DT[, V1_1 := gsub("Public sector |Private sector ", "", V1_1)]  ## Just "notes"
DT[, id := cumsum(V1_1 == "Company")]         ## Add an id column
以下假设您的记录在每个条目的一行上,即它看起来像:

text <- c("Anada - Asociación de nada Address: calle 13 13 Medellin Colombia Other address: Phone.: 13-13-136131 13-13-13-1313 E-mail: anada@13.co Web page: Category: 3. Private sector Notes:", 
          "Atodo - Asociación de todo Address: calle 12 Bogota Colombia Other address: Phone.: 12-1-23-32  E-mail: Web page: www.atodoooo.com, Category: 99. Public sector Notes: note that there are missing fields.")

在此基础上,方法基本如下:

library(devtools)
library(data.table)
library(reshape2)
source_gist("11380733") ## For cSplit
  • 提取“标题”部分的
    列表
  • 提取相关值的
    列表
  • 把它们重新组合成一个向量
  • 再把他们分开
  • 将结果从“长”格式改为“宽”格式
使用的工具如下:

library(devtools)
library(data.table)
library(reshape2)
source_gist("11380733") ## For cSplit
该方法与@won782类似

splitlist <- c("Address:", "Other address:", "Phone.:", "E-mail:", "Web page:",
               "Category:", "Public sector Notes:", "Private sector Notes:")
pattern <- paste0(splitlist, collapse = "|")
cSplit
函数与
data.table
s很好地配合使用,所以让我们直接使用它

DT <- data.table(V1 = unlist(Combined))       ## unlist the values
DT <- cSplit(DT, "V1", ":")                   ## Split by a colon
DT[, V1_1 := gsub("Public sector |Private sector ", "", V1_1)]  ## Just "notes"
DT[, id := cumsum(V1_1 == "Company")]         ## Add an id column

Thnaks,但它不能解决我在多个案例中的问题。解决我的问题(第一项除外)的是textThnaks,但它不能解决我在多个案例中的问题。解决我的问题(第一项除外)的是textThnaks,但它不能解决我在多个案例中的问题。什么能解决我的问题(第一项除外)是textThnaks,但它不能解决我在多个情况下的问题。什么能解决我的问题(第一项除外),是文本谢谢,但假设每个条目一行是不正确的。上面的示例包含两个条目,但它们没有分成行。所有条目都是未匹配的文本,但是将文本转换为每个条目一行应该很容易。哪种方法最简单?@xav,是否可能是“地址:”是否总是在条目的第一行?如果是这样,那么应该很容易修复。我已经看到您转换的文本已被拆分。下面是应该的文本:c("阿纳达-纳达协会地址:calle 13 13麦德林哥伦比亚其他地址:电话:13-13-136131 13-13-13-1313电子邮件:anada@13.co网页:类别:3.私营部门注释:Atodo-Asociación de todo地址:calle 12 Bogota Colombia其他地址:电话:12-1-23-32电子邮件:网页:www.atodoo.com,类别:99.公共部门注释:请注意,有缺少的字段。”)谢谢,但假设每个条目一行是不正确的。上面的示例包含两个条目,但它们没有分为几行。所有条目都在未匹配的文本中,但是将文本转换为每个条目一行应该很容易。哪种方法最简单?@xav,是否可能是“地址:“总是在条目的第一行吗?如果是这样,那么这应该是一个简单的修复方法。我看到您转换的文本已经被拆分。下面是应该的文本:c。”("阿纳达-纳达协会地址:calle 13 13麦德林哥伦比亚其他地址:电话:13-13-136131 13-13-13-1313电子邮件:anada@13.co网页:类别:3.私营部门注释:Atodo-Asociación de todo地址:calle 12 Bogota Colombia其他地址:电话:12-1-23-32电子邮件:网页:www.atodoo.com,类别:99.公共部门注释:请注意,缺少字段。”)谢谢,但假设每个条目只有一行