Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/71.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/node.js/39.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R txt文件的异常结构_R - Fatal编程技术网

R txt文件的异常结构

R txt文件的异常结构,r,R,具有这样的文本文件(示例): 原始文件有65k行。 我需要将其上传到R并使其可处理。我使用了以下功能: read.table-不起作用(R从未返回任何结果) freadfromdata.tablepackage-需要对文件进行大量手动预处理,并且由于引号断线和文件格式不正确,因此无法按需工作) scan得到了一个向量,转换成矩阵并没有带来需要的结果 所需的文件格式为常规数据框: mydata <- structure(list(fieldName = structure(c(3L, 3L

具有这样的文本文件(示例):

原始文件有65k行。 我需要将其上传到R并使其可处理。我使用了以下功能:

  • read.table
    -不起作用(R从未返回任何结果)
  • fread
    from
    data.table
    package-需要对文件进行大量手动预处理,并且由于引号断线和文件格式不正确,因此无法按需工作)
  • scan
    得到了一个向量,转换成矩阵并没有带来需要的结果 所需的文件格式为常规数据框:

    mydata <- structure(list(fieldName = structure(c(3L, 3L), .Label = c("description", 
        "scraped_manufacturer", "title"), class = "factor"), foreign_id = c(13389, 
        13389), is_single_product = structure(1:2, .Label = c("FALSE", 
        "TRUE"), class = "factor"), matched_manufacturers = c("Foden /manId: 76775", 
        "Caterpillar /manId: 74, Skogsjan-Caterpillar /manId: 10329"), 
            matched_products = c("", "C12 /modelId: 32774 /manId: 74"
            ), raw_string = c("CAT FODEN C-12 ENGINE", "CATERPILLAR C-12 ENGINE"
            ), pagesource = structure(c(84L, 84L), .Label = c("", "585e362f6b010083d6962041", 
            "585f270a300000c614b819ed", "585f84be6b0100c6ee962ab1", "585f84dc66010074efac42ca", 
            "585f875a6b0100c7ee963000", "585f878c66010074efac483e", "585f87ad66010075efac4880", 
            "585f88e06b0100b6ee96331c", "585f8b4566010074efac4fcb", "agriaffaires", 
            "apex-auctions", "arlington-plastics-machinery", "auctelia", 
            "auctions-international", "autogilles", "baestlein", "baupool", 
            "bavaria-swiss-ag", "big-iron", "big-machinery", "blackforxx", 
            "blue-group", "bpi-associates", "buk-baumaschinen", "cegema", 
            "christophbusch", "cjm-asset", "classified", "cnc-auction", 
            "cottrill-and-co", "daan", "de-vries", "dechow", "dimex-import-export", 
            "e-farm", "ebay", "ebay-de", "eberle-hald-gmbh", "eggers-landmaschinen", 
            "euro-auctions", "fabricating-machinery-corp", "fastline", 
            "ferwood", "fh-machinery", "first-machinery-auctions-limited", 
            "forklift-international", "ga-tec-gabelstaplertechnik", "gambtec", 
            "geiger", "german-graphics", "goindustry-dovebid", "graf", 
            "gruma-nutzfahrzeuge-gmbh", "hanselmann", "heinrich-kuper-gmbh", 
            "hooray-machinery", "imz-maschinen", "industrial-discount", 
            "ipr-petmachinery", "ironplanet", "ironplanet-com", "karl-guenter-wirths-gmbh", 
            "karner-dechow", "kurt-steiger", "kvd-auctions", "lagermaschinen", 
            "leinweber-landtechnik", "mach4metal", "machinefinder", "machinery-park", 
            "machineryzone", "maschinenbau-rehnen-gmbh", "mideast-equipment", 
            "mmtequipment", "oskar-broziat-maschinen", "perfection-global", 
            "perlick", "perry-videx", "pfeifer-machinery", "plustech-as", 
            "polboto-agri-sp-z-oo", "pressenhaas", "rc-tuxford-exports", 
            "resale", "restlos", "richter-friedewald-gmbh", "ritchie-bros", 
            "rock-and-dirt", "rogiers", "rs-auktionen", "stig-bindner", 
            "surplex", "technikboerse", "themar-trucks", "traktorpool", 
            "unilift", "vebim", "vertimac", "zeppelin-caterpillar", "zoll-auktion", 
            "zuern-gmbh"), class = "factor")), .Names = c("fieldName", 
        "foreign_id", "is_single_product", "matched_manufacturers", "matched_products", 
        "raw_string", "pagesource"), row.names = 1:2, class = "data.frame")
    

    mydata考虑在可以读取RTF类型的软件中打开文本文件。在Windows计算机上,Microsoft Word和内置写字板可以读取.rtf文档。这样,一个有效的json就会显示在文档中(没有标记内容)

    幸运的是,Windows上的R可以使用
    RDCOMClient
    库连接到MS Word对象库,您可以在其中使用属性提取文本。读入json文本后,使用
    jsonlite
    库将内容迁移到数据帧:

    library(RDCOMClient)
    library(jsonlite)
    
    # OPEN WORD APP
    wrdApp = COMCreate("Word.Application")
    wrdDoc = wrdApp$Documents()$Open("C:\Path\To\Data.txt")    
    wrdtext = wrdDoc[['Content']]
    
    # EXTRACT TEXT TO R VARIABLE
    doc = wrdtext$Text()
    
    # CLOSE APP
    wrdDoc$Close(FALSE)
    wrdApp$Quit()
    
    # RELEASE RESOURCES
    wrdtext <- wrdDoc <- wrdApp <- NULL
    rm(wrdtext, wrdDoc, wrdApp)
    gc()
    
    # RAW DF: NAME / COLUMNS / VALUES LIST TYPES
    rawdf <- fromJSON(doc)[[1]][[1]][[1]]
    
    # FINAL DF: NORMALIZING VALUES WITH COL NAMES
    finaldf <- setNames(data.frame(rawdf$values, stringsAsFactors = FALSE),
                                   rawdf$columns[[1]])
    
    保存json文件后,在R中运行:

    rawdf <- do.call(rbind, lapply(paste(reaadLines("C:\Path\To\Data.json", warn=FALSE),
                                          collapse=""), 
                                    jsonlite::fromJSON))[[1]][[1]][[1]]
    
    finaldf <- setNames(data.frame(rawdf$values, stringsAsFactors = FALSE),
                        rawdf$columns[[1]])
    

    rawdf这是rtf(富文本格式)。看见有一个名为rtf的R包,但它似乎只用于创建rtf文件,而不是读取它们。在我使用mac时有一些建议,所以我使用了另一种方法。以txt格式执行原始文件,工作完美!多谢各位!令人惊叹的!很乐意帮忙。毫无疑问,Mac MS Word也可以读取
    .rtf
    格式。
    rawdf <- do.call(rbind, lapply(paste(reaadLines("C:\Path\To\Data.json", warn=FALSE),
                                          collapse=""), 
                                    jsonlite::fromJSON))[[1]][[1]][[1]]
    
    finaldf <- setNames(data.frame(rawdf$values, stringsAsFactors = FALSE),
                        rawdf$columns[[1]])