R 如何将多个编码统一为一个？_R_Encoding_Utf 8_Web Scraping_Utf 16

R 如何将多个编码统一为一个？

r encoding utf-8 web-scraping

R 如何将多个编码统一为一个？,r,encoding,utf-8,web-scraping,utf-16,R,Encoding,Utf 8,Web Scraping,Utf 16,我想用utf-8编码然而，有7种猜测编码（）我应该用utf-8来完成这些吗 >guess_encoding(text) encoding language confidence 1 UTF-8 0.15 >guess_encoding(text) encoding language confidence 1 UTF-8 1.00 2 UTF-16BE 0

我想用utf-8编码

然而，有7种猜测编码（）

我应该用utf-8来完成这些吗

>guess_encoding(text)
  encoding language confidence
1    UTF-8                0.15

>guess_encoding(text)
  encoding language confidence
1        UTF-8                1.00
2     UTF-16BE                0.10
3     UTF-16LE                0.10
4 windows-1255       he       0.07
5 windows-1255       he       0.06
6   IBM420_ltr       ar       0.04
7   IBM420_rtl       ar       0.02

“guess_encoding（）”的含义是否表示编码的结构

无法使用“修复编码”代码进行编码

我已经对这段代码进行了编码，但它似乎工作不正常。我应该使用“iconv”吗

我必须把这个编码6次吗

iconv(text, from="UTF-16BE", to="UTF8") 
iconv(text, from="UTF-16LE", to="UTF8")
iconv(text, from="windows-1255", to="UTF8") 
#Omitted below

整个代码作为参考发布

问题的内容可能很难理解

我把整个代码放得很贴身

library(httr)
library(rvest)
library(stringr)


# Bulletin URL
list.url = 'http://kin.naver.com/qna/list.nhn?m=expertAnswer&dirId=70111'

# Vector to store title and body
text = c() #Answer the question

#  1 to 10 page bulletin crawling
for(i in 1:10){
  url = modify_url(list.url, query=list(page=i))  # Change the page in the bulletin URL
  h.list = read_html(url, encoding = 'UTF-8')  # Get a list of posts, read and save html files from url

  # Post link extraction
  title.link1 = html_nodes(h.list, '.title') #class of title
  title.links = html_nodes(title.link1, 'a') #title.link1 to a

  article.links = html_attr(title.links, 'href') 
  article.links = paste0("http://kin.naver.com",article.links) 

  #Extract attrribute
  for(link in article.links){
    h = read_html(link)  # Get the post

    # answer    
    text = html_text(html_nodes(h, '#contents_layer_1'))
    text= str_trim(repair_encoding(texts))
    texts=c(texts,text)

    print(link)

  }
}

您是否尝试过

enc2utf8

？如果是这样的话，结果有什么问题吗？还请提及您正在使用的库，

guess\u encoding

存在于多个包中，至少在

rvest

和

readr

中，似乎您正在使用

rvest

@Moody\u mudscapper我不明白您的意思，但我使用了“rvest”。您的意思是像这样将“h.list=read_html（url，编码='UTF-8'）”更改为“h.list=read_html（url，编码='enc2utf8'）”吗？当我这样做的时候，我得到了与上面相同的错误（=警告消息）。@Moody_Mudskipper我的代码是一个简单的问答网站，比如“stackoverflow”。输入关键字后，问题的每个URL都将获得所需的元素。在我的整个代码中，我转向了标题、内容和anss，但anss部分出现了一个问题。那么，你只需要在anss部分更改编码吗？@Moody_mudscapper，谢谢你的回答。谢谢你没有回复这篇文章。如果你有机会，我想给你发一封更详细的电子邮件。如果你不介意的话，我能得到一个电子邮件地址吗？