使用read.csv读取时出现异常字符_R_Utf 8_Character Encoding_Non Ascii Characters_Read.csv

使用read.csv读取时出现异常字符

r utf-8 character-encoding

使用read.csv读取时出现异常字符,r,utf-8,character-encoding,non-ascii-characters,read.csv,R,Utf 8,Character Encoding,Non Ascii Characters,Read.csv,我正在读取的文件每行包含一个单词。我对其中的一些词有异议，因为似乎有些字符是不寻常的。请参见下面的示例和我列表中的第一个单词 stopwords <- read.csv("stopwords_fr.txt",stringsAsFactors = FALSE,header=FALSE,encoding="UTF-8")$V1 stopwords[1] # "a" , if you copy paste into R studio this character with the quotes

我正在读取的文件每行包含一个单词。我对其中的一些词有异议，因为似乎有些字符是不寻常的。请参见下面的示例和我列表中的第一个单词

stopwords <- read.csv("stopwords_fr.txt",stringsAsFactors = FALSE,header=FALSE,encoding="UTF-8")$V1
stopwords[1] # "a" , if you copy paste into R studio this character with the quotes around it, you'll see a little red dot preceding the a.
stopwords[1] == "a" # FALSE

以下是我获取文件的来源：

根据notepad++，该文件的编码为UTF-8-BOM。但是使用“UTF-8-BOM”作为编码没有帮助。虽然这个答案似乎有效：

stopwords你能链接到你的数据文件吗？或者类似的东西。你还需要定义一个词“不寻常”的原因，因为它在任何语言中都可能是不寻常的。作为一个土生土长的英国人，大多数带有任何口音的单词都是“不寻常的”：（如果它是第一个字符，它可能是一个字节顺序标记-参见前面的问题。。。最简单的方法可能是将第一个值设置为“a”
（从键盘键入！）@Spacedman：我用源代码和可复制的示例编辑了这个值。我知道“不寻常”不是一个很好的术语，我还没有找到任何其他…@AndrewGustar我想你是对的，在记事本++中打开文件，我看到文件的编码是UTF-8-BOM，但是这种编码在R中似乎不可用。我已经阅读了你的链接，但我不知道如何解决我在R中的问题。这里有一个可能的答案…-<代码>文件编码

而不是

编码

"a" == "a" # FALSE

stopwords <- read.csv("stopwords_fr.txt",stringsAsFactors = FALSE,header=FALSE,encoding="UTF-8-BOM")$V1
stopwords[1] # "ï»¿a"