无法让R比较看似相等的字符串_R_Excel_Encoding_String Matching

无法让R比较看似相等的字符串

r excel encoding

无法让R比较看似相等的字符串,r,excel,encoding,string-matching,R,Excel,Encoding,String Matching,我最近收到一个来源未知的excel文件（并使用excel将一个子集输出到文本文件“example0.txt”（请参见下面的dropbox链接）。在这个数据集中，两个看起来完全相等的字符串无法通过R进行比较 a<-scan("example0.txt", what="raw") a [1] "ÖSTVÅG" "FALKVÅG" "ÖSTVÅG" # cell a[1] and a[3] appear similar ("ÖSTVÅG"). However, a[1]==a[3]

我最近收到一个来源未知的excel文件（并使用excel将一个子集输出到文本文件“example0.txt”（请参见下面的dropbox链接）。在这个数据集中，两个看起来完全相等的字符串无法通过R进行比较

a<-scan("example0.txt", what="raw")
a
[1] "ÖSTVÅG"   "FALKVÅG" "ÖSTVÅG"

# cell a[1] and a[3] appear similar ("ÖSTVÅG"). However,

a[1]==a[3]                  # I was expecting TRUE but I get FALSE
nchar(a[1]) == nchar(a[3])  # I was expecting TRUE (n=6) but I get FALSE

# and similarly,

nchar(a[2]) == 8            # I was expecting n=7

a您的dropbox中的文件似乎没有给出上述结果。@Nuno Prista-看不到dropbox（公司防火墙）；但是当您逐字节比较您提到的两个文件（例如example0.txt和example0correct.txt）时，一定有差异（例如，以十六进制查看）-您检查了吗？如果a[3]
包含一个。请参见enc2utf8（a）
stringi:：stri_width（a[-2]）
两次生成6
。@lukeA谢谢！我现在可以识别它们了。有没有快速的方法将它们转换为相同的“格式”？我的意思是：生成一个[3]==a[1]？我需要它，以便能够（例如）通过仅提取特定字符类对字符串进行排序，请参见？stringi:：`stringi search charclass`
和？stringi:：stri_extract_all_charclass
。