将非ascii字符替换为在R中没有循环的已定义字符串列表

将非ascii字符替换为在R中没有循环的已定义字符串列表,r,replace,special-characters,non-ascii-characters,R,Replace,Special Characters,Non Ascii Characters,我想用ascii等效字符替换非ascii字符(目前仅限西班牙语)。如果我有“á”,我想用“a”来代替它,以此类推 我构建了这个函数(工作正常),但我不想使用循环(包括像Sappy这样的内部循环) latin2asciigsubfn()在同名的包中,这类东西非常好: library(gsubfn) # Create a named list, in which: # - the names are the strings to be looked up # - the values ar

我想用ascii等效字符替换非ascii字符(目前仅限西班牙语)。如果我有“á”,我想用“a”来代替它,以此类推

我构建了这个函数(工作正常),但我不想使用循环(包括像Sappy这样的内部循环)

latin2ascii
gsubfn()
在同名的包中,这类东西非常好:

library(gsubfn)

# Create a named list, in which:
#   - the names are the strings to be looked up
#   - the values are the replacement strings
mapL <- c("á","é","í","ó","ú","Á","É","Í","Ó","Ú","ñ","Ñ","ü","Ü")
mapA <- c("a","e","i","o","u","A","E","I","O","U","n","N","u","U")

# ll <- setNames(as.list(mapA), mapL) # An alternative to the 2 lines below
ll <- as.list(mapA)
names(ll) <- mapL


# Try it out
string <- "ÍÓáÚ"
gsubfn("[áéíóúÁÉÍÓÚñÑüÜ]", ll, string)
# [1] "IOaU"
库(gsubfn)
#创建命名列表,其中:
#-名称是要查找的字符串
#-这些值是替换字符串

mapL我喜欢Josh的版本,但我想我可能会添加另一个“矢量化”解决方案。它返回未注释字符串的向量。它还仅依赖于
基本功能

x=c('íÁuÚ','uíÚÁ')

mapL<-c("á","é","í","ó","ú","Á","É","Í","Ó","Ú","ñ","Ñ","ü","Ü")
mapA<-c("a","e","i","o","u","A","E","I","O","U","n","N","u","U")
split=strsplit(x,split='')
m=lapply(split,match,mapL)
mapply(function(split,m) paste(ifelse(is.na(m),split,mapA[m]),collapse='') , split, m)
# "iAuU" "uiUA"
x=c('uíÚ','uíÚÁ')

马普坦克斯!工作完美。只有一个问题(只是想知道);您知道gsubfn函数是否使用任何类型的内部循环吗?应该比sapply快吗?@阿尔瓦罗——我不认为
gsubfn()
特别快速--“刚好”方便且优雅。另请参见R底部的
chartr
,它似乎适合所述的问题,尽管如果实际问题中存在变化,例如替换两个字符序列,则
gsubfn
仍然可以处理,但不能
chartr
@G.Grothendieck--谢谢指出这一点。我已经把它附在答案后面了。
A <- paste(mapA, collapse="")
L <- paste(mapL, collapse="")
chartr(L, A, "ÍÓáÚ")
# [1] "IOaU"
x=c('íÁuÚ','uíÚÁ')

mapL<-c("á","é","í","ó","ú","Á","É","Í","Ó","Ú","ñ","Ñ","ü","Ü")
mapA<-c("a","e","i","o","u","A","E","I","O","U","n","N","u","U")
split=strsplit(x,split='')
m=lapply(split,match,mapL)
mapply(function(split,m) paste(ifelse(is.na(m),split,mapA[m]),collapse='') , split, m)
# "iAuU" "uiUA"