Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/wcf/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 替换矩阵中的重复项_R_Matrix_Duplicates_Stringdist - Fatal编程技术网

R 替换矩阵中的重复项

R 替换矩阵中的重复项,r,matrix,duplicates,stringdist,R,Matrix,Duplicates,Stringdist,我为您准备了以下测试代码: #####在这里测试 test=tibble::tribble( ~Name1,~Name2,~Name3, “保罗·沃克”、“保罗·沃克”、“海科·纳普”, “费迪南德·巴斯”,“费迪南德·贝斯”,“迈克尔·赫尔” ) 图书馆(stringdist) gsub(“[()c\“]”,“”中的输出错误,soundexspalten):对象'soundexspalten'nicht gefunden soundexmatrix1=gsub(“0000”,soundexma

我为您准备了以下测试代码:

#####在这里测试
test=tibble::tribble(
~Name1,~Name2,~Name3,
“保罗·沃克”、“保罗·沃克”、“海科·纳普”,
“费迪南德·巴斯”,“费迪南德·贝斯”,“迈克尔·赫尔”
)
图书馆(stringdist)
gsub(“[()c\“]”,“”中的输出错误,soundexspalten):对象'soundexspalten'nicht gefunden
soundexmatrix1=gsub(“0000”,soundexmatrix0)
#>gsub(“0000”和“”中的错误,soundexmatrix0):对象“soundexmatrix0”不正确
由(v2.0.0)于2021-06-03创建

现在我想!!!用字符串“DUPLICATE”替换soundexmatrix1中的所有副本,这样矩阵的维数保持不变,并且可以立即看到所有副本

有什么办法吗?
感谢您的帮助!

要检查每行中是否有重复项(请参见更新),这应该可以实现您想要的,并且以更干净的方式:

# Feel free to load the packages you're using.
# library(stringdist)
# library(tibble)

test <- tibble::tribble(
  ~Name1,           ~Name2,           ~Name3,
  "Paul Walker",    "Paule Walkr",    "Heiko Knaup",
  "Ferdinand Bass", "Ferdinand Base", "Michael Herre"
)

# Get phonetic codes cleanly.
result <- as.matrix(apply(X = test, MARGIN = 2,
                          FUN = stringdist::phonetic, method = c("soundex"), useBytes = FALSE))

# Find all blank codes ("0000").
blanks <- result == "0000"

# # Find all duplicates, as compared across ENTIRE matrix; ignore blank codes.
# all_duplicates <- !blanks & duplicated(result, MARGIN = 0)

# Find duplicates, as compared within EACH ROW; ignore blank codes.
row_duplicates <- !blanks & t(apply(X = result, MARGIN = 1, FUN = duplicated))

# Replace blank codes ("0000") with blanks (""); and replace duplicates (found
# within rows) with "DUPLICATE".
result[blanks] <- ""
result[row_duplicates] <- "DUPLICATE"

# View result.
result
更新 根据海报,我修改了代码,只在每一行中比较重复的代码,而不是在整个
结果
矩阵中进行比较

test <- tibble::tribble(
    ~Name1,           ~Name2,           ~Name3,
    "Paul Walker",    "Paule Walkr",    "Heiko Knaup",
    "Ferdinand Bass", "Ferdinand Base", "Michael Herre",
    "",               "01234 56789",    "Heiko Knaup"
# | ^^              | ^^^^^^^^^^^^^   | ^^^^^^^^^^^^^                   |
# | Coded as "0000" | Coded as "0000" | Duplicate in matrix, NOT in row |
)

您好,这是非常有帮助的,但我只想放置“复制”“如果在同一行中有一个副本,那么不是整个矩阵中的所有副本。啊,我想我正好有这个东西。给我几个小时。我假设你在中欧夏季,@MaxH。?Hi@MaxH.,我刚刚更新了代码,以便只比较行内的重复项。请不要添加更新,只需编辑以成为当时最好的帖子。如果某个问题被编辑为使合理答案无效,请将其回滚。
test <- tibble::tribble(
    ~Name1,           ~Name2,           ~Name3,
    "Paul Walker",    "Paule Walkr",    "Heiko Knaup",
    "Ferdinand Bass", "Ferdinand Base", "Michael Herre",
    "",               "01234 56789",    "Heiko Knaup"
# | ^^              | ^^^^^^^^^^^^^   | ^^^^^^^^^^^^^                   |
# | Coded as "0000" | Coded as "0000" | Duplicate in matrix, NOT in row |
)
     Name1  Name2       Name3 
[1,] "P442" "DUPLICATE" "H225"
[2,] "F635" "DUPLICATE" "M246"
[3,] ""     ""          "H225"