R：仅从姓名列表中提取名字和姓氏_R

R：仅从姓名列表中提取名字和姓氏

R：仅从姓名列表中提取名字和姓氏,r,R,我使用R进行数据操作。我有一长串的名字，看起来像这样： "names" [1] "" [2] "Victoria Marie" [3] "Ori Mann" [4] "Lina Pearl Right" [5] "David Berg" [6] "Anthony Lee"

我使用R进行数据操作。我有一长串的名字，看起来像这样：

"names"

[1] ""                               
[2] "Victoria Marie"                 
[3] "Ori Mann"                     
[4] "Lina Pearl Right"          
[5] "David Berg"                     
[6] "Anthony Lee"                  
[7] "Brian Michael Ingraham"         
[8] "Jay Ling"

我只想将整个列表的名字和姓氏提取到新的列中，并放弃任何中间名。我该怎么做？我使用了以下代码：

mat  = matrix(unlist(names), ncol=2, byrow=TRUE)

这只是遍历每个条目中的所有名称，并按顺序将所有名称都放入列中

如果有任何帮助，我们将不胜感激。

这里有一种方法可以在base R中实现这一点，它还可以处理后缀的可能性。如果发现其他后缀（例如，“II”），可以将它们添加到%中

%后面的向量中
# some representative data
names <- list("", "Ed Smith", "Jennifer Jason Leigh", "Ed Begley, Jr.")

# use strsplit to get a list of vectors of each name broken into its parts,
# keying off the space between names
names.split <- strsplit(unlist(names), " ")

# make new vectors with the first and last names, based on their position in
# those vectors. for last names, make the result conditional on whether or
# not a recognized suffix is in the last spot, and get rid of any 
# punctuation attached to the last name if there was a suffix.
name.first <- sapply(names.split, function(x) x[1])
name.last <- sapply(names.split, function(x)

  # this deals with empty name slots in your original list, returning NA
  if(length(x) == 0) {

    NA

  # now check for a suffix; if one is there, use the penultimate item
  # after stripping it of any punctuation
  } else if (x[length(x)] %in% c("Jr.", "Jr", "Sr.", "Sr")) {

    gsub("[[:punct:]]", "", x[length(x) - 1])

  } else {

    x[length(x)]

})

这里有一种在R基中实现这一点的方法，它也处理后缀的可能性。如果发现其他后缀（例如，“II”），可以将它们添加到%

中

%后面的向量中
# some representative data
names <- list("", "Ed Smith", "Jennifer Jason Leigh", "Ed Begley, Jr.")

# use strsplit to get a list of vectors of each name broken into its parts,
# keying off the space between names
names.split <- strsplit(unlist(names), " ")

# make new vectors with the first and last names, based on their position in
# those vectors. for last names, make the result conditional on whether or
# not a recognized suffix is in the last spot, and get rid of any 
# punctuation attached to the last name if there was a suffix.
name.first <- sapply(names.split, function(x) x[1])
name.last <- sapply(names.split, function(x)

  # this deals with empty name slots in your original list, returning NA
  if(length(x) == 0) {

    NA

  # now check for a suffix; if one is there, use the penultimate item
  # after stripping it of any punctuation
  } else if (x[length(x)] %in% c("Jr.", "Jr", "Sr.", "Sr")) {

    gsub("[[:punct:]]", "", x[length(x) - 1])

  } else {

    x[length(x)]

})

你是说你想让名字和姓氏分开列，还是只需要一列字符串简化为名字和姓氏？是的，名字在一列，姓氏在另一列。谢谢。小心点——如果你的数据集包含很多人，你几乎可以保证有人的姓氏会跨越两个或两个以上的单词，或者姓氏排在第一位，或者。。。好的，给你。谢谢！但我可能会忽略我分析中的异常现象。你是说你想让名字和姓氏分开列，还是只需要一列字符串简化为名字和姓氏？是的，一列中的名字和另一列中的姓氏。谢谢。小心点——如果你的数据集包含很多人，你几乎可以保证有人的姓氏会跨越两个或两个以上的单词，或者姓氏排在第一位，或者。。。好的，给你。谢谢！但我可能会忽略我分析中的异常现象。天哪，太棒了！它工作得很好！我真是太感谢你了，真是太好了！它工作得很好！我对你感激不尽。