R:仅从姓名列表中提取名字和姓氏

R:仅从姓名列表中提取名字和姓氏,r,R,我使用R进行数据操作。我有一长串的名字,看起来像这样: "names" [1] "" [2] "Victoria Marie" [3] "Ori Mann" [4] "Lina Pearl Right" [5] "David Berg" [6] "Anthony Lee"

我使用R进行数据操作。我有一长串的名字,看起来像这样:

"names"

[1] ""                               
[2] "Victoria Marie"                 
[3] "Ori Mann"                     
[4] "Lina Pearl Right"          
[5] "David Berg"                     
[6] "Anthony Lee"                  
[7] "Brian Michael Ingraham"         
[8] "Jay Ling"             
我只想将整个列表的名字和姓氏提取到新的列中,并放弃任何中间名。我该怎么做? 我使用了以下代码:

mat  = matrix(unlist(names), ncol=2, byrow=TRUE)
这只是遍历每个条目中的所有名称,并按顺序将所有名称都放入列中


如果有任何帮助,我们将不胜感激。

这里有一种方法可以在base R中实现这一点,它还可以处理后缀的可能性。如果发现其他后缀(例如,“II”),可以将它们添加到%中
%后面的向量中

# some representative data
names <- list("", "Ed Smith", "Jennifer Jason Leigh", "Ed Begley, Jr.")

# use strsplit to get a list of vectors of each name broken into its parts,
# keying off the space between names
names.split <- strsplit(unlist(names), " ")

# make new vectors with the first and last names, based on their position in
# those vectors. for last names, make the result conditional on whether or
# not a recognized suffix is in the last spot, and get rid of any 
# punctuation attached to the last name if there was a suffix.
name.first <- sapply(names.split, function(x) x[1])
name.last <- sapply(names.split, function(x)

  # this deals with empty name slots in your original list, returning NA
  if(length(x) == 0) {

    NA

  # now check for a suffix; if one is there, use the penultimate item
  # after stripping it of any punctuation
  } else if (x[length(x)] %in% c("Jr.", "Jr", "Sr.", "Sr")) {

    gsub("[[:punct:]]", "", x[length(x) - 1])

  } else {

    x[length(x)]

})

这里有一种在R基中实现这一点的方法,它也处理后缀的可能性。如果发现其他后缀(例如,“II”),可以将它们添加到%
%后面的向量中

# some representative data
names <- list("", "Ed Smith", "Jennifer Jason Leigh", "Ed Begley, Jr.")

# use strsplit to get a list of vectors of each name broken into its parts,
# keying off the space between names
names.split <- strsplit(unlist(names), " ")

# make new vectors with the first and last names, based on their position in
# those vectors. for last names, make the result conditional on whether or
# not a recognized suffix is in the last spot, and get rid of any 
# punctuation attached to the last name if there was a suffix.
name.first <- sapply(names.split, function(x) x[1])
name.last <- sapply(names.split, function(x)

  # this deals with empty name slots in your original list, returning NA
  if(length(x) == 0) {

    NA

  # now check for a suffix; if one is there, use the penultimate item
  # after stripping it of any punctuation
  } else if (x[length(x)] %in% c("Jr.", "Jr", "Sr.", "Sr")) {

    gsub("[[:punct:]]", "", x[length(x) - 1])

  } else {

    x[length(x)]

})

你是说你想让名字和姓氏分开列,还是只需要一列字符串简化为名字和姓氏?是的,名字在一列,姓氏在另一列。谢谢。小心点——如果你的数据集包含很多人,你几乎可以保证有人的姓氏会跨越两个或两个以上的单词,或者姓氏排在第一位,或者。。。好的,给你。谢谢!但我可能会忽略我分析中的异常现象。你是说你想让名字和姓氏分开列,还是只需要一列字符串简化为名字和姓氏?是的,一列中的名字和另一列中的姓氏。谢谢。小心点——如果你的数据集包含很多人,你几乎可以保证有人的姓氏会跨越两个或两个以上的单词,或者姓氏排在第一位,或者。。。好的,给你。谢谢!但我可能会忽略我分析中的异常现象。天哪,太棒了!它工作得很好!我真是太感谢你了,真是太好了!它工作得很好!我对你感激不尽。