在列中标识匹配的字符串，并使用R中每个字符串的编号创建新列_R_Tidyverse_Mutate

在列中标识匹配的字符串，并使用R中每个字符串的编号创建新列

在列中标识匹配的字符串，并使用R中每个字符串的编号创建新列,r,tidyverse,mutate,R,Tidyverse,Mutate,我有一个tibble，看起来有点像以下内容： block description 1 enroll 1 enroll 1 motivated 1 motivated 1 motivated 2 openemail 2 openemail df$question <- as.integer(factor(df$description, levels = unique(df$description))) 我想创建一个新

我有一个tibble，看起来有点像以下内容：

block  description
1      enroll
1      enroll
1      motivated
1      motivated
1      motivated
2      openemail
2      openemail

df$question <- as.integer(factor(df$description, levels = unique(df$description)))

我想创建一个新列，该列的唯一编号对应于“description”列中的每个唯一值。“描述”列中的唯一值比此处显示的要多得多。我想知道R是否有办法确定哪些唯一值彼此匹配，然后为每个值生成一个新值，因此生成的tibble如下所示：

block  description  question
1      enroll       1
1      enroll       1 
1      motivated    2
1      motivated    2
1      motivated    2
2      openemail    3
2      openemail    3

我计划使用mutate（）创建新列，但不确定输入应该是什么。理想情况下，有一种方法可以做到这一点，而无需我输入可能位于“描述”下的每个唯一值

编辑：以下几种最适合我的解决方案组合如下：

block  description
1      enroll
1      enroll
1      motivated
1      motivated
1      motivated
2      openemail
2      openemail

df$question <- as.integer(factor(df$description, levels = unique(df$description)))

df$问题R基本解决方案
> df$question <- as.numeric(df$description)
> df
  block description question
1     1      enroll        1
2     1      enroll        1
3     1   motivated        2
4     1   motivated        2
5     1   motivated        2
6     2   openemail        3
7     2   openemail        3

>df$问题df
块描述问题
1注册1
2 1注册1
3 1 2
4.1.2
5.1.2
6 2打开邮件3
7 2打开邮件3
我们可以使用作为.integer（factor（））
来获得所需的结果。使用unique（）
手动指定级别可以避免在levels=NULL时进行默认排序，从而使级别在数据帧中的显示顺序一致。请注意，我对一些行进行了重新排序，以明确级别的第一次出现决定其索引。

库（tidyverse）
df%
mutate（index=as.integer（factor（description，levels=unique（description）））
#>#tibble:7 x 3
#>块描述索引
#>           
#>1注册1
#>2开放邮件2
#>3 1注册1
#>4 2打开邮件2
#>5 1 3
#>6 1 3
#>7 1 3

由（v0.2.0）于2018年4月13日创建。
块是否与唯一值的计算相关？对于使用mutate的解决方案，我得到两个错误：1）未找到该对象“”，2）无法在“character”类对象上使用组索引。知道如何解决这些问题吗？以df$index开头的解决方案不会按照数字最初在df中显示的顺序分配数字，而是按照字母顺序分配数字。如果您需要按照出现的顺序分配数字，则应使用上述方法，避免将中的因子值默认排序为.factor
<代码>组索引

将无法作为解决方案，如果您不需要对它们进行排序，因此我将其删除。虽然现在是多余的，但我只能建议您为

组索引

的错误键入了函数调用，可能传递的是

“description”

，而不是

description

？此代码运行，示例输出看起来正确，但当我尝试查看（df）时，索引列不可见。当我尝试调用df$index时，结果为空。你知道这是为什么吗？因为我没有在行中包含赋值步骤，以便在示例中打印输出。。。如果您做了

df%变异（…）

它将覆盖

df

这将为问题创建一个单独的列，但不会填充该列。