Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/314.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 循环/lappy中多个数据帧上的gsub()_R_Gsub - Fatal编程技术网

R 循环/lappy中多个数据帧上的gsub()

R 循环/lappy中多个数据帧上的gsub(),r,gsub,R,Gsub,我有两个数据帧,每个数据帧中都有一个名为“Title”的列,包含字符串。我需要减少这些字符串以便合并它们。现在,我想在循环中尽可能地使它干净,这样我只需要编写一次gsub函数 假设我有: df_1 <-read.table(text=" id Title 1 some_average_title 2 another:_one 3 the_third! 4 and_'the'_last ",header=TRUE,sep="") 但是它不起作用——尽管缺少错误消息 我还考虑了lappy(l

我有两个数据帧,每个数据帧中都有一个名为“Title”的列,包含字符串。我需要减少这些字符串以便合并它们。现在,我想在循环中尽可能地使它干净,这样我只需要编写一次gsub函数

假设我有:

df_1 <-read.table(text="
id Title
1 some_average_title
2 another:_one
3 the_third!
4 and_'the'_last
",header=TRUE,sep="")
但是它不起作用——尽管缺少错误消息

我还考虑了
lappy(list(dt_1,dt_2),函数(w){w$Title
get()
将允许您以编程方式获取多个数据集。
data.table()
将有助于轻松修改每个表中的列

## CREATING A FEW MORE DATA SETS
df_3 <- df_2
df_4 <- df_1
set.seed(1)
df_3$id <- sample(20, 4)
df_4$id <- sample(20, 4)

library(data.table)

dt_1 <- as.data.table(df_1)
dt_2 <- as.data.table(df_2)
dt_3 <- as.data.table(df_3)
dt_4 <- as.data.table(df_4)

## OR programatically: 

Numb_of_DTs <- 4

names_of_dt_objects <- paste("dt", 1:Numb_of_DTs, sep="_")  # dt_1, dt_2, etc
names_of_df_objects <- paste("df", 1:Numb_of_DTs, sep="_")  # dt_1, dt_2, etc

for (i in 1:Numb_of_DTs)
  assign(names_of_dt_objects[[i]], as.data.table(get(namse(names_of_df_objects[[i]]))))


for (dt.nm in names_of_dt_objects) {
  get(dt.nm)[, Title := gsub("[ .':!_]", "", Title)]
  ## set the key for merging in the next step
  setkey(get(dt.nm), Title)
  ## You might want to insert a line to clean up the column names, using 
  ##   setnames(get(dt.nm), OLD_NAMES, NEW_NAMES)
}


Reduce(merge, lapply(names_of_dt_objects, function(x) get(x)))
##创建更多的数据集
df_3这项工作:

for(df in c("df_1", "df_2")){
  assign(df, transform(get(df), Title =  gsub(" |\\.|'|:|!|\\'|_", "", Title)))
}
测试:

df_1
  id            Title
1  1 someaveragetitle
2  2       anotherone
3  3         thethird
4  4       andthelast
以及:


在@David的评论和@Carlos的回答之间,加上一点额外的内容:

使用
mget
获取
数据.frame
s,如果需要,使用
list2env
复制原始
数据.frame
s

mget
+
lappy
将执行转换

lapply(mget(ls(pattern = "df_\\d")), function(w)
  transform(w, Title = gsub(" |\\.|'|:|!|\\'|_", "", Title)))
# $df_1
#   id            Title
# 1  1 someaveragetitle
# 2  2       anotherone
# 3  3         thethird
# 4  4       andthelast
# 
# $df_2
#   id            Title
# 1  1 someaveragetitle
# 2  2       anotherone
# 3  3         thethird
# 4  4       andthelast
…但结果保留在
列表中
,不影响原始
数据。帧
s:

# df_1
#   id              Title
# 1  1 some_average_title
# 2  2       another:_one
# 3  3         the_third!
# 4  4     and_'the'_last
如果确实要覆盖
data.frame
s,请尝试:

list2env(
  lapply(mget(ls(pattern = "df_\\d")), function(w) 
    transform(w, Title = gsub(" |\\.|'|:|!|\\'|_", "", Title))), 
  envir = .GlobalEnv)
df_1
#   id            Title
# 1  1 someaveragetitle
# 2  2       anotherone
# 3  3         thethird
# 4  4       andthelast

lappy(list(dfu 1,dfu 2),function(w)gsub(“\124\.\ 124;”):“lappy”,“w$Title”)
?或更一般的
lappy(mget(ls(pattern=“df\\\”),function(w)gsub(“\124\.\”::“w$Title”)
正则表达式
,[:”code>,“:”更易于读取,“$dtm>”。
将不起作用,因为您希望
assign
将字符结果解释为语言对象,但它并不是这样设置的。@G.Grothendieck:好建议,在我(真实)的情况下,如果我也替换
\\\'“|”|\“
。如果您有一些字符串超过一个字符,而其他字符串超过一个字符,您仍然可以这样做:
[ab]| cd
,它将匹配
a
b
cd
+1以获得干净快速的代码。注意:如果某人的姓名模式不允许使用
ls()
,请使用
mget(c)(“dfu 1”,“2ndftrm”))
  df_2
  id            Title
1  1 someaveragetitle
2  2       anotherone
3  3         thethird
4  4       andthelast
lapply(mget(ls(pattern = "df_\\d")), function(w)
  transform(w, Title = gsub(" |\\.|'|:|!|\\'|_", "", Title)))
# $df_1
#   id            Title
# 1  1 someaveragetitle
# 2  2       anotherone
# 3  3         thethird
# 4  4       andthelast
# 
# $df_2
#   id            Title
# 1  1 someaveragetitle
# 2  2       anotherone
# 3  3         thethird
# 4  4       andthelast
# df_1
#   id              Title
# 1  1 some_average_title
# 2  2       another:_one
# 3  3         the_third!
# 4  4     and_'the'_last
list2env(
  lapply(mget(ls(pattern = "df_\\d")), function(w) 
    transform(w, Title = gsub(" |\\.|'|:|!|\\'|_", "", Title))), 
  envir = .GlobalEnv)
df_1
#   id            Title
# 1  1 someaveragetitle
# 2  2       anotherone
# 3  3         thethird
# 4  4       andthelast