R-在列表中的data.frames中循环-修改列(列表元素)的字符
我有几千个R-在列表中的data.frames中循环-修改列(列表元素)的字符,r,loops,csv,dataframe,anonymize,R,Loops,Csv,Dataframe,Anonymize,我有几千个*.csv文件(所有文件都有一个唯一的名称),但文件的标题列是相等的,比如“Timestamp”,“System\u name”,“CPU\u ID”,等等。 我的问题是如何替换“System\u Name”(这是一个类似于“as12535.org.at”的系统名)或任何其他字符组合,并将其匿名化?我非常感谢任何提示或指向正确方向的指示… 下面是CSV文件的结构 "Timestamp","System_Name","CPU_ID","User_CPU","User_Nice_CPU",
*.csv文件(所有文件都有一个唯一的名称),但文件的标题列是相等的,比如“Timestamp”
,“System\u name”
,“CPU\u ID”
,等等。
我的问题是如何替换“System\u Name”
(这是一个类似于“as12535.org.at”
的系统名)或任何其他字符组合,并将其匿名化?我非常感谢任何提示或指向正确方向的指示…
下面是CSV文件的结构
"Timestamp","System_Name","CPU_ID","User_CPU","User_Nice_CPU","System_CPU","Idle_CPU","Busy_CPU","Wait_IO_CPU","User_Sys_Pct"
"1161025010002000","as06240.org.xyz:LZ","-1","1.83","0.00","0.56","97.28","2.72","0.33","3.26"
"1161025010002000","as06240.org.xyz:LZ","-1","1.83","0.00","0.56","97.28","2.72","0.33","3.26"
"1161025010002000","as06240.org.xyz:LZ","-1","1.83","0.00","0.56","97.28","2.72","0.33","3.26"
我尝试使用R包匿名器,它在向量级别上运行良好,但我在R中读取的数千个csv文件中遇到了这样的问题-我尝试的是以下内容-创建一个列表,将所有csv文件作为列表中的数据帧
initialize a list
r.path <- setwd("mypath")
ldf <- list()
# creates the list of all the csv files in my directory - but filter for
# files with Unix in the filename for testing.
listcsv <- dir(pattern = ".UnixM.")
for (i in 1:length(listcsv)){
ldf[[i]] <- read.csv(file = listcsv[i])
}
我现在如何读入所有CSV文件,更改或匿名整个甚至部分的“System\u Name”
列,并在R循环中为我目录中的每个CSV执行此操作?不需要非常优雅-当它执行此任务时我很高兴:-)执行此操作的常见模式是:
df <- do.call(
rbind,
lapply(dir(pattern = "UnixM"),
read.csv, stringsAsFactors = FALSE)
)
df$System_Name <- anonymizer::anonymize(df$System_Name)
df使用lappy
将所需功能添加到列表中。我不知道匿名器是如何工作的,在一个假设的情况下,函数类似于匿名器(列)
:lappy(列表,函数(x)匿名器(x$System\u Name))
df <- do.call(
rbind,
lapply(dir(pattern = "UnixM"),
read.csv, stringsAsFactors = FALSE)
)
df$System_Name <- anonymizer::anonymize(df$System_Name)
listdf <- lapply(
dir(pattern = "UnixM"),
function(filename) {
df <- read.csv(filename, stringsAsFactors = FALSE)
df$System_Name <- anonymizer::anonymize(df$System_Name)
df
}
)