Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/cplusplus/134.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R:创建&;分配重复记录_R_Duplicates_Uniqueidentifier - Fatal编程技术网

R:创建&;分配重复记录

R:创建&;分配重复记录,r,duplicates,uniqueidentifier,R,Duplicates,Uniqueidentifier,我有一系列的媒体来源,我必须给它们指定县名。对于只有一个县分配的某些源(例如,本地报纸),这相当简单——我基于开关创建了一个县名称变量,该开关基于源名称分配县名称。样本: switchfun <- function(x) {switch(x, 'Morning Call' = 'Lehigh', 'Inquirer' = 'Philadelphia', 'Daily Ledger' = 'Mercer', 'Null') } County.Name <- as.chara

我有一系列的媒体来源,我必须给它们指定县名。对于只有一个县分配的某些源(例如,本地报纸),这相当简单——我基于
开关创建了一个县名称变量,该开关基于源名称分配县名称。样本:

switchfun <- function(x) {switch(x, 'Morning Call' = 'Lehigh', 'Inquirer' =     
'Philadelphia', 'Daily Ledger' = 'Mercer', 'Null') }

County.Name <- as.character(lapply(Source, switchfun))
在当前文件中,NPR、美联社和雅虎新闻没有关联县(“NA”)

所需文件布局的dput

structure(list(Source = structure(c(5L, 2L, 4L, 3L, 7L, 1L, 6L
), .Label = c("Associated Press", "Daily Ledger", "Herald Tribune", 
"Inquirer", "Morning Call", "NPR", "Yahoo News"), class = "factor"), 
County = structure(c(1L, 2L, 4L, 3L, NA, NA, NA), .Label = c("Lehigh", 
"Mercer", "Montgomery", "Philadelphia"), class = "factor"), 
Score = c(3L, 10L, 4L, 8L, 1L, 3L, 6L)), .Names = c("Source", 
"County", "Score"), class = "data.frame", row.names = c(NA, -7L
))
structure(list(Source = structure(c(5L, 2L, 4L, 3L, 7L, 7L, 7L, 
7L, 1L, 1L, 1L, 1L, 6L, 6L, 6L, 6L), .Label = c("Associated Press", 
"Daily Ledger", "Herald Tribune", "Inquirer", "Morning Call", 
"NPR", "Yahoo News"), class = "factor"), County = structure(c(1L, 
2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L), .Label = c("Lehigh", 
"Mercer", "Montgomery", "Philadelphia"), class = "factor"), Score = c(3L, 
10L, 4L, 8L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 6L, 6L, 6L, 6L)), .Names = c("Source", 
"County", "Score"), class = "data.frame", row.names = c(NA, -16L
))

在所需的布局中,我将每个国家数据源及其得分分配给数据集中的四个县。e、 雅虎新闻(Yahoo News)&它的得分为1,与利海、费城、蒙哥马利和默瑟县相关联的得分重复了4次。而雅虎新闻拥有“NA”县的记录消失。在我的实际数据集中,我有大约100个县,因此Yahoo News及其相关变量(例如分数、日期、作者等)——我总共有大约60个变量)将被复制100次。我还希望将这些新“复制”记录的县分配到country.Name变量中,该变量是我使用上面的
switch
函数创建的。我不需要2个县名称字段,我需要所有这些新创建的县都在County.Names下。

如果我理解正确,这可能是一种可能性:

# a (minimal) data frame with all unique source-county combinations
src_cnt <- data.frame(source = c("Morning Call", "AP", "AP", "AP"), county = c("Lehigh", "Lehigh", "Mercer", "Phila"))

# a data frame with a unique score for each source
src_score <- data.frame(source = c("Morning Call", "AP"), score = c(10, 3))

merge(src_cnt, src_score)
#具有所有唯一源县组合的(最小)数据帧

src_cnt如果您能为我们提供一些示例数据并显示所需结果,那就太好了。我想您可能正在寻找
合并
,但如果没有更好的数据表示形式,很难说。抱歉,时间太晚了&我很累。更新了w/更多解释&
dput
读数的再现性。我实际上有一个唯一的ID,也需要说明,所以我修改为:
src\u cnt
# Assuming your current data is named dd
# select the national sources, i.e. the sources where County is missing
src_national <- dd$Source[is.na(dd$County)])

# select unique counties
counties <- unique(dd$County[!is.na(dd$County)])

# create all combinations of national sources and counties
src_cnt <- expand.grid(Source = src_national, County = counties)

# add score from current data to national sources
src_cnt2 <- merge(src_cnt, dd[is.na(dd$County), c("Source", "Score")], by = "Source")

# add national sources to local sources in dd
dd2 <- rbind(dd[!is.na(dd$County), ], src_cnt2)

# order by Sourcy and County
# assuming desired data is named `desired`
library(plyr)
desired2 <- arrange(df = desired, Source, County) 
dd2 <- arrange(df = dd2, Source, County)
all.equal(desired2, dd2)