Dataframe 使用Stata将多个数据集合并为一个

Dataframe 使用Stata将多个数据集合并为一个,dataframe,stata,Dataframe,Stata,我正在尝试将多个较小数据集中的完整数据集合并到一起: cd "\\files use "\\files\Creatinine.dta" *merging with report data for baseline demographics * merge m:1 id using "Archive\Report.dta" * keeping only those tranplanted 2002-2015 * drop if tx1 <= date("01/01/2002", "DMY"

我正在尝试将多个较小数据集中的完整数据集合并到一起:

cd "\\files
use "\\files\Creatinine.dta"

*merging with report data for baseline demographics *
merge m:1 id using "Archive\Report.dta"
* keeping only those tranplanted 2002-2015 *
drop if tx1 <= date("01/01/2002", "DMY") | tx1 >= date("31/12/2015", "DMY")
drop _merge
* labelling variables *
label define org 1 "Heart" 2 "Lung" 3 "Liver" 5 "Multiple" 6 "Small Bowel" 7 
"Pancreas" 8 "Stomach" 
label values organ1 organ2 organ3 org  
label values multi1 multi2 multi3 multi4 org
label variable organ1 "First Organ"
label variable organ2 "Second Organ"
label variable organ3 "Third Organ"
label variable donor_type1 "First Donor Type"
label variable tx1 "Date of First Transplant"
label variable tx2 "Date of Second Transplant"
label variable tx3 "Date of Third Transplant"
label variable dob "Date of Birth"
label variable tx1_loc "First Transplant Location"
label variable multi1 "Multiple Organ 1"
label variable multi2 "Multiple Organ 2"
label variable multi3 "Multiple Organ 3"
label variable multi4 "Multiple Organ 4"
label variable censor_date "Censor Date"
label define loc 1 "Hospital" 
label values tx1_loc loc
label define sex1 1 "Male" 2 "Female"
label values sex sex1
label variable sex "Sex of Child"
label define donor 1 "Living" 2 "Deceased" 
label values donor_type1 donor
order dob sex tx1 tx1_loc organ1 donor_type1 multi1 multi2 multi3 multi4 organ2 tx2_date organ3 tx3_date censor_date DeathDate, after(id)


***Data Cleaning *
generate dateCollected = date(DateCollected, "DMY")**
format %tdCCYY/NN/DD dateCollected
codebook dateCollected
drop DateCollected
rename dateCollected DateCollected
order DateCollected TimeCollected, after (Test)

*dropping duplicates *
sort id DateCollected TimeCollected Result
quietly by id DateCollected TimeCollected Result: gen dup=cond(_N==1,0,_n)
drop if dup > 1
drop dup

*save *
 save "\\files\Injury.dta"
但是,它给了我一个类型不匹配错误

我认为这是由于
creatinine
文件和
报告
文件之间的日期格式造成的

请看一看并提出建议。非常感谢


数据

肌酸.dta(仅显示一个结果,每个id显示多个结果)

Report.dta(仅显示一个id)


请注意,在执行了
合并后,问题出现了

由于您试图对数值变量使用
date()
函数
生成一个新变量,因此出现
r(109)
错误。此函数需要一个字符串(变量)作为输入

我不知道您为什么要这样做,但如果您只是想创建并使用
dateCollected
进行进一步的工作,同时将
dateCollected
作为备份,您可以简单地克隆它:

clonevar dateCollected = DateCollected

编辑:

详细阐述我的评论:

. clear
. set obs 1
number of observations (_N) was 0, now 1

. generate DateCollected_String = "12/05/2007"

. generate DateCollected = date(DateCollected_String, "DMY")
. format %tdDD/NN/CCYY DateCollected

. browse

. generate dateCollected = date(DateCollected, "DMY")
type mismatch
r(109);

Datecollected变量来自creatine.dta我需要它是一个字符串以匹配报告中记录的其他日期。dta(这些已经是字符串)数据编辑器中字符串变量的值为红色。在所附图片中,
DateCollected
显然不是字符串变量。它是一个格式化的数值变量。这就是为什么会出现错误。你是否想让它成为一根弦是另一个完全不同的故事,而不是你所要求的。顺便说一句,从另一张图片上看,其他日期变量也一样,例如,
censor\u date
death date
等。它们看起来都是数字。很抱歉给您带来混淆。如何将所有日期转换为字符串而不是数字?只需键入:
generatenew\u DateCollected\u string=string(DateCollected,“%tdDD/NN/CCYY”)
id      dob         sex  tx1        tx1_loc organ1  donor_type1 multi1  multi2  multi3  multi4  organ2  tx2_date    organ3  tx3_date censor_date DeathDate
2010003 15-Apr-07   2    29-Jan-09  1       1       2                                                                                30-Jun-16  
clonevar dateCollected = DateCollected
. clear
. set obs 1
number of observations (_N) was 0, now 1

. generate DateCollected_String = "12/05/2007"

. generate DateCollected = date(DateCollected_String, "DMY")
. format %tdDD/NN/CCYY DateCollected

. browse

. generate dateCollected = date(DateCollected, "DMY")
type mismatch
r(109);