Stata：按ID和时间戳追加_Stata

Stata：按ID和时间戳追加

stata

Stata：按ID和时间戳追加,stata,Stata,我有两个数据集。这里有一个数据集包含杂货店/日级别的产品分类信息。该数据反映了某一商店在给定日期内可获得的所有产品另一个数据集包含在给定日期访问这些商店的个人的数据正如您在屏幕截图2中看到的，同一个人（突出显示，panid=1101758）在第1677 2周在商店234140只购买了两种产品：Michelob和Sam Adams，而我们知道，在同一天，该商店中该个人总共有4种选择，即2台额外的百威啤酒（屏幕截图1，突出显示obs。）我需要在商店/每天为每个人合并/附加这两个数据集，最终

我有两个数据集。这里有一个数据集包含杂货店/日级别的产品分类信息。该数据反映了某一商店在给定日期内可获得的所有产品

另一个数据集包含在给定日期访问这些商店的个人的数据

正如您在屏幕截图2中看到的，同一个人（突出显示，

panid=1101758

）在第1677 2周在商店

只购买了两种产品：

Michelob

和

Sam Adams

，而我们知道，在同一天，该商店中该个人总共有4种选择，即2台额外的

百威啤酒

（屏幕截图1，突出显示obs。）

我需要在商店/每天为每个人合并/附加这两个数据集，最终数据集显示一个人进行了这两次购买，此外，该商店/每天还有两次可供该个人使用。因此，该特定个人将有4个观察结果-2个已购买和2个以上可用选项。我有各种各样的商店、日子和个人

input store day brand
1 1 "Bud"
1 1 "Bud"
1 1 "Michelob"
1 1 "Sam Adams"
1 1 "Coors"
end


input hh store day brand
1 1 1 "Michelob"
1 1 1 "Sam Adams"
2 1 1 "Bud"
2 1 1 "Bud"
3 1 1 "Coors"
end

在上面的Stata代码中，您可以看到是另一个人购买了2台百威。对于该个人，也必须采取类似的行动，可以证明该个人有4种选择（Michelob、Sam Adams、百威、百威），但他们最终只选择了2种百威

以下是我希望收到的最终结果的示例：

input hh store day brand choice
1 1 1 "Michelob" 1
1 1 1 "Sam Adams" 1
1 1 1 "Bud" 0
1 1 1 "Bud" 0
1 1 1 "Coors" 0

2 1 1 "Bud" 1
2 1 1 "Bud" 1
2 1 1 "Michelob" 0
2 1 1 "Sam Adams" 0
2 1 1 "Coors" 0

3 1 1 "Coors" 1
3 1 1 "Michelob" 0
3 1 1 "Sam Adams" 0
3 1 1 "Bud" 0
3 1 1 "Bud" 0

这里有一个方法。它包括为门店和日期内的重复产品创建一个指标，使用joinby按门店和日期在hh和产品之间创建所有可能的组合，最后合并以获得选择变量

// Import hh data
clear
input hh store day str9 brand
1 1 1 "Michelob"
1 1 1 "Sam Adams"
2 1 1 "Bud"
2 1 1 "Bud"
3 1 1 "Coors"
end

// Create number of duplicate products for merging
bysort store day brand: gen n_brand = _n
gen choice = 1

tempfile hh hh_join
save `hh'

// Create dataset for use with joinby to create all possible combinations
// of hh and products per day/store
drop brand n_brand choice
duplicates drop
save `hh_join'

// Import store data
clear
input store day str9 brand
1 1 "Bud"
1 1 "Bud"
1 1 "Michelob"
1 1 "Sam Adams"
1 1 "Coors"
end

// Create number of duplicate products for merging
bysort store day brand: gen n_brand = _n

// Create all possible combinations of hh and products per day/store
joinby store day using `hh_join'
order hh store day brand n_brand
sort hh store day brand n_brand

// Merge with hh data to get choice variable
merge 1:1 hh store day brand n_brand using `hh'
drop _merge

// Replace choice with 0 if missing
replace choice = 0 if missing(choice)

list, noobs sepby(hh)

结果是：

. list, noobs sepby(hh)

  +-------------------------------------------------+
  | hh   store   day       brand   n_brand   choice |
  |-------------------------------------------------|
  |  1       1     1         Bud         1        0 |
  |  1       1     1         Bud         2        0 |
  |  1       1     1       Coors         1        0 |
  |  1       1     1    Michelob         1        1 |
  |  1       1     1   Sam Adams         1        1 |
  |-------------------------------------------------|
  |  2       1     1         Bud         1        1 |
  |  2       1     1         Bud         2        1 |
  |  2       1     1       Coors         1        0 |
  |  2       1     1    Michelob         1        0 |
  |  2       1     1   Sam Adams         1        0 |
  |-------------------------------------------------|
  |  3       1     1         Bud         1        0 |
  |  3       1     1         Bud         2        0 |
  |  3       1     1       Coors         1        1 |
  |  3       1     1    Michelob         1        0 |
  |  3       1     1   Sam Adams         1        0 |
  +-------------------------------------------------+

感谢您提供的示例数据。您还可以展示一个所需结果的示例吗？您需要在每个数据集中创建一个

choice

变量，然后

append

。谢谢，我明白您的意思-此命令可能不起作用，因为appending不会为每个人创建完整的选项集。我正在更新我的示例以说明这一点。

fillin

可能会有所帮助。fillin-对于这个示例，fillin将用于hh/store/day级别？您好！非常感谢您的输入。亲爱的Wouter-不幸的是，这段代码对真实数据不起作用。最后一个合并步骤不起作用。我跟踪了代码的每一个步骤，发现“每天创建hh和产品/商店的所有可能组合”中的步骤joinby会导致在真实数据集中创建许多重复项。因此，合并1:1会导致错误。这很奇怪，因为玩具数据集似乎是真实数据的精确表示。因此问题似乎在于

joinby

。使用joinby的

数据集时，您的应该只包含三个变量：store、day和panid，是这样吗？否则，在看不到数据的情况下，很难判断出问题出在哪里。您可以使用dataex
发布真实数据的示例，这样会更容易。