Stata 在另一个变量中使用带有多个条件的vlookup填充新变量
1) 应为变量Stata 在另一个变量中使用带有多个条件的vlookup填充新变量,stata,lookup,Stata,Lookup,1) 应为变量sku中列出的每个唯一观测值创建一个新变量,该变量包含重复值 2) 只要观察值的sku值与变量本身在同一子类别(subc)中,这些新创建的变量应在门店/周级别分配自己产品的price值。例如,在eta2,3中,第3、4和5行中的观测值具有相同的值,因为它们都属于与sku#3相同的子类别。[eta2,3表示sku 3,subc 2。] 3) x表示这是当前正在复制的产品/子类别的原始值 4) 如果观察不属于同一子类别,则应反映“0” 橙色是给定的数据。绿色表示步骤1、2和3中的值。白
sku
中列出的每个唯一观测值创建一个新变量,该变量包含重复值
2) 只要观察值的sku值与变量本身在同一子类别(subc
)中,这些新创建的变量应在门店/周级别分配自己产品的price
值。例如,在eta2,3中,
第3、4和5行中的观测值具有相同的值,因为它们都属于与sku#3相同的子类别。[eta2,3
表示sku 3,subc 2。]
3) x
表示这是当前正在复制的产品/子类别的原始值
4) 如果观察不属于同一子类别,则应反映“0”
橙色是给定的数据。绿色表示步骤1、2和3中的值。白细胞是第4步
我无法提供我自己的解决方案,因为我正在寻找解决方案
使用现有观测值生成变量的方法并没有给我结果
我还知道它必须是forvalues
、foreach
和levelsof
命令的组合
clear
input units price sku week store subc
3 4.3 1 1 1 1
2 3 2 1 1 1
1 2.5 3 1 1 2
4 12 5 1 1 2
5 12 6 1 1 3
35 4.3 1 1 2 1
23 3 2 1 2 1
12 2.5 3 1 2 2
35 12 5 1 2 2
35 12 6 1 2 3
3 20 1 2 1 1
2 30 2 2 1 1
4 40 3 2 2 2
1 50 4 2 2 2
9 10 5 2 2 2
2 90 6 2 2 3
end
更新
根据Nick Cox的反馈,这是给出我一直在寻找的结果的最终代码:
clear
input units price sku week store subc
35 4.3 1 1 1 1
23 3 2 1 1 1
12 2.5 3 1 1 2
10 1 4 1 1 2
35 12 5 1 1 2
35 12 6 1 1 3
35 5.3 1 2 1 1
23 4 2 2 1 1
12 3.5 3 2 1 2
10 2 4 2 1 2
35 13 5 2 1 2
35 13 6 2 1 3
end
egen joint = group(subc sku), label
bysort store week : gen freq = _N
su freq, meanonly
local jmax = r(max)
drop freq
tostring subc sku, replace
gen new = subc + "_"+sku
su joint, meanonly
forval j = 1/`r(max)'{
local J = new[`j']
gen eta`J' = .
}
sort subc week store sku
egen joint1 = group(subc week store), label
gen long id = _n
su joint1, meanonly
quietly forval i = 1/`r(max)' {
su id if joint1 == `i', meanonly
local jmin = r(min)
local jmax = r(max)
forval j = `jmin'/`jmax' {
local subc = subc[`j']
local sku = sku[`j']
replace eta`subc'_`sku' = price[`j'] in `jmin'/`jmax'
replace eta`subc'_`sku' = 0 in `j'/`j'
}
}
我代表您担心,在任何大小的数据集中,您所要求的将意味着许多额外的变量。我代表你想知道,无论你想用它们做什么,你是否都需要它们 除此之外,这似乎是你想要的。当然,电子表格视图中的列标题不是合法的变量名。披露:尽管我是
levelsof
的原始作者,但我不喜欢在这里使用它
clear
input units price sku week store subc
35 4.3 1 1 1 1
23 3 2 1 1 1
12 2.5 3 1 1 2
10 1 4 1 1 2
35 12 5 1 1 2
35 12 6 1 1 3
end
sort subc sku
* subc identifiers guaranteed to be integers 1 up
egen subc_id = group(subc), label
* observation numbers in a variable
gen long id = _n
* how many subc? loop over the range
su subc_id, meanonly
forval i = 1/`r(max)' {
* which subc is this one? look it up using -summarize-
* assuming that subc is numeric!
su subc if subc_id == `i', meanonly
local I = r(min)
* which observation numbers for this subc?
* given the prior sort, they are all contiguous
su id if subc_id == `i', meanonly
* for each observation in the subc, find out the sku and copy its price
* to all observations in that subc
forval j = `r(min)'/`r(max)' {
local J = sku[`j']
gen eta_`I'_`J' = cond(subc_id == `i', price[`j'], 0)
}
}
list subc eta*, sepby(subc)
+------------------------------------------------------------------+
| subc eta_1_1 eta_1_2 eta_2_3 eta_2_4 eta_2_5 eta_3_6 |
|------------------------------------------------------------------|
1. | 1 4.3 3 0 0 0 0 |
2. | 1 4.3 3 0 0 0 0 |
|------------------------------------------------------------------|
3. | 2 0 0 2.5 1 12 0 |
4. | 2 0 0 2.5 1 12 0 |
5. | 2 0 0 2.5 1 12 0 |
|------------------------------------------------------------------|
6. | 3 0 0 0 0 0 12 |
+------------------------------------------------------------------+
注:
N1。在您的示例中,subc
编号为1、2等。我的额外变量subc\u id
确保即使在实际数据中标识符不是很干净,也为true
N2。表情
cond(subc_id == `i', price[`j'], 0)
也可能是
(subc_id == `i') * price[`j']
N3。不同的数据结构似乎更有效率
编辑:这是另一个数据结构的代码和结果
clear
input units price sku week store subc
35 4.3 1 1 1 1
23 3 2 1 1 1
12 2.5 3 1 1 2
10 1 4 1 1 2
35 12 5 1 1 2
35 12 6 1 1 3
end
sort subc sku
egen subc_id = group(subc), label
bysort subc : gen freq = _N
su freq, meanonly
local jmax = r(max)
drop freq
forval j = 1/`jmax' {
gen eta`j' = .
gen which`j' = .
}
gen long id = _n
su subc_id, meanonly
quietly forval i = 1/`r(max)' {
su id if subc_id == `i', meanonly
local jmin = r(min)
local jmax = r(max)
local k = 1
forval j = `jmin'/`jmax' {
replace which`k' = sku[`j'] in `jmin'/`jmax'
replace eta`k' = price[`j'] in `jmin'/`jmax'
local ++k
}
}
list subc sku *1 *2 *3 , sepby(subc)
+------------------------------------------------------------+
| subc sku eta1 which1 eta2 which2 eta3 which3 |
|------------------------------------------------------------|
1. | 1 1 4.3 1 3 2 . . |
2. | 1 2 4.3 1 3 2 . . |
|------------------------------------------------------------|
3. | 2 3 2.5 3 1 4 12 5 |
4. | 2 4 2.5 3 1 4 12 5 |
5. | 2 5 2.5 3 1 4 12 5 |
|------------------------------------------------------------|
6. | 3 6 12 6 . . . . |
+------------------------------------------------------------+
我代表您担心,在任何大小的数据集中,您所要求的将意味着许多额外的变量。我代表你想知道,无论你想用它们做什么,你是否都需要它们 除此之外,这似乎是你想要的。当然,电子表格视图中的列标题不是合法的变量名。披露:尽管我是
levelsof
的原始作者,但我不喜欢在这里使用它
clear
input units price sku week store subc
35 4.3 1 1 1 1
23 3 2 1 1 1
12 2.5 3 1 1 2
10 1 4 1 1 2
35 12 5 1 1 2
35 12 6 1 1 3
end
sort subc sku
* subc identifiers guaranteed to be integers 1 up
egen subc_id = group(subc), label
* observation numbers in a variable
gen long id = _n
* how many subc? loop over the range
su subc_id, meanonly
forval i = 1/`r(max)' {
* which subc is this one? look it up using -summarize-
* assuming that subc is numeric!
su subc if subc_id == `i', meanonly
local I = r(min)
* which observation numbers for this subc?
* given the prior sort, they are all contiguous
su id if subc_id == `i', meanonly
* for each observation in the subc, find out the sku and copy its price
* to all observations in that subc
forval j = `r(min)'/`r(max)' {
local J = sku[`j']
gen eta_`I'_`J' = cond(subc_id == `i', price[`j'], 0)
}
}
list subc eta*, sepby(subc)
+------------------------------------------------------------------+
| subc eta_1_1 eta_1_2 eta_2_3 eta_2_4 eta_2_5 eta_3_6 |
|------------------------------------------------------------------|
1. | 1 4.3 3 0 0 0 0 |
2. | 1 4.3 3 0 0 0 0 |
|------------------------------------------------------------------|
3. | 2 0 0 2.5 1 12 0 |
4. | 2 0 0 2.5 1 12 0 |
5. | 2 0 0 2.5 1 12 0 |
|------------------------------------------------------------------|
6. | 3 0 0 0 0 0 12 |
+------------------------------------------------------------------+
注:
N1。在您的示例中,subc
编号为1、2等。我的额外变量subc\u id
确保即使在实际数据中标识符不是很干净,也为true
N2。表情
cond(subc_id == `i', price[`j'], 0)
也可能是
(subc_id == `i') * price[`j']
N3。不同的数据结构似乎更有效率
编辑:这是另一个数据结构的代码和结果
clear
input units price sku week store subc
35 4.3 1 1 1 1
23 3 2 1 1 1
12 2.5 3 1 1 2
10 1 4 1 1 2
35 12 5 1 1 2
35 12 6 1 1 3
end
sort subc sku
egen subc_id = group(subc), label
bysort subc : gen freq = _N
su freq, meanonly
local jmax = r(max)
drop freq
forval j = 1/`jmax' {
gen eta`j' = .
gen which`j' = .
}
gen long id = _n
su subc_id, meanonly
quietly forval i = 1/`r(max)' {
su id if subc_id == `i', meanonly
local jmin = r(min)
local jmax = r(max)
local k = 1
forval j = `jmin'/`jmax' {
replace which`k' = sku[`j'] in `jmin'/`jmax'
replace eta`k' = price[`j'] in `jmin'/`jmax'
local ++k
}
}
list subc sku *1 *2 *3 , sepby(subc)
+------------------------------------------------------------+
| subc sku eta1 which1 eta2 which2 eta3 which3 |
|------------------------------------------------------------|
1. | 1 1 4.3 1 3 2 . . |
2. | 1 2 4.3 1 3 2 . . |
|------------------------------------------------------------|
3. | 2 3 2.5 3 1 4 12 5 |
4. | 2 4 2.5 3 1 4 12 5 |
5. | 2 5 2.5 3 1 4 12 5 |
|------------------------------------------------------------|
6. | 3 6 12 6 . . . . |
+------------------------------------------------------------+
我正在添加另一个答案,它处理
subc
和week
的组合。前面的讨论确定,您试图做的是为每个观察添加一个额外的变量。这不是个好主意!充其量,您可能只有许多新变量,大部分是零。最坏的情况是,你会遇到斯塔塔的极限
因此,我将不支持您沿着同一条道路走得更远,而是展示如何生成我在上一个答案中讨论的第二个数据结构。事实上,您没有指出(a)为什么需要所有这些变量,它们只是现有数据的重新分配;(b) 你的应对策略是什么;(c) 为什么rangestat
(SSC)或其他一些程序无法从一开始就消除创建它们的需要
clear
input units price sku week store subc
35 4.3 1 1 1 1
23 3 2 1 1 1
12 2.5 3 1 1 2
10 1 4 1 1 2
35 12 5 1 1 2
35 12 6 1 1 3
35 5.3 1 2 1 1
23 4 2 2 1 1
12 3.5 3 2 1 2
10 2 4 2 1 2
35 13 5 2 1 2
35 13 6 2 1 3
end
sort subc week sku
egen joint = group(subc week), label
bysort joint : gen freq = _N
su freq, meanonly
local jmax = r(max)
drop freq
forval j = 1/`jmax' {
gen eta`j' = .
gen which`j' = .
}
gen long id = _n
su joint, meanonly
quietly forval i = 1/`r(max)' {
su id if joint == `i', meanonly
local jmin = r(min)
local jmax = r(max)
local k = 1
forval j = `jmin'/`jmax' {
replace which`k' = sku[`j'] in `jmin'/`jmax'
replace eta`k' = price[`j'] in `jmin'/`jmax'
local ++k
}
}
list subc week sku *1 *2 *3 , sepby(subc week)
+-------------------------------------------------------------------+
| subc week sku eta1 which1 eta2 which2 eta3 which3 |
|-------------------------------------------------------------------|
1. | 1 1 1 4.3 1 3 2 . . |
2. | 1 1 2 4.3 1 3 2 . . |
|-------------------------------------------------------------------|
3. | 1 2 1 5.3 1 4 2 . . |
4. | 1 2 2 5.3 1 4 2 . . |
|-------------------------------------------------------------------|
5. | 2 1 3 2.5 3 1 4 12 5 |
6. | 2 1 4 2.5 3 1 4 12 5 |
7. | 2 1 5 2.5 3 1 4 12 5 |
|-------------------------------------------------------------------|
8. | 2 2 3 3.5 3 2 4 13 5 |
9. | 2 2 4 3.5 3 2 4 13 5 |
10. | 2 2 5 3.5 3 2 4 13 5 |
|-------------------------------------------------------------------|
11. | 3 1 6 12 6 . . . . |
|-------------------------------------------------------------------|
12. | 3 2 6 13 6 . . . . |
+-------------------------------------------------------------------+
我正在添加另一个答案,它处理
subc
和week
的组合。前面的讨论确定,您试图做的是为每个观察添加一个额外的变量。这不是个好主意!充其量,您可能只有许多新变量,大部分是零。最坏的情况是,你会遇到斯塔塔的极限
因此,我将不支持您沿着同一条道路走得更远,而是展示如何生成我在上一个答案中讨论的第二个数据结构。事实上,您没有指出(a)为什么需要所有这些变量,它们只是现有数据的重新分配;(b) 你的应对策略是什么;(c) 为什么rangestat
(SSC)或其他一些程序无法从一开始就消除创建它们的需要
clear
input units price sku week store subc
35 4.3 1 1 1 1
23 3 2 1 1 1
12 2.5 3 1 1 2
10 1 4 1 1 2
35 12 5 1 1 2
35 12 6 1 1 3
35 5.3 1 2 1 1
23 4 2 2 1 1
12 3.5 3 2 1 2
10 2 4 2 1 2
35 13 5 2 1 2
35 13 6 2 1 3
end
sort subc week sku
egen joint = group(subc week), label
bysort joint : gen freq = _N
su freq, meanonly
local jmax = r(max)
drop freq
forval j = 1/`jmax' {
gen eta`j' = .
gen which`j' = .
}
gen long id = _n
su joint, meanonly
quietly forval i = 1/`r(max)' {
su id if joint == `i', meanonly
local jmin = r(min)
local jmax = r(max)
local k = 1
forval j = `jmin'/`jmax' {
replace which`k' = sku[`j'] in `jmin'/`jmax'
replace eta`k' = price[`j'] in `jmin'/`jmax'
local ++k
}
}
list subc week sku *1 *2 *3 , sepby(subc week)
+-------------------------------------------------------------------+
| subc week sku eta1 which1 eta2 which2 eta3 which3 |
|-------------------------------------------------------------------|
1. | 1 1 1 4.3 1 3 2 . . |
2. | 1 1 2 4.3 1 3 2 . . |
|-------------------------------------------------------------------|
3. | 1 2 1 5.3 1 4 2 . . |
4. | 1 2 2 5.3 1 4 2 . . |
|-------------------------------------------------------------------|
5. | 2 1 3 2.5 3 1 4 12 5 |
6. | 2 1 4 2.5 3 1 4 12 5 |
7. | 2 1 5 2.5 3 1 4 12 5 |
|-------------------------------------------------------------------|
8. | 2 2 3 3.5 3 2 4 13 5 |
9. | 2 2 4 3.5 3 2 4 13 5 |
10. | 2 2 5 3.5 3 2 4 13 5 |
|-------------------------------------------------------------------|
11. | 3 1 6 12 6 . . . . |
|-------------------------------------------------------------------|
12. | 3 2 6 13 6 . . . . |
+-------------------------------------------------------------------+
提示:
rangestat
(SSC)提供了一种工具,用于查看子组内变量的其他值,包括严格排除当前观察值的情况。搜索Statistist存档将显示数百篇文章提到它。提示:rangestat
(SSC)提供了一种工具,用于查看子组内变量的其他值,包括当前观察被严格排除的情况。搜索Statalist档案将显示数百篇提到它的帖子。亲爱的尼克-我现在正在浏览你慷慨的代码。我有没有办法计算可能创建的变量的数量?(只是数字)对于第一个代码,它与观察的数量相同,不是吗?在您的电子表格中有1,1,2 1,3 2,4 2,5 2,6,但仅第二个下标就足以作为计数器。好的,我看到此代码仅在子类别/sku级别识别价格。我们需要确保每个子类别/sku eta的价格在“门店/周”级别填写。我应该在代码中修改什么?是的,您在文本中确实说过“存储周”:数据示例只有一周。环绕egen关节=组