Stata 在另一个变量中使用带有多个条件的vlookup填充新变量

Stata 在另一个变量中使用带有多个条件的vlookup填充新变量,stata,lookup,Stata,Lookup,1) 应为变量sku中列出的每个唯一观测值创建一个新变量,该变量包含重复值 2) 只要观察值的sku值与变量本身在同一子类别(subc)中,这些新创建的变量应在门店/周级别分配自己产品的price值。例如,在eta2,3中,第3、4和5行中的观测值具有相同的值,因为它们都属于与sku#3相同的子类别。[eta2,3表示sku 3,subc 2。] 3) x表示这是当前正在复制的产品/子类别的原始值 4) 如果观察不属于同一子类别,则应反映“0” 橙色是给定的数据。绿色表示步骤1、2和3中的值。白

1) 应为变量
sku
中列出的每个唯一观测值创建一个新变量,该变量包含重复值

2) 只要观察值的sku值与变量本身在同一子类别(
subc
)中,这些新创建的变量应在门店/周级别分配自己产品的
price
值。例如,在
eta2,3中,
第3、4和5行中的观测值具有相同的值,因为它们都属于与sku#3相同的子类别。[
eta2,3
表示sku 3,subc 2。]

3)
x
表示这是当前正在复制的产品/子类别的原始值

4) 如果观察不属于同一子类别,则应反映“0”

橙色是给定的数据。绿色表示步骤1、2和3中的值。白细胞是第4步

我无法提供我自己的解决方案,因为我正在寻找解决方案 使用现有观测值生成变量的方法并没有给我结果

我还知道它必须是
forvalues
foreach
levelsof
命令的组合

clear
input units price   sku week    store   subc
3   4.3 1   1   1   1
2   3   2   1   1   1
1   2.5 3   1   1   2
4   12  5   1   1   2
5   12  6   1   1   3
35  4.3 1   1   2   1
23  3   2   1   2   1
12  2.5 3   1   2   2
35  12  5   1   2   2
35  12  6   1   2   3   
3   20  1   2   1   1
2   30  2   2   1   1
4   40  3   2   2   2
1   50  4   2   2   2
9   10  5   2   2   2
2   90  6   2   2   3
end
更新 根据Nick Cox的反馈,这是给出我一直在寻找的结果的最终代码:

clear
input units price   sku week    store   subc
35  4.3 1   1   1   1
23  3   2   1   1   1
12  2.5 3   1   1   2
10  1   4   1   1   2
35  12  5   1   1   2
35  12  6   1   1   3
35  5.3 1   2   1   1
23  4   2   2   1   1
12  3.5 3   2   1   2
10  2   4   2   1   2
35  13  5   2   1   2
35  13  6   2   1   3
end

egen joint = group(subc sku), label 

bysort store week : gen freq = _N
su freq, meanonly 
local jmax = r(max) 
drop freq

tostring subc sku, replace
gen new = subc + "_"+sku 


su joint, meanonly 
forval j = 1/`r(max)'{     
 local J = new[`j'] 
    gen eta`J' = . 
} 

sort  subc week store sku 
egen joint1 = group(subc week store), label 

gen long id = _n 
su joint1, meanonly  

quietly forval i = 1/`r(max)' { 
   su id if joint1 == `i', meanonly
   local jmin = r(min) 
   local jmax = r(max) 

   forval j = `jmin'/`jmax' {  
   local subc = subc[`j'] 
   local sku = sku[`j'] 
   replace eta`subc'_`sku' = price[`j'] in `jmin'/`jmax' 
   replace eta`subc'_`sku' = 0 in `j'/`j'  
   }
}    

我代表您担心,在任何大小的数据集中,您所要求的将意味着许多额外的变量。我代表你想知道,无论你想用它们做什么,你是否都需要它们

除此之外,这似乎是你想要的。当然,电子表格视图中的列标题不是合法的变量名。披露:尽管我是
levelsof
的原始作者,但我不喜欢在这里使用它

clear
input units price   sku week    store   subc
35  4.3 1   1   1   1
23  3   2   1   1   1
12  2.5 3   1   1   2
10  1   4   1   1   2
35  12  5   1   1   2
35  12  6   1   1   3
end

sort subc sku 
* subc identifiers guaranteed to be integers 1 up 
egen subc_id = group(subc), label 

* observation numbers in a variable  
gen long id = _n 

* how many subc? loop over the range 
su subc_id, meanonly 
forval i = 1/`r(max)' { 

   * which subc is this one? look it up using -summarize-
   * assuming that subc is numeric!    
   su subc if subc_id == `i', meanonly  
   local I = r(min) 

   * which observation numbers for this subc? 
   * given the prior sort, they are all contiguous 
   su id if subc_id == `i', meanonly 

   * for each observation in the subc, find out the sku and copy its price 
   * to all observations in that subc  
   forval j = `r(min)'/`r(max)' { 
       local J = sku[`j'] 
       gen eta_`I'_`J' = cond(subc_id == `i', price[`j'], 0) 
   }
}    

list subc eta*, sepby(subc)

     +------------------------------------------------------------------+
     | subc   eta_1_1   eta_1_2   eta_2_3   eta_2_4   eta_2_5   eta_3_6 |
     |------------------------------------------------------------------|
  1. |    1       4.3         3         0         0         0         0 |
  2. |    1       4.3         3         0         0         0         0 |
     |------------------------------------------------------------------|
  3. |    2         0         0       2.5         1        12         0 |
  4. |    2         0         0       2.5         1        12         0 |
  5. |    2         0         0       2.5         1        12         0 |
     |------------------------------------------------------------------|
  6. |    3         0         0         0         0         0        12 |
     +------------------------------------------------------------------+
注:

N1。在您的示例中,
subc
编号为1、2等。我的额外变量
subc\u id
确保即使在实际数据中标识符不是很干净,也为true

N2。表情

cond(subc_id == `i', price[`j'], 0)
也可能是

(subc_id == `i') * price[`j'] 
N3。不同的数据结构似乎更有效率

编辑:这是另一个数据结构的代码和结果

clear
input units price   sku week    store   subc
35  4.3 1   1   1   1
23  3   2   1   1   1
12  2.5 3   1   1   2
10  1   4   1   1   2
35  12  5   1   1   2
35  12  6   1   1   3
end

sort subc sku 
egen subc_id = group(subc), label 

bysort subc : gen freq = _N
su freq, meanonly 
local jmax = r(max) 
drop freq

forval j = 1/`jmax' { 
    gen eta`j' = . 
    gen which`j' = . 
} 

gen long id = _n 
su subc_id, meanonly  

quietly forval i = 1/`r(max)' { 
   su id if subc_id == `i', meanonly
   local jmin = r(min) 
   local jmax = r(max) 

   local k = 1 
   forval j = `jmin'/`jmax' { 
       replace which`k' = sku[`j'] in `jmin'/`jmax' 
       replace eta`k' = price[`j'] in `jmin'/`jmax' 
       local ++k 
   }
}    

   list subc sku *1 *2 *3 , sepby(subc)

     +------------------------------------------------------------+
     | subc   sku   eta1   which1   eta2   which2   eta3   which3 |
     |------------------------------------------------------------|
  1. |    1     1    4.3        1      3        2      .        . |
  2. |    1     2    4.3        1      3        2      .        . |
     |------------------------------------------------------------|
  3. |    2     3    2.5        3      1        4     12        5 |
  4. |    2     4    2.5        3      1        4     12        5 |
  5. |    2     5    2.5        3      1        4     12        5 |
     |------------------------------------------------------------|
  6. |    3     6     12        6      .        .      .        . |
     +------------------------------------------------------------+

我代表您担心,在任何大小的数据集中,您所要求的将意味着许多额外的变量。我代表你想知道,无论你想用它们做什么,你是否都需要它们

除此之外,这似乎是你想要的。当然,电子表格视图中的列标题不是合法的变量名。披露:尽管我是
levelsof
的原始作者,但我不喜欢在这里使用它

clear
input units price   sku week    store   subc
35  4.3 1   1   1   1
23  3   2   1   1   1
12  2.5 3   1   1   2
10  1   4   1   1   2
35  12  5   1   1   2
35  12  6   1   1   3
end

sort subc sku 
* subc identifiers guaranteed to be integers 1 up 
egen subc_id = group(subc), label 

* observation numbers in a variable  
gen long id = _n 

* how many subc? loop over the range 
su subc_id, meanonly 
forval i = 1/`r(max)' { 

   * which subc is this one? look it up using -summarize-
   * assuming that subc is numeric!    
   su subc if subc_id == `i', meanonly  
   local I = r(min) 

   * which observation numbers for this subc? 
   * given the prior sort, they are all contiguous 
   su id if subc_id == `i', meanonly 

   * for each observation in the subc, find out the sku and copy its price 
   * to all observations in that subc  
   forval j = `r(min)'/`r(max)' { 
       local J = sku[`j'] 
       gen eta_`I'_`J' = cond(subc_id == `i', price[`j'], 0) 
   }
}    

list subc eta*, sepby(subc)

     +------------------------------------------------------------------+
     | subc   eta_1_1   eta_1_2   eta_2_3   eta_2_4   eta_2_5   eta_3_6 |
     |------------------------------------------------------------------|
  1. |    1       4.3         3         0         0         0         0 |
  2. |    1       4.3         3         0         0         0         0 |
     |------------------------------------------------------------------|
  3. |    2         0         0       2.5         1        12         0 |
  4. |    2         0         0       2.5         1        12         0 |
  5. |    2         0         0       2.5         1        12         0 |
     |------------------------------------------------------------------|
  6. |    3         0         0         0         0         0        12 |
     +------------------------------------------------------------------+
注:

N1。在您的示例中,
subc
编号为1、2等。我的额外变量
subc\u id
确保即使在实际数据中标识符不是很干净,也为true

N2。表情

cond(subc_id == `i', price[`j'], 0)
也可能是

(subc_id == `i') * price[`j'] 
N3。不同的数据结构似乎更有效率

编辑:这是另一个数据结构的代码和结果

clear
input units price   sku week    store   subc
35  4.3 1   1   1   1
23  3   2   1   1   1
12  2.5 3   1   1   2
10  1   4   1   1   2
35  12  5   1   1   2
35  12  6   1   1   3
end

sort subc sku 
egen subc_id = group(subc), label 

bysort subc : gen freq = _N
su freq, meanonly 
local jmax = r(max) 
drop freq

forval j = 1/`jmax' { 
    gen eta`j' = . 
    gen which`j' = . 
} 

gen long id = _n 
su subc_id, meanonly  

quietly forval i = 1/`r(max)' { 
   su id if subc_id == `i', meanonly
   local jmin = r(min) 
   local jmax = r(max) 

   local k = 1 
   forval j = `jmin'/`jmax' { 
       replace which`k' = sku[`j'] in `jmin'/`jmax' 
       replace eta`k' = price[`j'] in `jmin'/`jmax' 
       local ++k 
   }
}    

   list subc sku *1 *2 *3 , sepby(subc)

     +------------------------------------------------------------+
     | subc   sku   eta1   which1   eta2   which2   eta3   which3 |
     |------------------------------------------------------------|
  1. |    1     1    4.3        1      3        2      .        . |
  2. |    1     2    4.3        1      3        2      .        . |
     |------------------------------------------------------------|
  3. |    2     3    2.5        3      1        4     12        5 |
  4. |    2     4    2.5        3      1        4     12        5 |
  5. |    2     5    2.5        3      1        4     12        5 |
     |------------------------------------------------------------|
  6. |    3     6     12        6      .        .      .        . |
     +------------------------------------------------------------+

我正在添加另一个答案,它处理
subc
week
的组合。前面的讨论确定,您试图做的是为每个观察添加一个额外的变量。这不是个好主意!充其量,您可能只有许多新变量,大部分是零。最坏的情况是,你会遇到斯塔塔的极限

因此,我将不支持您沿着同一条道路走得更远,而是展示如何生成我在上一个答案中讨论的第二个数据结构。事实上,您没有指出(a)为什么需要所有这些变量,它们只是现有数据的重新分配;(b) 你的应对策略是什么;(c) 为什么
rangestat
(SSC)或其他一些程序无法从一开始就消除创建它们的需要

clear
input units price   sku week    store   subc
35  4.3 1   1   1   1
23  3   2   1   1   1
12  2.5 3   1   1   2
10  1   4   1   1   2
35  12  5   1   1   2
35  12  6   1   1   3
35  5.3 1   2   1   1
23  4   2   2   1   1
12  3.5 3   2   1   2
10  2   4   2   1   2
35  13  5   2   1   2
35  13  6   2   1   3
end

sort subc week sku 
egen joint = group(subc week), label 

bysort joint : gen freq = _N
su freq, meanonly 
local jmax = r(max) 
drop freq

forval j = 1/`jmax' { 
    gen eta`j' = . 
    gen which`j' = . 
} 

gen long id = _n 
su joint, meanonly  

quietly forval i = 1/`r(max)' { 
   su id if joint == `i', meanonly
   local jmin = r(min) 
   local jmax = r(max) 

   local k = 1 
   forval j = `jmin'/`jmax' { 
       replace which`k' = sku[`j'] in `jmin'/`jmax' 
       replace eta`k' = price[`j'] in `jmin'/`jmax' 
       local ++k 
   }
}    

list subc week sku *1 *2 *3 , sepby(subc week)

     +-------------------------------------------------------------------+
     | subc   week   sku   eta1   which1   eta2   which2   eta3   which3 |
     |-------------------------------------------------------------------|
  1. |    1      1     1    4.3        1      3        2      .        . |
  2. |    1      1     2    4.3        1      3        2      .        . |
     |-------------------------------------------------------------------|
  3. |    1      2     1    5.3        1      4        2      .        . |
  4. |    1      2     2    5.3        1      4        2      .        . |
     |-------------------------------------------------------------------|
  5. |    2      1     3    2.5        3      1        4     12        5 |
  6. |    2      1     4    2.5        3      1        4     12        5 |
  7. |    2      1     5    2.5        3      1        4     12        5 |
     |-------------------------------------------------------------------|
  8. |    2      2     3    3.5        3      2        4     13        5 |
  9. |    2      2     4    3.5        3      2        4     13        5 |
 10. |    2      2     5    3.5        3      2        4     13        5 |
     |-------------------------------------------------------------------|
 11. |    3      1     6     12        6      .        .      .        . |
     |-------------------------------------------------------------------|
 12. |    3      2     6     13        6      .        .      .        . |
     +-------------------------------------------------------------------+

我正在添加另一个答案,它处理
subc
week
的组合。前面的讨论确定,您试图做的是为每个观察添加一个额外的变量。这不是个好主意!充其量,您可能只有许多新变量,大部分是零。最坏的情况是,你会遇到斯塔塔的极限

因此,我将不支持您沿着同一条道路走得更远,而是展示如何生成我在上一个答案中讨论的第二个数据结构。事实上,您没有指出(a)为什么需要所有这些变量,它们只是现有数据的重新分配;(b) 你的应对策略是什么;(c) 为什么
rangestat
(SSC)或其他一些程序无法从一开始就消除创建它们的需要

clear
input units price   sku week    store   subc
35  4.3 1   1   1   1
23  3   2   1   1   1
12  2.5 3   1   1   2
10  1   4   1   1   2
35  12  5   1   1   2
35  12  6   1   1   3
35  5.3 1   2   1   1
23  4   2   2   1   1
12  3.5 3   2   1   2
10  2   4   2   1   2
35  13  5   2   1   2
35  13  6   2   1   3
end

sort subc week sku 
egen joint = group(subc week), label 

bysort joint : gen freq = _N
su freq, meanonly 
local jmax = r(max) 
drop freq

forval j = 1/`jmax' { 
    gen eta`j' = . 
    gen which`j' = . 
} 

gen long id = _n 
su joint, meanonly  

quietly forval i = 1/`r(max)' { 
   su id if joint == `i', meanonly
   local jmin = r(min) 
   local jmax = r(max) 

   local k = 1 
   forval j = `jmin'/`jmax' { 
       replace which`k' = sku[`j'] in `jmin'/`jmax' 
       replace eta`k' = price[`j'] in `jmin'/`jmax' 
       local ++k 
   }
}    

list subc week sku *1 *2 *3 , sepby(subc week)

     +-------------------------------------------------------------------+
     | subc   week   sku   eta1   which1   eta2   which2   eta3   which3 |
     |-------------------------------------------------------------------|
  1. |    1      1     1    4.3        1      3        2      .        . |
  2. |    1      1     2    4.3        1      3        2      .        . |
     |-------------------------------------------------------------------|
  3. |    1      2     1    5.3        1      4        2      .        . |
  4. |    1      2     2    5.3        1      4        2      .        . |
     |-------------------------------------------------------------------|
  5. |    2      1     3    2.5        3      1        4     12        5 |
  6. |    2      1     4    2.5        3      1        4     12        5 |
  7. |    2      1     5    2.5        3      1        4     12        5 |
     |-------------------------------------------------------------------|
  8. |    2      2     3    3.5        3      2        4     13        5 |
  9. |    2      2     4    3.5        3      2        4     13        5 |
 10. |    2      2     5    3.5        3      2        4     13        5 |
     |-------------------------------------------------------------------|
 11. |    3      1     6     12        6      .        .      .        . |
     |-------------------------------------------------------------------|
 12. |    3      2     6     13        6      .        .      .        . |
     +-------------------------------------------------------------------+

提示:
rangestat
(SSC)提供了一种工具,用于查看子组内变量的其他值,包括严格排除当前观察值的情况。搜索Statistist存档将显示数百篇文章提到它。提示:
rangestat
(SSC)提供了一种工具,用于查看子组内变量的其他值,包括当前观察被严格排除的情况。搜索Statalist档案将显示数百篇提到它的帖子。亲爱的尼克-我现在正在浏览你慷慨的代码。我有没有办法计算可能创建的变量的数量?(只是数字)对于第一个代码,它与观察的数量相同,不是吗?在您的电子表格中有1,1,2 1,3 2,4 2,5 2,6,但仅第二个下标就足以作为计数器。好的,我看到此代码仅在子类别/sku级别识别价格。我们需要确保每个子类别/sku eta的价格在“门店/周”级别填写。我应该在代码中修改什么?是的,您在文本中确实说过“存储周”:数据示例只有一周。环绕
egen关节=组