For loop Stata:提取值并将其保存为标量(以及更多)
此问题是来自的后续问题。考虑这些数据:For loop Stata:提取值并将其保存为标量(以及更多),for-loop,stata,For Loop,Stata,此问题是来自的后续问题。考虑这些数据: set seed 123456 set obs 5000 g firmid = "firm" + string(_n) /* Observation (firm) id */ g nw = floor(100*runiform()) /* Number of workers in a firm */ g double lat = 39+runiform() /* Latitude in decimal degree of a fi
set seed 123456
set obs 5000
g firmid = "firm" + string(_n) /* Observation (firm) id */
g nw = floor(100*runiform()) /* Number of workers in a firm */
g double lat = 39+runiform() /* Latitude in decimal degree of a firm */
g double lon = -76+runiform() /* Longitude in decimal degree of a firm */
前10项观察是:
+--------------------------------------+
| firmid nw lat lon |
|--------------------------------------|
1. | firm1 81 39.915526 -75.505018 |
2. | firm2 35 39.548523 -75.201567 |
3. | firm3 10 39.657866 -75.17988 |
4. | firm4 83 39.957938 -75.898837 |
5. | firm5 56 39.575881 -75.169157 |
6. | firm6 73 39.886184 -75.857255 |
7. | firm7 27 39.33288 -75.724665 |
8. | firm8 75 39.165549 -75.96502 |
9. | firm9 64 39.688819 -75.232764 |
10. | firm10 76 39.012228 -75.166272 |
+--------------------------------------+
我需要计算公司1和所有其他公司之间的距离。因此,vincenty命令如下所示:
. scalar theLat = 39.915526
. scalar theLon = -75.505018
. vincenty lat lon theLat theLon, hav(distance_km) inkm
vincenty命令将创建distance_km变量,该变量具有每个观测值与1之间的距离。在这里,我手动复制并粘贴两个数字,即39.915526和-75.505018
问题1:提取这些数字的语法是什么
现在,我可以在距离_km的地方进行观察,以下是基本相同的策略,并基于您的“最终目标”。同样,根据原始数据集的大小,它可能很有用。
joinby
创建观察值,因此您可能会超过Stata限制。然而,我相信它能满足你的需求
clear all
set more off
set seed 123456
set obs 10
g firmid = _n /* Observation (firm) id */
g nw = floor(100*runiform()) /* Number of workers in a firm */
g double lat = 39+runiform() /* Latitude in decimal degree of a firm */
g double lon = -76+runiform() /* Longitude in decimal degree of a firm */
gen dum = 1
list
* joinby procedure
tempfile main
save "`main'"
rename (firmid lat lon nw) =0
joinby dum using "`main'"
drop dum
* Pretty print
sort firmid0 firmid
order firmid0 firmid
list, sepby(firmid0)
* Uncomment if you do not want to include workers in the "base" firm.
*drop if firmid0 == firmid
* Compute distance
vincenty lat0 lon0 lat lon, hav(distance_km) inkm
keep if distance_km <= 40 // an arbitrary distance
list, sepby(firmid0)
* Compute workers of nearby-firms
collapse (sum) nw_sum=nw (mean) nw0 lat0 lon0, by(firmid0)
list
然而效率低下,一些使用timer
进行的测试显示,大部分计算时间都会进入vincenty
命令,您将无法逃脱该命令。以下是使用Intel Core i5处理器和传统硬盘驱动器(非SSD)进行10000次观察的时间(秒)。计时器1为总数,2、3、4为组件(约)。计时器3对应于vincenty
:
. timer list
1: 1953.99 / 1 = 1953.9940
2: 169.19 / 10000 = 0.0169
3: 1669.95 / 10000 = 0.1670
4: 94.47 / 10000 = 0.0094
当然,请注意,在这两种代码中,都会重复计算距离(例如,计算firm1-firm2和firm2-firm1之间的距离),这可能是可以避免的。目前,11万次观测需要很长时间。从积极的一面来看,我注意到,与第一次设置中相同数量的观察结果相比,第二次设置需要的内存非常少。事实上,我的4GB机器与后者一起冻结
还要注意的是,尽管我使用了与您相同的种子,但数据是不同的,因为我创建了不同数量的观察(而不是5000),这使得变量创建过程有所不同
(顺便说一下,如果您想将值保存为标量,可以使用:
scalar latitude=lat[1]
)。谢谢,参考第16页。我学到了很多。joinby命令对于这个小数据非常有效。然而,我的原始数据集有超过110000个观测值,因此系统将崩溃。我可能不得不截断数据,将它们折叠为总和,然后将一个观察文件合并到一家公司的原始数据中。然后,我可能不得不对所有其他公司重复这个过程。@BillTP我添加了一些额外的代码,实现了您提到的一些东西,以绕过观察的限制。也许它会给你一些想法。对那些悲观的投票者:我希望能得到反馈,说明这样做的原因。我认为没有必要对答案投否决票而不解释原因。特别是当它被原海报接受时。
clear all
set more off
set seed 123456
set obs 10
g firmid = _n /* Observation (firm) id */
g nw = floor(100*runiform()) /* Number of workers in a firm */
g double lat = 39+runiform() /* Latitude in decimal degree of a firm */
g double lon = -76+runiform() /* Longitude in decimal degree of a firm */
gen dum = 1
list
* joinby procedure
tempfile main
save "`main'"
rename (firmid lat lon nw) =0
joinby dum using "`main'"
drop dum
* Pretty print
sort firmid0 firmid
order firmid0 firmid
list, sepby(firmid0)
* Uncomment if you do not want to include workers in the "base" firm.
*drop if firmid0 == firmid
* Compute distance
vincenty lat0 lon0 lat lon, hav(distance_km) inkm
keep if distance_km <= 40 // an arbitrary distance
list, sepby(firmid0)
* Compute workers of nearby-firms
collapse (sum) nw_sum=nw (mean) nw0 lat0 lon0, by(firmid0)
list
clear all
set more off
* Create empty database
gen x = .
tempfile results
save "`results'", replace
* Create input for exercise
set seed 123456
set obs 500
g firmid = _n /* Observation (firm) id */
g nw = floor(100*runiform()) /* Number of workers in a firm */
g double lat = 39+runiform() /* Latitude in decimal degree of a firm */
g double lon = -76+runiform() /* Longitude in decimal degree of a firm */
gen dum = 1
*list
* Save number of firms
local size = _N
display "`size'"
* joinby procedure
tempfile main
save "`main'"
timer clear 1
timer clear 2
timer clear 3
timer clear 4
quietly {
timer on 1
forvalues i=1/`size'{
timer on 2
use "`main'" in `i', clear // assumed sorted on firmid
rename (firmid lat lon nw) =0
joinby dum using "`main'", unmatched(using)
drop _merge dum
order firmid0 firmid
timer off 2
timer on 3
vincenty lat0 lon0 lat lon, hav(dist) inkm
timer off 3
keep if dist <= 40 // an arbitrary distance
timer on 4
collapse (sum) nw_sum=nw (mean) nw0 lat0 lon0, by(firmid0)
append using "`results'"
save "`results'", replace
timer off 4
}
timer off 1
}
use "`results'", clear
sort firmid0
drop x
list
timer list
. timer list
1: 1953.99 / 1 = 1953.9940
2: 169.19 / 10000 = 0.0169
3: 1669.95 / 10000 = 0.1670
4: 94.47 / 10000 = 0.0094