将日和月变量转换为数值(Stata)

将日和月变量转换为数值(Stata),stata,Stata,我有在线招聘信息的数据,但有些变量是字符串形式的,我希望它们是数字形式,以创建如中所示的时间序列图 我感兴趣的三个变量转换为数值变量如下所示: dataex month posted_date revenue [CODE] * Example generated by -dataex-. To install: ssc install dataex clear input str10 month str19 posted_date str32 revenue "March_2021&q

我有在线招聘信息的数据,但有些变量是字符串形式的,我希望它们是数字形式,以创建如中所示的时间序列图

我感兴趣的三个变量转换为数值变量如下所示:

dataex month posted_date revenue
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str10 month str19 posted_date str32 revenue
"March_2021" "2021-03-08 10:44:15" "Less than $1 million (USD)"      
"March_2021" "2021-03-08 10:44:15" "Less than $1 million (USD)"      
"Dec_2020"   "2020-12-13 08:04:59" "$10+ billion (USD)"              
"Nov_2020"   "44150.33611"         "$10+ billion (USD)"              
"Dec_2020"   "2021-01-04 04:59:40" "$10+ billion (USD)"              
"Nov_2020"   "44167.24444"         "$10+ billion (USD)"              
"Dec_2020"   "2020-12-16 10:49:38" "$10+ billion (USD)"              
"Nov_2020"   "44167.24514"         "$10+ billion (USD)"              
"Nov_2020"   "44172.01319"         "$10+ billion (USD)"              
"Dec_2020"   "2020-12-30 05:52:25" "$10+ billion (USD)"              
"April_2021" "2021-04-21 04:16:12" ""                                
"April_2021" "2021-04-21 04:16:12" ""                                
"Feb_2021"   "2021-03-01 01:03:09" ""                                
"Feb_2021"   "2021-03-01 01:03:09" ""                                
"Feb_2021"   "2021-03-01 01:03:09" ""                                
"April_2021" "2021-04-21 05:57:59" ""                                
"April_2021" "2021-04-21 05:57:59" ""                                
"Dec_2020"   "2020-12-22 08:13:06" "$500 million to $1 billion (USD)"
dataex revenue_n posted_date_n month_n
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(revenue_n posted_date_n month_n)
.        . .
.        . .
.        . .
. 44150.34 .
.        . .
. 44167.25 .
.        . .
. 44167.25 .
. 44172.01 .
.        . .
.        . .
.        . .
我希望新变量如下所示:

month_n posted_date_n revenue_n 
02/21   09/02/21       $500m_1B
03/21   14/03/21       +10B
04/21   11/04/21       +1m
因此,根据说明,我运行了以下代码:

// Destring variables string variables with numerical values 
gen posted_date_n = real(posted_date)
gen month_n = real(month)
gen revenue_n = real(revenue)
然而,我无法真正得到我想要的,相反,数据如下所示:

dataex month posted_date revenue
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str10 month str19 posted_date str32 revenue
"March_2021" "2021-03-08 10:44:15" "Less than $1 million (USD)"      
"March_2021" "2021-03-08 10:44:15" "Less than $1 million (USD)"      
"Dec_2020"   "2020-12-13 08:04:59" "$10+ billion (USD)"              
"Nov_2020"   "44150.33611"         "$10+ billion (USD)"              
"Dec_2020"   "2021-01-04 04:59:40" "$10+ billion (USD)"              
"Nov_2020"   "44167.24444"         "$10+ billion (USD)"              
"Dec_2020"   "2020-12-16 10:49:38" "$10+ billion (USD)"              
"Nov_2020"   "44167.24514"         "$10+ billion (USD)"              
"Nov_2020"   "44172.01319"         "$10+ billion (USD)"              
"Dec_2020"   "2020-12-30 05:52:25" "$10+ billion (USD)"              
"April_2021" "2021-04-21 04:16:12" ""                                
"April_2021" "2021-04-21 04:16:12" ""                                
"Feb_2021"   "2021-03-01 01:03:09" ""                                
"Feb_2021"   "2021-03-01 01:03:09" ""                                
"Feb_2021"   "2021-03-01 01:03:09" ""                                
"April_2021" "2021-04-21 05:57:59" ""                                
"April_2021" "2021-04-21 05:57:59" ""                                
"Dec_2020"   "2020-12-22 08:13:06" "$500 million to $1 billion (USD)"
dataex revenue_n posted_date_n month_n
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(revenue_n posted_date_n month_n)
.        . .
.        . .
.        . .
. 44150.34 .
.        . .
. 44167.25 .
.        . .
. 44167.25 .
. 44172.01 .
.        . .
.        . .
.        . .

我可以运行代码将数据转换成您想要的格式,但不能用于日期值,如44150.33611等。这些似乎是@JR96所指出的excel格式

我建议使用split函数,Nick Cox写的一篇非常方便的文章是一本有用的读物()

这并不完全符合你的要求,但在我看来,这比什么都没有更接近。示例输出为

month_n, year_n, posted_date_n
March, 2021, 03/08/2021
March, 2021, 03/08/2021

在这里,一切都被格式化为Stata可以识别的日期。也许其他人可以在这里加入,将
month\n
year\n
列组合起来?

有关将字符串变量转换为数字日期时间变量的函数,请参见
help datetime
。您的第一个变量是可行的。对于第二个变量,需要知道“44150.33611”等其他值的含义。对于第三个变量,您可以查看
recode
。像“44150.33611”这样的值是Excel的日期:时间格式(例如,我相信这是2020-11-15 8:04:00)
real()
字符串日期只在例外情况下有效,例如,如果年份被错误地导入为字符串。抱歉
recode
仅适用于数字变量,因此,也许只需查看
replace
。无需在此处拆分任何内容。OP可以使用
gen month\u n=monthly(month,“MY”)
gen posted\u date\u n=dofc(clock(posted\u date,“YMDhms”)
,然后可以分别格式化为
%tm
%td
。这很好。我建议你们提交你们的代码作为答案。谢谢,我会的,但他们似乎已经放弃了他们的问题船。。。