将日和月变量转换为数值(Stata)
我有在线招聘信息的数据,但有些变量是字符串形式的,我希望它们是数字形式,以创建如中所示的时间序列图 我感兴趣的三个变量转换为数值变量如下所示:将日和月变量转换为数值(Stata),stata,Stata,我有在线招聘信息的数据,但有些变量是字符串形式的,我希望它们是数字形式,以创建如中所示的时间序列图 我感兴趣的三个变量转换为数值变量如下所示: dataex month posted_date revenue [CODE] * Example generated by -dataex-. To install: ssc install dataex clear input str10 month str19 posted_date str32 revenue "March_2021&q
dataex month posted_date revenue
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str10 month str19 posted_date str32 revenue
"March_2021" "2021-03-08 10:44:15" "Less than $1 million (USD)"
"March_2021" "2021-03-08 10:44:15" "Less than $1 million (USD)"
"Dec_2020" "2020-12-13 08:04:59" "$10+ billion (USD)"
"Nov_2020" "44150.33611" "$10+ billion (USD)"
"Dec_2020" "2021-01-04 04:59:40" "$10+ billion (USD)"
"Nov_2020" "44167.24444" "$10+ billion (USD)"
"Dec_2020" "2020-12-16 10:49:38" "$10+ billion (USD)"
"Nov_2020" "44167.24514" "$10+ billion (USD)"
"Nov_2020" "44172.01319" "$10+ billion (USD)"
"Dec_2020" "2020-12-30 05:52:25" "$10+ billion (USD)"
"April_2021" "2021-04-21 04:16:12" ""
"April_2021" "2021-04-21 04:16:12" ""
"Feb_2021" "2021-03-01 01:03:09" ""
"Feb_2021" "2021-03-01 01:03:09" ""
"Feb_2021" "2021-03-01 01:03:09" ""
"April_2021" "2021-04-21 05:57:59" ""
"April_2021" "2021-04-21 05:57:59" ""
"Dec_2020" "2020-12-22 08:13:06" "$500 million to $1 billion (USD)"
dataex revenue_n posted_date_n month_n
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(revenue_n posted_date_n month_n)
. . .
. . .
. . .
. 44150.34 .
. . .
. 44167.25 .
. . .
. 44167.25 .
. 44172.01 .
. . .
. . .
. . .
我希望新变量如下所示:
month_n posted_date_n revenue_n
02/21 09/02/21 $500m_1B
03/21 14/03/21 +10B
04/21 11/04/21 +1m
因此,根据说明,我运行了以下代码:
// Destring variables string variables with numerical values
gen posted_date_n = real(posted_date)
gen month_n = real(month)
gen revenue_n = real(revenue)
然而,我无法真正得到我想要的,相反,数据如下所示:
dataex month posted_date revenue
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str10 month str19 posted_date str32 revenue
"March_2021" "2021-03-08 10:44:15" "Less than $1 million (USD)"
"March_2021" "2021-03-08 10:44:15" "Less than $1 million (USD)"
"Dec_2020" "2020-12-13 08:04:59" "$10+ billion (USD)"
"Nov_2020" "44150.33611" "$10+ billion (USD)"
"Dec_2020" "2021-01-04 04:59:40" "$10+ billion (USD)"
"Nov_2020" "44167.24444" "$10+ billion (USD)"
"Dec_2020" "2020-12-16 10:49:38" "$10+ billion (USD)"
"Nov_2020" "44167.24514" "$10+ billion (USD)"
"Nov_2020" "44172.01319" "$10+ billion (USD)"
"Dec_2020" "2020-12-30 05:52:25" "$10+ billion (USD)"
"April_2021" "2021-04-21 04:16:12" ""
"April_2021" "2021-04-21 04:16:12" ""
"Feb_2021" "2021-03-01 01:03:09" ""
"Feb_2021" "2021-03-01 01:03:09" ""
"Feb_2021" "2021-03-01 01:03:09" ""
"April_2021" "2021-04-21 05:57:59" ""
"April_2021" "2021-04-21 05:57:59" ""
"Dec_2020" "2020-12-22 08:13:06" "$500 million to $1 billion (USD)"
dataex revenue_n posted_date_n month_n
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(revenue_n posted_date_n month_n)
. . .
. . .
. . .
. 44150.34 .
. . .
. 44167.25 .
. . .
. 44167.25 .
. 44172.01 .
. . .
. . .
. . .
我可以运行代码将数据转换成您想要的格式,但不能用于日期值,如44150.33611等。这些似乎是@JR96所指出的excel格式 我建议使用split函数,Nick Cox写的一篇非常方便的文章是一本有用的读物() 这并不完全符合你的要求,但在我看来,这比什么都没有更接近。示例输出为
month_n, year_n, posted_date_n
March, 2021, 03/08/2021
March, 2021, 03/08/2021
在这里,一切都被格式化为Stata可以识别的日期。也许其他人可以在这里加入,将
month\n
和year\n
列组合起来?有关将字符串变量转换为数字日期时间变量的函数,请参见help datetime
。您的第一个变量是可行的。对于第二个变量,需要知道“44150.33611”等其他值的含义。对于第三个变量,您可以查看recode
。像“44150.33611”这样的值是Excel的日期:时间格式(例如,我相信这是2020-11-15 8:04:00)real()
字符串日期只在例外情况下有效,例如,如果年份被错误地导入为字符串。抱歉recode
仅适用于数字变量,因此,也许只需查看replace
。无需在此处拆分任何内容。OP可以使用gen month\u n=monthly(month,“MY”)
和gen posted\u date\u n=dofc(clock(posted\u date,“YMDhms”)
,然后可以分别格式化为%tm
和%td
。这很好。我建议你们提交你们的代码作为答案。谢谢,我会的,但他们似乎已经放弃了他们的问题船。。。