Import Stata:导入带有不一致分隔符的.txt
我有一个带有相对奇怪分隔符的.txt文件。数据如下所示:Import Stata:导入带有不一致分隔符的.txt,import,stata,delimiter,txt,Import,Stata,Delimiter,Txt,我有一个带有相对奇怪分隔符的.txt文件。数据如下所示: |ABC4|,|Name1|,|NameRaw1|,|y|,|XY1|,10000.0,| |,|FOURTH QUARTER REPORT|,|| |ABC5|,|Name2, extraname|,|NameRaw2|,,|XY2|,266539.0,|pac |,|MID-YEAR REPORT|,|| |ABC6|,|Name3|,|NameRaw3|,|y|,|X,Y3|,60000.0,|name |,|YEAR-E
|ABC4|,|Name1|,|NameRaw1|,|y|,|XY1|,10000.0,| |,|FOURTH QUARTER REPORT|,||
|ABC5|,|Name2, extraname|,|NameRaw2|,,|XY2|,266539.0,|pac |,|MID-YEAR REPORT|,||
|ABC6|,|Name3|,|NameRaw3|,|y|,|X,Y3|,60000.0,|name |,|YEAR-END REPORT|,|XYZ|
因此,有一个问题是,有些变量没有管道,比如第六个变量,它只是一个没有管道的量,而有些变量只有在它们为空时才没有管道,比如第四个变量,它是,
或,|y |,
。有些变量也有逗号,所以我不能用逗号作为分隔符。所以基本上有两个问题:
我正在寻找一种方法来解决这个问题。有人知道怎么做吗?如果完整的数据集比这个例子更混乱,我真的不想知道。但这似乎有点道理
* Example generated by -dataex-. To install: ssc install dataex
clear
input str100 whatever
"|ABC4|,|Name1|,|NameRaw1|,|y|,|XY1|,10000.0,| |,|FOURTH QUARTER REPORT|,||"
"|ABC5|,|Name2, extraname|,|NameRaw2|,,|XY2|,266539.0,|pac |,|MID-YEAR REPORT|,||"
"|ABC6|,|Name3|,|NameRaw3|,|y|,|X,Y3|,60000.0,|name |,|YEAR-END REPORT|,|XYZ|"
end
gen work = whatever
replace work = subinstr(work, ",,", ",||,", .)
forval j = 1/5 {
gen work`j' = substr(work, 1, strpos(work, "|,") + 1)
replace work = subinstr(work, work`j', "", 1)
}
gen work6 = substr(work, 1, strpos(work, ","))
replace work = subinstr(work, work6, "", 1)
forval j = 7/8 {
gen work`j' = substr(work, 1, strpos(work, "|,") + 1)
replace work = subinstr(work, work`j', "", 1)
}
gen work9 = work
drop work
forval j = 1/9 {
replace work`j' = trim(subinstr(work`j', "|", "", .))
replace work`j' = substr(work`j', 1, length(work`j') - 1) if substr(work`j', -1, 1) == ","
}
list
+-----------------------------------------------------------------------------------+
1. | whatever |
| |ABC4|,|Name1|,|NameRaw1|,|y|,|XY1|,10000.0,| |,|FOURTH QUARTER REPORT|,|| |
|-----------------------------------------------------------------------------------|
| work1 | work2 | work3 | work4 | work5 | work6 | work7 |
| ABC4 | Name1 | NameRaw1 | y | XY1 | 10000.0 | |
|-----------------------------------------------------------------------------------|
| work8 | work9 |
| FOURTH QUARTER REPORT | |
+-----------------------------------------------------------------------------------+
+-----------------------------------------------------------------------------------+
2. | whatever |
| |ABC5|,|Name2, extraname|,|NameRaw2|,,|XY2|,266539.0,|pac |,|MID-YEAR REPORT|,|| |
|-----------------------------------------------------------------------------------|
| work1 | work2 | work3 | work4 | work5 | work6 | work7 |
| ABC5 | Name2, extraname | NameRaw2 | | XY2 | 266539.0 | pac |
|-----------------------------------------------------------------------------------|
| work8 | work9 |
| MID-YEAR REPORT | |
+-----------------------------------------------------------------------------------+
+-----------------------------------------------------------------------------------+
3. | whatever |
| |ABC6|,|Name3|,|NameRaw3|,|y|,|X,Y3|,60000.0,|name |,|YEAR-END REPORT|,|XYZ| |
|-----------------------------------------------------------------------------------|
| work1 | work2 | work3 | work4 | work5 | work6 | work7 |
| ABC6 | Name3 | NameRaw3 | y | X,Y3 | 60000.0 | name |
|-----------------------------------------------------------------------------------|
| work8 | work9 |
| YEAR-END REPORT | XYZ |
+-----------------------------------------------------------------------------------+