String 找到单词时的Stata标志，而不是STRPO_String_Stata

String 找到单词时的Stata标志，而不是STRPO

string stata

String 找到单词时的Stata标志，而不是STRPO,string,stata,String,Stata,我有一些带字符串的数据，我想在找到一个单词时进行标记。单词可以定义为字符串开头、结尾或空格分隔处strpos将在字符串出现时找到，但我正在寻找类似于subinword的内容。Stata是否有办法使用subinword的功能而不必替换它，而是标记单词 clear input id str50 strings 1 "the thin th man" 2 "this old then" 3 "th to moon" 4 "moo

我有一些带字符串的数据，我想在找到一个单词时进行标记。单词可以定义为字符串开头、结尾或空格分隔处

strpos

将在字符串出现时找到，但我正在寻找类似于

subinword

的内容。Stata是否有办法使用

subinword

的功能而不必替换它，而是标记单词

clear 
input id str50 strings
1 "the thin th man"
2  "this old then"
3 "th to moon"
4 "moon blank th"
end

gen th_pos = 0
replace th = 1 if strpos(strings, "th") >0

上述代码将标记每个观测值，因为它们都包含“th”，但我希望的输出是：

ID      strings          th_sub
1   "the thin th man"      1
2   "this old then"        0
3   "th to moon"           1
4   "moon blank th"        1

一个小技巧是，

“th”

作为一个单词，除了出现在字符串的开头或结尾之外，前面和后面都会有空格。当然，例外情况也不是什么挑战

gen wanted = strpos(" " + strings + " ", " th ") > 0

在他们周围工作。否则，将有一组丰富的正则表达式函数可供使用

上面的示例标记不执行您想要的操作的代码压缩为一行

gen th_pos = strpos(strings, "th") > 0

一个更直接的答案是，你不必更换任何东西。你只需要让斯塔塔告诉你如果你这样做会发生什么：

gen WANTED = strings != subinword(strings, "th", "", .)

如果删除子字符串（如果存在）会更改字符串，则该字符串必须存在。

正则表达式对于此类练习非常有用，单词边界允许您搜索由

\b

指示的整个单词，如

“\bword\b”

中所示

gen wanted = ustrregexm(strings, "\bth\b")