使用R将错误文本格式化为数据帧

使用R将错误文本格式化为数据帧,r,regex,dataframe,R,Regex,Dataframe,如何格式化我的_单词: 这是我的解决办法。如果您不清楚以下正则表达式,您可能需要参考stringr包的备忘单 简而言之,我将字符串分为三个或三个类别: 数字后跟字母或空白 下接上、数字或美元 结肠后接上消化道 这项任务是巨大而混乱的,涉及到过多的正则表达式。我想你不能只说,这是我的问题,解决它!你至少需要表现出你自己的某种程度的努力!这里也许有一些东西可以让你走上正轨:unliststringr::str_splitmy_words? my_words = "June 29, 2019

如何格式化我的_单词:


这是我的解决办法。如果您不清楚以下正则表达式,您可能需要参考stringr包的备忘单

简而言之,我将字符串分为三个或三个类别:

数字后跟字母或空白 下接上、数字或美元 结肠后接上消化道
这项任务是巨大而混乱的,涉及到过多的正则表达式。我想你不能只说,这是我的问题,解决它!你至少需要表现出你自己的某种程度的努力!这里也许有一些东西可以让你走上正轨:unliststringr::str_splitmy_words?
my_words = "June 29, 2019June 27, 2020June 29, 2019Net sales:   Products$46,529 $42,354 $170,598 $162,354"
my_words = "Three Months EndedNine Months EndedJune 27, 2020June 29, 2019June 27, 2020June 29, 2019Net sales:   Products$46,529 $42,354 $170,598 $162,354    Services13,156 11,455 39,219 33,780 Total net sales59,685 53,809 209,817 196,134 Cost of sales:   Products32,693 29,473 116,089 109,758    Services4,312 4,109 13,461 12,297 Total cost of sales37,005 33,582 129,550 122,055 Gross margin22,680 20,227 80,267 74,079 Operating expenses:Research and development4,758 4,257 13,774 12,107 Selling, general and administrative4,831 4,426 14,980 13,667 Total operating expenses9,589 8,683 28,754 25,774 Operating income13,091 11,544 51,513 48,305 Other income/(expense), net46 367 677 1,305 Income before provision for income taxes13,137 11,911 52,190 49,610 Provision for income taxes1,884 1,867 7,452 8,040 Net income$11,253 $10,044 $44,738 $41,570 Earnings per share:Basic$2.61 $2.20 $10.25 $8.92 Diluted$2.58 $2.18 $10.16 $8.86 Shares used in computing earnings per share:Basic4,312,573 4,570,633 4,362,571 4,660,175 Diluted4,354,788 4,601,380 4,404,695 4,691,759"
my_words = stringr::str_split(my_words, "(?<=[:digit:])(?=[:alpha:]|[:blank:])|(?<=[:lower:])(?=[:upper:]|[:digit:]|\\$)|(?<=:)(?=[:upper:])")
my_words = stringr::str_trim(as.list(my_words[[1]]))
my_words