使用正则表达式组合将分隔符保留在Strsplit中
我正在搜索一些数据,这些数据需要我使用使用正则表达式组合将分隔符保留在Strsplit中,r,regex,strsplit,R,Regex,Strsplit,我正在搜索一些数据,这些数据需要我使用strsplit组合regex函数。我已经知道了如何分割字符串,但是我很难在保留分隔符方面应用指导 下面是我正在抓取的字符串示例: text您可以将正则表达式中的消费模式放入lookbehinds: > text<-c("This activity center is fun and helps give your birds exercise! With climbing ladders, a swing, tightrope and an a
strsplit
组合regex
函数。我已经知道了如何分割字符串,但是我很难在保留分隔符方面应用指导
下面是我正在抓取的字符串示例:
text您可以将正则表达式中的消费模式放入lookbehinds:
> text<-c("This activity center is fun and helps give your birds exercise! With climbing ladders, a swing, tightrope and an assortment of engaging toys, the Activity Center has everything your bird needs to relieve stress and boredom all in one place. Relieves stress & boredom Durable & brightly colored wood Easy to clean bottom Simple installation & assemblyMaterial: WoodDimensions (Overall): 12.0 inches (H) x 15.0 inches (W) x 18.5 inches (L)Weight: 6.0 poundsHolds up to: 20.0 poundsIntended Pet Type: BirdCare and Cleaning: Hand washPet activity: ClimbTCIN: 16707835UPC: 030172025594Item Number (DPCI): 083-01-0246Report incorrect product information")
> strsplit(text, "(?<=[0-9])(?=[A-Z])|(?<=[a-z])(?=[A-Z])|(?<=\\))(?=[A-Z])", perl=TRUE)
[[1]]
[1] "This activity center is fun and helps give your birds exercise! With climbing ladders, a swing, tightrope and an assortment of engaging toys, the Activity Center has everything your bird needs to relieve stress and boredom all in one place. Relieves stress & boredom Durable & brightly colored wood Easy to clean bottom Simple installation & assembly"
[2] "Material: Wood"
[3] "Dimensions (Overall): 12.0 inches (H) x 15.0 inches (W) x 18.5 inches (L)"
[4] "Weight: 6.0 pounds"
[5] "Holds up to: 20.0 pounds"
[6] "Intended Pet Type: Bird"
[7] "Care and Cleaning: Hand wash"
[8] "Pet activity: Climb"
[9] "TCIN: 16707835"
[10] "UPC: 030172025594"
[11] "Item Number (DPCI): 083-01-0246"
[12] "Report incorrect product information"
>text strsplit(text,”(?你为什么不把消费部分放到lookbehinds中?后续问题(我非常乐意发布一个单独的问题,或者修改我上面的问题):strsplit是否允许我在子字符串的内容不一致时将子字符串发送到列?例如,我可能正在计算不包含材质
子字符串的第二个字符串。在转换到数据帧时,我仍然希望有一个材质
列,并且该值为NA f或不具有该子串的字符串。@ RoDeY在分割方法中是绝对不可能的,您应该考虑匹配,甚至捕获,因为捕获的“字段”的数量总是不变的(捕获的数量是由正则表达式中的捕获组的数量定义的)。.有一个很好的代码是别人写的,看看你是否想遵循这个方法。