在R/RStudio中,为另一列的每一行查找包含一列字符串的行ID

在R/RStudio中,为另一列的每一行查找包含一列字符串的行ID,r,dataframe,R,Dataframe,如果一行在我的dataframe中的“ReferenceText”列中有文本,则“text”列中的相应文本表示回复注释。如果是NA,则“文本”列中的相应文本表示原始帖子 如果可能的话,我希望使用正则表达式(我考虑的是gregexpr和regmatches),但是如果在R/RStudio中有另一种形式的模式匹配,可以执行以下操作: 我想在“文本”列中找到“ReferenceText”文本的匹配文本,并为每个“ReferenceText”观察将“text”观察对应的“ID”放在我的“posted”列

如果一行在我的dataframe中的“ReferenceText”列中有文本,则“text”列中的相应文本表示回复注释。如果是NA,则“文本”列中的相应文本表示原始帖子

如果可能的话,我希望使用正则表达式(我考虑的是gregexpr和regmatches),但是如果在R/RStudio中有另一种形式的模式匹配,可以执行以下操作:

我想在“文本”列中找到“ReferenceText”文本的匹配文本,并为每个“ReferenceText”观察将“text”观察对应的“ID”放在我的“posted”列中。我还想索引每个回复评论的“posted”顺序(“顺序”)

例如,如果回复了原始帖子(PostID=5)(“ReferenceText”第6行第9行中的“Text”文本),“ReferenceText”第6行“PostID”观察结果将标记为“PostID=5”,“Sequence”观察结果将标记为“PostID=5 Sequence=1”。如果在“ReferenceText”(“ReferenceText”第9行)中再次回复或重复原始帖子,则“Sequence”观察结果将标记为“PostID=5 Sequence=2”。我有一个相当大的数据集(160000多个观测值),因此,如果有一个函数能够回答这个问题,我将不胜感激。有什么想法吗

我希望这是有道理的。请注意,任何R/RStudio解决方案都已足够。

我正在粘贴data.frame对象的dput输出

## dput output assigned to my sampleDF data frame
> dput(sampleDF)
structure(list(ID = 1:30, Screen.Name = c("User 1", "User 2", 
"User 3", "User 4", "User 5", "User 6", "User 7", "User 8", "User 5", 
"User 9", "User 9", "User 1", "User 1", "User 10", "User 8", 
"User 11", "Company", "User 12", "User 13", "User 14", "User 15", 
"User 16", "User 17", "User 18", "User 19", "User 13", "User 20", 
"User 21", "Uer 21", "User 21"), Text = c("Can anyone tell me where in the bloody world this TROLL came from.  Is he the troll of the week at the national trolling academy?  https://www.facebook.com/joseph.barnhorst", 
"company's \"You're Kinda a Big Deal\" promotion is kinda lame and insulting.  How about a service that actually is up to speed as advertised?  Now THATwould be a big deal for company.", 
"Hope I win sumthing!", "Im paying 90 dollars for a reason, so fix whatever is broken so I can actually use my phone!", 
"How do you sign up for the Your A Big Deal Sweepstakes?", "http://company.promo.eprize.com/sweepstakes/:b=chrome/?INTCID=TSC:MyS:MyA:Skn:013113:EngagementSweeps#", 
"Thanks for your giant mess up. I'm down 370 dollars.", "When will the blackberry 10 be available ?", 
"Thank you for the link but should the email get you there also?  I clicked on the mobile ad in and e-bill options to finish the sign up for those and received an error message for both.  Perhaps a link isn't working properly there either.", 
"Wait, Could it be Joseph is upset cause his Milkshake didn't bring the boys to his yard? Must see, look at this > http://www.youtube.com/watch?v=gFK8zYYoMtQ", 
"What a putz.... lol", "LMBO!", "The Blackberry Q10 will be available to US carriers in April.", 
"My mobile hot spot just shut off Randomly and now It tell me to set it up again I already have it on my plan", 
"I'm really interested in seeing this phone, I hope it's as good if not better than iPhones cause nothing new has challenged apple really", 
"Turn off the LTE in Carlisle it does work like at all. Or please fix. Won't load anything under LTE", 
"Need a hand? Check out this redesigned umbrella handle that lets you keep texting even during a downpour. http://bit.ly/12fniVl", 
"why we are still having service issues,  ", "unlimited data isn't worth anything when you can't get service with or without a femtocell and tech support has been next to useless over the past few months.", 
"Because everyone should be texting while walking in the rain... face palm.", 
"When is the LTE going to be available in NYC?? I was told end of last year... but it's Feb now.....", 
"Get us 4g already", "So silly. But people will buy it I am sure", 
"rubbish", "Free umbrella with company phones?! I bet it helps with the sewage internet connection you guys have.", 
"oh and more loveliness.. just had a company rep hang up on me..this is twice...nice job.", 
"Oh wow", "The pressure is getting to them. The CEO has put them in a no win situation.", 
"Never", "LTE means Lying To Everyone."), ReferenceText = c("NA", 
"NA", "NA", "NA", "NA", "How do you sign up for the Your A Big Deal Sweepstakes?", 
"NA", "NA", "How do you sign up for the Your A Big Deal Sweepstakes?", 
"Can anyone tell me where in the bloody world this TROLL came from.  Is he the troll of the week at the national trolling academy?  https://www.facebook.com/joseph", 
"Can anyone tell me where in the bloody world this TROLL came from.  Is he the troll of the week at the national trolling academy?  https://www.facebook.com/joseph", 
"Can anyone tell me where in the bloody world this TROLL came from.  Is he the troll of the week at the national trolling academy?  https://www.facebook.com/joseph", 
"When will the blackberry 10 be available ?", "NA", "When will the blackberry 10 be available ?", 
"NA", "NA", "NA", "NA", "Need a hand? Check out this redesigned umbrella handle that lets you keep texting even during a downpour. http://bit.ly/12fniVl", 
"NA", "Need a hand? Check out this redesigned umbrella handle that lets you keep texting even during a downpour. http://bit.ly/12fniVl", 
"Need a hand? Check out this redesigned umbrella handle that lets you keep texting even during a downpour. http://bit.ly/12fniVl", 
"Need a hand? Check out this redesigned umbrella handle that lets you keep texting even during a downpour. http://bit.ly/12fniVl", 
"Need a hand? Check out this redesigned umbrella handle that lets you keep texting even during a downpour. http://bit.ly/12fniVl", 
"NA", "Need a hand? Check out this redesigned umbrella handle that lets you keep texting even during a downpour. http://bit.ly/12fniVl", 
"oh and more loveliness.. just had a company rep hang up on me..this is twice...nice job. ", 
"When is the LTE going to be available in NYC?? I was told end of last year... but it's Feb now.....", 
"NA"), PostID = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA), Sequence = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA), DATE = c("2/1/2013", "2/1/2013", "2/1/2013", 
"2/1/2013", "2/1/2013", "2/1/2013", "2/1/2013", "2/1/2013", "2/1/2013", 
"2/1/2013", "2/1/2013", "2/1/2013", "2/1/2013", "2/1/2013", "2/1/2013", 
"2/1/2013", "2/1/2013", "2/1/2013", "2/1/2013", "2/1/2013", "2/1/2013", 
"2/1/2013", "2/1/2013", "2/1/2013", "2/1/2013", "2/1/2013", "2/1/2013", 
"2/1/2013", "2/1/2013", "2/1/2013"), X_M__millitary_time_ = c("16:46:20", 
"16:52:07", "16:55:54", "17:08:41", "17:10:08", "17:13:01", "17:13:17", 
"17:15:17", "17:19:01", "17:36:39", "17:41:08", "17:42:44", "17:45:42", 
"17:50:08", "17:50:53", "17:53:25", "18:00:01", "18:01:18", "18:03:37", 
"18:04:26", "18:05:41", "18:10:58", "18:11:17", "18:11:20", "18:11:41", 
"18:11:58", "18:13:19", "18:17:13", "18:18:34", "18:19:53"), 
timestampM = c("2/1/2013 16:46", "2/1/2013 16:52", "2/1/2013 16:55", 
"2/1/2013 17:08", "2/1/2013 17:10", "2/1/2013 17:13", "2/1/2013 17:13", 
"2/1/2013 17:15", "2/1/2013 17:19", "2/1/2013 17:36", "2/1/2013 17:41", 
"2/1/2013 17:42", "2/1/2013 17:45", "2/1/2013 17:50", "2/1/2013 17:50", 
"2/1/2013 17:53", "2/1/2013 18:00", "2/1/2013 18:01", "2/1/2013 18:03", 
"2/1/2013 18:04", "2/1/2013 18:05", "2/1/2013 18:10", "2/1/2013 18:11", 
"2/1/2013 18:11", "2/1/2013 18:11", "2/1/2013 18:11", "2/1/2013 18:13", 
"2/1/2013 18:17", "2/1/2013 18:18", "2/1/2013 18:19"), timestampN = c("2/1/2013 16:46", 
"2/1/2013 16:52", "2/1/2013 16:55", "2/1/2013 17:08", "2/1/2013 17:10", 
"2/1/2013 17:13", "2/1/2013 17:13", "2/1/2013 17:15", "2/1/2013 17:19", 
"2/1/2013 17:36", "2/1/2013 17:41", "2/1/2013 17:42", "2/1/2013 17:45", 
"2/1/2013 17:50", "2/1/2013 17:50", "2/1/2013 17:53", "2/1/2013 18:00", 
"2/1/2013 18:01", "2/1/2013 18:03", "2/1/2013 18:04", "2/1/2013 18:05", 
"2/1/2013 18:10", "2/1/2013 18:11", "2/1/2013 18:11", "2/1/2013 18:11", 
"2/1/2013 18:11", "2/1/2013 18:13", "2/1/2013 18:17", "2/1/2013 18:18", 
"2/1/2013 18:19")), .Names = c("ID", "Screen.Name", "Text", 
"ReferenceText", "PostID", "Sequence", "DATE", "X_M__millitary_time_", 
"timestampM", "timestampN"), class = "data.frame", row.names = c(NA, 
-30L))

我看到你有regex标签和r标签。地球上一些最优秀的雷格克斯人来到这里,但对“r”一无所知。如果您需要使用正则表达式的解决方案,您可能希望将“r”转换为更简单的形式。如果你不需要正则表达式,请去掉它的标签。
sampleDF$PostID我看到你有正则表达式标签和r标签。地球上一些最优秀的雷格克斯人来到这里,但对“r”一无所知。如果您需要使用正则表达式的解决方案,您可能希望将“r”转换为更简单的形式。如果你不需要正则表达式,请去掉它的标签。
sampleDF$PostID