根据R中的条件合并字符串和时间戳
我有带时间戳的演讲稿:根据R中的条件合并字符串和时间戳,r,conditional-statements,R,Conditional Statements,我有带时间戳的演讲稿: df line speaker utterance timestamp 1 0001 ID16.1 ah-ha 00:00:07.060 - 00:00:07.660 3 0002 <NA> yes 00
df
line speaker utterance timestamp
1 0001 ID16.1 ah-ha 00:00:07.060 - 00:00:07.660
3 0002 <NA> yes 00:00:07.964 - 00:00:08.610
5 0003 <NA> okay so where do we know each other from 00:00:16.350 - 00:00:22.170
7 0004 ID16.2 U uh Upper Rhine Cruises? maybe? 00:00:23.400 - 00:00:26.600
9 0005 ID16.3 yeah? ((pause)) well I do n't- 00:00:26.305 - 00:00:28.210
11 0006 ID16.1 (...) Meg? 00:00:27.385 - 00:00:29.305
13 0007 <NA> do you know Meg? 00:00:29.100 - 00:00:33.879
我一直在尝试使用paste0、dplyr::lag和dplyr:lead来解决这个问题,但还没有取得进展
可复制数据:
df <- structure(list(line = c("0001", "0002", "0003", "0004", "0005",
"0006", "0007"), speaker = c("ID16.1", NA, NA, "ID16.2",
"ID16.3", "ID16.1", NA), utterance = c("ah-ha", "yes",
"okay so where do we know each other from",
"U uh Upper Rhine Cruises? maybe? ", "yeah? ((pause)) well I do n't-",
"(...) Meg?", "do you know Meg?"
), timestamp = c("00:00:07.060 - 00:00:07.660", "00:00:07.964 - 00:00:08.610",
"00:00:16.350 - 00:00:22.170", "00:00:23.400 - 00:00:26.600",
"00:00:26.305 - 00:00:28.210", "00:00:27.385 - 00:00:29.305",
"00:00:29.100 - 00:00:33.879")), row.names = c(1L, 3L, 5L, 7L,
9L, 11L, 13L), class = "data.frame")
请尝试dplyr::group\u by。仅供参考,显示的数据与df不同,df会更改聚合
图书馆弹琴
df%>%
组_bynotna=cumsum!is.naspeaker%>%
总结
行=第一行,
演讲者=第一演讲者,
发音=粘贴发音,折叠=,
时间戳=PasteUnlistrSplitimestamp,[-]+[c1,n*2],折叠=-,,
.组=下降
%>%
选择notna
`总结“使用“.groups”参数取消分组输出覆盖”
一个tibble:4x4
行说话人话语时间戳
10001 ID16.1啊哈,是的,好的,那么我们从00:00:07.060到00:00:22.170在哪里认识
20004 ID16.2 U uh上莱茵河游轮?大概00:00:23.400 - 00:00:26.600
3005ID16.3是吗?暂停一下,我不知道-00:00:26.305-00:00:28.210
40006 ID16.1。。。梅格?你认识梅格吗?00:00:27.385 - 00:00:33.879
谢谢你的提示。我已经更改了显示的数据。
df <- structure(list(line = c("0001", "0002", "0003", "0004", "0005",
"0006", "0007"), speaker = c("ID16.1", NA, NA, "ID16.2",
"ID16.3", "ID16.1", NA), utterance = c("ah-ha", "yes",
"okay so where do we know each other from",
"U uh Upper Rhine Cruises? maybe? ", "yeah? ((pause)) well I do n't-",
"(...) Meg?", "do you know Meg?"
), timestamp = c("00:00:07.060 - 00:00:07.660", "00:00:07.964 - 00:00:08.610",
"00:00:16.350 - 00:00:22.170", "00:00:23.400 - 00:00:26.600",
"00:00:26.305 - 00:00:28.210", "00:00:27.385 - 00:00:29.305",
"00:00:29.100 - 00:00:33.879")), row.names = c(1L, 3L, 5L, 7L,
9L, 11L, 13L), class = "data.frame")