如何将字符对象（已解析的网页）转换为R中的tidy对象？_R_Character_Tidyr_Tidytext

如何将字符对象（已解析的网页）转换为R中的tidy对象？

如何将字符对象（已解析的网页）转换为R中的tidy对象？,r,character,tidyr,tidytext,R,Character,Tidyr,Tidytext,使用我想将这些数据存储在整洁的对象中，如： clear.text [1] "Alan Turing\n\nFrom Wikipedia, the free encyclopedia\n\nJump to navigation\tJump to search\n\n\"Turing\" redirects here. For other uses, see Turing (disambiguation).\n\nmathematician and computer scientist\n\nAl

使用

我想将这些数据存储在整洁的对象中，如：

clear.text
[1] "Alan Turing\n\nFrom Wikipedia, the free encyclopedia\n\nJump to navigation\tJump to search\n\n\"Turing\" redirects here. For other uses, see Turing (disambiguation).\n\nmathematician and computer scientist\n\nAlan Turing\n\nOBE FRS\n\nTuring aged 16\n\nBorn (1912-06-23)23 June 1912\n\nM...

结果是

'tidy.character' is deprecated.

#一个tible:1 x 1
x
1“Alan Turing \n \n从免费百科全书Wikipedia \n \n转到导航\tJum
>

因此，如何将这样的纯文本转换为整洁的格式

感谢您的帮助。

如果您有Wikipedia链接或其他HTML，tidytext中的

unnest_tokens（）

函数可以直接解析和整理它

库（tidytext）
图书馆（tidyverse）
读_行（“https://en.wikipedia.org/wiki/Alan_Turing") %>%
数据帧（文本=）%>%
unnest_标记（单词、文本、格式=“html”）
#>#A tible:15460 x 1
#>话
#>        
#>1艾伦
#>2图灵
#>3维基百科
#>4这个
#>5是
#>6 a
#>7好的
#>第8条
#>9跟随
#>10
#>#…还有15450行

2018年12月18日由（v0.2.1）

创建，代码块中的

sessionInfo（）

输出以及所有必要的

library（）都非常方便调用复制您的示例。此外，请考虑使用<代码> TeXTrADR:：Read html HTM/COD>而不是那个<代码> HTM2TXT包，因为该代码> HTM2TXT包是超级危险的（它使用正则表达式来破坏HTML内容，并且最终可能会伤害到您）您所说的“整洁的对象”是什么意思？"? 我没有安装htm2text
，但是弃用警告说您正在对字符向量调用tidy
。你想要得到什么样的结果？我知道这个地方不适合问，我不知道如何与你联系，你能回答关于数据工程师职业指导的问题吗。
tidy.text <- tidy(clear.text)

'tidy.character' is deprecated.

# A tibble: 1 x 1
                                                                                 x
                                                                             <chr>
1 "Alan Turing\n\nFrom Wikipedia, the free encyclopedia\n\nJump to navigation\tJum
>