使用R从PDF中删除数据_R_Pdf_Web Scraping_Screen Scraping - Fatal编程技术网

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/url/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用R从PDF中删除数据_R_Pdf_Web Scraping_Screen Scraping - Fatal编程技术网

使用R从PDF中删除数据

r pdf web-scraping

使用R从PDF中删除数据,r,pdf,web-scraping,screen-scraping,R,Pdf,Web Scraping,Screen Scraping,我想从这个PDF中提取数据（滑雪跳跃）我对所有数据都感兴趣，除了bib、俱乐部和出生日期我正在尝试使用pdftools库 pdf_text("raw/data.pdf") %>% strsplit(split = "\n") 我被困在这里。问题是列点（门补偿）有时为空，有时为空。我不知道该怎么处理我想要的输出是这样的： Rank|Athlete |Nation|(...)|Jump_1|Round_1|Jump_2|Round_2|Tot_points 1 |KLI

我想从这个PDF中提取数据（滑雪跳跃）

我对所有数据都感兴趣，除了bib、俱乐部和出生日期
我正在尝试使用pdftools库

pdf_text("raw/data.pdf") %>% strsplit(split = "\n")
我被困在这里。问题是列点（门补偿）有时为空，有时为空。我不知道该怎么处理
我想要的输出是这样的：

Rank|Athlete |Nation|(...)|Jump_1|Round_1|Jump_2|Round_2|Tot_points 1 |KLIMOV Evgeniy|RUS |(...)|127.5 |130 |131.5 |133.4 |263.4
有人可以帮我吗？
请查看：

library(tidyverse) text<-pdftools::pdf_text("http://medias4.fis-ski.com/pdf/2019/JP/3088/2019JP3088RL.pdf") list<-str_remove_all(text,"\\X+?TOTAL\\s+RANK\n") %>% str_trim() %>% str_split("\n\\s{10,}(?=\\p{L})") %>% modify_depth(1,~str_split(.x,"\\s{2,}") %>% map(~.x[1:13] %>% set_names(paste0("x",1:13))) ) ## Just the first page df<-bind_rows(!!!list[[1]])

库（tidyverse）文本% str_split（“\n\\s{10，}（？=\\p{L}）”）%>% 修改深度（1，~str\u split（.x，“\\s{2，}”）%>% 地图（~.x[1:13]]>% 设置名称（粘贴0（“x”，1:13））） ) ##就在第一页 df

[pdf]相关文章推荐

在R中使用脚本生成pdf时出错 pdf r statistics graph

使用iText将标题添加到PDF和RTF pdf itext

在命令行中使用Adobe Acrobat 9使用导出功能将pdf转换为png pdf command-line

从PDF Tm运算符确定平移/位置 pdf

Pdf 使用PowerShell脚本访问Word“另存为”对话框 pdf powershell ms-word ms-office

PDF–；字体即使未安装在PC中也能正确显示？ pdf jasper-reports

如何在laravel的浏览器上查看公用/文件夹中的pdf文件 pdf laravel laravel-4

itextsharp表单名称和保存pdf pdf

Pdf 将图像添加到现有文档 pdf itext

模糊部分的pdf头版 pdf image-processing imagemagick

SSIS创建0 KB Pdf文件 pdf reporting-services vbscript ssis

将Unicode写入PDF pdf fonts character-encoding

html2PDF在本地工作，但不会在服务器上创建pdf（空白页） pdf

有没有办法区分原生pdf和扫描pdf？ pdf

有没有办法在线查看PDF图层 pdf

jpg到pdf的批量转换 pdf command-line

pdf tounicode将cid映射到不正确的字符 pdf unicode

poppler pdfunite无法合并加密的PDF文件，如何删除加密？（无需密码即可打开） pdf encryption

使用adobe'；s pdf嵌入api pdf adobe

Pdf “与”的区别是什么；“清洁内容流”；及；“清理内容流”；在mutool？ pdf

随机文章推荐

[r]相关推荐

R 在数据帧列中创建顺序计数
R

R中的文件写入到哪里？
R

解释princomp结果
R

R 如何查找向量中仅出现少于X次的所有值
R

R 无法将系数转换为日期
R Date

基于R中一行中的两个或多个值访问一行
R Dataframe

rms包中的lrm功能不起作用
R

R 如何合并多个excel文件
R Excel

R编程包quantstrat/FinancialInstrument/importDefaults加载错误
R

R 从数据帧行自动生成字符串
R

R 在data.table中将日期拆分为年和月
R Date

如何根据R中的值的条件查找最小列名
R

R 使用ggplot软件包更改绘图中yaxis面的标签
R

从R中的日期向量提取区间
R Date

RxSpark代码执行
R Apache Spark

R 如何将对称字符矩阵转换为数字数据帧？
R Matrix Dataframe

在R中每隔两个字符添加一个空格
R Regex

R图例变量替换
R Variables

R 计算学生同龄人的平均成绩
R Networking Network Programming Stata

dplyr mutate_at和ifelse（）未矢量化
R

R 将字符串转换为命名函数参数
R

通过多数投票将因素合并为R中的一个因素
R

在R中将光栅拆分为5像素x 5像素的分幅
R

使用R获取文件夹中文件名的数据帧
R For Loop

R 如何按组获取精度值
R

R 线性规划-类别的唯一数量
R

R自动重复：如果变量发生变化，则创建新df
R Loops

群的R数据表商
R

如何在R中显示箱线图的各个点？
R

在r中使用dplyr高效地重塑数据帧
R Dataframe

Tags

Glassfish Elm Symfony1 Material Ui Math Build Autodesk Forge Amazon Web Services Apache Camel Botframework Jakarta Ee Google Cloud Dataflow Jqgrid Electron Enums Vbscript Julia Xaml Discord Calendar Programming Languages Arangodb Protocol Buffers User Interface Usb Jsp Asp.net Mvc 2 Opencv Cordova Visual Studio Code Facebook C++ Cli Sitecore Css Grid Exception Apache2 Emacs Google Analytics Methods Windows Phone 8.1 Plugins Alfresco Codeigniter Flash Airflow Vaadin Opengl Lambda Service Grep Sails.js View Sql Server 2012 Pine Script Spring Batch Reference Mule Openlayers 3 D3.js Terraform Matrix Documentation Twitter Bootstrap Matplotlib Shiny Login Fortran Atom Editor Redirect Stream Compression Core Data Aurelia Sms Xamarin.android Rally Windows Runtime Vuejs2 Playframework 2.0 Sphinx Google App Engine Racket Filter Seo Inno Setup Log4j Hbase Openid Selenium Mercurial Marklogic Next.js Safari Single Sign On Cocoa Jasmine Shell Embedded Recursion Azure Functions Raspberry Pi Youtube Api Oop Jquery Ui Iis Linkedin Swift3 Google Compute Engine Django Models Maven Blackberry Vb.net Loops Google App Maker Checkbox Twitter Bootstrap 3 Ocaml Windows 10 Resharper Aframe Lotus Notes Quickbooks Silverlight 4.0 Button Autohotkey Iphone Ftp Internet Explorer 8 Datetime For Loop Sapui5 Jasper Reports Silverlight Binding Eclipse Rcp Struct Servlets Ruby On Rails 3.1 Xpath Couchbase Windows Phone 8 Windows Phone Browser Playframework Sas Nlp Netsuite Deep Learning Oracle10g Assembly Android Ndk Yocto Google Chrome Winforms Sequelize.js Swing Moodle C Drupal Verilog Pip Gnuplot Frameworks Twilio Jetty Titanium Notifications C# Terminal Dojo Server Three.js Dynamic Scikit Learn Symfony Yii2 Floating Point Angular6 Html Swift Automated Tests Function Menu Nunit Octave Dll Map Cmake Orientdb Animation Activemq Jboss Reflection Wolfram Mathematica Android Wso2 Mono Logging Mod Rewrite Cucumber

Copyright © 2024. All Rights Reserved by - Fatal编程技术网