如何在python中从URL读取（.RDS）文件？_Python_R

如何在python中从URL读取（.RDS）文件？

python r

如何在python中从URL读取（.RDS）文件？,python,r,Python,R,所以我试图从NFLfastR获取数据，我的R等价代码是： data <- readRDS(url('https://raw.githubusercontent.com/guga31bb/nflfastR-data/master/data/play_by_play_2019.rds')) data Rds和Rdata文件很难以R以外的其他语言读取，尽管open没有文档记录。因此，关于如何在python中读取它们，没有太多选项。一个是你的建议。另一种方法是使用pyreadr，但您必须先将文件

所以我试图从NFLfastR获取数据，我的R等价代码是：

data <- readRDS(url('https://raw.githubusercontent.com/guga31bb/nflfastR-data/master/data/play_by_play_2019.rds'))
data

Rds和Rdata文件很难以R以外的其他语言读取，尽管open没有文档记录。因此，关于如何在python中读取它们，没有太多选项。一个是你的建议。另一种方法是使用pyreadr，但您必须先将文件下载到磁盘，因为pyreadr无法直接从url读取：

import pyreadr
from urllib.request import urlopen
link="https://raw.githubusercontent.com/guga31bb/nflfastR-data/master/data/play_by_play_2019.rds"
response = urlopen(link)
content = response.read()
fhandle = open( 'play_by_play_2019.rds', 'wb')
fhandle.write(content)
fhandle.close()
result = pyreadr.read_r("play_by_play_2019.rds")
print(result.keys())

编辑

pyreadr 0.3.7现在包括下载文件的功能：

import pyreadr

url = "https://github.com/hadley/nycflights13/blob/master/data/airlines.rda?raw=true"
dst_path = "/some/path/on/disk/airlines.rda"
res = pyreadr.read_r(pyreadr.download_file(url, dst_path), dst_path)

如果您只想读取nflFastR数据，可以直接在python中读取，如下所示：

import pandas as pd
pd.read_csv('https://github.com/guga31bb/nflfastR-data/blob/master/data/' \
                         'play_by_play_2019.csv.gz?raw=True',
                         compression='gzip', low_memory=False)

但到目前为止，还没有通过python实现这一点的方法。从url读取时读取本地（.rds）文件已经够难了，这是我从未见过的实现。因此，您必须在本地下载该文件，然后您可以使用您提到的pyreadr包或rpy2（如果您安装了R）直接读取该文件。

在R中，与Python不同，您不必使用其包源来限定每个函数，除非您面临名称冲突。此外，在R中，没有内置的方法。您调用的每个函数都驻留在一个包中。但是R附带了一些默认包，例如用于常规方法的

utils

，

base

，

stats

具体地说，您的工作R代码从

base

包调用两个函数，如双冒号别名所示：

nfl\u网址帮助了我们。但我仍然对如何从URL中读取（.Rds）文件感到好奇。@RetroInvader，请参阅下面我的答案如何使用rpy2
从URL读取.Rds文件。您的代码有什么问题？请发布错误或不足的结果。此外，请张贴工作R代码中使用的库
行。需要知道url的来源。
import pandas as pd
pd.read_csv('https://github.com/guga31bb/nflfastR-data/blob/master/data/' \
                         'play_by_play_2019.csv.gz?raw=True',
                         compression='gzip', low_memory=False)