使用Ruby处理重定向直接从RSS源下载文件

使用Ruby处理重定向直接从RSS源下载文件,ruby,rss,Ruby,Rss,我正在用Ruby编写一个程序,从RSS源下载一个文件到本地硬盘。之前,我用Perl编写了这个应用程序,并认为学习Ruby的一个好方法是使用Ruby代码重新创建这个程序 在Perl程序中(有效),我可以直接从它所在的服务器下载原始文件(保留原始文件名),效果非常好。在Ruby程序中(不起作用),我必须将数据从我想要的文件“流”到我在硬盘上创建的新文件中。不幸的是,这不起作用,“流”数据总是返回空。我的假设是,Perl可以处理某种重定向来直接检索Ruby无法检索的文件 我将发布这两个程序(它们相对较

我正在用Ruby编写一个程序,从RSS源下载一个文件到本地硬盘。之前,我用Perl编写了这个应用程序,并认为学习Ruby的一个好方法是使用Ruby代码重新创建这个程序

在Perl程序中(有效),我可以直接从它所在的服务器下载原始文件(保留原始文件名),效果非常好。在Ruby程序中(不起作用),我必须将数据从我想要的文件“流”到我在硬盘上创建的新文件中。不幸的是,这不起作用,“流”数据总是返回空。我的假设是,Perl可以处理某种重定向来直接检索Ruby无法检索的文件

我将发布这两个程序(它们相对较小),希望这有助于解决我的问题。如果你有问题,请告诉我。作为旁注,我将这个程序指向一个更静态的URL(jpeg),它很好地下载了文件。这就是为什么我认为某种重定向会引起问题

Ruby代码(不起作用)

Perl代码(确实有效)


get_响应将从HTTPResponse层次结构返回一个类。通常是HTTPSuccess,但如果有重定向,则是HTTPRedirection。一个简单的递归方法可以解决这个问题,它遵循重定向。如何正确处理此问题在标题“以下重定向”下的中。

是的,您正在检索的URL似乎返回302(重定向)。HTTP要求/允许您自己处理重定向。您通常使用类似AboutRuby提到的递归技术(尽管这建议您不仅要查看“位置”字段,还要查看响应中的元刷新)

如果您对低级交互不感兴趣,open uri将为您处理重定向:

require 'open-uri'

File.open("#{episode_id}.torrent", 'wb') {|torrent_file| torrent_file.write open(torrent_url).read}

use strict;
use XML::Parser;
use LWP::UserAgent;
use HTTP::Status;
use DBI;
my $dbh = DBI->connect("dbi:SQLite:dbname=fiend.db", "", "", { RaiseError => 1, AutoCommit => 1 });
my $userAgent = new LWP::UserAgent; # Create new user agent
$userAgent->agent("Mozilla/4.0"); # Spoof our user agent as Mozilla
$userAgent->timeout(20); # Set timeout limit for request
my $currentTag = ""; # Stores what tag is currently being parsed
my $torrentUrl = ""; # Stores the data found in any  node
my $isDownloaded = 0; # 1 or zero that states whether or not we've downloaded a particular episode
my $shows = $dbh->selectall_arrayref("SELECT id, name, current_season, last_episode FROM shows ORDER BY name");
my $id = 0;
my $name = "";
my $season = 0;
my $last_episode = 0;
foreach my $show (@$shows) { 
    $isDownloaded = 0;
    ($id, $name, $season, $last_episode) = (@$show);
    $season = sprintf("%02d", $season); # Append a zero to the season (e.g. 6 becomes 06)
    $last_episode = sprintf("%02d", ($last_episode + 1)); # Append a zero to the last episode (e.g. 6 becomes 06) and increment it by one
    print("Checking $name S" . $season . "E" . "$last_episode \n"); 
    my $request = new HTTP::Request(GET => "http://btjunkie.org/rss.xml?query=$name S" . $season . "E" . $last_episode . "&o=52"); # Retrieve the torrent feed
    my $rssFeed = $userAgent->request($request);  # Store the feed in a variable for later access
    if($rssFeed->is_success) { # We retrieved the feed
        my $parser = new XML::Parser(); # Make a new instance of XML::Parser
        $parser->setHandlers # Set the functions that will be called when the parser encounters different kinds of data within the XML file.
        (
            Start => \&startHandler, # Handles start tags (e.g. )
            End   => \&endHandler, # Handles end tags (e.g. 
            Char  => \&DataHandler # Handles data inside of start and end tags
        );
        $parser->parsestring($rssFeed->content); # Parse the feed
    }
}

#
# Called every time XML::Parser encounters a start tag
# @param: $parseInstance {object} | Instance of the XML::Parser. Passed automatically when feed is parsed. 
# @param: $element {string} | The name of the XML element being parsed (e.g. "title"). Passed automatically when feed is parsed. 
# @attributes {array} | An array of all of the attributes of $element
# @returns: void
#
sub startHandler {
    my($parseInstance, $element, %attributes) = @_;
    $currentTag = $element;
}
#
# Called every time XML::Parser encounters anything that is not a start or end tag (i.e, all the data in between tags)
# @param: $parseInstance {object} | Instance of the XML::Parser. Passed automatically when feed is parsed. 
# @param: $element {string} | The name of the XML element being parsed (e.g. "title"). Passed automatically when feed is parsed. 
# @attributes {array} | An array of all of the attributes of $element
# @returns: void
#
sub DataHandler {
    my($parseInstance, $element, %attributes) = @_;
    if($currentTag eq "link" && $element ne "\n") {
        $torrentUrl = $element;
    }
}
#
# Called every time XML::Parser encounters an end tag
# @param: $parseInstance {object} | Instance of the XML::Parser. Passed automatically when feed is parsed. 
# @param: $element {string} | The name of the XML element being parsed (e.g. "title"). Passed automatically when feed is parsed. 
# @attributes {array} | An array of all of the attributes of $element
# @returns: void
#
sub endHandler {
    my($parseInstance, $element, %attributes) = @_;
    if($element eq "item" && $isDownloaded == 0) { # We just finished parsing an  element so let's attempt to download a torrent
        print("DOWNLOADING: $torrentUrl" . "/download.torrent \n");
        system("echo.|lwp-download " . $torrentUrl . "/download.torrent"); # We echo the "return " key into the command to force it to skip any file-overwite prompts
        if(unlink("download.torrent.html")) { # We tried to download a 'locked' torrent
            $isDownloaded = 0; # Forces program to download next torrent on list from current show
        }
        else {
            $isDownloaded = 1;
            $dbh->do("UPDATE shows SET last_episode = '$last_episode' WHERE id = '$id'"); # Update DB with new show information
        }   
    }
}
require 'open-uri'

File.open("#{episode_id}.torrent", 'wb') {|torrent_file| torrent_file.write open(torrent_url).read}