Php 如何获取内容新闻,如Safari';s&x201C;读卡器模式”;

Php 如何获取内容新闻,如Safari';s&x201C;读卡器模式”;,php,Php,Safari有一个“阅读器模式”,它可以删除网站上除了文本以外的所有文章。 现在我需要从站点获取HTML源代码,然后通过PHP获取真正的内容新闻,比如Safari的“阅读器模式”! 你能帮我吗??:有人指出,仅仅发布一个指向另一篇文章的链接并没有多大帮助,所以我正在更新。从那以后,我开始使用一个具有Arc90可读性的PHP端口,它工作得非常好 下面是指向Readability.js的PHP端口的链接: 下面是一个简单的实施示例: $url = 'http://'; $html = file_g

Safari有一个“阅读器模式”,它可以删除网站上除了文本以外的所有文章。

现在我需要从站点获取HTML源代码,然后通过PHP获取真正的内容新闻,比如Safari的“阅读器模式”!
你能帮我吗??:有人指出,仅仅发布一个指向另一篇文章的链接并没有多大帮助,所以我正在更新。从那以后,我开始使用一个具有Arc90可读性的PHP端口,它工作得非常好

下面是指向Readability.js的PHP端口的链接:

下面是一个简单的实施示例:

$url = 'http://';
$html = file_get_contents($url);

if (function_exists('tidy_parse_string')) {
    $tidy = tidy_parse_string($html, array(), 'UTF8');
    $tidy->cleanRepair();
    $html = $tidy->value;
}

// give it to Readability
$readability = new Readability($html, $url);
// echo $readability->html;
// echo htmlspecialchars($tidy($readability->html, true));

// print debug output?
// useful to compare against Arc90's original JS version -
// simply click the bookmarklet with FireBug's console window open
$readability->debug = false;
// convert links to footnotes?
$readability->convertLinksToFootnotes = false;

$readability->lightClean = false;
// $readability->revertForcedParagraphElements = false;

// process it
$result = $readability->init();
// store reference to dom content processed by Readability
$content = $readability->getContent();

echo '<h1>'.$readability->getTitle()->textContent.'</h1>';
echo $content->innerHTML;
$url = 'http://';
//$html = file_get_contents($url);
$html = getData($url);

if (function_exists('tidy_parse_string')) {
    $tidy = tidy_parse_string($html, array(), 'UTF8');
    $tidy->cleanRepair();
    $html = $tidy->value;
}

$readability = new Readability($html, $url);

//...
实施:

$url = 'http://';
$html = file_get_contents($url);

if (function_exists('tidy_parse_string')) {
    $tidy = tidy_parse_string($html, array(), 'UTF8');
    $tidy->cleanRepair();
    $html = $tidy->value;
}

// give it to Readability
$readability = new Readability($html, $url);
// echo $readability->html;
// echo htmlspecialchars($tidy($readability->html, true));

// print debug output?
// useful to compare against Arc90's original JS version -
// simply click the bookmarklet with FireBug's console window open
$readability->debug = false;
// convert links to footnotes?
$readability->convertLinksToFootnotes = false;

$readability->lightClean = false;
// $readability->revertForcedParagraphElements = false;

// process it
$result = $readability->init();
// store reference to dom content processed by Readability
$content = $readability->getContent();

echo '<h1>'.$readability->getTitle()->textContent.'</h1>';
echo $content->innerHTML;
$url = 'http://';
//$html = file_get_contents($url);
$html = getData($url);

if (function_exists('tidy_parse_string')) {
    $tidy = tidy_parse_string($html, array(), 'UTF8');
    $tidy->cleanRepair();
    $html = $tidy->value;
}

$readability = new Readability($html, $url);

//...

有人指出,仅仅发布一个链接到另一个帖子是没有多大帮助的,所以我正在更新。从那以后,我开始使用一个具有Arc90可读性的PHP端口,它工作得非常好

下面是指向Readability.js的PHP端口的链接:

下面是一个简单的实施示例:

$url = 'http://';
$html = file_get_contents($url);

if (function_exists('tidy_parse_string')) {
    $tidy = tidy_parse_string($html, array(), 'UTF8');
    $tidy->cleanRepair();
    $html = $tidy->value;
}

// give it to Readability
$readability = new Readability($html, $url);
// echo $readability->html;
// echo htmlspecialchars($tidy($readability->html, true));

// print debug output?
// useful to compare against Arc90's original JS version -
// simply click the bookmarklet with FireBug's console window open
$readability->debug = false;
// convert links to footnotes?
$readability->convertLinksToFootnotes = false;

$readability->lightClean = false;
// $readability->revertForcedParagraphElements = false;

// process it
$result = $readability->init();
// store reference to dom content processed by Readability
$content = $readability->getContent();

echo '<h1>'.$readability->getTitle()->textContent.'</h1>';
echo $content->innerHTML;
$url = 'http://';
//$html = file_get_contents($url);
$html = getData($url);

if (function_exists('tidy_parse_string')) {
    $tidy = tidy_parse_string($html, array(), 'UTF8');
    $tidy->cleanRepair();
    $html = $tidy->value;
}

$readability = new Readability($html, $url);

//...
实施:

$url = 'http://';
$html = file_get_contents($url);

if (function_exists('tidy_parse_string')) {
    $tidy = tidy_parse_string($html, array(), 'UTF8');
    $tidy->cleanRepair();
    $html = $tidy->value;
}

// give it to Readability
$readability = new Readability($html, $url);
// echo $readability->html;
// echo htmlspecialchars($tidy($readability->html, true));

// print debug output?
// useful to compare against Arc90's original JS version -
// simply click the bookmarklet with FireBug's console window open
$readability->debug = false;
// convert links to footnotes?
$readability->convertLinksToFootnotes = false;

$readability->lightClean = false;
// $readability->revertForcedParagraphElements = false;

// process it
$result = $readability->init();
// store reference to dom content processed by Readability
$content = $readability->getContent();

echo '<h1>'.$readability->getTitle()->textContent.'</h1>';
echo $content->innerHTML;
$url = 'http://';
//$html = file_get_contents($url);
$html = getData($url);

if (function_exists('tidy_parse_string')) {
    $tidy = tidy_parse_string($html, array(), 'UTF8');
    $tidy->cleanRepair();
    $html = $tidy->value;
}

$readability = new Readability($html, $url);

//...

它基于一种试图识别网站主要内容部分的算法。这方面没有明确的标准,你必须自己尝试并实现它。它基于一种算法,试图识别网站的主要内容部分。没有明确的标准,您必须自己尝试并实施。请添加解决方案的重要部分。如果链接停止,您的答案将丢失。请添加解决方案的重要部分。如果链接停止,您的答案将丢失。