Php 内爆结果仍不符合预期

Php 内爆结果仍不符合预期,php,Php,我处理以下代码中表示的一些文本文件: 守则: $file = file($files); $lines = str_replace("'", '', $file); $noMultipleSpace = removeMultipleSpaces($lines); $fileContents = array(); foreach($noMultipleSpace as $line) { if (isLatin($line) && count(preg_split('/\s+

我处理以下代码中表示的一些文本文件:

守则:

$file = file($files);
$lines = str_replace("'", '', $file);
$noMultipleSpace = removeMultipleSpaces($lines);
$fileContents = array();
foreach($noMultipleSpace as $line) {
    if (isLatin($line) && count(preg_split('/\s+/', $line)) > 25) {
        $newContent = preg_split('/\\.\\s*/', $line);
        foreach($newContent as $newsContent) {
            $pos1 = stripos($newsContent, ':');
            if ($pos1 == false && count(preg_split('/\s+/', $newsContent) > 3) && isLatin($newsContent)) {
                $fileContents[] = $newsContent;
            }
        }
        $content = implode('.', $fileContents);
    }
}​
具有以下功能:

function isLatin($string) {
 return preg_match('/^\\s*[a-z,A-Z]/', $string) > 0;
}

function removeMultipleSpaces($string){
 return preg_replace('/\s+/', ' ',$string);
}
但是,在内爆过程中,圆点会在下一句中粘贴。例如
sentence1.Sentence2
。我的期望是第1句。第2句。怎么了?谢谢:)

输入为文本文件,例如:

ChengXiang Zhai
Department of Computer Science University of Illinois at Urbana Champaign

ABSTRACT
Temporal Text Mining (TTM) is concerned with discovering temporal patterns in text 
information collected over time. Since most text information bears some time stamps, TTM has many applications in multiple domains, such as summarizing events in news articles and
revealing research trends in scientific literature. In this paper, we study a particular TTM 
task ­ discovering and summarizing the evolutionary patterns of themes in a text stream. We
define this new text mining problem and present general probabilistic methods for solving
this problem through (1) discovering latent themes from text; (2) constructing an evolution
graph of themes; and (3) analyzing life cycles of themes. Evaluation of the proposed methods
on two different domains (i.e., news articles and literature) shows that the proposed 
methods can discover interesting evolutionary theme patterns effectively. Categories and 
Subject Descriptors: H.3.3 [Information Search and Retrieval]: Clustering General Terms: 
Algorithms Keywords: Temporal text mining, evolutionary theme patterns, theme threads, 
clustering

1.

INTRODUCTION

我只想得到重要的句子,从
时态文本挖掘(TTM)…
直到
有效地

您的中间句子似乎有一个尾随空格,导致内爆分隔符显示为关闭

试试这个:

$file = file($files);
$lines = str_replace("'", '', $file);
$noMultipleSpace = removeMultipleSpaces($lines);
$fileContents = array();
foreach($noMultipleSpace as $line) {
    if (isLatin($line) && count(preg_split('/\s+/', $line)) > 25) {
        $newContent = preg_split('/\\.\\s*/', $line);
        foreach($newContent as $newsContent) {
            $pos1 = stripos($newsContent, ':');
            if ($pos1 == false && count(preg_split('/\s+/', $newsContent) > 3) && isLatin($newsContent)) {
                $fileContents[] = $newsContent;
            }
        }
        $fileContents = array_map('trim', $fileContents);
        $content = implode('.', $fileContents);
    }
}​

您到底想要实现什么,也请提供示例输入。@clentfort我已经添加了它,谢谢:)