php中的干净文本_Php_Parsing - Fatal编程技术网

php中的干净文本

php parsing

php中的干净文本,php,parsing,Php,Parsing,这是一个字符串： --0-1946616131-1282798399=:21360 Content-Type: text/plain; charset=us-ascii -------------- ------ do not change ---------------------------- Ticket ID : #987336 -------------------- ------------------------------------------- Hello, This is

这是一个字符串：

--0-1946616131-1282798399=:21360 Content-Type: text/plain; charset=us-ascii --------------
------ do not change ---------------------------- Ticket ID : #987336 --------------------
------------------------------------------- Hello, This is my problem try to solve this 
thank u --0-1946616131-1282798399=:21360 Content-Type: text/html; charset=us-ascii"

现在我想删除-

--0-1946616131-1282798399=:21360 Content-Type: text/plain; charset=us-ascii

及

从中删除部分。我的意思是把课文清理干净

我该怎么做呢？

您可以对正则表达式执行两次拆分，也可以尝试两次拆分。这是第二种选择：

//the original string
$string = "--0-1946616131-1282798399=:21360 Content-Type: text/plain; charset=us-ascii -------------------- do not change ---------------------------- Ticket ID : #987336 --------------------------------------------------------------- Hello, This is my problem try to solve this thank u --0-1946616131-1282798399=:21360 Content-Type: text/html; charset=us-ascii";
//split the string into lines separated by --0-
$splitstring = explode("--0-",$string);
print "<pre>";
print_r($splitstring);
print "</pre>";
//create an array that will be our final clean strings
$cleanstrings = array();
//go through each of our lines
foreach($splitstring as $line){
    //if it has content
    if (strlen($line)>0) {
        //then split it again to get rid of the junk sections
        $splitline = explode("charset=us-ascii",$line);
        //if the second part of the string has content
        if (strlen($splitline[1])>0) {
            //then add it to our list of clean strings
            $cleanstrings[] = $splitline[1];
        }
    }
}
print "<pre>";
print_r($cleanstrings);
print "</pre>";

//原始字符串
$string=“--0-1946616131-1282798399=：21360内容类型：文本/普通；charset=us ascii------------------请勿更改------------------票证ID:#987336-------------------------------------------您好，这是我的问题请尝试解决此问题谢谢您--0-1946616131-1282798399=：21360内容类型：text/html；字符集=美国ascii”；
//将字符串拆分为以--0分隔的行-
$splitstring=explode（“--0-”，$string）；
打印“”；
打印（$splitstring）；
打印“”；
//创建一个数组，该数组将成为最终的干净字符串
$cleanstrings=array（）；
//检查我们的每一行
foreach（$splitstring作为$line）{
//如果有内容,
如果（strlen（$line）>0）{
//然后再次拆分以除去垃圾部分
$splitline=explode（“字符集=us ascii”，$line）；
//如果字符串的第二部分包含内容
如果（strlen（$splitline[1]）>0）{
//然后将其添加到我们的干净字符串列表中
$cleanstrings[]=$splitline[1]；
}
}
}
打印“”；
打印（$cleanstrings）；
打印“”；

使用此简单的单行代码（其中，

$text

是输入文本）：

请澄清此字符串是否可变以及如何或始终相同

另外，似乎您在获取此字符串的第一时间做了一些错误的事情。或者您对传入的字符串没有控制权吗

要查看的功能：，和

这似乎是a的一部分。如果是这种情况，您希望删除的部分是不可预测的

应在消息头中指定不同部分之间的中断，如下所示：

boundary=“frontier”

意味着消息的每一个新部分都将由以下内容引入：

由于邮件的发件人完全可以自由选择他喜欢的任何文本作为边界，因此不查看邮件标题就无法预测这些文本。除非您有非常具体的边界，否则在事后几乎不可能可靠地删除边界文本。它需要“清理”“正在解析消息时

如果您处理的是一组非常有限的、可预测的边界，那么您应该指定它们的格式，并尝试使用正则表达式删除它们。

使用

str\u replace

这很简单。但是，我想那不是你真正想要的。它是否总是与您要删除的文本完全相同？我想不会。删除它的规则是什么？文本来自哪里？似乎是多部分电子邮件的一部分。你在解析电子邮件吗？如果是，怎么做？或者你正在试图清理解析错误的电子邮件？是的，我正在解析电子邮件。这封邮件来自yahoo mail。我所做的就是阅读pop3邮件并将其保存在DB（主题、发件人、正文）中。我之所以发布此选项，是因为我怀疑他想要删除的部分并不总是相同的，所以他必须要么制作一个复杂的替换函数，捕捉所有的可能性，或者像这样做。事实上，我正在阅读pop3电子邮件，这是来自yahoo mail的邮件正文。gmail和其他邮件正文都可以。但是雅虎的问题。我想将邮件保存在数据库中。实际上，此文本来自来自来自yahoo的多部分电子邮件的一部分。我正在使用IMAP函数解析电子邮件并将其保存在数据库中（发件人、日期、主题和正文）。那封邮件来自雅虎。gmail没问题。有什么好方法可以做到这一点（阅读pop3电子邮件并将其保存在db中）？@Emrul是的，您需要正确解析邮件。邮件的编码方式有很多种，只是因为你从Gmail收到的邮件只包含纯文本并不意味着就只有纯文本。您还需要预测多部分消息，在这种情况下，您需要查看诸如

imap\u fetchstructure

和

imap\u fetchbody

之类的函数。您还需要预测邮件可能使用的不同传输编码和文本编码。

//the original string
$string = "--0-1946616131-1282798399=:21360 Content-Type: text/plain; charset=us-ascii -------------------- do not change ---------------------------- Ticket ID : #987336 --------------------------------------------------------------- Hello, This is my problem try to solve this thank u --0-1946616131-1282798399=:21360 Content-Type: text/html; charset=us-ascii";
//split the string into lines separated by --0-
$splitstring = explode("--0-",$string);
print "<pre>";
print_r($splitstring);
print "</pre>";
//create an array that will be our final clean strings
$cleanstrings = array();
//go through each of our lines
foreach($splitstring as $line){
    //if it has content
    if (strlen($line)>0) {
        //then split it again to get rid of the junk sections
        $splitline = explode("charset=us-ascii",$line);
        //if the second part of the string has content
        if (strlen($splitline[1])>0) {
            //then add it to our list of clean strings
            $cleanstrings[] = $splitline[1];
        }
    }
}
print "<pre>";
print_r($cleanstrings);
print "</pre>";

$newtext = str_replace('--0-1946616131-1282798399=:21360 Content-Type: text/plain; charset=us-ascii', '', $text);

Content-Type: multipart/mixed; boundary="frontier"

--frontier
Content-Type: text/plain