Php 如何在不删除标记的情况下替换HTML标记内的空格_Php_Html_Regex

Php 如何在不删除标记的情况下替换HTML标记内的空格

php html regex

Php 如何在不删除标记的情况下替换HTML标记内的空格,php,html,regex,Php,Html,Regex,假设我有这个字符串： $string = ' ¡Esto es una prueba! Prueba 123 < a href="https://matricom.net"> MATRICOM < / a> </p

假设我有这个字符串：

$string = '<p > ¡Esto es una prueba! < /p > <p> <strong > Prueba 123 </strong> </p> <p> <strong> < a href="https://matricom.net"> MATRICOM < / a> </ strong> </p> <p> <strong > Todas las pruebas aquí ... </strong > < /p>'

$string='Prueba 123MATRICOM
托达斯·拉斯普鲁埃巴斯水族馆”'

我想做的是使用PHP修复HTML标记（它们由于空格而格式不正确）。我在网上找到了几种不同的正则表达式，例如：

$html = trim(preg_replace('/<\s+>/', '<>', $text));

$html=trim（preg_replace（'/'，'.$text））；
以及：

$html=preg_replace（“//”，“，$text）；
我正试图获得如下字符串输出（在HTML标记的前面部分和结尾部分删除空格）：

”“这是我的荣幸普拉巴123托达斯·拉斯普拉巴水族馆'

背景：谷歌翻译倾向于在翻译结果中添加影响HTML结构的随机空格。只是想找一个快速的方法来清理标签。我已经搜索了两天如何做到这一点，但似乎找不到任何与我所寻找的完全相符的方法。
我不推荐这种方法。虽然正则表达式很棒，但您可能会错过一些例子，因为谷歌翻译似乎在随机位置添加了随机空间。不可能在所有情况下都可靠
我建议发送纯文本或使用V2 Translate API的
format=html
参数让Google Translate正确解释html标记

如果您不能使用上述官方方法，请在将文本发送到Google Translate之前根据标记拆分文本，这样您可以获得更清晰的输入。
这就是我的想法。使用你的字符串

< *(\/*) *(.+?) *> < Matches a < char * Matches zero or more spaces. There is a ' ' (space) before * (\/*) Matches zero or more / () indicates capturing group 1 * Matches zero or more. Do notice the ' ' before * ( Start of capturing group 2 .+ Matches any character except a line break ? Lazy Matching ) End of capturing group 2 * Matches zero or more spaces. Again a ' ' before * > Matches a > char

<*（\/*）*（.+？）*> <匹配<字符 *匹配零个或多个空格。前面有一个“”（空格）* （\/*）匹配零个或多个/（）表示捕获组1 *匹配零个或多个。请注意前面的“”* （开始捕获第2组 .+匹配除换行符以外的任何字符？惰性匹配 )第2组结束 *匹配零个或多个空格。又是一个“”之前* >匹配>字符
然后像

$cleaned=preg\u replace（'/<*（\/*）*（.+？）*>/'，''.$html）；清洁的回声； #输入字符串 #“”这是我的荣幸！ Prueba 123MATRICOM 托达斯•拉斯普鲁埃巴斯•阿奎”； #清洁绳 #“这是我的荣幸普拉巴123号酒店
这将从这些格式中删除空格

但它不会删除属性中的空格。那么这个将转换为。但允许在属性中使用空格（即使不建议使用）如果我遗漏了一个案例，请让我知道，我会尝试合并相同的案例。您可以使用 preg_replace（'~）~u'，'$1'，$string）请参阅和： $string='Prueba 123MATRICOM 托达斯·拉斯普鲁埃巴斯水族馆”'； echo preg_replace（“~）~u'、“$1'、$string）； //=>“这是我的荣幸普拉巴123托达斯·拉斯普拉巴水族馆正则表达式详细信息：替换为组1值（当可选组匹配时，它是空字符串或/ 。您还尝试了什么？使用简单的字符串替换（从到嗨，下面的答案有帮助吗？该模式在功能上是否与<*（\/*）*（.+？）有所不同*> 就像我的答案中的一样。我正在努力学习。我的意思是它们都产生相同的输出，所以哪一个是更好的选择。而且\s 也会匹配制表符和换行符。@奋进是一个有趣的解决方案。您可以这样增强它：1）用\s 替换空格以匹配任何空格。注意我使用的u 标志，它使\s 匹配任何Unicode空格，2）将替换为[^>]以仅在单个标记内匹配。 ' ¡Esto es una prueba! Prueba 123 <a href="https://matricom.net"> MATRICOM </a> Todas las pruebas aquí ... ' < *(\/*) *(.+?) *> < Matches a < char * Matches zero or more spaces. There is a ' ' (space) before * (\/*) Matches zero or more / () indicates capturing group 1 * Matches zero or more. Do notice the ' ' before * ( Start of capturing group 2 .+ Matches any character except a line break ? Lazy Matching ) End of capturing group 2 * Matches zero or more spaces. Again a ' ' before * > Matches a > char