Python 为什么每次从web服务保存和提取响应字符串时都会得到不同的结果？_Python

Python 为什么每次从web服务保存和提取响应字符串时都会得到不同的结果？

python

Python 为什么每次从web服务保存和提取响应字符串时都会得到不同的结果？,python,Python,我的代码运行良好，提取所需内容没有问题。我的问题在于，使用web服务对相同操作的不同结果的响应存在一些差异，但web服务的值保存在一个变量中。我有这个阻滞剂好几天了，希望你能帮我注意：建议的重复问题答案不适用于我，这不是重复问题我正在使用一个web服务。我得到的答案存储在变量answerService中，这是一个很长的字符串，在此之后，我提取具有以下结构的标记span中的内容： <span style = "font-weight: bold"> xxx </ span&g

我的代码运行良好，提取所需内容没有问题。我的问题在于，使用web服务对相同操作的不同结果的响应存在一些差异，但web服务的值保存在一个变量中。我有这个阻滞剂好几天了，希望你能帮我

注意：建议的重复问题答案不适用于我，这不是重复问题

我正在使用一个web服务。我得到的答案存储在变量

answerService

中，这是一个很长的字符串，在此之后，我提取具有以下结构的标记

span

中的内容：

<span style = "font-weight: bold"> xxx </ span>
"xxx" is what I want to extract
 #with that I get the "xxx"
 arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', answerService)

现在，如果我将web服务的响应

sameStringOfAnswer

放在我的代码中，结果就不同了：

print(arraySpan)
['ADV', 'áGILMENTE']

从逻辑上讲，答案是相同的，而且永远不会改变。出于某种奇怪的原因，当我从web服务得到响应时，我只会在我期望的答案是

['ADV'，'aagilmente']

时得到

['ADV'，'aagilmente'

]

这是表明

2 span

始终与我需要的结构一起提供的关键部分：

这是我的密码：

import requests
import re
session = requests.Session()

getId=session.get('http://cartago.lllf.uam.es/grampal/grampal.cgi')
cookie=session.cookies.get_dict()
getId=session.cookies.get_dict()
getId=getId["CGISESSID"]
#getting an ID for request a webservice
getService=requests.get("http://cartago.lllf.uam.es/grampal/grampal.cgi?m=analiza&csrf="+getId+"&e="+"ágilmente", cookies=cookie)

answerService=getService.text
#get the value of the <span>
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', answerService)
print(answerService)
print("array",arraySpan)

#same code but using the result of service web
sameStringOfAnswer='<html xmlns="http://www.w3.org/TR/REC-html40"><head><title>Grampal </title><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><meta name="Content-Language" content="EN"><meta name="author" content="jmguirao@ugr.es"><link rel="icon" type="image/ico" href="/favicon.ico"/><style type="text/css">html,body,form,ul,li,h1,h3,p{margin:0; padding:0}body{font-family: Arial, Helvetica, sans-serif; background-color:#fff}a{text-decoration: none;}a:hover{text-decoration: underline}ul{list-style-type: none}td{padding: 0.5pc 2pc 0pc 0pc}.nav{float: right; padding: 0.5pc 0.5pc 0.5pc 0.5pc; margin-left:5px}.nav li{display:inline; border-left: 1px solid #444; padding:0 0.4em;}.nav li.first{border-left:0}.hide{display:none}input{text-indent: 2px}input[type="submit"]{text-indent: 0}DIV.delPage{padding: 0.5ex 5em 0.5em 5em; background-color:#ffd6ba;}.delMain{padding: 2ex 0.5em 0.5pc 0.5em;}.post{margin-bottom: 0.25pc; font-size: 100%; padding-top: 0.5ex;}.posts, #posts{padding: 0.5ex 0.5em 0.5pc 50px;}.banner{padding: 0.5ex 0 0.5pc 0.5em;background-color: #ffc6aa;clear: both}.banner h1{font-weight: bolder; font-size: 150%;margin:0; padding:0 0 0 26px; display: inline;}h2{font-weight: bolder; font-size: 140%; color: red; margin:0; padding:0 0 0 26px; display: inline;}.resaltado{font-weight: bolder;font-size: 100%}</style></head><body><div class="banner"><ul class="hide"><li><a href="#content">skip to content</a></li></ul><ul class="nav">Análsis de:<li class="first"><a title="Analizador morfosintáctico" href="/grampal/grampal.cgi?m=analiza&e=ágilmente">palabras</a></li><li><a title="Desambiguador contextual" href="/grampal/grampal.cgi?m=etiqueta&e=ágilmente">oraciones</a></li><li><a title="Etiquetado de textos" href="/grampal/grampal.cgi?m=xml">textos</a></li><li><a title="Formas de una palabra" href="/grampal/grampal.cgi?m=genera&e=ágilmente">Generación de formas</a></li><!--<li><a title="Transcripción fonética" href="/grampal/grampal.cgi?m=transcribe&e=ágilmente">Transcripción</a></li>--><li><a href="/grampal/grampal.cgi?m=etiquetario">Etiquetario</a></li><li><a href="/grampal/grampal.cgi?m=autores">Autores</a></li></ul><h1>Grampal</h1></div><div class="delPage" style="font-size: 80%;"><form method="GET" action="/grampal/grampal.cgi"><input type="hidden" name="m" value="analiza"><input type="hidden" name="csrf" value="94508700a0ae409a90718299ae00b0e0"><span class="resaltado">Palabra : </span><input name="e" size="60" value="ágilmente"><input type="submit" value="Analiza"> &nbsp;</form></div><br><h2>ágilmente</h2><div class="delMain"><div id="posts"><table><tr><td style="font-style:italic;font-size:90%">categoría&nbsp;<span style="font-weight:bold"> ADV </span></td><td style="font-style:italic;font-size:90%">lema&nbsp;<span style="font-weight:bold"> áGILMENTE </span></td></tr></table></div></div></body></html>'
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', sameStringOfAnswer)
print(arraySpan)

导入请求
进口稀土
会话=请求。会话（）
getId=session.get（'http://cartago.lllf.uam.es/grampal/grampal.cgi')
cookie=session.cookies.get_dict（）
getId=session.cookies.get_dict（）
getId=getId[“CGISESSID”]
#获取Web服务请求的ID
getService=requests.get（“http://cartago.lllf.uam.es/grampal/grampal.cgi?m=analiza&csrf=“+getId+”&e=“+”ágilmente”，cookies=cookie）
answerService=getService.text
#获取
arraySpan=re.findall（r'（.*）来自Web服务的HTML包含：
<span style="font-weight:bold"> ADV\n </span>

您可以自己测试差异：
>>> pattern = r'<span style="font-weight:bold">(.*?)<'
>>> re.findall(pattern, '<span style="font-weight:bold">AAA\n<')
[]
>>> re.findall(pattern, '<span style="font-weight:bold">AAA<')
['AAA']

>>pattern=r'（.*）来自Web服务的HTML包含：
<span style="font-weight:bold"> ADV\n </span>

您可以自己测试差异：
>>> pattern = r'<span style="font-weight:bold">(.*?)<'
>>> re.findall(pattern, '<span style="font-weight:bold">AAA\n<')
[]
>>> re.findall(pattern, '<span style="font-weight:bold">AAA<')
['AAA']

>>模式=r'（*为什么要用正则表达式解析html？@TheIncorrigible1我是python新手，也许我做了一些不好的练习，但这是我找到的提取我需要的东西的方法。@TheIncorrigible1我问你，请不要将我的答案标记为已解决，除了我是否做了一个不好的练习之外，我还有一个函数代码，如果做得不同，我也可能出现问题我想让你看看我的问题，这有点奇怪。@Ralf的可能重复项不是重复项，我要求你不要将我的答案标记为重复项。我的代码工作正常，我提取我需要的东西没有问题。我的问题是，使用web服务的响应来获得相同操作的不同结果有一些不同但是web服务的值保存在一个变量中。我有这个拦截器好几天了，我希望你能帮助我。你为什么要用正则表达式解析html？@TheIncorrigible1我是python新手，也许我做了一些不好的练习，但这是我找到的提取我需要的东西的方法。@TheIncorrigible1我问你，请不要将我的答案标记为已解决，beyon我有一个功能代码，如果做得不好，我的问题也可能会发生。请我想让你看看我的问题，这有点奇怪。@Ralf的可能重复没有重复，我要求你不要将我的答案标记为重复。我的代码工作正常，我提取我需要的东西没有问题。我的问题lem与使用web服务对不同结果的响应有所不同，但web服务的值保存在一个变量中。我有这个拦截器好几天了，我希望你能帮助我。你是个天才，我想我终于明白了我的问题，尽管理论上我已经得到了里面的一切那么，在这些标记中获取所需内容的最佳方法或解决方案是什么？中的答案建议使用（[\s\s]*？）
（或其一些变体）而不是（*？）
@unusuario您应该阅读更多关于regex的内容，以便为您的用例找到一个好的解决方案。您应该真正使用解析器。尝试BeautifulSoup。这里有一些代码可以帮助您开始使用。您是一个天才，我想我终于明白了我的问题，尽管从理论上讲，我已经了解了内部的所有内容，但问题是什么要在这些标记中获取所需内容的方法或解决方案？中的答案建议使用（[\s\s]*？）
（或其一些变体）而不是（*）
@unusuario您应该阅读更多关于regex的内容，以便为您的用例找到一个好的解决方案。您应该真正使用解析器。尝试BeautifulSoup。以下是一些代码，可以帮助您开始使用。