Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/285.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 亚马逊机械土耳其人(Amazon Mechanical Turk)在一些URL上出人意料地失败了?_Python_Amazon Web Services_Boto_Mechanicalturk - Fatal编程技术网

Python 亚马逊机械土耳其人(Amazon Mechanical Turk)在一些URL上出人意料地失败了?

Python 亚马逊机械土耳其人(Amazon Mechanical Turk)在一些URL上出人意料地失败了?,python,amazon-web-services,boto,mechanicalturk,Python,Amazon Web Services,Boto,Mechanicalturk,我试图在Amazon Mechanical Turk中使用boto在命中请求中包含一个链接,但不断收到一个错误,即我的XML无效。我逐渐地将我的html缩减到最低限度,并发现一些有效链接似乎无缘无故地失败了。有boto或aws方面专业知识的人能帮我分析原因吗 我遵循了以下两条指南: 以下是我的例子: from boto.mturk.connection import MTurkConnection from boto.mturk.question import QuestionConte

我试图在Amazon Mechanical Turk中使用boto在命中请求中包含一个链接,但不断收到一个错误,即我的XML无效。我逐渐地将我的html缩减到最低限度,并发现一些有效链接似乎无缘无故地失败了。有boto或aws方面专业知识的人能帮我分析原因吗

我遵循了以下两条指南:

以下是我的例子:

from boto.mturk.connection import MTurkConnection
from boto.mturk.question import QuestionContent,Question,QuestionForm,Overview,AnswerSpecification,SelectionAnswer,FormattedContent,FreeTextAnswer
from config import *

HOST = 'mechanicalturk.sandbox.amazonaws.com'

mtc = MTurkConnection(aws_access_key_id=ACCESS_ID,
                      aws_secret_access_key=SECRET_KEY,
                      host=HOST)

title = 'HIT title'
description = ("HIT description.")
keywords = 'keywords'

s1 = """<![CDATA[<p>Here comes a link <a href='%s'>LINK</a></p>]]>""" % "http://www.example.com"
s2 = """<![CDATA[<p>Here comes a link <a href='%s'>LINK</a></p>]]>""" % "https://www.google.com/search?q=example&site=imghp&tbm=isch"

def makeahit(s):
    overview = Overview()
    overview.append_field('Title', 'HIT title itself')
    overview.append_field('FormattedContent',s)

    qc = QuestionContent()
    qc.append_field('Title','The title')

    fta = FreeTextAnswer()

    q = Question(identifier="URL",
                 content=qc,
                 answer_spec=AnswerSpecification(fta))

    question_form = QuestionForm()
    question_form.append(overview)
    question_form.append(q)

    mtc.create_hit(questions=question_form,
                   max_assignments=1,
                   title=title,
                   description=description,
                   keywords=keywords,
                   duration = 30,
                   reward=0.05)

makeahit(s1) # SUCCESS!
makeahit(s2) # FAIL?
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 25, in makeahit
  File "/usr/local/lib/python2.7/dist-packages/boto/mturk/connection.py", line 263, in create_hit
    return self._process_request('CreateHIT', params, [('HIT', HIT)])
  File "/usr/local/lib/python2.7/dist-packages/boto/mturk/connection.py", line 821, in _process_request
    return self._process_response(response, marker_elems)
  File "/usr/local/lib/python2.7/dist-packages/boto/mturk/connection.py", line 836, in _process_response
    raise MTurkRequestError(response.status, response.reason, body)
boto.mturk.connection.MTurkRequestError: MTurkRequestError: 200 OK
<?xml version="1.0"?>
<CreateHITResponse><OperationRequest><RequestId>19548ab5-034b-49ec-86b2-9e499a3c9a79</RequestId></OperationRequest><HIT><Request><IsValid>False</IsValid><Errors><Error><Code>AWS.MechanicalTurk.XHTMLParseError</Code><Message>There was an error parsing the XHTML data in your request.  Please make sure the data is well-formed and validates against the appropriate schema. Details: The reference to entity "site" must end with the ';' delimiter. Invalid content: &lt;FormattedContent&gt;&lt;![CDATA[&lt;p&gt;Here comes a link &lt;a href='https://www.google.com/search?q=example&amp;site=imghp&amp;tbm=isch'&gt;LINK&lt;/a&gt;&lt;/p&gt;]]&gt;&lt;/FormattedContent&gt; (1369323038698 s)</Message></Error></Errors></Request></HIT></CreateHITResponse>
知道为什么s2失败了,但s1成功了,当两者都是有效链接吗?这两个链接内容都起作用:

使用查询字符串的东西?Https

更新

我要做一些测试,但现在我的候选假设是:

  • HTTPS不起作用(因此,我将看看是否可以获得另一个HTTPS链接)
  • 带参数的url不起作用(因此,我将看看是否可以获得另一个带参数的url)
  • 谷歌不允许它的搜索以这种方式发布?(如果1和2失败!)

  • 您需要在URL中转义符号,即
    &
    =>
    &

    在s2的末尾,使用

    q=example&amp;site=imghp&amp;tbm=isch
    
    而不是

    q=example&site=imghp&tbm=isch
    

    你需要xml转义URL吗?也许我错了,但我认为CDATA就是这么做的。这是不正确的吗?这应该是正确的,但我认为值得一试。你找到答案了吗?没有,但我还没有追求我的更新想法。Amp通常是“%26”,对吗?但是在URL中不允许(&U)?当输入到我的URL栏中时,该URL实际上是有效的,使用&。这就是谷歌本机查询的方式。你是说这是通过amazon进行的限制,还是我应该知道的其他事情?这是XML转义,与URL中需要的转义不同允许在URL中使用,但Amazon读取的文件是XML,因此当它看到“&”时,会认为它是一个特殊字符的开头。下面是一个有用的讨论: