Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/279.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/73.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
python-使用带有xpath语法的lxml.html解析html表单_Python_Html_Xpath_Lxml.html - Fatal编程技术网

python-使用带有xpath语法的lxml.html解析html表单

python-使用带有xpath语法的lxml.html解析html表单,python,html,xpath,lxml.html,Python,Html,Xpath,Lxml.html,这是表格。相同的精确形式在源代码中出现两次 <form method="POST" action="/login/?tok=sess"> <input type="text" id="usern" name="username" value="" placeholder="Username"/> <input type="password" id="passw" name="password" placeholder="Password"/> <inpu

这是表格。相同的精确形式在源代码中出现两次

<form method="POST" action="/login/?tok=sess">
<input type="text" id="usern" name="username" value="" placeholder="Username"/>
<input type="password" id="passw" name="password" placeholder="Password"/>
<input type="hidden" name="ses_token" value="token"/>
<input id="login" type="submit" name="login" value="Log"/>
</form>
由于有两种形式,它将打印这两种属性

['/login/?session=sess', '/login/?session=sess']
我怎样才能让它只打印一张?我只需要一个,因为它们的形状完全一样

我还有第二个问题

如何获取令牌的值? 我说的是这一行:

 <input type="hidden" name="ses_token" value="token"/>
但是,由于名为value的属性不止一个,因此将打印出来

['', 'token', 'Log In', '', 'token', 'Log In'] # or something close to that
我怎样才能拿到代币呢?就一个

有更好的方法吗?

使用
find()
而不是
xpath()
,因为
find()
只返回第一个匹配

下面是一个基于您提供的代码的示例:

import lxml.html


pagesource = """<form method="POST" action="/login/?session=sess">
<input type="text" id="usern" name="username" value="" placeholder="Username"/>
<input type="password" id="passw" name="password" placeholder="Password"/>
<input type="hidden" name="ses_token" value="token"/>
<input id="login" type="submit" name="login" value="Log In"/>
</form>
<form method="POST" action="/login/?session=sess">
<input type="text" id="usern" name="username" value="" placeholder="Username"/>
<input type="password" id="passw" name="password" placeholder="Password"/>
<input type="hidden" name="ses_token" value="token"/>
<input id="login" type="submit" name="login" value="Log In"/>
</form>
"""

tree = lxml.html.fromstring(pagesource)
form = tree.find('.//form')

print "Action:", form.action
print "Token:", form.find('.//input[@name="ses_token"]').value
希望有帮助

['', 'token', 'Log In', '', 'token', 'Log In'] # or something close to that
import lxml.html


pagesource = """<form method="POST" action="/login/?session=sess">
<input type="text" id="usern" name="username" value="" placeholder="Username"/>
<input type="password" id="passw" name="password" placeholder="Password"/>
<input type="hidden" name="ses_token" value="token"/>
<input id="login" type="submit" name="login" value="Log In"/>
</form>
<form method="POST" action="/login/?session=sess">
<input type="text" id="usern" name="username" value="" placeholder="Username"/>
<input type="password" id="passw" name="password" placeholder="Password"/>
<input type="hidden" name="ses_token" value="token"/>
<input id="login" type="submit" name="login" value="Log In"/>
</form>
"""

tree = lxml.html.fromstring(pagesource)
form = tree.find('.//form')

print "Action:", form.action
print "Token:", form.find('.//input[@name="ses_token"]').value
Action: /login/?session=sess
Token: token