python-使用带有xpath语法的lxml.html解析html表单
这是表格。相同的精确形式在源代码中出现两次python-使用带有xpath语法的lxml.html解析html表单,python,html,xpath,lxml.html,Python,Html,Xpath,Lxml.html,这是表格。相同的精确形式在源代码中出现两次 <form method="POST" action="/login/?tok=sess"> <input type="text" id="usern" name="username" value="" placeholder="Username"/> <input type="password" id="passw" name="password" placeholder="Password"/> <inpu
<form method="POST" action="/login/?tok=sess">
<input type="text" id="usern" name="username" value="" placeholder="Username"/>
<input type="password" id="passw" name="password" placeholder="Password"/>
<input type="hidden" name="ses_token" value="token"/>
<input id="login" type="submit" name="login" value="Log"/>
</form>
由于有两种形式,它将打印这两种属性
['/login/?session=sess', '/login/?session=sess']
我怎样才能让它只打印一张?我只需要一个,因为它们的形状完全一样
我还有第二个问题
如何获取令牌的值?
我说的是这一行:
<input type="hidden" name="ses_token" value="token"/>
但是,由于名为value的属性不止一个,因此将打印出来
['', 'token', 'Log In', '', 'token', 'Log In'] # or something close to that
我怎样才能拿到代币呢?就一个
有更好的方法吗?使用find()
而不是xpath()
,因为find()
只返回第一个匹配
下面是一个基于您提供的代码的示例:
import lxml.html
pagesource = """<form method="POST" action="/login/?session=sess">
<input type="text" id="usern" name="username" value="" placeholder="Username"/>
<input type="password" id="passw" name="password" placeholder="Password"/>
<input type="hidden" name="ses_token" value="token"/>
<input id="login" type="submit" name="login" value="Log In"/>
</form>
<form method="POST" action="/login/?session=sess">
<input type="text" id="usern" name="username" value="" placeholder="Username"/>
<input type="password" id="passw" name="password" placeholder="Password"/>
<input type="hidden" name="ses_token" value="token"/>
<input id="login" type="submit" name="login" value="Log In"/>
</form>
"""
tree = lxml.html.fromstring(pagesource)
form = tree.find('.//form')
print "Action:", form.action
print "Token:", form.find('.//input[@name="ses_token"]').value
希望有帮助
['', 'token', 'Log In', '', 'token', 'Log In'] # or something close to that
import lxml.html
pagesource = """<form method="POST" action="/login/?session=sess">
<input type="text" id="usern" name="username" value="" placeholder="Username"/>
<input type="password" id="passw" name="password" placeholder="Password"/>
<input type="hidden" name="ses_token" value="token"/>
<input id="login" type="submit" name="login" value="Log In"/>
</form>
<form method="POST" action="/login/?session=sess">
<input type="text" id="usern" name="username" value="" placeholder="Username"/>
<input type="password" id="passw" name="password" placeholder="Password"/>
<input type="hidden" name="ses_token" value="token"/>
<input id="login" type="submit" name="login" value="Log In"/>
</form>
"""
tree = lxml.html.fromstring(pagesource)
form = tree.find('.//form')
print "Action:", form.action
print "Token:", form.find('.//input[@name="ses_token"]').value
Action: /login/?session=sess
Token: token