Python 爬网wordreference的问题_Python_Xpath_Web Scraping_Web Crawler_Lxml

Python 爬网wordreference的问题

python xpath web-scraping web-crawler

Python 爬网wordreference的问题,python,xpath,web-scraping,web-crawler,lxml,Python,Xpath,Web Scraping,Web Crawler,Lxml,我正在尝试爬网wordreference，但没有成功我遇到的第一个问题是，很大一部分是通过JavaScript加载的，但这应该不是什么大问题，因为我可以在源代码中看到我需要什么例如，我想提取一个给定单词的前两个意思，在这个url中：http://www.wordreference.com/es/translation.asp?tranword=crane我需要提取grulla和grúa 这是我的代码： import lxml.html as lh import urllib2 url =

我正在尝试爬网

wordreference

，但没有成功

我遇到的第一个问题是，很大一部分是通过

JavaScript

加载的，但这应该不是什么大问题，因为我可以在源代码中看到我需要什么

例如，我想提取一个给定单词的前两个意思，在这个url中：

http://www.wordreference.com/es/translation.asp?tranword=crane

我需要提取

grulla

和

grúa

这是我的代码：

import lxml.html as lh
import urllib2

url = 'http://www.wordreference.com/es/translation.asp?tranword=crane'
doc = lh.parse((urllib2.urlopen(url)))
trans = doc.xpath('//td[@class="ToWrd"]/text()')

for i in trans:

    print i

结果是我得到一个空列表

我也试过用scrapy爬行，但没有成功。我不确定到底发生了什么，我能够抓取它的唯一方法是使用

curl

，但这就是sloopy，我想用Python以优雅的方式完成它

非常感谢

您似乎需要发送

用户代理

标题，请参阅

另外，只要切换到就可以了（默认情况下，它会自动发送

python请求/version

User代理）：

印刷品：

grulla 
grúa 
plataforma 
...
grulla blanca 
grulla trompetera

谢谢，但是，它不能与

用户代理

urllib

一起工作的原因是什么。我已经爬过其他网站，那一个没有问题，为什么不是这一个？

grulla 
grúa 
plataforma 
...
grulla blanca 
grulla trompetera