Python 按标题类型拆分HTML div后,如何提取I';我对什么感兴趣?

Python 按标题类型拆分HTML div后,如何提取I';我对什么感兴趣?,python,html,xml,xpath,lxml,Python,Html,Xml,Xpath,Lxml,给定一个页面,例如,两个作业(我们现在将忽略“打开的应用程序”)一个接一个地被完整描述,我可以通过应用以下XPath来检测是否有一个作业与关键字匹配: //*[self::h2 or self::h3 or self::h4][contains(., 'Country Manager')] 通过Python: import urllib2 import lxml.html as lh url = 'http://jobs.kelkoo.co.uk/' response = urllib

给定一个页面,例如,两个作业(我们现在将忽略“打开的应用程序”)一个接一个地被完整描述,我可以通过应用以下XPath来检测是否有一个作业与关键字匹配:

//*[self::h2 or self::h3 or self::h4][contains(., 'Country Manager')]
通过Python:

import urllib2
import lxml.html as lh    
url = 'http://jobs.kelkoo.co.uk/'
response = urllib2.urlopen(url)
content = response.read()
root = lh.fromstring(content)
job_titles = root.xpath("//*[self::h2 or self::h3 or self::h4][contains(., 'Country Manager')]")
然后,我可以确定所涉及的是哪种类型:

tags = [e.tag for e in titles]
知道我们正在处理一个
,我希望提取单个工作规范。我知道我可以使用以下描述每个

//div[count(preceding-sibling::h2)=1]
但是,我如何将有关职位名称的位置、标签类型以及上述描述联系起来呢

我试图将关键字放回上面描述的XPath中,但我被告知它不是有效的表达式:

//div[count(preceding-sibling::h2[contains(text(), 'Country Manager')]=1]
使用
class=“jobspecs”
查找以下内容:

印刷品:

Country Manager - Uk
Contract type: Permanent
Hours per week: 40
Site: London
----
上的每个作业都在一个

(上面使用lxml支持的XPath变量,但您可以使用
[包含(,“国家/地区管理器”)]

  • 绕着它们转
  • 在循环内部,使用@alecxe建议的
    [@class=“jobspecs”]
    相对地选择所需的子元素
    (XPath表达式以
    /
    开头以确保安全)
大概是这样的:

>>> for job in jobs:
...     title = job.xpath('normalize-space(h2|h3|h4)')
...     specs = job.xpath('string(./div[@class="jobspecs"])').strip()
...     desc = job.xpath('string(./div[@class="jobdesc"])').strip()
...     print('-------')
...     print(title)
...     print('-------')
...     print(specs)
...     print('-------')
...     print(desc)
...     
... 
-------
Country Manager - Uk
-------
Contract type: Permanent
                    Hours per week: 40

                    Site: London
-------
Role overview:
Reporting in to the European Commercial Director, the UK/IE Country Manager is a senior manager with full responsibility for the sales, traffic and product functions across two countries. He/She will drive the UK sales and traffic functions and manage a team of highly skilled digital account managers based in London.
The role involves sales planning, account growth planning, forecasting, data analysis and high level presentations with senior internal and external parties. The CM is responsible for the Gross Margin position and goals of the country, managing yield prices, cost of sale prices and the overall financial management of conversion over a large number of merchants and traffic partners.
The critical equations of broking between revenue, cost of leads and understanding the merchant perspective on volume, performance and quality is key to this role. This person will need little day to day management and will be a natural leader who is respected for their knowledge, commitment and ability.
Accountabilities and Deliverables:
-Develop strong relationships with key UK merchants and agencies that drive growth and take best advantage of all opportunities
- Work closely with EU counterparts to identify and maximise pan-euro opportunities where required, drive these deals through to completion either on own initiative or as part of the wider European team
- Use initiative to identity and push new opportunities; from growth of existing channels to creation of new ones
- Full control and management of the UK/IE commercial teams; able to delegate tasks and responsibilities while respecting their staffs experience and ability;
Previous Experience/Skills required:
- 6+ years experience in a proven sales/marketing management role, in digital/e-commerce.
- Understanding of the price comparison market.
- Understanding of digital marketing and online advertising.
- Contacts in online retail
Person Specification/Competencies:
- Good negotiation skills and ability to close deals quickly.
- Very strong communication and presentation skills to get best results in both local country and where required across Europe (proven track record in creating and maintaining a productive network)
- Excellent internal and external customer relationship and interpersonal skills.
- Team player with strong work ethic and ability to adapt to and drive change.
- Commercially minded.
- Strategic thinker and able to think analytically at a detailed level.
- Proven leadership skills.
- Ability to strongly influence those outside direct control for positive results.
- Displays respect to all colleagues and encourages this behaviour in own team.
- Able to deliver at a consistently high level in a demanding commercial environment.
- Manages conflict in a positive and assertive manner for best outcome
Academic Background:
- Strong academic background; preferably a minimum 2:1 degree or equivalen
Requirements/Other Information:
- Role holder must be able to travel freely across Europe and be eligible to work in the UK
Good reasons to join us
- Company highly recognized in its market
- Help our customer drive core of their business
- Opportunity to show your full potential in a growing business
- Chance to work with incredibly smart, talented, and interesting folks                

                        Apply


                        Download details

看起来我可以做到以下几点,但它是否总是可靠的
for i in e.itersiblings():print i.text_content()
有很多这样的网站,所以我事先不知道所讨论的div的类名。回答得很好,谢谢。
            <div class="jobitem">
        <h2>Country Manager - Uk</h2>
        <div class="jobspecs">
            <ul>
                <li><span class="label">Contract type: </span>Permanent</li>
                <li><span class="label">Hours per week: </span>40</li>
                <li></li>
                <li><span class="label">Site: </span>London</li>
                <li></li>
                <li></li>
            </ul>
        </div>
        <div class="jobdesc">
            <p>Role overview:</p>
            ...
import urllib2
import lxml.html as lh    
url = 'http://jobs.kelkoo.co.uk/'
response = urllib2.urlopen(url)
content = response.read()
root = lh.fromstring(content)
jobs = root.xpath('''
    //div[@class='jobitem']
         [child::*[self::h2 or self::h3 or self::h4]
                  [contains(., $query)]]''',
    query="Country Manager")
>>> for job in jobs:
...     title = job.xpath('normalize-space(h2|h3|h4)')
...     specs = job.xpath('string(./div[@class="jobspecs"])').strip()
...     desc = job.xpath('string(./div[@class="jobdesc"])').strip()
...     print('-------')
...     print(title)
...     print('-------')
...     print(specs)
...     print('-------')
...     print(desc)
...     
... 
-------
Country Manager - Uk
-------
Contract type: Permanent
                    Hours per week: 40

                    Site: London
-------
Role overview:
Reporting in to the European Commercial Director, the UK/IE Country Manager is a senior manager with full responsibility for the sales, traffic and product functions across two countries. He/She will drive the UK sales and traffic functions and manage a team of highly skilled digital account managers based in London.
The role involves sales planning, account growth planning, forecasting, data analysis and high level presentations with senior internal and external parties. The CM is responsible for the Gross Margin position and goals of the country, managing yield prices, cost of sale prices and the overall financial management of conversion over a large number of merchants and traffic partners.
The critical equations of broking between revenue, cost of leads and understanding the merchant perspective on volume, performance and quality is key to this role. This person will need little day to day management and will be a natural leader who is respected for their knowledge, commitment and ability.
Accountabilities and Deliverables:
-Develop strong relationships with key UK merchants and agencies that drive growth and take best advantage of all opportunities
- Work closely with EU counterparts to identify and maximise pan-euro opportunities where required, drive these deals through to completion either on own initiative or as part of the wider European team
- Use initiative to identity and push new opportunities; from growth of existing channels to creation of new ones
- Full control and management of the UK/IE commercial teams; able to delegate tasks and responsibilities while respecting their staffs experience and ability;
Previous Experience/Skills required:
- 6+ years experience in a proven sales/marketing management role, in digital/e-commerce.
- Understanding of the price comparison market.
- Understanding of digital marketing and online advertising.
- Contacts in online retail
Person Specification/Competencies:
- Good negotiation skills and ability to close deals quickly.
- Very strong communication and presentation skills to get best results in both local country and where required across Europe (proven track record in creating and maintaining a productive network)
- Excellent internal and external customer relationship and interpersonal skills.
- Team player with strong work ethic and ability to adapt to and drive change.
- Commercially minded.
- Strategic thinker and able to think analytically at a detailed level.
- Proven leadership skills.
- Ability to strongly influence those outside direct control for positive results.
- Displays respect to all colleagues and encourages this behaviour in own team.
- Able to deliver at a consistently high level in a demanding commercial environment.
- Manages conflict in a positive and assertive manner for best outcome
Academic Background:
- Strong academic background; preferably a minimum 2:1 degree or equivalen
Requirements/Other Information:
- Role holder must be able to travel freely across Europe and be eligible to work in the UK
Good reasons to join us
- Company highly recognized in its market
- Help our customer drive core of their business
- Opportunity to show your full potential in a growing business
- Chance to work with incredibly smart, talented, and interesting folks                

                        Apply


                        Download details