Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/280.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何获得<;p>;元素<;h3>;标签中包含的单词是“quot;联系方式:“;使用刮擦反应_Python_Xpath_Scrapy - Fatal编程技术网

Python 如何获得<;p>;元素<;h3>;标签中包含的单词是“quot;联系方式:“;使用刮擦反应

Python 如何获得<;p>;元素<;h3>;标签中包含的单词是“quot;联系方式:“;使用刮擦反应,python,xpath,scrapy,Python,Xpath,Scrapy,我正在尝试使用scrapy shell从数据库中删除联系人信息 <div class="info-section"> <h3>State(s) Served:</h3> <p>Nationwide (US)</p> </div> <div class="info-section"> <h

我正在尝试使用scrapy shell从数据库中删除联系人信息

<div class="info-section">
                    <h3>State(s) Served:</h3>
                    <p>Nationwide (US)</p>  </div>
<div class="info-section">
                    <h3>Year Founded:</h3>
                    <p>1985</p>  </div>

<div class="info-section">
                    <h3>Description:</h3>
                    <p>Corporate tax accounting/consulting. Specialties:  280E Compliance/Planning, Research & Development Tax Credits, Cost Segregation, IRS Representation, Certified Financial Auditing.</p> </div>
                                    <div class="info-section">
                        <h3>Contact:</h3>
                        <p><a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="93f1e1eaf2fdd3f0e3f2fef7bdf0fcfe">[email&#160;protected]</a> | 847-382-1166 X28</p>
                    </div>

所服务的国家:
全国(美国)

成立年份: 1985年

说明: 公司税务会计/咨询。专业:280E合规/规划、研发税收抵免、成本分离、IRS代表、注册财务审计。

联系人: | 847-382-1166 X28


我使用
sel=response.css('.info section')
选择了信息部分,然后我可以迭代
p
元素,但是我如何只选择包含联系人信息的
标记,然后获取
文本?

如果需要通过电子邮件获取
之后的
文本,您可以尝试以下方法:

>>> txt = """<div class="info-section">
...                     <h3>State(s) Served:</h3>
...                     <p>Nationwide (US)</p>  </div>
... <div class="info-section">
...                     <h3>Year Founded:</h3>
...                     <p>1985</p>  </div>
... 
... <div class="info-section">
...                     <h3>Description:</h3>
...                     <p>Corporate tax accounting/consulting. Specialties:  280E Compliance/Planning, Research & Development Tax Credits, Cost Segregation, IRS Representation, Certified Financial Auditing.</p> </div>
...                                     <div class="info-section">
...                         <h3>Contact:</h3>
...                         <p><a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="93f1e1eaf2fdd3f0e3f2fef7bdf0fcfe">[email&#160;protected]</a> | 847-382-1166 X28</p>
...                     </div>"""
>>> from scrapy import Selector
>>> sel = Selector(text=txt)
>>> sel.xpath('//h3[contains(text(), "Contact")]/following-sibling::p/a/following-sibling::text()').get()
u' | 847-382-1166 X28'

请粘贴您的python代码以获取更多信息。我使用的是scrapy shell,只是尝试使用cmd行中的response来选择这些元素。您的联系人信息有一个类。。。您可以使用该类,而不是在本例中,xpath表达式不能是更短的
//h3[contains(text(),“Contact”)]/以下同级::p/text()
>>> sel.xpath('//h3[contains(text(), "Contact")]/following-sibling::p/text()').get()
u' | 847-382-1166 X28'