Python 在scrapy中使用正则表达式来获取onclick函数的参数_Python_Web Scraping_Scrapy

Python 在scrapy中使用正则表达式来获取onclick函数的参数

python web-scraping scrapy

Python 在scrapy中使用正则表达式来获取onclick函数的参数,python,web-scraping,scrapy,Python,Web Scraping,Scrapy,使用scrapy，我只想获取onclink函数的参数，我使用response.css（）来提取链接如果我仅使用正则表达式获取参数，则会出现错误（AttributeError:“list”对象没有属性“re”）必需的O/p:173543extract（）将文本数据提取为字符串列表。要将选择器与正则表达式匹配，需要在选择器本身上使用 html = """<table class="table table-striped table-bordered table-hover Tax" &g

使用scrapy，我只想获取onclink函数的参数，我使用response.css（）来提取链接

如果我仅使用正则表达式获取参数，则会出现错误（AttributeError:“list”对象没有属性“re”）

必需的O/p:173543

extract（）

将文本数据提取为字符串列表。要将选择器与正则表达式匹配，需要在选择器本身上使用

html = """<table class="table table-striped table-bordered table-hover Tax" >
            <thead>
                <tr>
                    <th>Sr No.</th>
                    <th>Name</th>
                    <th>Registration No</th>
                    <th>Address</th>
                    <th>Sectors</th>
                </tr>
            </thead>
            <tbody>
<tr>

    <td>1</td><td> <a href="javascript:void(0)" onclick='show_info("173543");'> ABCD</a></td>
                    <td>Address</td>
                    <td>12345</td>
                    <td>Data Not Found</td>
                </tr></tbody></table>"""

from scrapy.selector import Selector 
response= Selector(text=html)
links = response.css(".table.table-striped.table-bordered.table-hover.Tax>tbody>tr>td>a").xpath("./@onclick").re("show_info\((.+?)\)")

print links

希望这有帮助：）

extract（）

将文本数据提取为字符串列表。要将选择器与正则表达式匹配，需要在选择器本身上使用

html = """<table class="table table-striped table-bordered table-hover Tax" >
            <thead>
                <tr>
                    <th>Sr No.</th>
                    <th>Name</th>
                    <th>Registration No</th>
                    <th>Address</th>
                    <th>Sectors</th>
                </tr>
            </thead>
            <tbody>
<tr>

    <td>1</td><td> <a href="javascript:void(0)" onclick='show_info("173543");'> ABCD</a></td>
                    <td>Address</td>
                    <td>12345</td>
                    <td>Data Not Found</td>
                </tr></tbody></table>"""

from scrapy.selector import Selector 
response= Selector(text=html)
links = response.css(".table.table-striped.table-bordered.table-hover.Tax>tbody>tr>td>a").xpath("./@onclick").re("show_info\((.+?)\)")

print links

希望这有帮助：）

我使用XPath

contains

来获得正确的

onclick

内容，并使用

re_first（）解析它
我使用XPathcontains
获得正确的onclick
内容，并使用re\u first（）解析它
.extract（）[0].re（“show_info\（.+？）\）”
？通过删除感谢@Andersson.extract（）
和text
获得了我的回复.extract（）[0].re（“show_info\（.+？）\”
？通过删除感谢@Andersson.extract（））
和text
得到我的回答请在回答中加入一些解释。请在回答中加入一些解释。或者，正如Gangabass指出的那样，您可以使用re_first
作为仅获取第一个成员的快捷方式。或者，正如Gangabass指出的那样，您可以使用re_first作为仅获取第一个成员的快捷方式。
html = """<table class="table table-striped table-bordered table-hover Tax" >
            <thead>
                <tr>
                    <th>Sr No.</th>
                    <th>Name</th>
                    <th>Registration No</th>
                    <th>Address</th>
                    <th>Sectors</th>
                </tr>
            </thead>
            <tbody>
<tr>

    <td>1</td><td> <a href="javascript:void(0)" onclick='show_info("173543");'> ABCD</a></td>
                    <td>Address</td>
                    <td>12345</td>
                    <td>Data Not Found</td>
                </tr></tbody></table>"""

from scrapy.selector import Selector 
response= Selector(text=html)
links = response.css(".table.table-striped.table-bordered.table-hover.Tax>tbody>tr>td>a").xpath("./@onclick").re("show_info\((.+?)\)")

print links

[u'"173543"']

link_id = response.xpath('//td/a[contains(@onclick, "show_info")]/@onclick').re_first( r'"([^"]+)"')