Python 如何使用正则表达式提取信息页_Python_Regex_Web Scraping

Python 如何使用正则表达式提取信息页

python regex web-scraping

Python 如何使用正则表达式提取信息页,python,regex,web-scraping,Python,Regex,Web Scraping,我很难捕捉到“name”的内容：他经常出现在“pluralName”之前。还有什么更好的方法？（性能方面的最佳方式）。谢谢你的帮助注意：我正在使用python 包含我需要的信息的页面块： {"count":0,"items":[]},"shortUrl":"http:\/\/4sq.com\/11nP13T","likes":{"count":22,"groups":[{"type":"others","count":22,"items":[]}],"summary":"22 Likes"},

我很难捕捉到“name”的内容：他经常出现在“pluralName”之前。还有什么更好的方法？（性能方面的最佳方式）。谢谢你的帮助

注意：我正在使用python

包含我需要的信息的页面块：

{"count":0,"items":[]},"shortUrl":"http:\/\/4sq.com\/11nP13T","likes":{"count":22,"groups":[{"type":"others","count":22,"items":[]}],"summary":"22 Likes"},"ratingColor":"FF9600","id":"5172311be4b0ecc0a12a9953","canonicalPath":"\/v\/kee-hiong-klang-bak-kut-teh\/5172311be4b0ecc0a12a9953","canonicalUrl":"https:\/\/foursquare.com\/v\/kee-hiong-klang-bak-kut-teh\/5172311be4b0ecc0a12a9953","rating":5.3,"categories":[**{"pluralName":"Chinese Restaurants","name":"Chinese Restaurant",**"icon":{"prefix":"https:\/\/ss3.4sqi.net\/img\/categories_v2\/food\/asian_","mapPrefix":"https:\/\/ss3.4sqi.net\/img\/categories_map\/food\/chinese","suffix":".png"},"id":"4bf58dd8d48988d145941735","shortName":"Chinese","primary":true},{"pluralName":"Asian Restaurants","name":"Asian Restaurant","icon":{"prefix":"https:\/\/ss3.4sqi.net\/img\/categories_v2\/food\/asian_","mapPrefix":"https:\/\/ss3.4sqi.net\/img\/categories_map\/food\/asian","suffix":".png"},"id":"4bf58dd8d48988d142941735","shortName":"Asian"}],"createdAt":1366438171,"tips":{"count":25,"groups":[{"count":25,"items":[{"logView":true,"text":"Portion is quite small and expensive. Service attitude is so so. The BKT taste is not my preference.One of the up car restaurants in SS2 which I'll never go back again. ðŸ‘Ž","likes":{"count":1,"groups":[{"type":"others","count":1,"items":[{"photo":{"prefix":"https:\/\/irs0.4sqi.net\/img\/user\/","suffix":"\/43964080-5LYADRF2EEP2RWPL.jpg"},"lastName":".w","firstName":"Jackie","id":"43964080","canonicalPath":"\/user\/43964080","canonicalUrl":"https:\/\/foursquare.com\/user\/43964080","gender":"female"}]}],"summary":"1 like"},"id":"541c2b73498eb0cfe1f76b9e","canonicalPath":"\/item\/541c2b73498eb0cfe1f76b9e","canonicalUrl":"https:\/\/foursquare.com\/item\/541c2b73498eb0cfe1f76b9e","createdAt":1.411132275E9,"todo":{"count":0},"user":{"photo":{"prefix":"https:\/\/irs1.4sqi.net\/img\/user\/","suffix":"\/5765949-NW4BAJWFBCVLRR1M.jpg"}

使用

re.findall

尝试此操作。请参阅演示

根本不要使用regexp

相反，使用JSON解析器并访问结果对象。这就更加有力了

import json # part of python
o = json.loads(str)

你需要精确匹配什么？你能提供我想要匹配的预期输出吗？在这个例子“亚洲餐厅”中，但是，我将运行到标记“名称”具有不同值的其他页面：谢谢，有更多的结果作为输出，但是使用这些结果更容易。回答得好！如果您展示一个如何使用

的示例，可能会更清楚为什么这个答案更好。甚至可能在发布的问题的上下文中，他共享的JSON片段是FUBAR，无法修复。

print re.findall(r'(?:"pluralName":"[^"]*","name":"([^"]*))|(?:"name":"([^"]*)","pluralName")',test_str)

import json # part of python
o = json.loads(str)