Python 用漂亮的汤喝咖啡_Python_Web Scraping_Html Parsing_Beautifulsoup

Python 用漂亮的汤喝咖啡

python web-scraping

Python 用漂亮的汤喝咖啡,python,web-scraping,html-parsing,beautifulsoup,Python,Web Scraping,Html Parsing,Beautifulsoup,我用漂亮的汤解析了一些页面。但我有js代码： <script type="text/javascript"> var utag_data = { customer_id : "_PHL2883198554", customer_type : "New", loyalty_id : "N", declined_loyalty_interstitial : "false",

我用漂亮的汤解析了一些页面。但我有js代码：

<script type="text/javascript">   


var utag_data = {
            customer_id   : "_PHL2883198554", 
            customer_type : "New",
            loyalty_id : "N",
            declined_loyalty_interstitial : "false",
            site_version  : "Desktop Site",
            site_currency: "de_DE_EURO",
            site_region: "uk",
            site_language: "en-GB",


            customer_address_zip : "",
            customer_email_hash :  "",
            referral_source :  "",
            page_type : "product",
            product_category_name : ["Lingerie"],
            product_category_id :[jQuery("meta[name=defaultParent]").attr("content")],
            product_id : ["5741462261401"],
            product_image_url : ["http://images.urbanoutfitters.com/is/image/UrbanOutfitters/5741462261401_001_b?$detailmain$"],
            product_brand : ["Pretty Polly"],
            product_selling_price : ["20.0"],
            promo_id : "6",
            product_referral : ["WOMENS-SHAPEWEAR-LINGERIE-SOLUTIONS-EU"],
            product_name : ["Pretty Polly Shape It Up Tummy Shaping Camisole"],
            is_online_only : true,
            is_back_in_stock : false
}
</script>


变量utag_数据={
客户id:“_PHL2883198554”，
客户类型：“新”，
忠诚号：“N”，
拒绝_忠诚_间隙：“假”，
站点版本：“桌面站点”，
站点货币：“欧元”，
站点_地区：“英国”，
站点语言：“en GB”，
客户地址邮编：“，
客户\电子邮件\散列：“”，
转介来源：“，
页面类型：“产品”，
产品类别名称：[“内衣”]，
产品类别id:[jQuery（“meta[name=defaultParent]”）attr（“content”）]，
产品编号：[“5741462261401”]，
产品图片网址：[”http://images.urbanoutfitters.com/is/image/UrbanOutfitters/5741462261401_001_b?$detailmain$“]，
产品品牌：[“美丽的波利”]，
产品售价：[“20.0”]，
宣传片编号：“6”，
产品推荐：[“WOMENS-ShapeWar-内衣解决方案-EU”]，
产品名称：[“美丽的波莉塑造它的肚子塑造贴身背心”]，
是在线的吗，
是否有库存：错误
}

如何从该输入中获取一些值？我应该像处理文本一样处理这个示例吗？我的意思是把它写进某个变量，然后拆分，然后获取一些数据

一旦您通过

js_text = soup.find('script', type="text/javascript").text

比如说。然后你可以使用正则表达式来查找数据，我相信有一种更简单的方法可以做到这一点，但正则表达式也不难

import re
regex =  re.compile('\n^(.*?):(.*?)$|,', re.MULTILINE) #compile regex
js_text = re.findall(regex, js_text) #  find first item @ new line to : and 2nd item @ from : to the end of the line or , 
js_text = [jt.strip() for jt in js_text] #  to strip away all of the extra white space.

这将返回名称|值|名称2 |值2中的名称和值列表。。。您可以在以后随意处理或转换为字典的顺序。

@user3761151 Add re.MULTILINE flag，忘了提到这一点。编辑了我的答案。您可以在这里找到如何在Python中使用正则表达式的完整文档：但是如果我需要这样的字符串：this.products=ko.observableArray（[{“productId”：537477，…elemets}]），是否可以为它创建正则表达式？@user3761151我很难理解您在这里实际需要什么，但是使用正则表达式，你几乎可以从你得到的字符串中提取你想要的任何东西。了解regex对于任何字符串管理工作都是至关重要的，所以我强烈建议花一两个晚上来学习它。