Web crawler 我想使用python和beautiful soup从图中提取值_Web Crawler_Screen Scraping

Web crawler 我想使用python和beautiful soup从图中提取值

web-crawler

Web crawler 我想使用python和beautiful soup从图中提取值,web-crawler,screen-scraping,Web Crawler,Screen Scraping,这里是我想从图中提取值的网站单击此处查看图下面是从该页面中删除其他数据的代码如规格及相关产品并收集不同的卖家从网页上我的工作在最后一年的项目。为了完成这个项目，我只需要从图中删除这些值 def规格（自身、请求）：产品名称=列表（）产品价格=清单（） image_source=list（）产品href=list（）行=列表（）表1=列表（）表2=列表（）存储=列表（）存储\重定向\链接=列表（）其他价格=清单（） href=request.GET.GET（'url'）

这里是我想从图中提取值的网站单击此处查看图下面是从该页面中删除其他数据的代码如规格及相关产品并收集不同的卖家从网页上我的工作在最后一年的项目。为了完成这个项目，我只需要从图中删除这些值

def规格（自身、请求）：
产品名称=列表（）
产品价格=清单（）
image_source=list（）
产品href=list（）
行=列表（）
表1=列表（）
表2=列表（）
存储=列表（）
存储\重定向\链接=列表（）
其他价格=清单（）
href=request.GET.GET（'url'）
l=href
scrap1=self.page_load（href）
content1=BeautifulSoup（scrap1.text，“html.parser”）
containers=content1.findAll（“li”{
“类别”：“crsl\uuuu itm prd-sldr\uuuu itm prd-sldr\uuuu itm--s”
})
对于集装箱中的集装箱：
name=container.find（“div”{
“类别”：“prdp ttl”
}).get_text（）
印刷品（名称）
price=container.find（“div”{
“类别”：“prdp中国价格”
}).get_text（）
印刷品（价格）
source=container.findAll（'div'）[1]
打印（来源）
source=source.img['data-src']
href=container.find（'a'）['href']
产品名称。附加（名称）
产品价格。附加（价格）
图像\源。附加（源）
产品href.append（href）
data_id=content1.find（'div'{
“类别”：“crd prd prmry crd”
})
id=数据\u id.find（'li'{
'class'：'float--right'
})
id=id.find（'输入'{
“类”：“aPC”
})
pid=id['data-pid']
info=content1.find（'ul'{
'class'：'nav soft half--顶部soft half--左侧quick spec nav--vtop three cols'
})
info_rows=info.findAll（'li'））
对于信息行中的行：
rows.append（row.text）
name=content1.find（“h1”{
“类”：“粗体txt xl”
}).文本
名称=名称。替换（'价格'，''）
price=content1.find（“div”{
“类”，
“txt xl粗体spcolor最低价格”
}).文本
source=content1.find（“li”{
“类”，
“crsl___________________________________________
})
source=source.img['data-src']
如果content1.find（'div'{
“类”：“crd hdr bg--警告”
})==无：
r=content1.find（'div'{
“类”：“按一半--底部”
})
如果r.find（'a'）！=无：
redirect=r.find（'a'）['href']
重定向_name=r.text
其他：
重定向\u name=“缺货”
重定向=“”
其他：
重定向\u name=“缺货”
重定向=“”
多个存储=content1.findAll（'div'{
'class'：'invt_uuITM inventoryListItem'
})
对于多个门店中的门店：
n=store.find（'div'{
“类”，
“软半底”
})
store_name=n.img['src']
store_redirect_link=n.find（'a'）['href']
other_price=store.find（'span'{
“类”，
“中国价格投资”
}).文本
如果其他价格=“-”：
其他价格=“不可用”
stores.append（商店名称）
store\u redirect\u links.append（store\u redirect\u link）
其他价格。附加（其他价格）
tables=content1.findAll（“table”{
“类别”：“sp”
})
对于表中的表：
table_rows=table.findAll（'tr'）
i=0
对于表_行中的tr：
如果（i>=1）：
colum1=tr.findAll（“td”）[0]。文本
colum2=tr.findAll（“td”）[1]。文本
表1列附加（列1）
表2列附加（列2）
i=i+1
规格={
“index1”：范围（0，len（产品名称）），
“产品名称”：产品名称，
“产品价格”：产品价格，
“图像源”：图像源，
“产品”：产品，
“名称”：名称，
"价格":价格,，
“源”：源，
“重定向”：重定向，
“重定向\u名称”：重定向\u名称，
“pid”：pid，
“索引”：范围（0，长度（行）-1），
“行”：行，
“总存储量”：范围（0，len（存储量）），
“商店”：商店，
“存储重定向链接”：存储重定向链接，
“其他价格”：其他价格，
“总规格”：范围（0，长度（表1）），
“表列1”：表列1，
“table_Column_2”：table_Column_2，
“l_p”：l
}
返回render（请求'product_details.html'，spec）

如果您直接包含图像，而不是作为外部链接，您的问题将更容易阅读。另外，您的问题是关于图像处理（估计2D图像中的趋势），还是关于从html文本块中提取值？谢谢。不是图像处理先生，从图形中提取值。这是刮（刮，刮，刮）不是刮