Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/309.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用BeatifulSoup(IndustryAbout)解析HTML中的文本块_Python_Regex_Parsing_Beautifulsoup - Fatal编程技术网

Python 使用BeatifulSoup(IndustryAbout)解析HTML中的文本块

Python 使用BeatifulSoup(IndustryAbout)解析HTML中的文本块,python,regex,parsing,beautifulsoup,Python,Regex,Parsing,Beautifulsoup,我想解析来自industryAbout的矿山条目。在这个例子中,我正在处理 HTML中有趣的块是: <strong>Commodities: Copper, Nickel, Platinum, Palladium, Gold</strong><br /><strong>Area: Lappi</strong><br /><strong>Type: Copper Concentrator Plant</str

我想解析来自industryAbout的矿山条目。在这个例子中,我正在处理

HTML中有趣的块是:

<strong>Commodities: Copper, Nickel, Platinum, Palladium, Gold</strong><br /><strong>Area: Lappi</strong><br /><strong>Type: Copper Concentrator Plant</strong><br /><strong>Annual Production: 17,200 tonnes of Copper (2015), 8,800 tonnes of Nickel (2015), 31,900 tonnes of Platinum, 25,100 ounces of Palladium, 12,800 ounces of Gold (2015)</strong><br /><strong>Owner: Kevitsa Mining Oy</strong><br /><strong>Shareholders: Boliden AB (100%)</strong><br /><strong>Activity since: 2012</strong>
第二版:

import requests
from bs4 import BeautifulSoup
import re
import csv

links = ["34519-kevitsa-copper-concentrator-plant", "34520-kevitsa-copper-mine", "34356-glogow-copper-refinery"]

for l in links:

    page = requests.get("https://www.industryabout.com/country-territories-3/2199-finland/copper-mining/"+l)
    soup = BeautifulSoup(page.content, 'lxml')
    rows = soup.select("strong")
    d = {}

    for r in rows:
        name, value, *rest = r.text.split(":")
        if not rest:
            d[name] = value
    print(d)

这是你想要的吗

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.industryabout.com/country-territories-3/2199-finland/copper-mining/34519-kevitsa-copper-concentrator-plant")
soup = BeautifulSoup(page.content, 'html.parser')

rows = soup.select("strong")
d = {}
for r in rows:
    name, value, *rest = r.text.split(":")
    if not rest: # links or scripts have more ":" probably not intesting for you
        d[name] = value
print(d)

如何添加csv编写器?为了让我更容易,也便于其他喜欢分析此页面的人完成。@pickenpack“添加csv编写器”是什么意思?我曾尝试将代码放在此处,但失败了。在我的问题中,我添加了一个“第二个”版本,在这个版本中,我试图使解析器变得简单(即使缺少条目)+添加数组的csv输出。
import requests
from bs4 import BeautifulSoup
import re
import csv

links = ["34519-kevitsa-copper-concentrator-plant", "34520-kevitsa-copper-mine", "34356-glogow-copper-refinery"]

for l in links:

    page = requests.get("https://www.industryabout.com/country-territories-3/2199-finland/copper-mining/"+l)
    soup = BeautifulSoup(page.content, 'lxml')
    rows = soup.select("strong")
    d = {}

    for r in rows:
        name, value, *rest = r.text.split(":")
        if not rest:
            d[name] = value
    print(d)
import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.industryabout.com/country-territories-3/2199-finland/copper-mining/34519-kevitsa-copper-concentrator-plant")
soup = BeautifulSoup(page.content, 'html.parser')

rows = soup.select("strong")
d = {}
for r in rows:
    name, value, *rest = r.text.split(":")
    if not rest: # links or scripts have more ":" probably not intesting for you
        d[name] = value
print(d)