Javascript 如何从网站页面中提取css值
是否有任何方法可以使用css类名从网站页面提取css值。我想使用父类css名称获取所有css值和子类值 例如: 网页Css:Javascript 如何从网站页面中提取css值,javascript,python,css,web-scraping,Javascript,Python,Css,Web Scraping,是否有任何方法可以使用css类名从网站页面提取css值。我想使用父类css名称获取所有css值和子类值 例如: 网页Css: .container { width: 80%; } .btn-wrap { padding: 3px; width: 25%; text-align: center; } .text-box { margin: 0 auto; width: 50%; } .frm-btn-grp { pad
.container {
width: 80%;
}
.btn-wrap {
padding: 3px;
width: 25%;
text-align: center;
}
.text-box {
margin: 0 auto;
width: 50%;
}
.frm-btn-grp {
padding: 3px;
width: 100%;
text-align: center;
.btn-success {
border: 1px solid green;
padding: 7px 24px;
border-radius: 2px;
color: white;
background-color: green;
width: 100px;
}
}
如果我将.frm btn grp作为输入,它将返回
.frm-btn-grp {
padding: 3px;
width: 100%;
text-align: center;
.btn-success {
border: 1px solid green;
padding: 7px 24px;
border-radius: 2px;
color: white;
background-color: green;
width: 100px;
}
}
这可能吗?这里有一些网站垃圾操作:
import re
import urllib.request as ureq
sample_url = "https://stackoverflow.com/questions/59685137/how-to-extract-css-values-from-website-page"
with ureq.urlopen(sample_url) as req:
data = req.read().decode('utf-8')
#- Split HTML by line ending; Look for 'text/css' matches
css_lines = [i.strip() for i in data.split('\n') if len(i) > 0 and 'text/css' in i]
#-- Create a simple regular expression to extract the css html
#-- Note: ?P<named_tag> allows for naming each section, but I think
#-- it only works on compiled regular expresions, which isn't a huge
#-- deal.
css_pat = r'href="(?P<css_url>.+)"'
p = re.compile(css_pat)
#-- Create a list and append it with our matches.
css_urls = []
for i in css_lines:
tmp = p.search(i).group('css_url')
if tmp:
css_urls.append(tmp)
然后,你可以做任何事。迭代URL以获取所有css数据,打开所有css文件并将其合并为一个,等等
with ureq.urlopen(css_urls[0]) as req:
css_data = req.read().decode('utf-8')
#-- Here's a sample printout of a css file for this page
#-- I added some .replace() statments to make it prettier :-)
print(css_data[:500]
.replace(',', ',\n')
.replace('{', ' {\n\t')
.replace(';', ';\n\t')
.replace('}','\n\t}\n\n')
)
截断输出:
html,
body,
div,
span,
{...}
output,
ruby,
section,
summary,
time,
mark,
audio,
video {
margin:0;
padding:0;
border:0;
font:inherit;
font-size:100%;
vertical-align:baseline
}
article,
a
到目前为止你试过什么?比什么都好奇。。。你为什么要这样做?@roganjosh现在我想手动提取所有css值。这需要更多的时间。所以我会寻找一些自动化代码。可能吗?这重申了你想做的事情,但我问了为什么。@Bryan我什么都没开始。我已经搜索了相关的内容,但我仍然不知道如何开始这个想法。这就是我发布这个问题的原因。
html,
body,
div,
span,
{...}
output,
ruby,
section,
summary,
time,
mark,
audio,
video {
margin:0;
padding:0;
border:0;
font:inherit;
font-size:100%;
vertical-align:baseline
}
article,
a