Python 从任何css文件获取特定类的所有css属性_Python_Css_Regex_Parsing

Python 从任何css文件获取特定类的所有css属性

python css regex parsing

Python 从任何css文件获取特定类的所有css属性,python,css,regex,parsing,Python,Css,Regex,Parsing,有没有相对简单的方法可以使用python中的某种解析器来获得一个类的组合属性，或者我需要使用一些正则表达式来获得它 .container_12, .container_16 { margin-left:auto; margin-right:auto; width:960px } .grid_1, .grid_2, .grid_3, .grid_4, .grid_5 { display:inline; float:left; margin-left:

有没有相对简单的方法可以使用python中的某种解析器来获得一个类的组合属性，或者我需要使用一些正则表达式来获得它

.container_12, .container_16 {
    margin-left:auto;
    margin-right:auto;
    width:960px
}
.grid_1, .grid_2, .grid_3, .grid_4, .grid_5 {
    display:inline;
    float:left;
    margin-left:10px;
    margin-right:10px
}
.featured_container .container_12 .grid_4 a {
    color: #1d1d1d;
    float: right;
    width: 235px;
    height: 40px;
    text-align: center;
    line-height: 40px;
    border: 4px solid #141a20;

对于上面的css片段，如果我搜索“container_12”，它应该返回：

  {
        margin-left:auto;
        margin-right:auto;
        width:960px
        color: #1d1d1d;
        float: right;
        width: 235px;
        height: 40px;
        text-align: center;
        line-height: 40px;
        border: 4px solid #141a20;
    }

重复属性很好，以后我将使用字典来存储它们，因此这不会是问题。

您可以使用：

\.container_12\b[^{]*{([\s\S]*?)}

您所期望的结果将出现在

\1

中，因此只需对其进行迭代，并对其执行任何您想要的操作

以下是CSS的粗略解析器：

import pyparsing as pp

# punctuation is important during parsing, but just noise afterwords; suppress it
LBRACE, RBRACE = map(pp.Suppress, "{}")

# read a ':' and any following whitespace
COLON = (":" + pp.Empty()).suppress()

obj_ref = pp.Word(".", pp.alphanums+'_') | pp.Word(pp.alphas, pp.alphanums+'_')
attr_name = pp.Word(pp.alphas, pp.alphanums+'-_')
attr_spec = pp.Group(attr_name("name") + COLON + pp.restOfLine("value"))

# each of your format specifications is one or more comma-delimited lists of obj_refs,
# followed by zero or more attr_specs in {}'s
# using a pp.Dict will auto-define an associative array from the parsed keys and values
spec = pp.Group(pp.delimitedList(obj_ref)[1,...]('refs')
                + LBRACE
                + pp.Dict(attr_spec[...])("attrs")
                + RBRACE)

# the parser will parse 0 or more specs    
parser = spec[...]

解析css源代码：

result = parser.parseString(css_source)
print(result.dump())

给出：

[['.container_12', '.container_16', [['margin-left', 'auto;'], ['margin-right', 'auto;'], ['width', '960px']]], ['.grid_1', '.grid_2', '.grid_3', '.grid_4', '.grid_5', [['display', 'inline;'], ['float', 'left;'], ['margin-left', '10px;'], ['margin-right', '10px']]], ['.featured_container', '.container_12', '.grid_4', 'a', [['color', '#1d1d1d;'], ['float', 'right;'], ['width', '235px;'], ['height', '40px;'], ['text-align', 'center;'], ['line-height', '40px;'], ['border', '4px solid #141a20;']]]]
[0]:
  ['.container_12', '.container_16', [['margin-left', 'auto;'], ['margin-right', 'auto;'], ['width', '960px']]]
  - attrs: [['margin-left', 'auto;'], ['margin-right', 'auto;'], ['width', '960px']]
    - margin-left: 'auto;'
    - margin-right: 'auto;'
    - width: '960px'
  - refs: ['.container_12', '.container_16']
[1]:
  ['.grid_1', '.grid_2', '.grid_3', '.grid_4', '.grid_5', [['display', 'inline;'], ['float', 'left;'], ['margin-left', '10px;'], ['margin-right', '10px']]]
  - attrs: [['display', 'inline;'], ['float', 'left;'], ['margin-left', '10px;'], ['margin-right', '10px']]
    - display: 'inline;'
    - float: 'left;'
    - margin-left: '10px;'
    - margin-right: '10px'
  - refs: ['.grid_1', '.grid_2', '.grid_3', '.grid_4', '.grid_5']
[2]:
  ['.featured_container', '.container_12', '.grid_4', 'a', [['color', '#1d1d1d;'], ['float', 'right;'], ['width', '235px;'], ['height', '40px;'], ['text-align', 'center;'], ['line-height', '40px;'], ['border', '4px solid #141a20;']]]
  - attrs: [['color', '#1d1d1d;'], ['float', 'right;'], ['width', '235px;'], ['height', '40px;'], ['text-align', 'center;'], ['line-height', '40px;'], ['border', '4px solid #141a20;']]
    - border: '4px solid #141a20;'
    - color: '#1d1d1d;'
    - float: 'right;'
    - height: '40px;'
    - line-height: '40px;'
    - text-align: 'center;'
    - width: '235px;'
  - refs: ['.featured_container', '.container_12', '.grid_4', 'a']

{'border': '4px solid #141a20;',
 'color': '#1d1d1d;',
 'float': 'right;',
 'height': '40px;',
 'line-height': '40px;',
 'margin-left': 'auto;',
 'margin-right': 'auto;',
 'text-align': 'center;',
 'width': '235px;'}

使用

defaultdict（dict）

按引用的CSS对象累积属性：

from collections import defaultdict
accum = defaultdict(dict)
for res in result:
    for name in res.refs:
        accum[name].update(res.attrs)

from pprint import pprint
pprint(accum['.container_12'])

给出：

[['.container_12', '.container_16', [['margin-left', 'auto;'], ['margin-right', 'auto;'], ['width', '960px']]], ['.grid_1', '.grid_2', '.grid_3', '.grid_4', '.grid_5', [['display', 'inline;'], ['float', 'left;'], ['margin-left', '10px;'], ['margin-right', '10px']]], ['.featured_container', '.container_12', '.grid_4', 'a', [['color', '#1d1d1d;'], ['float', 'right;'], ['width', '235px;'], ['height', '40px;'], ['text-align', 'center;'], ['line-height', '40px;'], ['border', '4px solid #141a20;']]]]
[0]:
  ['.container_12', '.container_16', [['margin-left', 'auto;'], ['margin-right', 'auto;'], ['width', '960px']]]
  - attrs: [['margin-left', 'auto;'], ['margin-right', 'auto;'], ['width', '960px']]
    - margin-left: 'auto;'
    - margin-right: 'auto;'
    - width: '960px'
  - refs: ['.container_12', '.container_16']
[1]:
  ['.grid_1', '.grid_2', '.grid_3', '.grid_4', '.grid_5', [['display', 'inline;'], ['float', 'left;'], ['margin-left', '10px;'], ['margin-right', '10px']]]
  - attrs: [['display', 'inline;'], ['float', 'left;'], ['margin-left', '10px;'], ['margin-right', '10px']]
    - display: 'inline;'
    - float: 'left;'
    - margin-left: '10px;'
    - margin-right: '10px'
  - refs: ['.grid_1', '.grid_2', '.grid_3', '.grid_4', '.grid_5']
[2]:
  ['.featured_container', '.container_12', '.grid_4', 'a', [['color', '#1d1d1d;'], ['float', 'right;'], ['width', '235px;'], ['height', '40px;'], ['text-align', 'center;'], ['line-height', '40px;'], ['border', '4px solid #141a20;']]]
  - attrs: [['color', '#1d1d1d;'], ['float', 'right;'], ['width', '235px;'], ['height', '40px;'], ['text-align', 'center;'], ['line-height', '40px;'], ['border', '4px solid #141a20;']]
    - border: '4px solid #141a20;'
    - color: '#1d1d1d;'
    - float: 'right;'
    - height: '40px;'
    - line-height: '40px;'
    - text-align: 'center;'
    - width: '235px;'
  - refs: ['.featured_container', '.container_12', '.grid_4', 'a']

{'border': '4px solid #141a20;',
 'color': '#1d1d1d;',
 'float': 'right;',
 'height': '40px;',
 'line-height': '40px;',
 'margin-left': 'auto;',
 'margin-right': 'auto;',
 'text-align': 'center;',
 'width': '235px;'}

我尝试使用这个精确的解决方案，也许我做错了什么，但这能处理整个css文件吗？我这样传递它：

temp\u css\u file=open（root+“/”+“style.css”，“r”）temp\u css\u content=temp\u css\u file.read（）temp\u css\u file.close（）parse\u css（temp\u css\u content，.container\u 12”）

但问题是两个打印都返回空。我所做的唯一更改是将其放入函数中。传递的文件有效，因此问题不存在。我尝试了一个短文本，结果似乎是一样的：

test_text=“.container_12，.container_16{margin left:auto；margin right:auto；width:960px}”result=parser.parseString（test_text）print（“”）print（result.dump（））

[]解析器依赖于原始CSS示例中所示的换行符来检测每个值的结尾。短文本不分隔键：value；元素放在单独的行上。请尝试

test\u text=“.container\u 12，.container\u 16{\n左边距：自动；\n右边距：自动；\n宽度：960px\n}”

。如果您需要在同一行上支持多个key:值，那么您需要细化

attr_spec

的定义，以便读取到下一个“；”或者“}”，而不是使用

restOfLine

。如果要在完整的CSS文件中筛选这些内容，也可以尝试使用

searchString

而不是

parseString

。无法保证这会匹配其他不需要的东西。这远远不是一个完整的CSS解析器。通常文件没有相同的行属性，这只是我试图弄明白为什么我的文件似乎不工作的一个例子。这是我用它测试的文件之一。我正在阅读python，如上面的评论所示。我是否应该进行任何转换以使其与您的解析器一起工作？