C#解析和更改yaml中的字符串_C#_Python_Parsing_Yaml_Yamldotnet

C#解析和更改yaml中的字符串

c# python parsing yaml

C#解析和更改yaml中的字符串,c#,python,parsing,yaml,yamldotnet,C#,Python,Parsing,Yaml,Yamldotnet,我正在寻找一种方法来解析yaml文件并更改每个字符串，然后在不更改原始文件结构的情况下保存文件。在我看来，我不应该使用正则表达式，而应该使用某种yaml解析器。 yaml输入波纹管示例如下： receipt: Oz-Ware Purchase Invoice date: 2007-08-06 customer: given: Dorothy items: - part_no: A4786 descrip: Water Bucket

我正在寻找一种方法来解析yaml文件并更改每个字符串，然后在不更改原始文件结构的情况下保存文件。在我看来，我不应该使用正则表达式，而应该使用某种yaml解析器。 yaml输入波纹管示例如下：

receipt:     Oz-Ware Purchase Invoice
date:        2007-08-06
customer:
    given:   Dorothy

items:
    - part_no:   A4786
      descrip:   Water Bucket (Filled)

    - part_no:   E1628
      descrip:   High Heeled "Ruby" Slippers
      size:      8

bill-to:  &id001
    street: |
            123 Tornado Alley
            Suite 16
    city:   East Centerville
    state:  KS

ship-to:  *id001

specialDelivery:  >
    Follow the Yellow Brick
    Road to the Emerald City.
...

期望输出：

receipt:     ###Oz-Ware Purchase Invoice###
date:        ###2007-08-06###
customer:
    given:   ###Dorothy###

items:
    - part_no:   ###A4786###
      descrip:   ###Water Bucket (Filled)###

    - part_no:   ###E1628###
      descrip:   ###High Heeled "Ruby" Slippers###
      size:      ###8###

bill-to:  ###&id001###
    street: |
            ###123 Tornado Alley
            Suite 16###
    city:   ###East Centerville###
    state:  ###KS###

ship-to:  ###*id001###

specialDelivery:  >
    ###Follow the Yellow Brick
    Road to the Emerald City.###
...

是否有一个好的yaml解析器可以处理复杂的yaml文件、更改字符串并将数据保存回去，而不会影响文档的结构？也许你有其他的办法来解决这个问题。基本上，我希望从文档顶部遍历每个字符串，并对字符串进行一些修改。

欢迎提供任何提示。

YAML规范包括：

在表示模型中，映射键没有顺序。要序列化映射，必须对其键进行排序。此顺序是一个序列化细节，在构成表示图时不应使用（因此用于保存应用程序数据）。在每个节点顺序重要的情况下，都必须使用序列。例如，一个有序映射可以表示为一系列映射，其中每个映射都是一个键：值对。YAML为这种情况提供了方便的紧凑表示法

因此，在加载和保存文档时，您真的不应该期望YAML维护任何顺序

话虽如此，我完全理解你来自哪里。因为YAML文档是为人类准备的，所以维护一定的顺序肯定是有帮助的。不幸的是，由于规范的原因，大多数实现将使用无序的数据结构来表示键/值映射。在C#和Python中，这将是一本字典；字典是按设计的，没有顺序

但是C#和Python都有有序的字典类型，而且，至少对于Python来说，过去有一些努力使用有序字典来维护键顺序：

是一种特殊的有序映射。它由PyYAML支持
讨论可能包括一个
```
OrderedLoader
```
。还有一个使用YAML构造函数的短期解决方案，以及加载程序的可能实现
具有也提供此功能的
最后，还包括将YAML加载到
```
OrderedDict
```
s中

这就是Python的一面；我相信C#实现也有类似的工作。

大多数YAML解析器都是为读取YAML而构建的，无论是由其他程序编写的还是由人工编辑的，都是为了编写YAML以供其他程序读取。众所周知，解析器缺乏编写YAML的能力，而YAML仍然是人类可读的：

映射键的顺序未定义
评论被扔掉
标量文字块样式（如果有）将被删除
标量周围的间距被丢弃
标量折叠信息（如果有）将被删除

加载加载的手工制作的YAML文件的转储将产生与初始加载相同的内部数据结构，但中间转储通常不像原始（手工制作的）YAML

如果您有Python程序：

import ruamel.yaml as yaml

yaml_str = """\
receipt:     Oz-Ware Purchase Invoice
date:        2007-08-06
customer:
    given:   Dorothy

items:
    - part_no:   A4786
      descrip:   Water Bucket (Filled)

    - part_no:   E1628
      descrip:   High Heeled "Ruby" Slippers
      size:      8

bill-to:  &id001
    street: |
            123 Tornado Alley
            Suite 16
    city:   East Centerville
    state:  KS

ship-to:  *id001

specialDelivery:  >
    Follow the Yellow Brick
    Road to the Emerald City.
"""

data1 = yaml.load(yaml_str, Loader=yaml.Loader)
dump_str = yaml.dump(data1, Dumper=yaml.Dumper)
data2 = yaml.load(dump_str, Loader=yaml.Loader)

然后，以下断言成立：

assert data1 == data2
assert dump_str != yaml_str

中间的

dump\u str

看起来像：

账单收件人：&id001{城市：东森特维尔，州：KS，街道：'123龙卷风巷
套房16
'}
顾客：{给定：多萝西}
日期：2007-08-06
项目：
-{描述：水桶（已装满），零件号：A4786}
-{描述：高跟“红宝石”拖鞋，零件号：E1628，尺寸：8}
收据：Oz洁具采购发票
发货地址：*id001
特殊服务：“沿着黄砖路到翡翠城。
'

以上是其他语言和在线YAML转换服务中的许多YAML解析器的默认行为。对于某些解析器来说，这是提供的唯一行为

我之所以启动ruamel.yaml作为PyYAML的一个增强，是为了从手工制作的yaml到内部数据，再到yaml，产生更好的可读性（我称之为往返），并保留更多信息（特别是注释）

给你：

收据：Oz-Ware采购发票
日期：2007-08-06
客户：
吉文：多萝西
项目：
-零件号：A4786
描述：水桶（已装满）
-零件号：E1628
描述：高跟“红宝石”拖鞋
尺码：8
账单收件人：&id001
街道：|
123龙卷风巷
套房16
城市：东森特维尔
州：KS
发货地址：*id001
特殊服务：“沿着黄砖路到翡翠城。
'

我的重点是注释、键、顺序和文字块样式。标量和折叠标量周围的间距（目前）并不特别

从那里开始（您也可以在PyYAML中执行此操作，但您不会拥有ruamel.yaml密钥顺序保持的内置增强功能），您可以提供特殊的发射器，或者在较低的级别连接到系统，覆盖

emitter.py

中的一些方法（并确保可以调用不需要处理的案件的原件：

def rewrite_write_plain(self, text, split=True):
    if self.state == self.expect_block_mapping_simple_value:
        text = '###' + text + '###'
        while self.column < 20:
            text = ' ' + text
            self.column += 1
    self._org_write_plain(text, split)

def rewrite_write_literal(self, text):
    if self.state == self.expect_block_mapping_simple_value:
        last_nl = False
        if text and text[-1] == '\n':
            last_nl = True
            text = text[:-1]
        text = '###' + text + '###'
        if False:
            extra_indent = ''
            while self.column < 15:
                text = ' ' + text
                extra_indent += ' '
                self.column += 1
            text = text.replace('\n', '\n' + extra_indent)
        if last_nl:
            text += '\n'
    self._org_write_literal(text)

def rewrite_write_single_quoted(self, text, split=True):
    if self.state == self.expect_block_mapping_simple_value:
        last_nl = False
        if text and text[-1] == u'\n':
            last_nl = True
            text = text[:-1]
        text = u'###' + text + u'###'
        if last_nl:
            text += u'\n'
    self.write_folded(text)

def rewrite_write_indicator(self, indicator, need_whitespace,
                    whitespace=False, indention=False):
    if indicator and indicator[0] in u"*&":
        indicator = u'###' + indicator + u'###'
        while self.column < 20:
            indicator = ' ' + indicator
            self.column += 1
    self._org_write_indicator(indicator, need_whitespace, whitespace,
                              indention)

dumper._org_write_plain = dumper.write_plain
dumper.write_plain = rewrite_write_plain
dumper._org_write_literal = dumper.write_literal
dumper.write_literal = rewrite_write_literal
dumper._org_write_single_quoted = dumper.write_single_quoted
dumper.write_single_quoted = rewrite_write_single_quoted
dumper._org_write_indicator = dumper.write_indicator
dumper.write_indicator = rewrite_write_indicator

print yaml.dump(data, Dumper=dumper, indent=4)

对于C#

中的进一步处理，这是可以接受的。您尝试过YamlDotNet吗？它似乎提供了您所需要的。大多数YAML解析器将丢弃值前的额外空格，并释放所有隐式对齐信息。我知道的解析器也将在读入时解释锚定和引用（并创建对相同数据的引用。我可以向您展示如何在Python中完成大部分工作（折叠样式标量是一个问题），如果这是一个选项，但由于它被标记为C，除非您确认这是正确的，否则我不会这样做。@Dreamweaver非常感谢您的建议，但我找不到任何如何迭代/更改每个字符串的示例。@Anthon虽然我更喜欢使用C，但我可以使用python解决方案作为替代方案。如果没有任何用C写的答案，我会的接受你的解决方案。谢谢你的解决方案，但是我得到了一些更复杂的文件：rua

def rewrite_write_plain(self, text, split=True):
    if self.state == self.expect_block_mapping_simple_value:
        text = '###' + text + '###'
        while self.column < 20:
            text = ' ' + text
            self.column += 1
    self._org_write_plain(text, split)

def rewrite_write_literal(self, text):
    if self.state == self.expect_block_mapping_simple_value:
        last_nl = False
        if text and text[-1] == '\n':
            last_nl = True
            text = text[:-1]
        text = '###' + text + '###'
        if False:
            extra_indent = ''
            while self.column < 15:
                text = ' ' + text
                extra_indent += ' '
                self.column += 1
            text = text.replace('\n', '\n' + extra_indent)
        if last_nl:
            text += '\n'
    self._org_write_literal(text)

def rewrite_write_single_quoted(self, text, split=True):
    if self.state == self.expect_block_mapping_simple_value:
        last_nl = False
        if text and text[-1] == u'\n':
            last_nl = True
            text = text[:-1]
        text = u'###' + text + u'###'
        if last_nl:
            text += u'\n'
    self.write_folded(text)

def rewrite_write_indicator(self, indicator, need_whitespace,
                    whitespace=False, indention=False):
    if indicator and indicator[0] in u"*&":
        indicator = u'###' + indicator + u'###'
        while self.column < 20:
            indicator = ' ' + indicator
            self.column += 1
    self._org_write_indicator(indicator, need_whitespace, whitespace,
                              indention)

dumper._org_write_plain = dumper.write_plain
dumper.write_plain = rewrite_write_plain
dumper._org_write_literal = dumper.write_literal
dumper.write_literal = rewrite_write_literal
dumper._org_write_single_quoted = dumper.write_single_quoted
dumper.write_single_quoted = rewrite_write_single_quoted
dumper._org_write_indicator = dumper.write_indicator
dumper.write_indicator = rewrite_write_indicator

print yaml.dump(data, Dumper=dumper, indent=4)