Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Python将引号转换为Latex格式_Python_Regex_Python 3.x_Latex - Fatal编程技术网

使用Python将引号转换为Latex格式

使用Python将引号转换为Latex格式,python,regex,python-3.x,latex,Python,Regex,Python 3.x,Latex,tl;dr版本 我有一段可能包含引语(例如,“诸如此类”,“这一段也”等等)。现在,在python 3.0的帮助下,我必须用latex风格的引号(例如,`blah blah',`this alway',等等)来代替它 背景 我有很多纯文本文件(超过100个)。现在我必须用从这些文件中提取的内容制作一个Latex文档,在对它们进行少量文本处理后。为此,我使用Python 3.0。现在我可以制作其他所有文件(如转义字符、节等)工作,但在我无法得到引号正确 我可以用正则表达式(如上所述)找到模式,但如

tl;dr版本

我有一段可能包含引语(例如,“诸如此类”,“这一段也”等等)。现在,在python 3.0的帮助下,我必须用latex风格的引号(例如,`blah blah',`this alway',等等)来代替它

背景

我有很多纯文本文件(超过100个)。现在我必须用从这些文件中提取的内容制作一个Latex文档,在对它们进行少量文本处理后。为此,我使用Python 3.0。现在我可以制作其他所有文件(如转义字符、节等)工作,但在我无法得到引号正确

我可以用正则表达式(如上所述)找到模式,但如何用给定的模式替换它?我不知道在这种情况下如何使用“re.sub()”函数。因为我的字符串中可能有多个引号实例。有一个与此相关的问题,但如何用python实现它?

设计注意事项
  • 我只考虑了常规的
    “双引号”
    “单引号”
    。可能还有其他引号(请参阅)
  • LaTeX尾端报价也是单报价-我们不希望捕获LaTeX尾端报价(例如“LaTeX double quote”),并将其误认为是单报价(大约为零)
  • 单词缩略语和所有权
    包含单引号(例如
    约翰的
    )。其特点是引号两边都有字母字符
  • 普通名词(复数所有权)在单词后面有单引号(例如,
    女演员的角色
  • 解决方案 输入文件(
    test.txt
    ):

    输出(
    Output.txt
    ):

    (注意:注释的前面加了注释,以停止对帖子的输出进行格式化!)

    解释
    我们将分解这个正则表达式模式,
    (?正则表达式对于某些任务来说非常好,但它们仍然有限(阅读以获取更多信息)。为这个任务编写解析器似乎更容易消除错误

    我为这个任务创建了一个简单的函数并添加了注释。如果仍然有关于实现的问题,请询问

    代码():

    打印出:

    This is my ``test" String
    This is my ``test' String
    This is my ``test' String
    This is my ``test" String which has ``two" quotes
    This is my ``test' String which has ``two' quotes
    This is my ``test' String which has ``two" quotes
    This is my ``test" String which has ``two' quotes
    
    注意:解析嵌套引号不明确

    例如: 字符串
    ”鲍勃说:“爱丽丝说:你好”
    嵌套在适当的语言中

    但是:

    字符串
    “bob said:hi”和“alice said:hello”
    没有嵌套


    如果是这种情况,您可能希望首先将这些嵌套引号解析为不同的引号,或者使用括号
    ()
    消除嵌套引号的歧义。

    感谢您提供了这么好的解释,但是单引号功能(texify_single_quote)不工作。:/不用担心!你能告诉我它是怎么工作的吗?在我的系统上似乎工作得很好。啊,我明白了,我想可能是当我们有这样一个字符串时:(这是我的“test”字符串,这是一个“double”)。可能是“test”之后的单引号和“double”之后的单引号“正在匹配。抱歉,我必须阅读更多关于Regex的内容以找到答案-我可以的时候会给你回复。P.s.使用括号,因为我无法在代码块的任何地方都使用这些引号!无论如何,感谢你的努力。+1:)嗯,所以我在文本文件上尝试了它(请参见编辑),如果你只是在使用双过滤器之前使用单过滤器,事情似乎很顺利——不确定这是否可行(但将继续研究更“稳健”的解决方案)有趣的方法!
    with open("test.txt", 'r') as fd_in, open("output.txt", 'w') as fd_out:
        for line in fd_in.readlines():
    
            #Test for commutativity
            assert texify_single_quote(texify_double_quote(in_string)) == texify_double_quote(texify_single_quote(in_string))
    
            line = texify_single_quote(line)
            line = texify_double_quote(line)
            fd_out.write(line)
    
    # 'single', 'single', "double"
    # 'single', "double", 'single'
    # "double", 'single', 'single'
    # "double", "double", 'single'
    # "double", 'single', "double"
    # I'm a 'single' person
    # I'm a "double" person?
    # Ownership for plural words; the peoples' 'rights'
    # John's dog barked 'Woof!', and Fred's parents' 'loving' cat ran away.
    # "A double-quoted phrase, with a 'single' quote inside"
    # 'A single-quoted phrase with a "double quote" inside, with contracted words such as "don't"'
    # 'A single-quoted phrase with a regular noun such as actresses' roles'
    
    # `single', `single', ``double''
    # `single', ``double'', `single'
    # ``double'', `single', `single'
    # ``double'', ``double'', `single'
    # ``double'', `single', ``double''
    # I'm a `single' person
    # I'm a ``double'' person?
    # Ownership for plural words; the peoples' `rights'
    # John's dog barked `Woof!', and Fred's parents' `loving' cat ran away.
    # ``A double-quoted phrase, with a `single' quote inside''
    # `A single-quoted phrase with a ``double quote'' inside, with contracted words such as ``don't'''
    # `A single-quoted phrase with a regular noun such as actresses' roles'
    
    the_text = '''
    This is my \"test\" String
    This is my \'test\' String
    This is my 'test' String
    This is my \"test\" String which has \"two\" quotes
    This is my \'test\' String which has \'two\' quotes
    This is my \'test\' String which has \"two\" quotes
    This is my \"test\" String which has \'two\' quotes
    '''
    
    
    def convert_quotes(txt, quote_type):
        # find all quotes
        quotes_pos = []
        idx = -1
    
        while True:
            idx = txt.find(quote_type, idx+1)
            if idx == -1:
                break
            quotes_pos.append(idx)
    
        if len(quotes_pos) % 2 == 1:
            raise ValueError('bad number of quotes of type %s' % quote_type)
    
        # replace quote with ``
        new_txt = []
        last_pos = -1
    
        for i, pos in enumerate(quotes_pos):
            # ignore the odd quotes - we dont replace them
            if i % 2 == 1:
                continue
            new_txt += txt[last_pos+1:pos]
            new_txt += '``'
            last_pos = pos
    
        # append the last part of the string
        new_txt += txt[last_pos+1:]
    
        return ''.join(new_txt)
    
    print(convert_quotes(convert_quotes(the_text, '\''), '"'))
    
    This is my ``test" String
    This is my ``test' String
    This is my ``test' String
    This is my ``test" String which has ``two" quotes
    This is my ``test' String which has ``two' quotes
    This is my ``test' String which has ``two" quotes
    This is my ``test" String which has ``two' quotes