Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/281.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 仅在以*CHI:_Python_Regex_Python 3.x - Fatal编程技术网

Python 仅在以*CHI:

Python 仅在以*CHI:,python,regex,python-3.x,Python,Regex,Python 3.x,我正在尝试编写一个Python脚本,仅在以*CHI:开头的行上标记所有英语单词,并在单词末尾加上“@s:eng”,但代码似乎不起作用。目前,代码如下所示: import re with open("transcript 0623.cha", encoding='utf8') as f: text = f.read() new_text = re.sub("A-Za-z", "A-Za-z@s:eng", text) with open("transcript 062

我正在尝试编写一个Python脚本,仅在以*CHI:开头的行上标记所有英语单词,并在单词末尾加上“@s:eng”,但代码似乎不起作用。目前,代码如下所示:

import re

with open("transcript 0623.cha", encoding='utf8') as f:

    text = f.read()

    new_text = re.sub("A-Za-z", "A-Za-z@s:eng", text)
    with open("transcript 0623_out.cha", "w", encoding='utf8') as result:
        result.write(new_text)
你能建议我如何改进代码吗

转录本0623的样本内容如下:

@Begin
@Languages: zho , eng
@Participants:  TEA Teacher , CHI Child
@ID:    zho,|change_me_later|TEA|||||Teacher|||
@ID:    zho,|change_me_later|CHI|||||Child|||
@Transcriber:   CKX
@Activities:    Storytelling
@Comment:   child used the malay word sayang
*TEA:   ok ,   来   ,   开始   .
*CHI:   呃   ,   the   boy@s   .
*TEA:   嗯   .
*CHI:   have a frog@s .
*TEA:   ok .
*TEA:   ok do you know what is boy in chinese ?
*TEA:   can you help me tell the story in chinese ?
*TEA:   ok then do you know what is a frog in chinese ?
*TEA:   ok , come .
*TEA:   go to the next page .
*CHI:   when the boy sleeping , then the frog come out@s .
*TEA:   ok .
*TEA:   还有 吗   ?
*CHI:   the cat also sleeping@s .
*TEA:   ok .
*TEA:   do you know what is cat in chinese ?
*TEA:   嗯   ,   what   is   it   ?
*CHI:   猫   .
*TEA:   ok .
*TEA:   so can you use your chinese for cat to help me tell the story ?
*TEA:   嗯   ?
*CHI:   猫   睡觉   .
*TEA:   啊   ,   很   好   .
*TEA:   还有   吗   ?
*CHI:   frog come out@s .
*TEA:   ok .
*TEA:   很   好   .
*TEA:   还有   吗   ?
*CHI:   next one@s .
*TEA:   ok .
*CHI:   the boy wake up@s .
*CHI:   and , the frog is gone@s .
*TEA:   嗯   .
*CHI:   then , maybe , the frog went out the window@s .
*TEA:   嗯   ,   ok   .
*CHI:   the boy is looking for the frog@s .
*TEA:   嗯   .
*CHI:   the cat is looking for the frog@s .
*TEA:   ok   what   is   cat   in   chinese   again   ?
*CHI:   what@s ?
*TEA:   what is cat in chinese again ?
*CHI:   猫   .
*TEA:   嗯   .
*TEA:   ok can you use the chinese word for cat to tell me the story again ?
*TEA:   嗯   ?
*CHI:   猫   looking   for   the@s   .
*TEA:   啊   .
*CHI:   for the@s .
*TEA:   嗯   .
*CHI:   frog@s .
*TEA:   ok .
*TEA:   very good .
*TEA:   anything else ?
*TEA:   ok .
*CHI:   the@s   猫   go   in@s   .
*CHI:   and put the bottle in here@s .
*TEA:   嗯   .
*CHI:   the boy has do this@s .
*TEA:   嗯   .
*CHI:   the cat fall down@s .
*TEA:   ok   what   is   cat   in   chinese   again   ?
*CHI:   猫   fall   down@s   .
*TEA:   嗯   .
*CHI:   and get the bottle@s .
*CHI:   get the bottle@s .
*TEA:   ok .
*TEA:   very good .
*TEA:   ok anything else ?
*TEA:   anything else ?
*TEA:   ok .
*CHI:   the   boy   go   and   sayang   the   cat@s   .
*TEA:   嗯   .
*TEA:   what is cat in chinese ?
*CHI:   the , the boy go and sayang the@s 猫 .
*TEA:   啊   ,   ok   .
*TEA:   very good .
*CHI:   and then the bottle break@s .
*TEA:   ok .
*TEA:   very good .
*TEA:   anything else ?
*TEA:   come .
*TEA:   ok this whole thing is together .
*CHI:   the boy is calling for the frog@s .
*TEA:   嗯   .
*CHI:   the cat is looking underneath the table@s .
*TEA:   ok   what   is   cat   in   chinese   again   ?
*CHI:   the@s 猫 looking for the frog underneath@s .
*TEA:   嗯   ,   ok   .
*CHI:   they looking inside the hole if the frog is here@s .
*TEA:   嗯   .
*TEA:   anything else ?
*CHI:   then the boy is here@s .
*TEA:   啊 , ok very good .
*TEA:   anything else ?
*CHI:   the boy fall down into the water@s .
*CHI:   and the cat also@s .
*CHI:   and then the log break@s .
*TEA:   嗯   .
*TEA:   do you know what is water in chinese ?
*TEA:   what is it ?
*CHI:   水   .
*TEA:   ok can you tell me the story again with the word , with the , with
    the chinese word for water ?
*TEA:   嗯   ?
*CHI:   the boy fall down@s .
*CHI:   and   the@s   猫   too@s   .
*CHI:   and   both   of   them   fall   in   the@s   水   .
*TEA:   ok , very good .
*CHI:   and then they all get wet@s .
*TEA:   嗯   .
*TEA:   ok .
*CHI:   they found some water on the log@s .
*TEA:   嗯   .
*CHI:   they found so many frogs@s .
*TEA:   嗯   .
*CHI:   and is this the frog that they have@s ?
*TEA:   嗯   .
*TEA:   ok .
*CHI:   then they say bye bye .
*TEA:   嗯   .
*TEA:   you know how to say bye bye in chinese ?
*CHI:   再见   .
*TEA:   ok .
*TEA:   can you repeat this part again in chinese ?
*CHI:   and   then   the   boy   and   the   cat   and   the   frog
    say@s   再见   .
*TEA:   ok   what   is   cat   in   chinese   again   ?
*CHI:   猫   .
*TEA:   啊   .
*TEA:   can you repeat the whole thing ?
*CHI:   the boy and the@s 猫 and the , and the frog@s .
*TEA:   嗯   .
*CHI:   say@s   再见   .
*TEA:   ok .
*TEA:   very good .
*TEA:   thank you for telling me the story ok ?
@End 

您的正则表达式不正确:

搜索模式正在查找“大写A、连字符、大写Z、小写A、连字符、小写Z”。如果您只想检查以“*CHI:”开头的行,那么“*CHI:”应该是您的搜索模式的一部分

替换模式将整个线路替换为“A-Za”-z@s:eng”。您需要捕获要保留的文本部分,然后重新使用它们,并在单词末尾添加“@s:eng”

以下是您可以使用的:

重新导入
i_path=“转录本0623.cha”
o_path=“转录本0623_out.cha”
标记\u pattern=re.compile(\\*CHI:.*))
word_pattern=re.compile(([A-Za-z]+))
将open(i_路径,encoding='utf8')作为i_文件,将open(o_路径,“w”,encoding='utf8')作为o_文件:
对于i_文件中的行:
#分成可能的词
parts=line.split()
如果mark_pattern.match(零件[0])为无:
o_文件写入(行)
持续
#有一条气线吗
新线
对于第[1]部分中的单词:
匹配=单词\模式。匹配(单词)
如果匹配:
old=f“\\b{word}\\b”
new=f“{matches.group(1)}@s:eng”
新线=re.sub(旧线、新线、新线、计数=1)
o_file.write(新_行)
说明:

  • mark\u pattern=re.compile(\\*CHI:.*)
    • 匹配以“*CHI:*”开头的行的模式。您需要在开头转义
      *
      ,因为
    • re
      文档说,“当表达式在单个程序中多次使用时,使用并保存生成的正则表达式对象以供重用更为有效。”
  • word\u pattern=re.compile(([A-Za-z]+)”)
    • 匹配单词的模式。您需要使用
      []
      指示一组字符,然后使用
      +
      指示匹配前面模式的一个或多个重复
  • 用于i_文件中的行
    • 逐行处理文件会更容易(而且内存效率更高)。您可以轻松调试正则表达式搜索并替换每行。也许可以在一次
      read()
      /
      readlines()
      中完成所有这些操作,但我更喜欢可读性
  • parts=line.split()
    • 要查找单词,请将行拆分为可能的单词
  • .match(..)
    • 它返回一个,如果您在正则表达式模式中有捕获(
      ()
      ),您可以使用
      .group()
      访问它们。这用于将
      word
      更改为
      word@s:eng
首先,我检查第一个单词(
parts[0]
)是否为“CHI”模式。如果不是,只需将行按原样写入输出文件。如果是,则按单词继续处理

对于每个可能的单词,检查其是否与单词模式匹配。如果匹配,请使用
re.sub
将行中的旧单词替换为
word@s:eng
。重复此匹配,然后替换每个单词,并在
新行中累积替换项。请注意,使用
匹配项。分组(1)
,我将替换原始行中的
@s
(与中的一样)frog@s“变成”frog@s:eng“)

我对
旧的
新的
使用了f字符串。如果不在Python3.6+上,可以使用常规字符串连接/格式化

结果:

I:*CHI:祝你好运frog@s .
O:*迟浩田:have@s:enga@s:engfrog@s:eng。
I:*CHI:猫在看地板下面table@s .
O:*迟浩田:the@s:eng@s:engcat@s:engis@s:englooking@s:engunderneath@s:eng thetable@s:eng。
(忽略标点符号)
I:*迟:what@s ?
O:*迟浩田:what@s:英语?
(忽略行中的非英语单词)
I:*迟:猫   落下down@s   .
O:*迟浩田:猫   fall@s:engdown@s:eng。
(如果不是以CHI开头,则不受影响)
I:*茶:好吧,这一切都在一起。
O:*茶:好吧,这一切都在一起。

您的正则表达式不正确:

搜索模式正在查找“大写字母A、连字符、大写字母Z、小写字母A、连字符、小写字母Z”。如果您只想检查以“*CHI:”开头的行,那么“*CHI:”应该是搜索模式的一部分

替换模式将整个线路替换为“A-Za”-z@s:eng”。您需要捕获文本中要保留的部分,然后重新使用它们,并在单词末尾添加“@s:eng”

以下是您可以使用的:

重新导入
i_path=“转录本0623.cha”
o_path=“转录本0623_out.cha”
标记\u pattern=re.compile(\\*CHI:.*))
word_pattern=re.compile(([A-Za-z]+))
将open(i_路径,encoding='utf8')作为i_文件,将open(o_路径,“w”,encoding='utf8')作为o_文件:
对于i_文件中的行:
#分成可能的词
parts=line.split()
如果mark_pattern.match(零件[0])为无:
o_文件写入(行)
持续
#有一条气线吗
新线
对于第[1]部分中的单词:
匹配=单词\模式。匹配(单词)
如果匹配:
old=f“\\b{word}\\b”
new=f“{matches.group(1)}@s:eng”
新线=re.sub(旧线、新线、新线、计数=1)
o_file.write(新_行)
说明:

  • mark\u pattern=re.compile(\\*CHI:.*)
    • 以“*CHI:*”开头的匹配行的模式。您需要在开始时退出
      *
      new_text = re.sub("A-Za-z", "A-Za-z@s:eng", text)