Python 仅在以*CHI:_Python_Regex_Python 3.x

Python 仅在以*CHI:

python regex python-3.x

Python 仅在以*CHI:,python,regex,python-3.x,Python,Regex,Python 3.x,我正在尝试编写一个Python脚本，仅在以*CHI:开头的行上标记所有英语单词，并在单词末尾加上“@s:eng”，但代码似乎不起作用。目前，代码如下所示： import re with open("transcript 0623.cha", encoding='utf8') as f: text = f.read() new_text = re.sub("A-Za-z", "A-Za-z@s:eng", text) with open("transcript 062

我正在尝试编写一个Python脚本，仅在以*CHI:开头的行上标记所有英语单词，并在单词末尾加上“@s:eng”，但代码似乎不起作用。目前，代码如下所示：

import re

with open("transcript 0623.cha", encoding='utf8') as f:

    text = f.read()

    new_text = re.sub("A-Za-z", "A-Za-z@s:eng", text)
    with open("transcript 0623_out.cha", "w", encoding='utf8') as result:
        result.write(new_text)

你能建议我如何改进代码吗

转录本0623的样本内容如下：

@Begin
@Languages: zho , eng
@Participants:  TEA Teacher , CHI Child
@ID:    zho,|change_me_later|TEA|||||Teacher|||
@ID:    zho,|change_me_later|CHI|||||Child|||
@Transcriber:   CKX
@Activities:    Storytelling
@Comment:   child used the malay word sayang
*TEA:   ok ,   来   ,   开始   .
*CHI:   呃   ,   the   boy@s   .
*TEA:   嗯   .
*CHI:   have a frog@s .
*TEA:   ok .
*TEA:   ok do you know what is boy in chinese ?
*TEA:   can you help me tell the story in chinese ?
*TEA:   ok then do you know what is a frog in chinese ?
*TEA:   ok , come .
*TEA:   go to the next page .
*CHI:   when the boy sleeping , then the frog come out@s .
*TEA:   ok .
*TEA:   还有 吗   ?
*CHI:   the cat also sleeping@s .
*TEA:   ok .
*TEA:   do you know what is cat in chinese ?
*TEA:   嗯   ,   what   is   it   ?
*CHI:   猫   .
*TEA:   ok .
*TEA:   so can you use your chinese for cat to help me tell the story ?
*TEA:   嗯   ?
*CHI:   猫   睡觉   .
*TEA:   啊   ,   很   好   .
*TEA:   还有   吗   ?
*CHI:   frog come out@s .
*TEA:   ok .
*TEA:   很   好   .
*TEA:   还有   吗   ?
*CHI:   next one@s .
*TEA:   ok .
*CHI:   the boy wake up@s .
*CHI:   and , the frog is gone@s .
*TEA:   嗯   .
*CHI:   then , maybe , the frog went out the window@s .
*TEA:   嗯   ,   ok   .
*CHI:   the boy is looking for the frog@s .
*TEA:   嗯   .
*CHI:   the cat is looking for the frog@s .
*TEA:   ok   what   is   cat   in   chinese   again   ?
*CHI:   what@s ?
*TEA:   what is cat in chinese again ?
*CHI:   猫   .
*TEA:   嗯   .
*TEA:   ok can you use the chinese word for cat to tell me the story again ?
*TEA:   嗯   ?
*CHI:   猫   looking   for   the@s   .
*TEA:   啊   .
*CHI:   for the@s .
*TEA:   嗯   .
*CHI:   frog@s .
*TEA:   ok .
*TEA:   very good .
*TEA:   anything else ?
*TEA:   ok .
*CHI:   the@s   猫   go   in@s   .
*CHI:   and put the bottle in here@s .
*TEA:   嗯   .
*CHI:   the boy has do this@s .
*TEA:   嗯   .
*CHI:   the cat fall down@s .
*TEA:   ok   what   is   cat   in   chinese   again   ?
*CHI:   猫   fall   down@s   .
*TEA:   嗯   .
*CHI:   and get the bottle@s .
*CHI:   get the bottle@s .
*TEA:   ok .
*TEA:   very good .
*TEA:   ok anything else ?
*TEA:   anything else ?
*TEA:   ok .
*CHI:   the   boy   go   and   sayang   the   cat@s   .
*TEA:   嗯   .
*TEA:   what is cat in chinese ?
*CHI:   the , the boy go and sayang the@s 猫 .
*TEA:   啊   ,   ok   .
*TEA:   very good .
*CHI:   and then the bottle break@s .
*TEA:   ok .
*TEA:   very good .
*TEA:   anything else ?
*TEA:   come .
*TEA:   ok this whole thing is together .
*CHI:   the boy is calling for the frog@s .
*TEA:   嗯   .
*CHI:   the cat is looking underneath the table@s .
*TEA:   ok   what   is   cat   in   chinese   again   ?
*CHI:   the@s 猫 looking for the frog underneath@s .
*TEA:   嗯   ,   ok   .
*CHI:   they looking inside the hole if the frog is here@s .
*TEA:   嗯   .
*TEA:   anything else ?
*CHI:   then the boy is here@s .
*TEA:   啊 , ok very good .
*TEA:   anything else ?
*CHI:   the boy fall down into the water@s .
*CHI:   and the cat also@s .
*CHI:   and then the log break@s .
*TEA:   嗯   .
*TEA:   do you know what is water in chinese ?
*TEA:   what is it ?
*CHI:   水   .
*TEA:   ok can you tell me the story again with the word , with the , with
    the chinese word for water ?
*TEA:   嗯   ?
*CHI:   the boy fall down@s .
*CHI:   and   the@s   猫   too@s   .
*CHI:   and   both   of   them   fall   in   the@s   水   .
*TEA:   ok , very good .
*CHI:   and then they all get wet@s .
*TEA:   嗯   .
*TEA:   ok .
*CHI:   they found some water on the log@s .
*TEA:   嗯   .
*CHI:   they found so many frogs@s .
*TEA:   嗯   .
*CHI:   and is this the frog that they have@s ?
*TEA:   嗯   .
*TEA:   ok .
*CHI:   then they say bye bye .
*TEA:   嗯   .
*TEA:   you know how to say bye bye in chinese ?
*CHI:   再见   .
*TEA:   ok .
*TEA:   can you repeat this part again in chinese ?
*CHI:   and   then   the   boy   and   the   cat   and   the   frog
    say@s   再见   .
*TEA:   ok   what   is   cat   in   chinese   again   ?
*CHI:   猫   .
*TEA:   啊   .
*TEA:   can you repeat the whole thing ?
*CHI:   the boy and the@s 猫 and the , and the frog@s .
*TEA:   嗯   .
*CHI:   say@s   再见   .
*TEA:   ok .
*TEA:   very good .
*TEA:   thank you for telling me the story ok ?
@End

您的正则表达式不正确：

搜索模式正在查找“大写A、连字符、大写Z、小写A、连字符、小写Z”。如果您只想检查以“*CHI:”开头的行，那么“*CHI:”应该是您的搜索模式的一部分

替换模式将整个线路替换为“A-Za”-z@s：eng”。您需要捕获要保留的文本部分，然后重新使用它们，并在单词末尾添加“@s:eng”

以下是您可以使用的：

重新导入
i_path=“转录本0623.cha”
o_path=“转录本0623_out.cha”
标记\u pattern=re.compile（\\*CHI:.*））
word_pattern=re.compile（（[A-Za-z]+））
将open（i_路径，encoding='utf8'）作为i_文件，将open（o_路径，“w”，encoding='utf8'）作为o_文件：
对于i_文件中的行：
#分成可能的词
parts=line.split（）
如果mark_pattern.match（零件[0]）为无：
o_文件写入（行）
持续
#有一条气线吗
新线
对于第[1]部分中的单词：
匹配=单词\模式。匹配（单词）
如果匹配：
old=f“\\b{word}\\b”
new=f“{matches.group（1）}@s:eng”
新线=re.sub（旧线、新线、新线、计数=1）
o_file.write（新_行）

说明：

```
mark\u pattern=re.compile（\\*CHI:.*）
```
- 匹配以“*CHI:*”开头的行的模式。您需要在开头转义
```
*
```
  ，因为
- ```
re
```
  文档说，“当表达式在单个程序中多次使用时，使用并保存生成的正则表达式对象以供重用更为有效。”
```
word\u pattern=re.compile（（[A-Za-z]+）”）
```
- 匹配单词的模式。您需要使用
```
[]
```
  指示一组字符，然后使用
```
+
```
  指示匹配前面模式的一个或多个重复
```
用于i_文件中的行
```
- 逐行处理文件会更容易（而且内存效率更高）。您可以轻松调试正则表达式搜索并替换每行。也许可以在一次
```
read（）
```
  /
```
readlines（）
```
  中完成所有这些操作，但我更喜欢可读性
```
parts=line.split（）
```
- 要查找单词，请将行拆分为可能的单词
```
.match（..）
```
- 看
- 它返回一个，如果您在正则表达式模式中有捕获（
```
（）
```
  ），您可以使用
```
.group（）
```
  访问它们。这用于将
```
word
```
  更改为
```
word@s：eng
```

首先，我检查第一个单词（

parts[0]

）是否为“CHI”模式。如果不是，只需将行按原样写入输出文件。如果是，则按单词继续处理

对于每个可能的单词，检查其是否与单词模式匹配。如果匹配，请使用

re.sub

将行中的旧单词替换为

word@s：eng

。重复此匹配，然后替换每个单词，并在

新行中累积替换项。请注意，使用匹配项。分组（1）
，我将替换原始行中的@s
（与中的一样）frog@s“变成”frog@s：eng“）
我对旧的
和新的
使用了f字符串。如果不在Python3.6+上，可以使用常规字符串连接/格式化
结果：
I:*CHI：祝你好运frog@s .
O:*迟浩田：have@s：enga@s：engfrog@s：eng。
I:*CHI:猫在看地板下面table@s .
O:*迟浩田：the@s:eng@s：engcat@s：engis@s：englooking@s：engunderneath@s：eng thetable@s：eng。
（忽略标点符号）
I:*迟：what@s ?
O:*迟浩田：what@s：英语？
（忽略行中的非英语单词）
I:*迟：猫   落下down@s   .
O:*迟浩田：猫   fall@s：engdown@s：eng。
（如果不是以CHI开头，则不受影响）
I:*茶：好吧，这一切都在一起。
O:*茶：好吧，这一切都在一起。
您的正则表达式不正确：
搜索模式正在查找“大写字母A、连字符、大写字母Z、小写字母A、连字符、小写字母Z”。如果您只想检查以“*CHI:”开头的行，那么“*CHI:”应该是搜索模式的一部分
替换模式将整个线路替换为“A-Za”-z@s：eng”。您需要捕获文本中要保留的部分，然后重新使用它们，并在单词末尾添加“@s:eng”
以下是您可以使用的：
重新导入
i_path=“转录本0623.cha”
o_path=“转录本0623_out.cha”
标记\u pattern=re.compile（\\*CHI:.*））
word_pattern=re.compile（（[A-Za-z]+））
将open（i_路径，encoding='utf8'）作为i_文件，将open（o_路径，“w”，encoding='utf8'）作为o_文件：
对于i_文件中的行：
#分成可能的词
parts=line.split（）
如果mark_pattern.match（零件[0]）为无：
o_文件写入（行）
持续
#有一条气线吗
新线
对于第[1]部分中的单词：
匹配=单词\模式。匹配（单词）
如果匹配：
old=f“\\b{word}\\b”
new=f“{matches.group（1）}@s:eng”
新线=re.sub（旧线、新线、新线、计数=1）
o_file.write（新_行）

说明：

mark\u pattern=re.compile（\\*CHI:.*）

以“*CHI:*”开头的匹配行的模式。您需要在开始时退出*
new_text = re.sub("A-Za-z", "A-Za-z@s:eng", text)