Python 使用。在文本上有效替换_Python_Python 3.x_Replace

Python 使用。在文本上有效替换

python python-3.x replace

Python 使用。在文本上有效替换,python,python-3.x,replace,Python,Python 3.x,Replace,我试图将文本中只出现一次的部分中的所有单词大写。我有一个位可以找到哪些单词只出现一次，但是当我用.upper版本替换原始单词时，其他一些单词也会大写。这是一个小程序，下面是代码 from collections import Counter from string import punctuation path = input("Path to file: ") with open(path) as f: word_counts = Counter(word.strip(punct

我试图将文本中只出现一次的部分中的所有单词大写。我有一个位可以找到哪些单词只出现一次，但是当我用

.upper

版本替换原始单词时，其他一些单词也会大写。这是一个小程序，下面是代码

from collections import Counter
from string import punctuation

 path = input("Path to file: ")
 with open(path) as f:
    word_counts = Counter(word.strip(punctuation) for line in f for word in line.replace(")", " ").replace("(", " ")
                      .replace(":", " ").replace("", " ").split())

wordlist = open(path).read().replace("\n", " ").replace(")", " ").replace("(", " ").replace("", " ")

unique = [word for word, count in word_counts.items() if count == 1]

for word in unique:
    print(word)
    wordlist = wordlist.replace(word, str(word.upper()))

print(wordlist)

结果应该是“

创世记37:1雅各住在他父亲的寄居地，在迦南地。

，因为寄居是第一个只出现一次的词。相反，它输出了《创世纪》37:1雅各布生活在他父亲的寄居地，迦南地。由于其他一些字母出现在关键字中，它也尝试将它们大写

有什么想法吗？

我重写了非常重要的代码，因为一些链式的

replace

调用可能被证明是不可靠的

导入字符串
#这句话。
创世纪37:1雅各住在他父亲寄居之地，就是迦南地
rm_punc=句子.翻译（无，字符串.标点符号）#删除标点符号
words=rm_punc.split（“”）#拆分空格以获得单词列表
#查找所有唯一的单词。
单个事件=[]
用文字表示：
#如果单词只出现一次，请将其附加到列表中
如果单词数（单词）=1：
单次出现。追加（word）
#对于每个唯一的单词，找到它的索引，并在索引处大写字母
#在初始字符串中（该索引处的字母也是
#单词）。注意字符串是不可变的，所以我们实际上是在创建一个新的
#在每次迭代中使用字符串。此外，有时小词会出现在其他单词中
#单词，例如“土地”内的“安”。为了确保我们的电话
#'index（）`找不到这些小词，我们跟踪'start'哪个词
#确保我们只从以前找到的单词的末尾进行搜索。
开始=0
对于单次出现的单词：
尝试：
word_idx=start+句子[start:]索引（word）
除值错误外：
#在句子中找不到单词。跳过它。
通过
其他：
#更新计数器。
开始=字+长（字）
#用大写字母重建句子。
第一个字母=句子[word\u idx]。大写（）
句子=句子[：单词+首字母+句子
打印（句子）

以模式替换文本需要

你的文字有点棘手，你必须

删除数字
拆下冲压件
分词
关注资本化：
```
'It's'
```
与
```
'It's'
```
更换mote时，仅更换完全匹配的
```
'remote'
```
vs
```
'mote'
```
等等

这应该可以做到这一点-有关解释，请参阅内部评论：

bible.txt

来自

从集合导入计数器
从字符串导入标点符号、数字
进口稀土
从集合导入defaultdict
将open（r“SO\AllThingsPython\P4\bible.txt”）作为f：
s=f.read（）
#获取一组不需要的字符并清除文本
ps=设置（标点符号+数字）
s2=''.join（如果c不在ps中，则c代表s中的c）
#分词
s3=s2.split（）
#创建一组每个单词的所有大写字母
repl=defaultdict（设置）
对于s3中的word：
repl[word.upper（）].add（word）#f.e.{…，'IN':{'IN'，'IN'}，'THE':{'THE'，'THE'}，…}
#计算所有单词的大写字母，并使用那些只出现一次的单词
单次出现\u upper\u单词=[w代表w，n在计数器中（（w.upper（）代表s3中的w））。如果n==1，则最常见的是（）
text=s
#现在替换部分-用于所有上部单字
对于单次出现的upp大写字母：
#对于文本中所有发生的大写字母
对于repl[upp]中的源代码：
#使用regex replace从repl dict中查找原始单词
#在前面/后面空格/punktuation，并用大写单词替换
text=re.sub（f）（？你能给出输入和所需的输出吗？有一种更简单的方法可以做到这一点，但如果没有输入，很难给出代码。@错误语法自责有输入，这是圣经的一个特定部分，因为我想不出任何更好的测试材料。请在问题中以格式化文本的形式发布一个最小的示例。链接（尤其是下载链接）是非常危险的。在你的代码中替换（“，”）

的目的是什么？@hwaring那一个…没有目的。删除它。你应该查看-你的for循环以获取单次事件的效率非常低（O（n**2）考虑一个“<代码> A的文本A A A A A A A A A A /代码>它将文本拆分为20代码> A <代码>计数“<代码> a”/>代码20次.计数器快得多--它只需要1遍通过数据.如果一个单字在另一个词里面：“代码> > 17”< /代码>如果<代码>en'是一个独特的词，它将被替换为

'seven'

。

from collections import Counter
from string import punctuation , digits

import re

from collections import defaultdict

with open(r"SO\AllThingsPython\P4\bible.txt") as f:
    s = f.read()

# get a set of unwanted characters and clean the text
ps = set(punctuation + digits)  
s2 = ''.join( c for c in s if c not in ps) 

# split into words
s3 = s2.split()

# create a set of all capitalizations of each word
repl = defaultdict(set)
for word in s3:
    repl[word.upper()].add(word)  # f.e. {..., 'IN': {'In', 'in'}, 'THE': {'The', 'the'}, ...}

# count all words _upper case_ and use those that only occure once
single_occurence_upper_words = [w for w,n in Counter( (w.upper() for w in s3) ).most_common() if n == 1]
text = s

# now the replace part - for all upper single words 
for upp in single_occurence_upper_words:

    # for all occuring capitalizations in the text
    for orig in repl[upp]:

        # use regex replace to find the original word from our repl dict with 
        # space/punktuation before/after it and replace it with the uppercase word
        text = re.sub(f"(?<=[{punctuation} ])({orig})(?=[{punctuation} ])",upp, text)

print(text)

Genesis 37:1 Jacob lived in the land of his father's SOJOURNINGS, in the land of Canaan.

2 These are the GENERATIONS of Jacob.

Joseph, being seventeen years old, was pasturing the flock with his brothers. He was a boy with the sons of Bilhah and Zilpah, his father's wives. And Joseph brought a BAD report of them to their father. 3 Now Israel loved Joseph more than any other of his sons, because he was the son of his old age. And he made him a robe of many colors. [a] 4 But when his brothers saw that their father loved him more than all his brothers, they hated him
and could not speak PEACEFULLY to him. 

<snipp>