Python 计算给定文本中的特定标点符号，而不使用正则表达式或其他模块_Python_String_Dictionary_Substring_Counting

Python 计算给定文本中的特定标点符号，而不使用正则表达式或其他模块

python string dictionary

Python 计算给定文本中的特定标点符号，而不使用正则表达式或其他模块,python,string,dictionary,substring,counting,Python,String,Dictionary,Substring,Counting,我有一个文本文件，其中有一个以段落形式书写的巨大文本。我需要计算某些标点符号：不使用任何模块，甚至不使用regex 计数、和还需要计算”和-，但仅在某些情况下。具体而言：计数”标记，但仅当标记以撇号形式出现并被字母包围时，即表示收缩，如“不应该”或“不会”。（撇号被包括在内，作为更为非正式的写作，也许是直接演讲的一种表示。）计数-符号，但仅当它们被字母包围时，表示一个复合词，如“自尊” 任何其他标点符号或字母，例如数字，都应视为空白，以便用作单词的结尾注意：我们将使用的一些文本

我有一个文本文件，其中有一个以段落形式书写的巨大文本。
我需要计算某些标点符号：

不使用任何模块，甚至不使用
```
regex
```
计数
```
、
```
和


还需要计算”
和-
，但仅在某些情况下。具体而言：

计数”
标记，但仅当标记以撇号形式出现并被字母包围时，即表示收缩，如“不应该”或“不会”。（撇号被包括在内，作为更为非正式的写作，也许是直接演讲的一种表示。）
计数-
符号，但仅当它们被字母包围时，表示一个复合词，如“自尊”

任何其他标点符号或字母，例如数字，都应视为空白，以便用作单词的结尾
注意：我们将使用的一些文本包括双连字符，即--
。这将被视为一个空格字符


我首先创建了一个字符串，并在其中存储了一些标点符号，例如标点符号_string=“；./”-“
，但它给出了总数；我需要的是单个标点符号的计数。

正因为如此，我不得不改变某些\u cha
可变次数
with open("/Users/abhishekabhishek/downloads/l.txt") as f:
    text_lis = f.read().split()
punctuation_count = {}
certain_cha = "/"
freq_coun = 0
for word in text_lis:
    for char in word:
       if char in certain_char:
        freq_coun += 1
 punctuation_count[certain_char] = freq_count 

我需要这样显示值：
; 40

. 10

/ 5

' 16

等等。
但我得到的是总数（71）。
以下内容应该有效：
text=open（“/Users/abhishekabishek/downloads/l.txt”）.read（）
text=文本。替换（“-”，“”）
对于“-”中的符号：
text=文本。替换（符号+“”，“”）
text=文本。替换（“+符号，”）
对于“，/”中的符号：
打印（符号、文本、计数（符号））
因为您不想导入任何内容，这将很慢，需要一些时间，但应该可以：
file=open（）#输入文件路径作为参数
lines=file.readline（）#输入文档中的行数作为参数
search_chars=['，'，'；'，“'，'-”]#存储要搜索的值
search_values={'，':0'；':0，'':0'-':0}字典保存出现的次数
空白=[''，'-'，'1'，'2'，…]#您可以根据需要向此列表中添加任何内容
对于行中的行：
对于搜索字符中的搜索：
如果在行和中搜索（在搜索字符中搜索）：
chars=line.split（）
对于以字符表示的Chu索引：
如果字符[Chu索引]=='，'：
搜索_值['，']+=1
elif字符[Chu索引]='；'：
搜索_值['；']+=1
elif字符[Chu索引]==“'”和not（空格中的字符[Chu索引-1]和not（空格中的字符[Chu索引+1]）：
搜索_值[“'”]+=1
elif chars[ch_index]==“-”和not（空格中的chars[ch_index-1]）和not（空格中的chars[ch_index+1]）：
搜索_值[“-”]+=1
对于范围内的键（search_values.keys（））：
打印（str（键）+'：'+搜索值[键]）

这显然不是最优的，在这里使用正则表达式更好，但它应该可以工作
如果有任何问题，请随时提问。
您需要创建一个字典，其中每个条目都存储了这些标点符号的计数。

对于逗号和分号，我们可以简单地进行字符串搜索来计算单词中出现的次数。但是我们需要稍微不同地处理”
和-

这应考虑到所有情况：
with open("/Users/abhishekabhishek/downloads/l.txt") as f:
    text_words = f.read().split()
punctuation_count = {}
punctuation_count[','] = 0
punctuation_count[';'] = 0
punctuation_count["'"] = 0
punctuation_count['-'] = 0


def search_for_single_quotes(word):
    single_quote = "'"
    search_char_index = word.find(single_quote)
    search_char_count = word.count(single_quote)
    if search_char_index == -1 and search_char_count != 1:
        return
    index_before = search_char_index - 1
    index_after = search_char_index + 1
    # Check if the characters before and after the quote are alphabets,
    # and the alphabet after the quote is the last character of the word.
    # Will detect `won't`, `shouldn't`, but not `ab'cd`, `y'ess`
    if index_before >= 0 and word[index_before].isalpha() and \
            index_after == len(word) - 1 and word[index_after].isalpha():
        punctuation_count[single_quote] += 1


def search_for_hyphens(word):
    hyphen = "-"
    search_char_index = word.find(hyphen)
    if search_char_index == -1:
        return
    index_before = search_char_index - 1
    index_after = search_char_index + 1
    # Check if the character before and after hyphen is an alphabet.
    # You can also change it check for characters as well as numbers
    # depending on your use case.
    if index_before >= 0 and word[index_before].isalpha() and \
            index_after < len(word) and word[index_after].isalpha():
        punctuation_count[hyphen] += 1


for word in text_words:
    for search_char in [',', ';']:
        search_char_count = word.count(search_char)
        punctuation_count[search_char] += search_char_count
    search_for_single_quotes(word)
    search_for_hyphens(word)


print(punctuation_count)

打开（“/Users/abhishekabishek/downloads/l.txt”）作为f：
text_words=f.read（）.split（）
标点符号\计数={}
标点符号计数['，']=0
标点符号计数['；']=0
标点符号计数[“'”]=0
标点符号计数['-']=0
def搜索单引号（word）：
单引号=“”
search\u char\u index=word.find（单引号）
search\u char\u count=word.count（单引号）
如果search\u char\u index=-1，search\u char\u count！=1:
返回
索引\u before=搜索\u字符\u索引-1
index\u after=search\u char\u index+1
#检查引号前后的字符是否为字母，
#引号后面的字母是单词的最后一个字符。
#将检测“不会”、“不应该”，但不会检测“ab'cd”、“y'ess”`
如果index\u before>=0和word[index\u before].isalpha（）和\
index\u after==len（word）-1和word[index\u after].isalpha（）：
标点符号计数[单引号]+=1
def搜索字符（word）：
连字符=“-”
search\u char\u index=word.find（连字符）
如果搜索字符索引==-1：
返回
索引\u before=搜索\u字符\u索引-1
index\u after=search\u char\u index+1
#检查连字符前后的字符是否为字母表。
#您还可以将其更改为检查字符和数字
#取决于您的用例。
如果index\u before>=0和word[index\u before].isalpha（）和\
index\u after
但它说只有当连字符和单引号被字母swell包围时，我才更新。这并不能解决-.
的情况。当然也有例外，但这并不完美——添加更多的代码会使它更像“意大利面条”——这就是为什么我们有正则表达式。无论如何，正则表达式是我建议在生产中使用的。是的，我知道使用正则表达式更好，但我不允许使用正则表达式。如果您需要搜索任何内容，您可以使用附加的if语句和搜索字符和搜索值中的更多条目轻松扩展代码。因此，您不能使用内置术语