JSON文件：使用Python为不同对象单独计算字数_Python_Json_Text_Nlp

JSON文件：使用Python为不同对象单独计算字数

python json text nlp

JSON文件：使用Python为不同对象单独计算字数,python,json,text,nlp,Python,Json,Text,Nlp,对于当前的一个研究项目，我计划计算JSON文件中不同对象的唯一单词。理想情况下，输出文件应该为“Text Main”、“Text Pro”和“Text Con”中的文本显示单独的字数摘要（计算唯一单词的出现次数）。有什么聪明的办法可以让这一切发生吗目前，我收到以下错误消息： File "index.py", line 10, in <module> text = data["Text_Main"] TypeError: list indices must be integers o

对于当前的一个研究项目，我计划计算JSON文件中不同对象的唯一单词。理想情况下，输出文件应该为

“Text Main”

、

“Text Pro”

和

“Text Con”

中的文本显示单独的字数摘要（计算唯一单词的出现次数）。有什么聪明的办法可以让这一切发生吗

目前，我收到以下错误消息：

File "index.py", line 10, in <module>
text = data["Text_Main"]
TypeError: list indices must be integers or slices, not str

相应的代码如下所示：

# Import relevant libraries
import string
import json
import csv
import textblob

# Open JSON file and slice by object
file = open("Glassdoor_A.json", "r")
data = json.load(file)
text = data["Text_Main"]

# Create an empty dictionary
d = dict()

# Loop through each line of the file
for line in text:
    # Remove the leading spaces and newline character
    line = line.strip()

    # Convert the characters in line to
    # lowercase to avoid case mismatch
    line = line.lower()

    # Remove the punctuation marks from the line
    line = line.translate(line.maketrans("", "", string.punctuation))

    # Split the line into words
    words = line.split(" ")

    # Iterate over each word in line
    for word in words:
        # Check if the word is already in dictionary
        if word in d:
            # Increment count of word by 1
            d[word] = d[word] + 1
        else:
            # Add the word to dictionary with count 1
            d[word] = 1

# Print the contents of dictionary
for key in list(d.keys()):
    print(key, ":", d[key])

# Save results as CSV
with open('Glassdoor_A.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(["Word", "Occurences", "Percentage"])
    writer.writerows([key, d[key])

text = data[0]["Text Main"]

首先，键应该是

“Text Main”

，其次，您需要访问

列表中的第一个dict
。因此，只需提取文本
变量，如下所示：
# Import relevant libraries
import string
import json
import csv
import textblob

# Open JSON file and slice by object
file = open("Glassdoor_A.json", "r")
data = json.load(file)
text = data["Text_Main"]

# Create an empty dictionary
d = dict()

# Loop through each line of the file
for line in text:
    # Remove the leading spaces and newline character
    line = line.strip()

    # Convert the characters in line to
    # lowercase to avoid case mismatch
    line = line.lower()

    # Remove the punctuation marks from the line
    line = line.translate(line.maketrans("", "", string.punctuation))

    # Split the line into words
    words = line.split(" ")

    # Iterate over each word in line
    for word in words:
        # Check if the word is already in dictionary
        if word in d:
            # Increment count of word by 1
            d[word] = d[word] + 1
        else:
            # Add the word to dictionary with count 1
            d[word] = 1

# Print the contents of dictionary
for key in list(d.keys()):
    print(key, ":", d[key])

# Save results as CSV
with open('Glassdoor_A.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(["Word", "Occurences", "Percentage"])
    writer.writerows([key, d[key])


text = data[0]["Text Main"]

这将修复错误消息。
首先，键应该是“Text Main”
，其次，您需要访问列表中的第一个命令。因此，只需提取文本
变量，如下所示：
# Import relevant libraries
import string
import json
import csv
import textblob

# Open JSON file and slice by object
file = open("Glassdoor_A.json", "r")
data = json.load(file)
text = data["Text_Main"]

# Create an empty dictionary
d = dict()

# Loop through each line of the file
for line in text:
    # Remove the leading spaces and newline character
    line = line.strip()

    # Convert the characters in line to
    # lowercase to avoid case mismatch
    line = line.lower()

    # Remove the punctuation marks from the line
    line = line.translate(line.maketrans("", "", string.punctuation))

    # Split the line into words
    words = line.split(" ")

    # Iterate over each word in line
    for word in words:
        # Check if the word is already in dictionary
        if word in d:
            # Increment count of word by 1
            d[word] = d[word] + 1
        else:
            # Add the word to dictionary with count 1
            d[word] = 1

# Print the contents of dictionary
for key in list(d.keys()):
    print(key, ":", d[key])

# Save results as CSV
with open('Glassdoor_A.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(["Word", "Occurences", "Percentage"])
    writer.writerows([key, d[key])


text = data[0]["Text Main"]

这将修复错误消息。
您的JSON文件在列表中有一个对象。要访问所需内容，首先必须通过数据[0]
访问对象。然后可以访问字符串字段。我会将代码更改为：
# Open JSON file and slice by object
file = open("Glassdoor_A.json", "r")
data = json.load(file)
json_obj = data[0]
text = json_obj["Text_Main"]

或者您可以使用quamrana所述的text=data[0][“text\u Main”]
在单行中访问该字段。
您的JSON文件在列表中有一个对象。要访问所需内容，首先必须通过数据[0]
访问对象。然后可以访问字符串字段。我会将代码更改为：
# Open JSON file and slice by object
file = open("Glassdoor_A.json", "r")
data = json.load(file)
json_obj = data[0]
text = json_obj["Text_Main"]

或者您可以使用quamrana所述的text=data[0][“text\u Main”]
在单行中访问该字段。
谢谢，这确实修复了错误消息。作为输出，我现在收到的是A列中的单个字母和B列中的数字。需要更改哪些内容以检查单词而不是单个字母？此外：统计的所有单个字母均指文件的第一行，而剩余数据不包括在分析/输出中。如前所述，如果不包含“Text_Main”规范，代码可以使用全字。您需要就stackoverflow提出一个新问题。这应该是软件开发的一种模式。写一小段代码，遇到问题，找到一个解决方案，然后解决它。编写下一小段代码，依此类推。你上面的问题结合了太多的运动部件，每个部件都可能有问题，但每个问题反过来又隐藏了下一个问题。这是可以理解的。让我在一个新问题中总结一下基本部分。谢谢，这确实修复了错误消息。作为输出，我现在收到的是A列中的单个字母和B列中的数字。需要更改哪些内容以检查单词而不是单个字母？此外：统计的所有单个字母均指文件的第一行，而剩余数据不包括在分析/输出中。如前所述，如果不包含“Text_Main”规范，代码可以使用全字。您需要就stackoverflow提出一个新问题。这应该是软件开发的一种模式。写一小段代码，遇到问题，找到一个解决方案，然后解决它。编写下一小段代码，依此类推。你上面的问题结合了太多的运动部件，每个部件都可能有问题，但每个问题反过来又隐藏了下一个问题。这是可以理解的。那么，让我在一个新问题中总结一下要点。谢谢，这很有帮助。然而，我现在收到的输出是一封信。原因可能是什么？如果没有“Text\u Main”
对象规范，代码会统计完整的单词数。此外：统计的所有单个字母均指文件的第一行，而剩余数据不包括在分析/输出中。如前所述，如果不包含“Text_Main”规范，代码可以使用全文。谢谢，这很有帮助。然而，我现在收到的输出是一封信。原因可能是什么？如果没有“Text\u Main”
对象规范，代码会统计完整的单词数。此外：统计的所有单个字母均指文件的第一行，而剩余数据不包括在分析/输出中。如前所述，如果不包含“Text_Main”规范，则代码可以使用全字。