Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/322.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/blackberry/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从具有相同关键字的关键字短语中删除关键字频率_Python - Fatal编程技术网

Python 从具有相同关键字的关键字短语中删除关键字频率

Python 从具有相同关键字的关键字短语中删除关键字频率,python,Python,我有一个关键字频率列表,如下所示,通过将关键字与响应匹配来计算频率。然而,我想把公共卫生、健康问题和健康状况的频率从健康中去除。此外,将公共卫生官员的频率从公共卫生中移除。我想知道,我如何在Python中做到这一点 关键词 频率 健康 56 保健 23 健康状况 5. 健康问题 4. 公共卫生 7. 公共卫生官员 2. 使用此脚本,可以使用变量ele指定要从中减去的值 然后,以ele开头的每个元素都将被减法。 对于公共卫生,您只需使用ele=public health来镜像相同的代码即可 根据前

我有一个关键字频率列表,如下所示,通过将关键字与响应匹配来计算频率。然而,我想把公共卫生、健康问题和健康状况的频率从健康中去除。此外,将公共卫生官员的频率从公共卫生中移除。我想知道,我如何在Python中做到这一点

关键词 频率 健康 56 保健 23 健康状况 5. 健康问题 4. 公共卫生 7. 公共卫生官员 2.
使用此脚本,可以使用变量ele指定要从中减去的值

然后,以ele开头的每个元素都将被减法。 对于公共卫生,您只需使用ele=public health来镜像相同的代码即可


根据前面的答案,我得出如下结论:

标记关键字并查找最大关键字长度

将关键字列从最大长度循环到单个单词

删除重复的关键字频率,如果关键字中出现长度较小的关键字 更大的关键词

更新以前的数据帧

import pandas as pd
import numpy as np
import spacy

import en_core_web_md
nlp = en_core_web_md.load()

df = .....
df_3 = df

# find the max length of the column value
max_length = df["keyword"].map(lambda x: len(nlp(x))).max()

length = max_length

# loop until the length ends with column value that only have one word
while length > 1 :

  element = []
  value = []
  element_2 = []
  value_2 = []
  element_3 = []
  value_3 = []

  # select column value by different length
  for index, row in df_3.iterrows():
      a = len(nlp(row["keyword"]))
      if a == length:
          element.append(row["keyword"])
          value.append(row["frequency"])
      if a < length:
          element_2.append(row["keyword"])
          value_2.append(row["frequency"])
      if a > length:
          element_3.append(row["keyword"])
          value_3.append(row["frequency"])

  d_1 = dict(zip(element, value))
  d_2 = dict(zip(element_2, value_2))
  d_3 = dict(zip(element_3, value_3))

  # remove duplicated keyword frequency if keyword with smaller length 
  appear in key phrase with bigger length

  for key1, value1 in d_1.items():
  for key2, value2 in d_2.items():
      if key2 in key1:
          d_2[key2] = value2-value1


  new_key = []
  new_value = []

  # update the original dataframe
  for key, value in d_2.items():
      new_key.append(key)
      new_value.append(value)
  for key, value in d_1.items():
      new_key.append(key)
      new_value.append(value)
  for key, value in d_3.items():
      new_key.append(key)
      new_value.append(value)

  df_3 = pd.DataFrame({"keyword":new_key, "frequency":new_value})

  length -= 1

答案看起来很简单,欢迎再来一个优雅的~

你能告诉我们你尝试过的最低代码吗?以及你被困在哪里?是的,不需要整合信息,只需从已经包含该关键字的关键字短语的频率中减去关键字频率。请重复并从中删除。演示如何解决这个编码问题?堆栈溢出的主题已关闭。您必须诚实地尝试解决方案,然后询问有关实现的具体问题。堆栈溢出不用于替换现有教程和文档。请参阅。在这里简单地放弃你的作业是不可接受的。请不要回答不应该被问到的问题。我们不希望堆栈溢出成为家庭作业服务。谢谢你,卢卡斯,根据你的回答,我找到了一种系统化的方法
import pandas as pd
import numpy as np
import spacy

import en_core_web_md
nlp = en_core_web_md.load()

df = .....
df_3 = df

# find the max length of the column value
max_length = df["keyword"].map(lambda x: len(nlp(x))).max()

length = max_length

# loop until the length ends with column value that only have one word
while length > 1 :

  element = []
  value = []
  element_2 = []
  value_2 = []
  element_3 = []
  value_3 = []

  # select column value by different length
  for index, row in df_3.iterrows():
      a = len(nlp(row["keyword"]))
      if a == length:
          element.append(row["keyword"])
          value.append(row["frequency"])
      if a < length:
          element_2.append(row["keyword"])
          value_2.append(row["frequency"])
      if a > length:
          element_3.append(row["keyword"])
          value_3.append(row["frequency"])

  d_1 = dict(zip(element, value))
  d_2 = dict(zip(element_2, value_2))
  d_3 = dict(zip(element_3, value_3))

  # remove duplicated keyword frequency if keyword with smaller length 
  appear in key phrase with bigger length

  for key1, value1 in d_1.items():
  for key2, value2 in d_2.items():
      if key2 in key1:
          d_2[key2] = value2-value1


  new_key = []
  new_value = []

  # update the original dataframe
  for key, value in d_2.items():
      new_key.append(key)
      new_value.append(value)
  for key, value in d_1.items():
      new_key.append(key)
      new_value.append(value)
  for key, value in d_3.items():
      new_key.append(key)
      new_value.append(value)

  df_3 = pd.DataFrame({"keyword":new_key, "frequency":new_value})

  length -= 1