Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/14.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/reporting-services/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python BERT-修改run_squad.py预测文件_Python_Json_Python 3.x_Tensorflow_Bert Language Model - Fatal编程技术网

Python BERT-修改run_squad.py预测文件

Python BERT-修改run_squad.py预测文件,python,json,python-3.x,tensorflow,bert-language-model,Python,Json,Python 3.x,Tensorflow,Bert Language Model,我是BERT的新手,我正在尝试编辑的输出,用于构建问答系统,并获得具有以下结构的输出文件: { "data": [ { "id": "ID1", "title": "Alan_Turing", "question": "When Alan Turing was born?", "context": "Alan Mathison Turing (23 June 1912 – 7 Ju

我是BERT的新手,我正在尝试编辑的输出,用于构建问答系统,并获得具有以下结构的输出文件

{
    "data": [
      {
            "id": "ID1",
            "title": "Alan_Turing",
            "question": "When Alan Turing was born?",
            "context": "Alan Mathison Turing (23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist. [...] . However, both Julius and Ethel wanted their children to be brought up in Britain, so they moved to Maida Vale, London, where Alan Turing was born on 23 June 1912, as recorded by a blue plaque on the outside of the house of his birth, later the Colonnade Hotel. Turing had an elder brother, John (the father of Sir John Dermot Turing, 12th Baronet of the Turing baronets).",
            "answers": [
              {"text": "on 23 June 1912",   "probability": 0.891726, "start_logit": 4.075,  "end_logit": 4.15},
              {"text": "on 23 June", "probability": 0.091726, "start_logit": 2.075, "end_logit": 1.15},
              {"text": "June 1912", "probability": 0.051726, "start_logit": 1.075, "end_logit": 0.854}
            ]
        },
        {
            "id": "ID2",
            "title": "Title2",
            "question": "Question2",
            "context": "Context 2 ...",
            "answers": [
              {"text": "text1", "probability": 0.891726, "start_logit": 4.075, "end_logit": 4.15},
              {"text": "text2", "probability": 0.091726, "start_logit": 2.075, "end_logit": 1.15},
              {"text": "text3", "probability": 0.051726, "start_logit": 1.075, "end_logit": 0.854}
            ]
        }
    ]
}
首先,在BERT的
read_-squad_示例
函数(第227行)中将一个班json文件(输入文件)读入一个SquadeSample列表中,该文件包含我需要的前四个字段:id、标题、问题和上下文

之后,方形样本转换为特征,然后可以开始
写入预测
阶段(第741行)

write_predictions
BERT中,编写一个名为
nbest_predictions.json
的输出文件,其中包含特定上下文的所有可能答案以及相关概率

在第891-898行,我想我需要的最后四个字段(text、probability、start\u logit、end\u logit)是附加的:

nbest_json = []
    for (i, entry) in enumerate(nbest):
      output = collections.OrderedDict()
      output["text"] = entry.text
      output["probability"] = probs[i]
      output["start_logit"] = entry.start_logit
      output["end_logit"] = entry.end_logit
nbest_json.append(output)
输出文件nbest_predictions.json具有以下结构:

{
    "ID-1": [
        {
            "text": "text1", 
            "probability": 0.3617, 
            "start_logit": 4.0757, 
            "end_logit": 4.1554
        }, {
            "text": "text2", 
            "probability": 0.0036, 
            "start_logit": -0.5180, 
            "end_logit": 4.1554
        }
    ], 
    "ID-2": [
        {
            "text": "text1", 
            "probability": 0.2487, 
            "start_logit": -1.6009, 
            "end_logit": -0.2818
        }, {
            "text": "text2", 
            "probability": 0.0070, 
            "start_logit": -0.9566, 
            "end_logit": -1.5770
        }
    ]
}
现在…我不太明白nbest_预测文件是如何生成的。如何编辑此函数并获得一个json文件,其结构如我在文章开头所述?

考虑到这一点,我认为我有两种可能性:

  • 创建一个新的数据结构并附加我需要的字段
  • 编辑
    write_predictions
    函数以获得
    nbest_predictions.json
    以我想要的方式结构化
  • 最佳解决方案是什么?

    目前,我编写了一个新函数,用于读取输入文件并将我的id、标题、问题和上下文附加到数据结构中:

    import json
    import tensorflow as tf
    
    
    def read_squad_examples2(input_file, is_training):
      # SQUAD json file to list of SquadExamples #
      with tf.gfile.Open(input_file, "r") as reader:
        input_data = json.load(reader)["data"]
    
      def is_whitespace(c):
        if c == " " or c == "\t" or c == "\r" or c == "\n" or ord(c) == 0x202F:
          return True
        return False
    
      data = {}
      sup_data = [] 
    
      for entry in input_data:
        entry_title = entry["title"]
        data["title"] = entry_title;
        for paragraph in entry["paragraphs"]:
          paragraph_text = paragraph["context"]
          data["context"] = paragraph_text;
          for qa in paragraph["qas"]:
            qas_id = qa["id"]
            data["id"] = qas_id;
            question_text = qa["question"]
            data["question"] = question_text
    
            sup_data.append(data)
    
      my_json = json.dumps(sup_data)
    
      return my_json
    
    我得到的是:

    [{
        "question": "Question 1?",
        "id": "ID 1 ",
        "context": "The context 1",
        "title": "Title 1"
    }, {
        "question": "Question 2?",
        "id": "ID 2 ",
        "context": "The context 2",
        "title": "Title 2"
    }]
    
    在这一点上,我如何将包含“text”、“probability”、“start\u logit”和“end\u logit”的字段
    answers
    附加到这个数据结构中

    提前谢谢