Google cloud platform 使用terraform设置bigquery数据集

Google cloud platform 使用terraform设置bigquery数据集,google-cloud-platform,google-bigquery,terraform,terraform-provider-gcp,Google Cloud Platform,Google Bigquery,Terraform,Terraform Provider Gcp,我是GCP和Terraform的新手。我正在开发terraform脚本,以提供大约50个BQ数据集,每个数据集至少有10个表。并非所有表都具有相同的架构 我已经开发了创建数据集和表的脚本,但是我面临着向表中添加模式的挑战,我需要帮助。我正在利用地形变量来构建脚本 这是我的密码。我需要集成逻辑来为表创建模式 变量tf terraform.tfvars main.tf 我尝试了所有可能的方法为数据集中的表创建模式。但是没有一个有效。可能所有表都应该具有相同的模式 我想这样试试 在 resource“

我是GCP和Terraform的新手。我正在开发terraform脚本,以提供大约50个BQ数据集,每个数据集至少有10个表。并非所有表都具有相同的架构

我已经开发了创建数据集和表的脚本,但是我面临着向表中添加模式的挑战,我需要帮助。我正在利用地形变量来构建脚本

这是我的密码。我需要集成逻辑来为表创建模式

变量tf terraform.tfvars main.tf
我尝试了所有可能的方法为数据集中的表创建模式。但是没有一个有效。

可能所有表都应该具有相同的模式

我想这样试试

resource“google\u bigquery\u table”“table”

例如,您可以在标签之后添加:

schema=file(“${path.root}/subdirectories path/table_schema.json”)

在哪里

  • ${path.root}
    -是地形文件的位置
  • 子目录路径
    -零个子目录或多个子目录
  • table_schema.json
    -包含模式的json文件
==>更新日期:2021年2月14日

下面是一个显示表模式不同的示例的请求。。。原始问题的最小修改

变量。tf

variable "project_id" {
  description = "The target project"
  type        = string
  default     = "ishim-sample"
}

variable "region" {
  description = "The region where resources are created => europe-west2"
  type        = string
  default     = "europe-west2"
}

variable "zone" {
  description = "The zone in the europe-west region for resources"
  type        = string
  default     = "europe-west2-b"
}

# ===========================
variable "test_bq_dataset" {
  type = list(object({
    id       = string
    location = string
  }))
}

variable "test_bq_table" {
  type = list(object({
    dataset_id = string
    table_id   = string
    schema_id  = string
  }))
}
provider "google" {
  project = var.project_id
  region  = var.region
  zone    = var.zone
}

resource "google_bigquery_dataset" "test_dataset_set" {
  project    = var.project_id
  count      = length(var.test_bq_dataset)
  dataset_id = var.test_bq_dataset[count.index]["id"]
  location   = var.test_bq_dataset[count.index]["location"]

  labels = {
    "environment" = "development"
  }
}

resource "google_bigquery_table" "test_table_set" {
  project    = var.project_id
  count      = length(var.test_bq_table)
  dataset_id = var.test_bq_table[count.index]["dataset_id"]
  table_id   = var.test_bq_table[count.index]["table_id"]
  schema     = file("${path.root}/bq-schema/${var.test_bq_table[count.index]["schema_id"]}")

  labels = {
    "environment" = "development"
  }
  depends_on = [
    google_bigquery_dataset.test_dataset_set,
  ]
}
terraform.tfvars

test_bq_dataset = [
  {
    id       = "ds1"
    location = "EU"
  },
  {
    id       = "ds2"
    location = "EU"
  }
]

test_bq_table = [
  {
    dataset_id = "ds1"
    table_id   = "table1"
    schema_id  = "table-schema-01.json"
  },
  {
    dataset_id = "ds2"
    table_id   = "table2"
    schema_id  = "table-schema-02.json"
  },
  {
    dataset_id = "ds1"
    table_id   = "table3"
    schema_id  = "table-schema-03.json"
  },
  {
    dataset_id = "ds2"
    table_id   = "table4"
    schema_id  = "table-schema-04.json"
  }
]
json模式文件示例-table-schema-01.json

[
  {
    "name": "table_column_01",
    "mode": "REQUIRED",
    "type": "STRING",
    "description": ""
  },
  {
    "name": "_gcs_file_path",
    "mode": "REQUIRED",
    "type": "STRING",
    "description": "The GCS path to the file for loading."
  },
  {
    "name": "_src_file_ts",
    "mode": "REQUIRED",
    "type": "TIMESTAMP",
    "description": "The source file modification timestamp."
  },
  {
    "name": "_src_file_name",
    "mode": "REQUIRED",
    "type": "STRING",
    "description": "The file name of the source file."
  },
    {
    "name": "_firestore_doc_id",
    "mode": "REQUIRED",
    "type": "STRING",
    "description": "The hash code (based on the file name and its content, so each file has a unique hash) used as a Firestore document id."
  },
  {
    "name": "_ingested_ts",
    "mode": "REQUIRED",
    "type": "TIMESTAMP",
    "description": "The timestamp when this record was processed during ingestion into the BigQuery table."
  }
]
main.tf

variable "project_id" {
  description = "The target project"
  type        = string
  default     = "ishim-sample"
}

variable "region" {
  description = "The region where resources are created => europe-west2"
  type        = string
  default     = "europe-west2"
}

variable "zone" {
  description = "The zone in the europe-west region for resources"
  type        = string
  default     = "europe-west2-b"
}

# ===========================
variable "test_bq_dataset" {
  type = list(object({
    id       = string
    location = string
  }))
}

variable "test_bq_table" {
  type = list(object({
    dataset_id = string
    table_id   = string
    schema_id  = string
  }))
}
provider "google" {
  project = var.project_id
  region  = var.region
  zone    = var.zone
}

resource "google_bigquery_dataset" "test_dataset_set" {
  project    = var.project_id
  count      = length(var.test_bq_dataset)
  dataset_id = var.test_bq_dataset[count.index]["id"]
  location   = var.test_bq_dataset[count.index]["location"]

  labels = {
    "environment" = "development"
  }
}

resource "google_bigquery_table" "test_table_set" {
  project    = var.project_id
  count      = length(var.test_bq_table)
  dataset_id = var.test_bq_table[count.index]["dataset_id"]
  table_id   = var.test_bq_table[count.index]["table_id"]
  schema     = file("${path.root}/bq-schema/${var.test_bq_table[count.index]["schema_id"]}")

  labels = {
    "environment" = "development"
  }
  depends_on = [
    google_bigquery_dataset.test_dataset_set,
  ]
}
项目目录结构-屏幕截图

请记住“main.tf”文件中“google_bigquery_table”资源的“schema”属性中使用的子目录名——“bq schema”

BigQuery控制台-屏幕截图


“terraform apply”命令的结果。

terraform包含一个可选参数,该参数需要JSON字符串

上一链接上共享的文档有一个示例:

resource "google_bigquery_table" "default" {
  dataset_id = google_bigquery_dataset.default.dataset_id
  table_id   = "bar"

  time_partitioning {
    type = "DAY"
  }

  labels = {
    env = "default"
  }

  schema = <<EOF
[
  {
    "name": "permalink",
    "type": "STRING",
    "mode": "NULLABLE",
    "description": "The Permalink"
  },
  {
    "name": "state",
    "type": "STRING",
    "mode": "NULLABLE",
    "description": "State where the head office is located"
  }
]
EOF

}
resource“google\u bigquery\u表”默认值{
dataset\u id=google\u bigquery\u dataset.default.dataset\u id
table_id=“bar”
时间分割{
type=“天”
}
标签={
env=“默认值”
}

schema=我必须将此添加到问题中。表没有相同的模式。您可能可以定义一个terraform变量-映射“table name=>schema file name”,或使用模式文件名的列表,以便使用相同的计数循环而不是常量“table_schema.json”来选择正确的文件。请您分享一个使用map的示例。我有50个BQ数据集,每个DS有10个表,我不喜欢硬编码值。想办法利用变量创建模式(就像我创建表和DS一样)我明白了!我不知道模式在哪里不同。这肯定需要另一种方法。我相信@al dann的答案提供了一种更好的方法。
resource "google_bigquery_table" "default" {
  dataset_id = google_bigquery_dataset.default.dataset_id
  table_id   = "bar"

  time_partitioning {
    type = "DAY"
  }

  labels = {
    env = "default"
  }

  schema = <<EOF
[
  {
    "name": "permalink",
    "type": "STRING",
    "mode": "NULLABLE",
    "description": "The Permalink"
  },
  {
    "name": "state",
    "type": "STRING",
    "mode": "NULLABLE",
    "description": "State where the head office is located"
  }
]
EOF

}