Google cloud platform 如何在我的项目中的所有大查询表中运行云DLP(数据丢失预防)?
根据,创建检查作业时,需要指定表引用:Google cloud platform 如何在我的项目中的所有大查询表中运行云DLP(数据丢失预防)?,google-cloud-platform,google-bigquery,standards-compliance,google-cloud-dlp,pii,Google Cloud Platform,Google Bigquery,Standards Compliance,Google Cloud Dlp,Pii,根据,创建检查作业时,需要指定表引用: { "inspectJob":{ "storageConfig":{ "bigQueryOptions":{ "tableReference":{ "projectId":"bigquery-public-data", "datasetId":"usa_names", "tableId":"usa_1910_current" },
{
"inspectJob":{
"storageConfig":{
"bigQueryOptions":{
"tableReference":{
"projectId":"bigquery-public-data",
"datasetId":"usa_names",
"tableId":"usa_1910_current"
},
"rowsLimit":"1000",
"sampleMethod":"RANDOM_START",
"identifyingFields":[
{
"name":"name"
}
]
}
},
"inspectConfig":{
"infoTypes":[
{
"name":"FIRST_NAME"
}
],
"includeQuote":true
},
"actions":[
{
"saveFindings":{
"outputConfig":{
"table":{
"projectId":"[PROJECT-ID]",
"datasetId":"testingdlp",
"tableId":"bqsample3"
},
"outputSchema":"BASIC_COLUMNS"
}
}
}
]
}
}
这意味着我需要为每个表创建一个Inspect作业,我想在所有大查询资源中查找敏感数据,怎么做?要在所有大查询资源中运行DLP,您有两个选项
- 以编程方式获取大查询表,然后为每个表触发一个查询表
优点:更便宜,1 GB到50 TB(TB)-每GB 1.00美元
Cons:这是一个批处理操作,因此不会实时执行
Python示例,其思想如下:
client = bigquery.Client() datasets = list(client.list_datasets(project=project_id)) if datasets: for dataset in datasets: tables = client.list_tables(dataset.dataset_id) for table in tables: # Create Inspect Job for table.table_id
url = String.format( "jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;OAuthType=3;ProjectId=%s;", projectId); DataSource ds = new com.simba.googlebigquery.jdbc42.DataSource(); ds.setURL(url); conn = ds.getConnection(); DatabaseMetaData databaseMetadata = conn.getMetaData(); ResultSet tablesResultSet = databaseMetadata.getTables(conn.getCatalog(), null, "%", new String[]{"TABLE"}); while (tablesResultSet.next()) { // Query your Table Data and call DLP Streaming API }
- 以编程方式获取大查询表,查询表并调用DLP
优点:这是一个实时操作
缺点:更昂贵,超过1GB-$3.00每GB的价格
Java示例,其思想如下:
client = bigquery.Client() datasets = list(client.list_datasets(project=project_id)) if datasets: for dataset in datasets: tables = client.list_tables(dataset.dataset_id) for table in tables: # Create Inspect Job for table.table_id
url = String.format( "jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;OAuthType=3;ProjectId=%s;", projectId); DataSource ds = new com.simba.googlebigquery.jdbc42.DataSource(); ds.setURL(url); conn = ds.getConnection(); DatabaseMetaData databaseMetadata = conn.getMetaData(); ResultSet tablesResultSet = databaseMetadata.getTables(conn.getCatalog(), null, "%", new String[]{"TABLE"}); while (tablesResultSet.next()) { // Query your Table Data and call DLP Streaming API }
在撰写本文时,账单信息是最新的,有关最新信息,请查看DLP。要在所有大查询资源中运行DLP,您有两个选项
- 以编程方式获取大查询表,然后为每个表触发一个查询表
优点:更便宜,1 GB到50 TB(TB)-每GB 1.00美元
Cons:这是一个批处理操作,因此不会实时执行
Python示例,其思想如下:
client = bigquery.Client() datasets = list(client.list_datasets(project=project_id)) if datasets: for dataset in datasets: tables = client.list_tables(dataset.dataset_id) for table in tables: # Create Inspect Job for table.table_id
url = String.format( "jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;OAuthType=3;ProjectId=%s;", projectId); DataSource ds = new com.simba.googlebigquery.jdbc42.DataSource(); ds.setURL(url); conn = ds.getConnection(); DatabaseMetaData databaseMetadata = conn.getMetaData(); ResultSet tablesResultSet = databaseMetadata.getTables(conn.getCatalog(), null, "%", new String[]{"TABLE"}); while (tablesResultSet.next()) { // Query your Table Data and call DLP Streaming API }
- 以编程方式获取大查询表,查询表并调用DLP
优点:这是一个实时操作
缺点:更昂贵,超过1GB-$3.00每GB的价格
Java示例,其思想如下:
client = bigquery.Client() datasets = list(client.list_datasets(project=project_id)) if datasets: for dataset in datasets: tables = client.list_tables(dataset.dataset_id) for table in tables: # Create Inspect Job for table.table_id
url = String.format( "jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;OAuthType=3;ProjectId=%s;", projectId); DataSource ds = new com.simba.googlebigquery.jdbc42.DataSource(); ds.setURL(url); conn = ds.getConnection(); DatabaseMetaData databaseMetadata = conn.getMetaData(); ResultSet tablesResultSet = databaseMetadata.getTables(conn.getCatalog(), null, "%", new String[]{"TABLE"}); while (tablesResultSet.next()) { // Query your Table Data and call DLP Streaming API }