Azure数据工厂-管道执行顺序中的多个活动
我有2个blob文件要复制到Azure SQL表。我的管道包括两项活动:Azure数据工厂-管道执行顺序中的多个活动,azure,pipeline,azure-data-factory,Azure,Pipeline,Azure Data Factory,我有2个blob文件要复制到Azure SQL表。我的管道包括两项活动: { "name": "NutrientDataBlobToAzureSqlPipeline", "properties": { "description": "Copy nutrient data from Azure BLOB to Azure SQL", "activities": [ { "type": "Copy
{
"name": "NutrientDataBlobToAzureSqlPipeline",
"properties": {
"description": "Copy nutrient data from Azure BLOB to Azure SQL",
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "BlobSource"
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 10000,
"writeBatchTimeout": "60.00:00:00"
}
},
"inputs": [
{
"name": "FoodGroupDescriptionsAzureBlob"
}
],
"outputs": [
{
"name": "FoodGroupDescriptionsSQLAzure"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "FoodGroupDescriptions",
"description": "#1 Bulk Import FoodGroupDescriptions"
},
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "BlobSource"
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 10000,
"writeBatchTimeout": "60.00:00:00"
}
},
"inputs": [
{
"name": "FoodDescriptionsAzureBlob"
}
],
"outputs": [
{
"name": "FoodDescriptionsSQLAzure"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "FoodDescriptions",
"description": "#2 Bulk Import FoodDescriptions"
}
],
"start": "2015-07-14T00:00:00Z",
"end": "2015-07-14T00:00:00Z",
"isPaused": false,
"hubName": "gymappdatafactory_hub",
"pipelineMode": "Scheduled"
}
}
据我所知,一旦第一项活动完成,第二项活动就开始了。然后如何执行此管道,而不是转到数据集切片并手动运行?另外pipelineMode如何设置为仅一次,而不是计划?为了使活动同步(有序)运行,第一个管道的输出需要是第二个管道的输入
{
"name": "NutrientDataBlobToAzureSqlPipeline",
"properties": {
"description": "Copy nutrient data from Azure BLOB to Azure SQL",
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "BlobSource"
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 10000,
"writeBatchTimeout": "60.00:00:00"
}
},
"inputs": [
{
"name": "FoodGroupDescriptionsAzureBlob"
}
],
"outputs": [
{
"name": "FoodGroupDescriptionsSQLAzureFirst"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "FoodGroupDescriptions",
"description": "#1 Bulk Import FoodGroupDescriptions"
},
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "BlobSource"
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 10000,
"writeBatchTimeout": "60.00:00:00"
}
},
"inputs": [
{
"name": "FoodGroupDescriptionsSQLAzureFirst",
"name": "FoodDescriptionsAzureBlob"
}
],
"outputs": [
{
"name": "FoodDescriptionsSQLAzureSecond"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "FoodDescriptions",
"description": "#2 Bulk Import FoodDescriptions"
}
],
"start": "2015-07-14T00:00:00Z",
"end": "2015-07-14T00:00:00Z",
"isPaused": false,
"hubName": "gymappdatafactory_hub",
"pipelineMode": "Scheduled"
}
如果您注意到第一个活动的输出“FoodGroupDescriptionsSQLAzureFirst”成为第二个活动的输入。如果我理解正确,您希望在不手动执行数据集切片的情况下执行这两个活动 只需将数据集定义为外部数据集,就可以做到这一点 例如
{
"name": "FoodGroupDescriptionsAzureBlob",
"properties": {
"type": "AzureBlob",
"linkedServiceName": "AzureBlobStore",
"typeProperties": {
"folderPath": "mycontainer/folder",
"format": {
"type": "TextFormat",
"rowDelimiter": "\n",
"columnDelimiter": "|"
}
},
"external": true,
"availability": {
"frequency": "Day",
"interval": 1
}
}
}
请注意,属性外部标记为true。这将自动移动处于就绪状态的数据集。
遗憾的是,没有人能够将管道标记为运行一次。运行管道后,您可以选择将isPaused属性设置为true,以防止进一步执行
注意:外部属性只能针对输入数据集设置为true。
所有具有标记为外部的输入数据集的活动将并行执行