Amazon web services 如何找到将数据从S3加载到红移的平均时间_Amazon Web Services_Amazon S3_Amazon Redshift

Amazon web services 如何找到将数据从S3加载到红移的平均时间

amazon-web-services amazon-s3 amazon-redshift

Amazon web services 如何找到将数据从S3加载到红移的平均时间,amazon-web-services,amazon-s3,amazon-redshift,Amazon Web Services,Amazon S3,Amazon Redshift,我有超过8个模式和200多个表，数据由不同模式中的CSV文件加载我想了解SQL脚本，了解如何找到将所有200个表的数据从S3加载到Redshift的平均时间。您可以查看以了解查询运行所需的时间您可能需要解析查询文本来发现加载了哪些表，但是您可以使用历史加载时间来计算每个表的典型加载时间以下是一些特别有用的表格：：包含已在用户定义的查询队列服务类中完成运行的查询的度量信息，例如已处理的行数、CPU使用情况、输入/输出和磁盘使用情况。：返回有关数据库查询的执行信息。：此表记录每个数据文件

我有超过8个模式和200多个表，数据由不同模式中的CSV文件加载

我想了解SQL脚本，了解如何找到将所有200个表的数据从S3加载到Redshift的平均时间。

您可以查看以了解查询运行所需的时间

您可能需要解析查询文本来发现加载了哪些表，但是您可以使用历史加载时间来计算每个表的典型加载时间

以下是一些特别有用的表格：

：包含已在用户定义的查询队列服务类中完成运行的查询的度量信息，例如已处理的行数、CPU使用情况、输入/输出和磁盘使用情况。：返回有关数据库查询的执行信息。：此表记录每个数据文件加载到数据库表中时的进度。您可以检查以发现查询运行所需的时间

您可能需要解析查询文本来发现加载了哪些表，但是您可以使用历史加载时间来计算每个表的典型加载时间

以下是一些特别有用的表格：

：包含已在用户定义的查询队列服务类中完成运行的查询的度量信息，例如已处理的行数、CPU使用情况、输入/输出和磁盘使用情况。：返回有关数据库查询的执行信息。：此表记录每个数据文件加载到数据库表中时的进度。

有一个聪明的方法。您应该有一个将数据从S3迁移到红移的ETL脚本

假设您有一个shell脚本，只需在该表的ETL逻辑开始之前捕获时间戳，让我们调用该start，在该表的ETL逻辑结束之后捕获另一个时间戳，让我们调用该end，并在脚本结束时获取差异：

#!bin/sh
    .
    .
    .

start=$(date +%s) #capture start time

#ETL Logic
        [find the right csv on S3]
        [check for duplicates, whether the file has already been loaded etc]
        [run your ETL logic, logging to make sure that file has been processes on s3]
        [copy that table to Redshift, log again to make sure that table has been copied]
        [error logging, trigger emails, SMS, slack alerts etc]
        [ ... ]


end=$(date +%s) #Capture end time


duration=$((end-start)) #Difference (time taken by the script to execute)

echo "duration is $duration"

PS：持续时间将以秒为单位，您可以维护日志文件、DB表条目等。时间戳将以epoc为单位，您可以根据日志记录的位置使用以下功能：

秒到秒时间$duration-用于MySQL

选择时间戳“epoch”+1511680982*间隔“1秒”作为mytimestamp-对于Amazon Redshift，然后在epoch中取两个实例的差异。

有一种聪明的方法。您应该有一个将数据从S3迁移到红移的ETL脚本

#!bin/sh
    .
    .
    .

start=$(date +%s) #capture start time

#ETL Logic
        [find the right csv on S3]
        [check for duplicates, whether the file has already been loaded etc]
        [run your ETL logic, logging to make sure that file has been processes on s3]
        [copy that table to Redshift, log again to make sure that table has been copied]
        [error logging, trigger emails, SMS, slack alerts etc]
        [ ... ]


end=$(date +%s) #Capture end time


duration=$((end-start)) #Difference (time taken by the script to execute)

echo "duration is $duration"

PS：持续时间将以秒为单位，您可以维护日志文件、DB表条目等。时间戳将以epoc为单位，您可以根据日志记录的位置使用以下功能：

秒到秒时间$duration-用于MySQL

选择时间戳“epoch”+1511680982*间隔“1秒”作为mytimestamp-for Amazon Redshift，然后在epoch中取两个实例的差异。

运行此查询以了解复制查询的工作速度

select q.starttime,  s.query, substring(q.querytxt,1,120) as querytxt,
       s.n_files, size_mb, s.time_seconds,
       s.size_mb/decode(s.time_seconds,0,1,s.time_seconds)  as mb_per_s
from (select query, count(*) as n_files,
     sum(transfer_size/(1024*1024)) as size_MB, (max(end_Time) -
         min(start_Time))/(1000000) as time_seconds , max(end_time) as end_time
      from stl_s3client where http_method = 'GET' and query > 0
       and transfer_time > 0 group by query ) as s
LEFT JOIN stl_Query as q on q.query = s.query
where s.end_Time >=  dateadd(day, -7, current_Date)
order by s.time_Seconds desc, size_mb desc, s.end_time desc
limit 50;

一旦你知道你从S3中推送了多少mb/s，你就可以根据大小大致确定每个文件需要多长时间。

运行此查询以了解复制查询的工作速度

select q.starttime,  s.query, substring(q.querytxt,1,120) as querytxt,
       s.n_files, size_mb, s.time_seconds,
       s.size_mb/decode(s.time_seconds,0,1,s.time_seconds)  as mb_per_s
from (select query, count(*) as n_files,
     sum(transfer_size/(1024*1024)) as size_MB, (max(end_Time) -
         min(start_Time))/(1000000) as time_seconds , max(end_time) as end_time
      from stl_s3client where http_method = 'GET' and query > 0
       and transfer_time > 0 group by query ) as s
LEFT JOIN stl_Query as q on q.query = s.query
where s.end_Time >=  dateadd(day, -7, current_Date)
order by s.time_Seconds desc, size_mb desc, s.end_time desc
limit 50;

一旦你知道你从S3中推送了多少mb/s，你就可以根据大小大致确定每个文件需要多长时间