Airflow 如何使用气流将sql输出存储到数据帧？_Airflow

Airflow 如何使用气流将sql输出存储到数据帧？

airflow

Airflow 如何使用气流将sql输出存储到数据帧？,airflow,Airflow,我想将数据从SQL存储到dataframe，并进行一些数据转换，然后加载到另一个表中我面临的问题是，到表的连接字符串只能通过气流访问。所以我需要使用气流作为媒介来读写数据如何做到这一点我的代码 Task1 = PostgresOperator( task_id='Task1', postgres_conn_id='REDSHIFT_CONN', sql="SELECT * FROM Western.trip limit 5 ", params={'limit

我想将数据从SQL存储到dataframe，并进行一些数据转换，然后加载到另一个表中

我面临的问题是，到表的连接字符串只能通过气流访问。所以我需要使用气流作为媒介来读写数据

如何做到这一点

我的代码

Task1 = PostgresOperator(
    task_id='Task1',
    postgres_conn_id='REDSHIFT_CONN',
    sql="SELECT * FROM Western.trip limit 5 ",
    params={'limit': '50'},
    dag=dag

任务的输出需要存储到dataframe（df）中，并在转换后加载回另一个表中

如何做到这一点？

我怀疑是否有一个内置的操作员。您可以轻松编写自定义运算符

扩展
```
PostgresOperator
```
或仅扩展
```
BaseOperator
```
/您选择的任何其他运算符。所有自定义代码都进入重写的方法
然后使用通过调用函数获取数据帧
执行您在
```
df中必须执行的任何转换
```


最后使用函数将数据插入表中



更新-1
根据要求，我在此为操作员添加代码
from typing import Dict, Any, List, Tuple

from airflow.hooks.postgres_hook import PostgresHook
from airflow.operators.postgres_operator import PostgresOperator
from airflow.utils.decorators import apply_defaults
from pandas import DataFrame


class MyCustomOperator(PostgresOperator):

    @apply_defaults
    def __init__(self, destination_table: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.destination_table: str = destination_table

    def execute(self, context: Dict[str, Any]):
        # create PostgresHook
        self.hook: PostgresHook = PostgresHook(postgres_conn_id=self.postgres_conn_id,
                                               schema=self.database)
        # read data from Postgres-SQL query into pandas DataFrame
        df: DataFrame = self.hook.get_pandas_df(sql=self.sql, parameters=self.parameters)
        # perform transformations on df here
        df['column_to_be_doubled'] = df['column_to_be_doubled'].multiply(2)
        ..
        # convert pandas DataFrame into list of tuples
        rows: List[Tuple[Any, ...]] = list(df.itertuples(index=False, name=None))
        # insert list of tuples in destination Postgres table
        self.hook.insert_rows(table=self.destination_table, rows=rows)

注：此片段仅供参考；它还没有经过测试
参考资料



进一步修改/改进

可以从中读取destination_表
param
如果目标表不一定驻留在相同的Postgres
模式中，那么我们可以在\uuuu init\uuuu
中获取另一个参数，如destination\u Postgres\u conn\u id
，并使用它创建一个目标挂钩
，我们可以在其上调用insert\u rows
方法
这里是一个非常简单和基本的示例，用于将数据库中的数据读取到数据帧中
    # Get the hook
    mysqlserver = MySqlHook("Employees")
    # Execute the query
    df = mysqlserver.get_pandas_df(sql="select * from employees LIMIT 10")

感谢你给我的小费
我还将数据帧保存到文件以将其传递给下一个任务（在使用集群时不建议这样做，因为下一个任务可能会在不同的服务器上执行）
这个完整的代码应该按原样工作
from airflow import DAG
from airflow.operators.python import PythonOperator,
from airflow.utils.dates import days_ago
from airflow.hooks.mysql_hook import MySqlHook

dag_id = "db_test"
args = {
    "owner": "airflow",
}

base_file_path = "dags/files/"

def export_func(task_instance):
    import time

    # Get the hook
    mysqlserver = MySqlHook("Employees")
    # Execute the query
    df = mysqlserver.get_pandas_df(sql="select * from employees LIMIT 10")

    # Generate somewhat unique filename
    path = "{}{}_{}.ftr".format(base_file_path, dag_id, int(time.time()))
    # Save as a binary feather file
    df.to_feather(path)
    print("Export done")

    # Push the path to xcom
    task_instance.xcom_push(key="path", value=path)


def import_func(task_instance):
    import pandas as pd

    # Get the path from xcom
    path = task_instance.xcom_pull(key="path")
    # Read the binary file
    df = pd.read_feather(path)

    print("Import done")
    # Do what you want with the dataframe
    print(df)

with DAG(
    dag_id,
    default_args=args,
    schedule_interval=None,
    start_date=days_ago(2),
    tags=["test"],
) as dag:

    export_task = PythonOperator(
        task_id="export_df",
        python_callable=export_func,
    )

    import_task = PythonOperator(
        task_id="import_df",
        python_callable=import_func,
    )

    export_task >> import_task

我也在寻找类似的解决方案好问题，我也面临同样的问题。期待solution@LuckyGuess，您有什么解决方案吗side@Bernardostearns reisen，您有什么解决方案吗side@Bernardostearnsreisen，你能看看这个问题吗。我很喜欢你在举例回答问题时的解释。我认为这对很多人来说都是有用的，你能举个例子说明一下吗，这真的很有用helpful@y2k-Shubam，你能给我举个例子吗helpful@y2k-shubham，如果你能展示一个示例代码，那会很有帮助，我也有类似的问题。@LDF_VARUM_ELLAM_SHERIAAVUM，@pankaj，@sneha nair，@Ria Alves我已经更新了答案，加入了代码片段供参考