Python 将一个Databricks笔记本导入另一个错误_Python_Pandas_Jupyter Notebook_Databricks_Azure Databricks

Python 将一个Databricks笔记本导入另一个错误

python pandas jupyter-notebook

Python 将一个Databricks笔记本导入另一个错误,python,pandas,jupyter-notebook,databricks,azure-databricks,Python,Pandas,Jupyter Notebook,Databricks,Azure Databricks,我试图在另一个笔记本中运行一个Jupyter笔记本，在Databricks中以下代码失败，错误为“未定义df3”。但是，定义了df3 input_file = pd.read_csv("/dbfs/mnt/container_name/input_files/xxxxxx.csv") df3 = input_file %run ./NotebookB NotebookB中的第一行代码如下（所有的降价都显示在Databricks中，没有问题）：我的Jupyter笔记本中没有出现此类错误，例如

我试图在另一个笔记本中运行一个Jupyter笔记本，在Databricks中

以下代码失败，错误为“未定义df3”。但是，定义了df3

input_file = pd.read_csv("/dbfs/mnt/container_name/input_files/xxxxxx.csv")
df3 = input_file
%run ./NotebookB

NotebookB中的第一行代码如下（所有的降价都显示在Databricks中，没有问题）：

我的Jupyter笔记本中没有出现此类错误，例如，下面的代码有效：

input_file = pd.read_csv("xxxxxx.csv")
df3 = input_file
%run "NotebookB.ipynb"

基本上，在Databricks中运行NotebookB时，似乎没有使用或忘记df3的定义，从而导致“未定义”错误

两个Jupyter笔记本都属于Databricks中的同一工作区文件夹。

我发现您希望通过调用将结构化数据（如DataFrame）从Azure Databricks笔记本传递到另一个笔记本

请参考官方文档了解如何使用函数

dbutils.notebook.run

和

dbutils.notebook.exit

下面是Python中的示例代码，来自上述官方文档的

Pass structured data

部分

%python

# Example 1 - returning data through temporary tables.
# You can only return one string using dbutils.notebook.exit(), but since called notebooks reside in the same JVM, you can
# return a name referencing data stored in a temporary table.

## In callee notebook
sqlContext.range(5).toDF("value").createOrReplaceGlobalTempView("my_data")
dbutils.notebook.exit("my_data")

## In caller notebook
returned_table = dbutils.notebook.run("LOCATION_OF_CALLEE_NOTEBOOK", 60)
global_temp_db = spark.conf.get("spark.sql.globalTempDatabase")
display(table(global_temp_db + "." + returned_table))

因此，要在代码中传递pandas数据帧，首先需要使用

spark.createDataFrame

函数将pandas数据帧转换为pyspark数据帧，如下所示

df3 = spark.createDataFrame(input_file)

然后通过下面的代码传递它

df3.createOrReplaceGlobalTempView("df3")
dbutils.notebook.exit("df3")

同时，要更改

NotebookA

和

NotebookB

的角色，并从

NotebookB

作为调用者调用

NotebookA

。

我看到您希望通过调用将结构化数据（如数据帧）从Azure Datatricks笔记本传递到另一个笔记本

请参考官方文档了解如何使用函数

dbutils.notebook.run

和

dbutils.notebook.exit

下面是Python中的示例代码，来自上述官方文档的

Pass structured data

部分

%python

# Example 1 - returning data through temporary tables.
# You can only return one string using dbutils.notebook.exit(), but since called notebooks reside in the same JVM, you can
# return a name referencing data stored in a temporary table.

## In callee notebook
sqlContext.range(5).toDF("value").createOrReplaceGlobalTempView("my_data")
dbutils.notebook.exit("my_data")

## In caller notebook
returned_table = dbutils.notebook.run("LOCATION_OF_CALLEE_NOTEBOOK", 60)
global_temp_db = spark.conf.get("spark.sql.globalTempDatabase")
display(table(global_temp_db + "." + returned_table))

因此，要在代码中传递pandas数据帧，首先需要使用

spark.createDataFrame

函数将pandas数据帧转换为pyspark数据帧，如下所示

df3 = spark.createDataFrame(input_file)

然后通过下面的代码传递它

df3.createOrReplaceGlobalTempView("df3")
dbutils.notebook.exit("df3")

同时，要更改

NotebookA

和

NotebookB

的角色，并从

NotebookB

作为调用者调用

NotebookA

。

在

笔记本A

中，将df保存到csv，并调用

笔记本B

作为参数传递到csv的路径<代码>笔记本B读取路径，执行一些操作，并覆盖csv<代码>笔记本A从同一路径读取，现在得到了所需的结果

例如：

笔记本A（来电者）

输出：

+---+
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
+---+

+---+---+
|_c0|_c1|
+---+---+
|  0|0.0|
|  1|2.0|
|  2|4.0|
|  3|6.0|
|  4|8.0|
+---+---+

笔记本B（被叫人）

在

笔记本A

中，将df保存到csv，并调用

笔记本B

将csv的路径作为参数传递<代码>笔记本B读取路径，执行一些操作，并覆盖csv<代码>笔记本A从同一路径读取，现在得到了所需的结果

例如：

笔记本A（来电者）

输出：

+---+
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
+---+

+---+---+
|_c0|_c1|
+---+---+
|  0|0.0|
|  1|2.0|
|  2|4.0|
|  3|6.0|
|  4|8.0|
+---+---+

笔记本B（被叫人）

我还没有解决这个问题——我犯了各种各样的错误。我的临时解决方案是将df3保存到blob存储中，然后在NotebookB中读取该文件。我还没有设法解决这个问题，我遇到了各种各样的错误。我的临时解决方案工作正常，就是将df3保存到blob存储中，然后在NotebookB中读取该文件。