Python 3.x 数据分析-如何计算Null、NaN和空字符串值？_Python 3.x_Pandas_Pyspark

Python 3.x 数据分析-如何计算Null、NaN和空字符串值？

python-3.x pandas pyspark

Python 3.x 数据分析-如何计算Null、NaN和空字符串值？,python-3.x,pandas,pyspark,Python 3.x,Pandas,Pyspark,我是pyspark的新手，我有一个示例数据集： Ticker_Modelo Ticker Type Period Product Geography Source Unit Test 0 Model1_Index Model1 Index NWE Forties Hydrocraking Daily Refinery Margins NWE Bloomberg None 3 1 Model2_Index Model2 Index NWE

我是pyspark的新手，我有一个示例数据集：

   Ticker_Modelo Ticker  Type   Period  Product  Geography  Source  Unit  Test
0  Model1_Index  Model1  Index  NWE     Forties  Hydrocraking  Daily  Refinery Margins  NWE  Bloomberg  None  3
1  Model2_Index  Model2  Index  NWE     Bonny Light Hydrocraking  Daily  Refinery Margins  NWE  Bloomberg  None  5
2  Model3_Index  Model3  Index  USGC    LLS FCC  Daily  Refinery Margins  USGC  Bloomberg  None  12
3  Model4_Index  Model4  Index  USGC    Maya Coking  Daily  Refinery Margins  USGC  Bloomberg  None  67
4  Model6_Index  Model6  Index  USMC    WTI FCC  Daily  Refinery Margins  USMC  Bloomberg  None  45
5  Model5_Index  Model5  Index  USMC    WCSS Coking  Daily  Refinery Margins  USMC  Bloomberg  None  22
6  Model7_Index  Model7  Index  USEC    Hibernia FCC  Daily  Refinery Margins  USEC  Bloomberg  None  
7  Model8_Index  Model8  Index  Singapore Dubai Hydrocracking  Daily  Refinery Margins  Singapore  Bloomberg  None  Null

我需要做一个数据分析并将其存储在数据库中

我试过使用Optimus（）和panda_profiler（），但它们进行了分析，并提供了一个HTML，我需要一些值，但它们无法计算

我需要计算每列中有多少null/nan/empty字符串，并用它创建一个新表

我用熊猫和Pypark

我找到了一个我认为有帮助的答案，但是当我尝试将它应用到一个专栏中去尝试时

data_df.filter((data_df["Ticker_Modelo"] == "") | data_df["Ticker_Modelo"].isNull() | isnan(data_df["Ticker_Modelo"])).count()

它给了我一个错误：

AttributeError:'Series'对象没有属性'isNull'

然后我不知道如何将其应用于所有列，并将其转置以获得如下结果：

               Count_nulls
Ticker_Modelo  0
Ticker         0
Type           0
Period         0
Product        0
Geography      0
Source         0
Unit           0
Test           2

您可以执行以下操作：

首先将所有Null/None值更改为Panda NaN的值

df.replace(['None','Null'],np.nan)

df.isnull().sum(axis=0).to_frame().rename(columns={0 : 'Count_Nulls'})