Python 3.x 数据分析-如何计算Null、NaN和空字符串值?

Python 3.x 数据分析-如何计算Null、NaN和空字符串值?,python-3.x,pandas,pyspark,Python 3.x,Pandas,Pyspark,我是pyspark的新手,我有一个示例数据集: Ticker_Modelo Ticker Type Period Product Geography Source Unit Test 0 Model1_Index Model1 Index NWE Forties Hydrocraking Daily Refinery Margins NWE Bloomberg None 3 1 Model2_Index Model2 Index NWE

我是pyspark的新手,我有一个示例数据集:

   Ticker_Modelo Ticker  Type   Period  Product  Geography  Source  Unit  Test
0  Model1_Index  Model1  Index  NWE     Forties  Hydrocraking  Daily  Refinery Margins  NWE  Bloomberg  None  3
1  Model2_Index  Model2  Index  NWE     Bonny Light Hydrocraking  Daily  Refinery Margins  NWE  Bloomberg  None  5
2  Model3_Index  Model3  Index  USGC    LLS FCC  Daily  Refinery Margins  USGC  Bloomberg  None  12
3  Model4_Index  Model4  Index  USGC    Maya Coking  Daily  Refinery Margins  USGC  Bloomberg  None  67
4  Model6_Index  Model6  Index  USMC    WTI FCC  Daily  Refinery Margins  USMC  Bloomberg  None  45
5  Model5_Index  Model5  Index  USMC    WCSS Coking  Daily  Refinery Margins  USMC  Bloomberg  None  22
6  Model7_Index  Model7  Index  USEC    Hibernia FCC  Daily  Refinery Margins  USEC  Bloomberg  None  
7  Model8_Index  Model8  Index  Singapore Dubai Hydrocracking  Daily  Refinery Margins  Singapore  Bloomberg  None  Null
我需要做一个数据分析并将其存储在数据库中

我试过使用Optimus()和panda_profiler(),但它们进行了分析,并提供了一个HTML,我需要一些值,但它们无法计算

我需要计算每列中有多少null/nan/empty字符串,并用它创建一个新表

我用熊猫和Pypark

我找到了一个我认为有帮助的答案,但是当我尝试将它应用到一个专栏中去尝试时

data_df.filter((data_df["Ticker_Modelo"] == "") | data_df["Ticker_Modelo"].isNull() | isnan(data_df["Ticker_Modelo"])).count()
它给了我一个错误:
AttributeError:'Series'对象没有属性'isNull'

然后我不知道如何将其应用于所有列,并将其转置以获得如下结果:

               Count_nulls
Ticker_Modelo  0
Ticker         0
Type           0
Period         0
Product        0
Geography      0
Source         0
Unit           0
Test           2

您可以执行以下操作:

首先将所有Null/None值更改为Panda NaN的值

df.replace(['None','Null'],np.nan)

df.isnull().sum(axis=0).to_frame().rename(columns={0 : 'Count_Nulls'})