Python Pandas空系列数据帧构造函数与CSV

Python Pandas空系列数据帧构造函数与CSV,python,pandas,dataframe,aggregate,series,Python,Pandas,Dataframe,Aggregate,Series,我正在尝试按特定列聚合数据帧中的数据。当我使用dataframe构造函数时,它可以工作: df = pd.DataFrame([ ["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1025,"outbound","allowed","",2], ["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1026,"outbound","allowed","",

我正在尝试按特定列聚合数据帧中的数据。当我使用dataframe构造函数时,它可以工作:

df = pd.DataFrame([
        ["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1025,"outbound","allowed","",2], 
        ["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1026,"outbound","allowed","",2], 
        ["Firewall-1","outside","tcp","4.4.4.4",22,"1.1.1.1",1028,"outbound","allowed","",2], 
        ["Firewall-1","outside","tcp","3.3.3.3",22,"2.2.2.2",2200,"outbound", "allowed","",2]
    ], 
    columns=["dvc","src_interface","transport","src_ip","src_port","dest_ip","dest_port","direction", "action", "cause", "count"])

index_cols = df.columns.tolist()
index_cols.remove("dest_port") 
df = df.groupby(index_cols)["dest_port"].apply(list)
df = df.reset_index()
数据帧

          dvc src_interface transport   src_ip  src_port  dest_ip  dest_port direction   action cause  count
0  Firewall-1       outside       tcp  4.4.4.4        53  1.1.1.1       1025  outbound  allowed            2
1  Firewall-1       outside       tcp  4.4.4.4        53  1.1.1.1       1026  outbound  allowed            2
2  Firewall-1       outside       tcp  4.4.4.4        22  1.1.1.1       1028  outbound  allowed            2
3  Firewall-1       outside       tcp  4.4.4.4        22  1.1.1.1       1029  outbound  allowed            2
4  Firewall-1       outside       tcp  3.3.3.3        22  2.2.2.2       2200  outbound  allowed            2
输出

   dvc         src_interface  transport  src_ip   src_port  dest_ip  direction  action   cause  count
    Firewall-1  outside        tcp        3.3.3.3  22        2.2.2.2  outbound   allowed         2              [2200]
                                          4.4.4.4  22        1.1.1.1  outbound   allowed         2        [1028, 1029]
                                                   53        1.1.1.1  outbound   allowed         2        [1025, 1026]
Series([], Name: dest_port, dtype: float64) 
问题是当我尝试从CSV导入数据时:

fwdata = pd.concat([pd.read_csv(f) for f in glob.glob('*.csv')], ignore_index = True)
df = pd.DataFrame(fwdata)

index_cols = df.columns.tolist()
index_cols.remove("dest_port")
df = df.groupby(index_cols)["dest_port"].apply(list)
df.reset_index()
print(df.head(10))
数据帧 同上

输出

   dvc         src_interface  transport  src_ip   src_port  dest_ip  direction  action   cause  count
    Firewall-1  outside        tcp        3.3.3.3  22        2.2.2.2  outbound   allowed         2              [2200]
                                          4.4.4.4  22        1.1.1.1  outbound   allowed         2        [1028, 1029]
                                                   53        1.1.1.1  outbound   allowed         2        [1025, 1026]
Series([], Name: dest_port, dtype: float64) 
CSV文件的数据与上面的构造函数完全相同,但处理方式似乎有所不同。任何帮助都将不胜感激。提前谢谢

CSV

dvc,"src_interface",transport,"src_ip","src_port","dest_ip","dest_port",direction,action,cause,count "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1025,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1026,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",22,"1.1.1.1",1028,outbound,allowed,"",2 "Firewall-1",outside,tcp,"3.3.3.3",22,"2.2.2.2",2200,outbound,allowed,"",2

问题是“原因”列中的空数据。熊猫讨厌这样。您可以使用以下任一解决方案解决此问题

删除列:

df.drop(columns=['column_name'], inplace=True)
用数据填充列:

df.column_name.fillna('', inplace=True)

(对于这些示例列_name='cause')

如果只在一个文件上运行数据,您是否验证了从
pd.read _csv(f)
获取数据?