在Python中过滤数据并打印列标题

在Python中过滤数据并打印列标题,python,pandas,csv,Python,Pandas,Csv,我有一个CSV文件,其中有一堆不同的行,比如行标签是宽度、长度、高度等,大约有50个整数单元格,对应于列中每个单元格下面的正确值。列的标签可以是矩形、正方形等 假设在这个例子中,矩形缺少宽度,但它有高度和长度,正方形缺少长度和高度,我想制作一个python脚本来打印出来 正方形、长度、高度矩形、宽度如果还有40多个形状缺少某些数据,则显然如此 在csv文件中,数据是空白的,没有空值,我相信它会像下面这样 import pandas as pd data = pd.read_csv('shap

我有一个CSV文件,其中有一堆不同的行,比如行标签是宽度、长度、高度等,大约有50个整数单元格,对应于列中每个单元格下面的正确值。列的标签可以是矩形、正方形等

假设在这个例子中,矩形缺少宽度,但它有高度和长度,正方形缺少长度和高度,我想制作一个python脚本来打印出来
正方形、长度、高度

矩形、宽度

如果还有40多个形状缺少某些数据,则显然如此 在csv文件中,数据是空白的,没有空值,我相信它会像下面这样

import pandas as pd
data = pd.read_csv('shapes.csv')
# Filter the data accordingly.
data = data[data['width'] > 0]
data = data[data['row'] == 'width']

我相信这只会循环通过宽度?我想让它检查一个宽度,如果有一个宽度整数,很好,跳到下一列并查找长度。。。等等提前谢谢你

考虑一个示例数据帧:

数据帧({'a':[None,1,2],'b':[None,None,3],'c':[1,2,3]}) x> >>x a、b、c 0楠楠1 1.0 NaN 2 2 2.0 3.0 3 解决方案:

>>x.apply(lambda r:list(r[r.isnull()].index),axis=1)
0[a,b]
1[b]
2        []
数据类型:对象

类似的东西应该可以完成这项工作:


from intertools import compress

data = {
'Shape': ['sqare', 'rectangle'], 
'width' : [4, ''], 
'hight': ['', 4], 
'length': ['', 7]}

df = pd.DataFrame(data)

def f(row):
    missing = map(lambda x: isinstance(x, str), 
        [row.width, row.hight, row.length])
    params = ['width', 'hight', 'length']
    if missing:
        return row.Shape, list(compress(params, missing))

for idx, row in df.iterrows():
    print(f(row))

# Result
('sqare', ['hight', 'length'])
('rectangle', ['width'])
  • 综合了更多的数据,以更好地演示该方法
  • 首先,只筛选缺少值的行
    df.loc[df.isna().any(axis=1)]
  • 然后在列之间循环,选择缺少值的列
  • 最后,本系列
    缺失
    将其打印出来
输出
import pandas as pd
import numpy as np

df = pd.DataFrame({"shape": ["Triangle", "Acute triangle", "Equilateral triangle", "Heptagonal triangle", "Isosceles triangle", "Golden Triangle", "Obtuse triangle", "Rational triangle", "Right triangle", "Isosceles right triangle", "Kepler triangle", "Scalene triangle", "Quadrilateral", "Cyclic quadrilateral", "Kite", "Parallelogram", "Rhombus", "Lozenge", "Rhomboid", "Rectangle", "Square", "Tangential quadrilateral", "Trapezoid", "Isosceles trapezoid", "Pentagon", "Hexagon", "Lemoine hexagon", "Heptagon", "Octagon", "Nonagon", "Decagon", "Hendecagon", "Dodecagon", "Tridecagon", "Tetradecagon", "Pentadecagon", "Hexadecagon", "Heptadecagon", "Octadecagon", "Enneadecagon"]})
df = df.assign(**{c:np.random.choice([np.nan]+list(range(3,10)), len(df)) for c in ["width","height","length"]})


missing = df.loc[df.isna().any(axis=1)].apply(
    lambda r: ",".join(
        [r["shape"]] + [c for c in r.drop("shape").index.values 
                        if not np.isnan(r[c])]
    ),
    axis=1,
)

print("\n".join(missing.tolist()))
Equilateral triangle,height
Isosceles right triangle,width,height
Quadrilateral,width,length
Tangential quadrilateral,height,length
Isosceles trapezoid,width,height
Heptagon,width,height
Tridecagon,width,length
Tetradecagon,width,height
Heptadecagon,height,length
Octadecagon,width,length