Python:如何解析文件并查找每列缺少的数据?

Python:如何解析文件并查找每列缺少的数据?,python,Python,我会尽力解释我的问题: 我正在编写一个python脚本,我在某个点上进行了堆叠。我有以下表格/矩阵: Data xx_a_xx xx_a_xx xx_b_xx xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx_c_xx Data xx_a_xx xx_a_xx xx_b_xx xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx_c_xx Data xx_a_xx xx_a_xx xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx

我会尽力解释我的问题:

我正在编写一个python脚本,我在某个点上进行了堆叠。我有以下表格/矩阵:

Data xx_a_xx xx_a_xx xx_b_xx xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx_c_xx
Data xx_a_xx xx_a_xx xx_b_xx xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx_c_xx
Data xx_a_xx xx_a_xx xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx_c_xx
Data xx_a_xx xx_a_xx xx_b_xx xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx_c_xx
Data xx_a_xx xx_a_xx xx_b_xx xx_b_xx xx_rd_xx xx_c_xx xx_c_xx
数据遵循以下模式;“x”是一个变化的数字,但每个列中的字母顺序相同且成对不变。我想做的是在缺少数据的地方引入一个函数,得到以下输出:

Data xx_a_xx xx_a_xx xx_b_xx xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx_c_xx
Data xx_a_xx xx_a_xx xx_b_xx xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx_c_xx
Data xx_a_xx xx_a_xx **<NA>**    xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx_c_xx
Data xx_a_xx xx_a_xx xx_b_xx xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx_c_xx
Data xx_a_xx xx_a_xx xx_b_xx xx_b_xx **<NA>**     xx_rd_xx xx_c_xx xx_c_xx
数据xx_a_xx_a_xx_b_xx_b_xx_xx_rd_xx_xx_c_xx
数据xx_a_xx_a_xx_b_xx_b_xx_rd_xx_rd_xx_c_xx_c_xx_xx
数据xx_a_xx_a_xx****xx_b_xx_rd_xx_rd_xx_c_xx_xx_c_xx
数据xx_a_xx_a_xx_b_xx_b_xx_rd_xx_rd_xx_c_xx_c_xx_xx
数据xx_a_xx_a_xx_b_xx_b_xx****xx_rd_xx_c_xx_c_xx

有什么建议吗?谢谢

您将首先
按空格字符拆分
行字符串; 然后你会扔掉“数据”这个词

然后,您可以将“xx_a_xx”类型的字符串转换为元组,比如您编写了一个函数
string_to_key
,该函数将“99_a_72”映射为“a”;将该函数映射到列表

之后,您只需检查事情是否按正确的顺序排列,如果不按正确的顺序排列,则填充N/A值

....
def string_to_key(string):
    return string.split("_")[1] #will produce an exception on malformed lines!
....
reference = ("a","a","b","b","rd","rd","c","c")

for row in rows: #assuming rows is the result of reading all lines
    items = row.split(" ")[1:] #throwing away "Data"
    keys = [string_to_key(string) for string in items]
    result = []
    for num, ref_key in enumerate(reference):
         if keys[num] == ref_key:
              result.append(items[num])
         else:
              result.append("**NA**")
              keys.insert("")
    print "Data "+" ".join(result)

我认为我的回答很简洁,问题也没问题,所以请@jornsharpe和其他持有者重新考虑这个问题。