Python:如何解析文件并查找每列缺少的数据?
我会尽力解释我的问题: 我正在编写一个python脚本,我在某个点上进行了堆叠。我有以下表格/矩阵:Python:如何解析文件并查找每列缺少的数据?,python,Python,我会尽力解释我的问题: 我正在编写一个python脚本,我在某个点上进行了堆叠。我有以下表格/矩阵: Data xx_a_xx xx_a_xx xx_b_xx xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx_c_xx Data xx_a_xx xx_a_xx xx_b_xx xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx_c_xx Data xx_a_xx xx_a_xx xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx
Data xx_a_xx xx_a_xx xx_b_xx xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx_c_xx
Data xx_a_xx xx_a_xx xx_b_xx xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx_c_xx
Data xx_a_xx xx_a_xx xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx_c_xx
Data xx_a_xx xx_a_xx xx_b_xx xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx_c_xx
Data xx_a_xx xx_a_xx xx_b_xx xx_b_xx xx_rd_xx xx_c_xx xx_c_xx
数据遵循以下模式;“x”是一个变化的数字,但每个列中的字母顺序相同且成对不变。我想做的是在缺少数据的地方引入一个函数,得到以下输出:
Data xx_a_xx xx_a_xx xx_b_xx xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx_c_xx
Data xx_a_xx xx_a_xx xx_b_xx xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx_c_xx
Data xx_a_xx xx_a_xx **<NA>** xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx_c_xx
Data xx_a_xx xx_a_xx xx_b_xx xx_b_xx xx_rd_xx xx_rd_xx xx_c_xx xx_c_xx
Data xx_a_xx xx_a_xx xx_b_xx xx_b_xx **<NA>** xx_rd_xx xx_c_xx xx_c_xx
数据xx_a_xx_a_xx_b_xx_b_xx_xx_rd_xx_xx_c_xx
数据xx_a_xx_a_xx_b_xx_b_xx_rd_xx_rd_xx_c_xx_c_xx_xx
数据xx_a_xx_a_xx****xx_b_xx_rd_xx_rd_xx_c_xx_xx_c_xx
数据xx_a_xx_a_xx_b_xx_b_xx_rd_xx_rd_xx_c_xx_c_xx_xx
数据xx_a_xx_a_xx_b_xx_b_xx****xx_rd_xx_c_xx_c_xx
有什么建议吗?谢谢您将首先
按空格字符拆分
行字符串;
然后你会扔掉“数据”这个词
然后,您可以将“xx_a_xx”类型的字符串转换为元组,比如您编写了一个函数string_to_key
,该函数将“99_a_72”映射为“a”;将该函数映射到列表
之后,您只需检查事情是否按正确的顺序排列,如果不按正确的顺序排列,则填充N/A值
....
def string_to_key(string):
return string.split("_")[1] #will produce an exception on malformed lines!
....
reference = ("a","a","b","b","rd","rd","c","c")
for row in rows: #assuming rows is the result of reading all lines
items = row.split(" ")[1:] #throwing away "Data"
keys = [string_to_key(string) for string in items]
result = []
for num, ref_key in enumerate(reference):
if keys[num] == ref_key:
result.append(items[num])
else:
result.append("**NA**")
keys.insert("")
print "Data "+" ".join(result)
我认为我的回答很简洁,问题也没问题,所以请@jornsharpe和其他持有者重新考虑这个问题。