Python 如何基于列比较填充缺少的值
我想将第2列中缺少的值填入相应的第1列Python 如何基于列比较填充缺少的值,python,regex,string,multiple-columns,Python,Regex,String,Multiple Columns,我想将第2列中缺少的值填入相应的第1列 import pandas as pd data={"col1":["A","B","C","A","B","C","A","B","A"], "col2":["{hey1}"," ","{hello2}","{hey2}","{he1}","{hello3}","set()","set()","{hey1}"]} df=pd.DataFrame(data=data) 它应该包含一些规则,如下所示: 例如,如果A出现了四次,四次中有一次,它有三次对应的c
import pandas as pd
data={"col1":["A","B","C","A","B","C","A","B","A"], "col2":["{hey1}"," ","{hello2}","{hey2}","{he1}","{hello3}","set()","set()","{hey1}"]}
df=pd.DataFrame(data=data)
它应该包含一些规则,如下所示:
例如,如果A出现了四次,四次中有一次,它有三次对应的col2值,第四次缺失,
因此,缺少的值应该是这三个值的组合。与本例类似,3个值为hey1、hey2、hey1。第四次失踪
应该包含hey2,hey1。
Set()是垃圾值,我不想要那个值。所以,在处理列比较之前,我想删除它。
期望输出:
col1 col2
A hey1
B he1
C hello2
A hey2
B he1
C hello3
A hey1,hey2
B he1
A hey1
谢谢你的解决方案。当我在实时示例Updated上应用相同的代码时,我在第new\u val=new\u col2[I-1]行上得到错误“列表索引超出范围”。现在,若第一个元素是空字符串,则将其保留为空。我对问题进行了编辑,但做了一些小改动,得到的是set()值,它是一个垃圾值。我想把它去掉。当上面的解决方案找到set()时,它抛出错误
set()对象没有属性条
。你能帮忙吗?
data = {"col1": ["A", "B", "C", "A", "B", "C", "A", "B", "A"],
"col2": ["", " ", "hello2", "hey2", "he1", "hello3", " ", "", ""]}
col1 = data["col1"]
col2 = data["col2"]
d = collections.defaultdict(list)
new_col2 = []
for i, tup in enumerate(list(zip(col1, col2))):
key, value = tup
if not value.strip():
new_val = ", ".join(d[key])
if not new_val:
if len(new_col2) >= 1:
new_val = new_col2[i - 1]
else:
new_val = ""
new_col2.append(new_val)
else:
d[key].append(value)
new_col2.append(value)