Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/opencv/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Pandas 如何比较和标记最短的字符串行_Pandas_Group By_Pandas Groupby_Transform - Fatal编程技术网

Pandas 如何比较和标记最短的字符串行

Pandas 如何比较和标记最短的字符串行,pandas,group-by,pandas-groupby,transform,Pandas,Group By,Pandas Groupby,Transform,我的目标是通过使用transform()或apply()函数来检测基于res列的连续行是否相等 我的数据帧: data = [[111, 123, "aa", 0], [111, 124, "bb", 1], [111, 125, "bb", 2], [111, 126, "cc", 0], [111, 127, "dd", 1]

我的目标是通过使用
transform()
apply()
函数来检测基于
res
列的连续行是否相等

我的数据帧:

data = [[111, 123, "aa", 0], 
        [111, 124, "bb", 1], 
        [111, 125, "bb", 2],
        [111, 126, "cc", 0],
        [111, 127, "dd", 1],
        [111, 128, "cc", 2],
        [222, 133, "xx", 1],
        [222, 134, "yy", 2],
        [222, 135, "zz", 0], 
        [222, 136, "zz", 1],] 
df = pd.DataFrame(data, columns = ["uuid", "foo_id", "res", "num"]) 
111, 123, "aa", 0, 0 
111, 124, "bb", 1, 1 
111, 125, "bb", 2, 1
111, 126, "cc", 0, 0
111, 127, "dd", 1, 0
111, 128, "cc", 2, 0
222, 133, "xx", 1, 0
222, 134, "yy", 2, 0
222, 135, "zz", 0, 1
222, 136, "zz", 1, 1
df['flag'] = df.groupby('uuid')['res'].tranform(lambda x:  1 if x == x.shift(-1) else 0)
*ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().*
我在找什么:

data = [[111, 123, "aa", 0], 
        [111, 124, "bb", 1], 
        [111, 125, "bb", 2],
        [111, 126, "cc", 0],
        [111, 127, "dd", 1],
        [111, 128, "cc", 2],
        [222, 133, "xx", 1],
        [222, 134, "yy", 2],
        [222, 135, "zz", 0], 
        [222, 136, "zz", 1],] 
df = pd.DataFrame(data, columns = ["uuid", "foo_id", "res", "num"]) 
111, 123, "aa", 0, 0 
111, 124, "bb", 1, 1 
111, 125, "bb", 2, 1
111, 126, "cc", 0, 0
111, 127, "dd", 1, 0
111, 128, "cc", 2, 0
222, 133, "xx", 1, 0
222, 134, "yy", 2, 0
222, 135, "zz", 0, 1
222, 136, "zz", 1, 1
df['flag'] = df.groupby('uuid')['res'].tranform(lambda x:  1 if x == x.shift(-1) else 0)
*ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().*
我用过:

data = [[111, 123, "aa", 0], 
        [111, 124, "bb", 1], 
        [111, 125, "bb", 2],
        [111, 126, "cc", 0],
        [111, 127, "dd", 1],
        [111, 128, "cc", 2],
        [222, 133, "xx", 1],
        [222, 134, "yy", 2],
        [222, 135, "zz", 0], 
        [222, 136, "zz", 1],] 
df = pd.DataFrame(data, columns = ["uuid", "foo_id", "res", "num"]) 
111, 123, "aa", 0, 0 
111, 124, "bb", 1, 1 
111, 125, "bb", 2, 1
111, 126, "cc", 0, 0
111, 127, "dd", 1, 0
111, 128, "cc", 2, 0
222, 133, "xx", 1, 0
222, 134, "yy", 2, 0
222, 135, "zz", 0, 1
222, 136, "zz", 1, 1
df['flag'] = df.groupby('uuid')['res'].tranform(lambda x:  1 if x == x.shift(-1) else 0)
*ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().*
返回:

data = [[111, 123, "aa", 0], 
        [111, 124, "bb", 1], 
        [111, 125, "bb", 2],
        [111, 126, "cc", 0],
        [111, 127, "dd", 1],
        [111, 128, "cc", 2],
        [222, 133, "xx", 1],
        [222, 134, "yy", 2],
        [222, 135, "zz", 0], 
        [222, 136, "zz", 1],] 
df = pd.DataFrame(data, columns = ["uuid", "foo_id", "res", "num"]) 
111, 123, "aa", 0, 0 
111, 124, "bb", 1, 1 
111, 125, "bb", 2, 1
111, 126, "cc", 0, 0
111, 127, "dd", 1, 0
111, 128, "cc", 2, 0
222, 133, "xx", 1, 0
222, 134, "yy", 2, 0
222, 135, "zz", 0, 1
222, 136, "zz", 1, 1
df['flag'] = df.groupby('uuid')['res'].tranform(lambda x:  1 if x == x.shift(-1) else 0)
*ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().*

IIUC,您可以尝试使用
系列。每组重复的

f = lambda x: (x.eq(x.shift()) | x.eq(x.shift(-1))).astype(int)
df['flag'] = df.groupby('uuid')['res'].transform(f)


组内重复的或连续的。例如,一个组内的[AA,bb,aaa ]是否考虑了AA标记?“ALollz怎么说,假设你在一个组中重复了2个值,在这种情况下你想返回什么?”安基你是对的,没有代码>复制()/<代码>函数的解决方案,你可以在一个组中添加更多的重复,并显示你想要的吗?对于组111,如果您有两个重复的
aa,aa,xx,xx,yy
,是否要
1,2,1,2,1
?或者
0,0,1,1,2
?我想标记和比较最近的行,我已经更新了我的数据框以向您说明清楚。@阿迪尔布兰科没有意识到您更新了这个问题,您能用
f=lambda x:(x.eq(x.shift())| x.eq(x.shift(-1)).astype(int)
替换
f
吗?编辑了我的答案