Python中时间戳之间的Jaccard索引
我将UNIX时间戳转换为字符串,以及我需要从中获取Jaccard索引的给定时间字符串输入。以下数据作为时间间隔存储在二维数组中Python中时间戳之间的Jaccard索引,python,machine-learning,similarity,Python,Machine Learning,Similarity,我将UNIX时间戳转换为字符串,以及我需要从中获取Jaccard索引的给定时间字符串输入。以下数据作为时间间隔存储在二维数组中 unix_converted = [['00:00:00', '00:00:03'], ['00:00:03', '00:00:06'], ['00:00:12', '00:00:15']] input_timestamps = [['00:00:00', '00:00:03'], ['00:00:03', '00:00:06'], ['00:00:06', '00:0
unix_converted = [['00:00:00', '00:00:03'], ['00:00:03', '00:00:06'], ['00:00:12', '00:00:15']]
input_timestamps = [['00:00:00', '00:00:03'], ['00:00:03', '00:00:06'], ['00:00:06', '00:00:09']]
def jaccard_index(s1, s2):
raise NotImplementedError
我是否必须将这些时间间隔转换为datetime对象,或者有一种简单的方法?以及如何获取索引本身?您可以利用Python对集合的本机支持来计算您的Jaccard索引
unix\u converted=['00:00:00','00:00:03'],['00:00:03','00:00:06'],['00:00:12','00:00:15']
输入时间戳=['00:00:00','00:00:03'],['00:00:03','00:00:06'],['00:00:06','00:00:09']
def jaccard_索引(s1、s2):
s1=集合({'-'.join(each)for each in s1})
s2=集合({'-'.join(each)for each in s2})
返回len(s1.交点(s2))/len(s1.并集(s2))
打印(jaccard_索引(unix_转换,输入时间戳))#输出0.5
编辑:我假设Jaccard索引指的是Jaccard相似性,即给定列表的并集上的交集。此代码在时间戳不一定在同一invervals中计算的情况下计算Jaccard相似性<代码>O(len(s1)^2+len(s2)^2)时间复杂度
unix_converted = [(1, 3), (6, 10), (11, 12)]
input_timestamps = [(1, 3), (4, 7)]
def jaccard_index(s1, s2):
def _set_sum(start1, end1, start2, end2):
""" returns sum if there is an overlap and None otherwise """
if start2 <= start1 <= end2:
return start2, max(end1, end2)
if start1 <= start2 <= end1:
return start1, max(end1, end2)
return None # separate sets
def _set_intersection(start1, end1, start2, end2):
""" returns intersection if there is an overlap and None otherwise """
if start2 <= start1 <= end2:
return start1, min(end1, end2)
if start1 <= start2 <= end1:
return start2, min(end1, end2)
return None # separate sets
# Calculate A u B
sum = []
for x, y in s1 + s2:
matched_elem = False
for i, (x2, y2) in enumerate(sum):
set_sum = _set_sum(x, y, x2, y2)
if set_sum is not None:
sum[i] = set_sum
matched_elem = True
break
if not matched_elem:
sum.append((x, y))
# join overlapping timestamps
element_is_joined = [False for _ in sum]
for i, (x, y) in enumerate(sum):
if not element_is_joined[i]:
for j, (x2, y2) in enumerate(sum):
if element_is_joined[j] or i == j:
continue
set_sum = _set_sum(x, y, x2, y2)
if set_sum is not None: # overlap is found
sum[j] = set_sum
element_is_joined[i] = True
break
sum_ = 0
for (x, y), is_joined in zip(sum, element_is_joined):
if not is_joined:
sum_ += y - x
if sum_ == 0:
raise ValueError('Division by zero')
# calculate A ^ B
intersection = 0
for x, y in s1:
for x2, y2 in s2:
set_intersection = _set_intersection(x, y, x2, y2)
if set_intersection is not None:
intersection += set_intersection[1] - set_intersection[0]
return intersection / sum_
print(jaccard_index(unix_converted, input_timestamps)) #outputs 0.333333
unix_converted=[(1,3)、(6,10)、(11,12)]
输入时间戳=[(1,3)、(4,7)]
def jaccard_索引(s1、s2):
定义集和(开始1、结束1、开始2、结束2):
“”“如果存在重叠,则返回sum,否则返回None”“”
如果START2请考虑解释JACARD索引是什么,以及实际上提供一个解决问题的尝试。您要在我假设的两个列表中计算JACARD索引,即函数<代码> JACK索引(S1,S2)中的参数。预期的是
unix\u转换的和input\u时间戳
eh?请同时提供预期的输出注意:如果s1和s2中的时间戳具有相同的inverval,则此计算有效。