合并具有重叠时间范围的时间范围元组列表
我有一个元组列表,其中每个元组是(start-time, end-time)
。 我正在尝试合并所有重叠的时间范围,并返回一个不同时间范围的列表。 例如
[(1, 5), (2, 4), (3, 6)] ---> [(1,6)] [(1, 3), (2, 4), (5, 8)] ---> [(1, 4), (5,8)]
这是我如何实现它。
# Algorithm # initialranges: [(a,b), (c,d), (e,f), ...] # First we sort each tuple then whole list. # This will ensure that a<b, c<d, e<f ... and a < c < e ... # BUT the order of b, d, f ... is still random # Now we have only 3 possibilities #================================================ # b<c<d: a-------b Ans: [(a,b),(c,d)] # c---d # c<=b<d: a-------b Ans: [(a,d)] # c---d # c<d<b: a-------b Ans: [(a,b)] # c---d #================================================ def mergeoverlapping(initialranges): i = sorted(set([tuple(sorted(x)) for x in initialranges])) # initialize final ranges to [(a,b)] f = [i[0]] for c, d in i[1:]: a, b = f[-1] if c<=b<d: f[-1] = a, d elif b<c<d: f.append((c,d)) else: # else case included for clarity. Since # we already sorted the tuples and the list # only remaining possibility is c<d<b # in which case we can silently pass pass return f
我想弄清楚是否
- 是在一些python模块的内置函数,可以更有效地做到这一点? 要么
- 有没有更完美的方式来实现相同的目标?
你的帮助表示赞赏。 谢谢!
Pythonic:提高效率的几种方法:
- 消除
set()
结构,因为algorithm应该在主循环期间删除重复。 - 如果您只需要迭代结果,请使用
yield
来生成值。 - 减less中间对象的构造,例如:将
tuple()
调用移动到生成最终值的位置,使您不必构造并丢弃额外的元组,并重新使用saved
的列表来存储当前时间范围以进行比较。
码:
def merge(times): saved = list(times[0]) for st, en in sorted([sorted(t) for t in times]): if st <= saved[1]: saved[1] = max(saved[1], en) else: yield tuple(saved) saved[0] = st saved[1] = en yield tuple(saved) data = [ [(1, 5), (2, 4), (3, 6)], [(1, 3), (2, 4), (5, 8)] ] for times in data: print list(merge(times))
然后列出元组,然后列出,如果t1.right> = t2.left =>合并并重新启动与新的列表,…
– >
def f(l, sort = True): if sort: sl = sorted(tuple(sorted(i)) for i in l) else: sl = l if len(sl) > 1: if sl[0][1] >= sl[1][0]: sl[0] = (sl[0][0], sl[1][1]) del sl[1] if len(sl) < len(l): return f(sl, False) return sl
sorting部分:使用标准sorting,它已经将元组的比例正确化了。
sorted_tuples = sorted(initial_ranges)
合并部分。 它也消除了重复的范围,所以不需要set
。 假设你有current_tuple
和next_tuple
。
c_start, c_end = current_tuple n_start, n_end = next_tuple if n_start <= c_end: merged_tuple = min(c_start, n_start), max(c_end, n_end)
我希望这个逻辑足够清楚。
要查看下一个元组,可以使用对已sorted tuples
索引访问; 无论如何,这是一个完全已知的序列。
对所有边界进行sorting,然后采用边界开始后跟随边界的所有对。
def mergeOverlapping(initialranges): def allBoundaries(): for r in initialranges: yield r[0], True yield r[1], False def getBoundaries(boundaries): yield boundaries[0][0] for i in range(1, len(boundaries) - 1): if not boundaries[i][1] and boundaries[i + 1][1]: yield boundaries[i][0] yield boundaries[i + 1][0] yield boundaries[-1][0] return getBoundaries(sorted(allBoundaries()))
嗯,不是那么漂亮,但至less写起来很有趣!
编辑:多年后,upvote,我意识到我的代码是错的! 这是新的版本,只是为了好玩:
def mergeOverlapping(initialRanges): def allBoundaries(): for r in initialRanges: yield r[0], -1 yield r[1], 1 def getBoundaries(boundaries): openrange = 0 for value, boundary in boundaries: if not openrange: yield value openrange += boundary if not openrange: yield value def outputAsRanges(b): while b: yield (b.next(), b.next()) return outputAsRanges(getBoundaries(sorted(allBoundaries())))
基本上,我用-1或1来标记边界,然后按值sorting,只有当大括号和大括号之间的平衡为零时才输出。
晚了,但可能会帮助别人寻找这个。 我有一个类似的问题,但与字典。 给定一个时间范围的列表,我想find重叠,并尽可能合并它们。 @samplebias答案的一点修改让我这样:
合并function:
def merge_range(ranges: list, start_key: str, end_key: str): ranges = sorted(ranges, key=lambda x: x[start_key]) saved = dict(ranges[0]) for range_set in sorted(ranges, key=lambda x: x[start_key]): if range_set[start_key] <= saved[end_key]: saved[end_key] = max(saved[end_key], range_set[end_key]) else: yield dict(saved) saved[start_key] = range_set[start_key] saved[end_key] = range_set[end_key] yield dict(saved)
数据:
data = [ {'start_time': '09:00:00', 'end_time': '11:30:00'}, {'start_time': '15:00:00', 'end_time': '15:30:00'}, {'start_time': '11:00:00', 'end_time': '14:30:00'}, {'start_time': '09:30:00', 'end_time': '14:00:00'} ]
执行:
print(list(merge_range(ranges=data, start_key='start_time', end_key='end_time')))
输出:
[ {'start_time': '09:00:00', 'end_time': '14:30:00'}, {'start_time': '15:00:00', 'end_time': '15:30:00'} ]