字典词典合并

我需要合并多个字典，这是我的例子：

dict1 = {1:{"a":{A}},2:{"b":{B}}} dict2 = {2:{"c":{C}}, 3:{"d":{D}}

A B C和D是树的叶子，如{"info1":"value", "info2":"value2"}

字典有一个未知的级别（深度），可能是{2:{"c":{"z":{"y":{C}}}}}

在我的情况下，它代表一个目录/文件结构，其中节点是文档，叶子是文件。

我想合并它们以获得dict3={1:{"a":{A}},2:{"b":{B},"c":{C}},3:{"d":{D}}}

我不知道如何用Python轻松完成这个任务。

这实际上是相当棘手的 – 特别是如果你想要一个有用的错误信息，当事情不一致，同时正确接受重复但一致的条目（这里没有其他答案…）

假设你没有大量的条目recursion函数是最简单的：

 def merge(a, b, path=None): "merges b into a" if path is None: path = [] for key in b: if key in a: if isinstance(a[key], dict) and isinstance(b[key], dict): merge(a[key], b[key], path + [str(key)]) elif a[key] == b[key]: pass # same leaf value else: raise Exception('Conflict at %s' % '.'.join(path + [str(key)])) else: a[key] = b[key] return a # works print(merge({1:{"a":"A"},2:{"b":"B"}}, {2:{"c":"C"},3:{"d":"D"}})) # has conflict merge({1:{"a":"A"},2:{"b":"B"}}, {1:{"a":"A"},2:{"b":"C"}})

注意这个变异a – b的内容被添加到a （也被返回）。如果你想保持a你可以称之为merge(dict(a), b) 。

agf指出（下面），你可能有超过两个字，在这种情况下，你可以使用：

 reduce(merge, [dict1, dict2, dict3...])

一切都将被添加到dict1。

[注意 – 我编辑了我的初始答案以改变第一个参数; 这使得“减less”更容易解释]

ps在python 3中，你还需要from functools import reduce

这是使用生成器的简单方法：

 def mergedicts(dict1, dict2): for k in set(dict1.keys()).union(dict2.keys()): if k in dict1 and k in dict2: if isinstance(dict1[k], dict) and isinstance(dict2[k], dict): yield (k, dict(mergedicts(dict1[k], dict2[k]))) else: # If one of the values is not a dict, you can't continue merging it. # Value from second dict overrides one in first and we move on. yield (k, dict2[k]) # Alternatively, replace this with exception raiser to alert you of value conflicts elif k in dict1: yield (k, dict1[k]) else: yield (k, dict2[k]) dict1 = {1:{"a":"A"},2:{"b":"B"}} dict2 = {2:{"c":"C"},3:{"d":"D"}} print dict(mergedicts(dict1,dict2))

这打印：

 {1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

这个问题的一个问题是字典的值可能是任意复杂的数据。基于这些和其他答案，我想出了这个代码：

 class YamlReaderError(Exception): pass def data_merge(a, b): """merges b into a and return merged result NOTE: tuples and arbitrary objects are not handled as it is totally ambiguous what should happen""" key = None # ## debug output # sys.stderr.write("DEBUG: %s to %s\n" %(b,a)) try: if a is None or isinstance(a, str) or isinstance(a, unicode) or isinstance(a, int) or isinstance(a, long) or isinstance(a, float): # border case for first run or if a is a primitive a = b elif isinstance(a, list): # lists can be only appended if isinstance(b, list): # merge lists a.extend(b) else: # append to list a.append(b) elif isinstance(a, dict): # dicts must be merged if isinstance(b, dict): for key in b: if key in a: a[key] = data_merge(a[key], b[key]) else: a[key] = b[key] else: raise YamlReaderError('Cannot merge non-dict "%s" into dict "%s"' % (b, a)) else: raise YamlReaderError('NOT IMPLEMENTED "%s" into "%s"' % (b, a)) except TypeError, e: raise YamlReaderError('TypeError "%s" in key "%s" when merging "%s" into "%s"' % (e, key, b, a)) return a

我的用例是合并YAML文件，我只需要处理可能的数据types的子集。因此我可以忽略元组和其他对象。对我来说，一个合理的合并逻辑手段

replace标量
附加列表
通过添加丢失的键和更新现有的键来合并字符

其他一切和不可预见的结果都会导致错误。

如果你有一个未知级别的字典，那么我会build议一个recursion函数：

 def combineDicts(dictionary1, dictionary2): output = {} for item, value in dictionary1.iteritems(): if dictionary2.has_key(item): if isinstance(dictionary2[item], dict): output[item] = combineDicts(value, dictionary2.pop(item)) else: output[item] = value for item, value in dictionary2.iteritems(): output[item] = value return output

基于@andrew cooke。该版本处理嵌套的字典列表，并允许选项更新值

 def merge（a，b，path = None，update = True）：
     “http://stackoverflow.com/questions/7204805/python-dictionaries-of-dictionaries-merge”
     “合并成一个”
    如果path是None：path = []
    对于键入b：
        如果键入a：
            如果isinstance（a [key]，dict）和isinstance（b [key]，dict）：
                 merge（a [key]，b [key]，path + [str（key）]）
             elif a [key] == b [key]：
                通过＃相同的叶值
             elif isinstance（a [key]，list）和isinstance（b [key]，list）：
                对于idx，枚举中的val（b [key]）：
                    一个[key] [idx] = merge（a [key] [idx]，b [key] [idx]，path + [str（key），str（idx）]，update = update）
             elif更新：
                 a [key] = b [key]
            其他：
                引发exception（'％s'冲突''。join（path + [str（key）]））
        其他：
             a [key] = b [key]
    返回一个

字典词典合并

由于这是一个典型的问题（尽pipe有一些非一般性问题），我提供了规范的Pythonic方法来解决这个问题。

最简单的例子：“叶子是以空的字典结尾的嵌套字典”：

 d1 = {'a': {1: {'foo': {}}, 2: {}}} d2 = {'a': {1: {}, 2: {'bar': {}}}} d3 = {'b': {3: {'baz': {}}}} d4 = {'a': {1: {'quux': {}}}}

这是recursion最简单的情况，我会推荐两种天真的方法：

 def rec_merge1(d1, d2): '''return new merged dict of dicts''' for k, v in d1.items(): # in Python 2, use .iteritems()! if k in d2: d2[k] = rec_merge1(v, d2[k]) d3 = d1.copy() d3.update(d2) return d3 def rec_merge2(d1, d2): '''update first dict with second recursively''' for k, v in d1.items(): # in Python 2, use .iteritems()! if k in d2: d2[k] = rec_merge2(v, d2[k]) d1.update(d2) return d1

我相信我会比较喜欢第一个，但是要记住第一个的原始状态必须从原来的状态重build。用法如下：

 >>> from functools import reduce # only required for Python 3. >>> reduce(rec_merge1, (d1, d2, d3, d4)) {'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}} >>> reduce(rec_merge2, (d1, d2, d3, d4)) {'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}

复杂案例：“叶子是其他types”

所以如果他们以字典结尾，这是一个简单的情况下合并最终空的字典。如果没有，这不是很微不足道。如果string，你如何合并它们？集可以更新类似，所以我们可以给予这种待遇，但我们失去了他们合并的顺序。订单很重要吗？

所以，代替更多的信息，最简单的方法是给它们标准的更新处理，如果两个值都不是字典：即第二个字典的值会覆盖第一个，即使第二个字典的值是None，第一个值是a字典与大量的信息。

 d1 = {'a': {1: 'foo', 2: None}} d2 = {'a': {1: None, 2: 'bar'}} d3 = {'b': {3: 'baz'}} d4 = {'a': {1: 'quux'}} from collections import MutableMapping def rec_merge(d1, d2): ''' Update two dicts of dicts recursively, if either mapping has leaves that are non-dicts, the second's leaf overwrites the first's. ''' for k, v in d1.items(): # in Python 2, use .iteritems()! if k in d2: # this next check is the only difference! if all(isinstance(e, MutableMapping) for e in (v, d2[k])): d2[k] = rec_merge(v, d2[k]) # we could further check types and merge as appropriate here. d3 = d1.copy() d3.update(d2) return d3

现在

 from functools import reduce reduce(rec_merge, (d1, d2, d3, d4))

回报

 {'a': {1: 'quux', 2: 'bar'}, 'b': {3: 'baz'}}

应用到原来的问题：

我不得不去除字母上的大括号，并把它们放在单引号中，这样才能成为合法的Python（否则它们将被设置为Python 2.7+中的文字），并附加一个缺失的大括号：

 dict1 = {1:{"a":'A'}, 2:{"b":'B'}} dict2 = {2:{"c":'C'}, 3:{"d":'D'}}

和rec_merge(dict1, dict2)现在返回：

 {1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

它与原始问题的期望结果相匹配（在改变之后，例如{A}改为'A' 。

这个版本的函数将占用N个字典，只有字典 – 没有不正确的参数可以传递，否则会引发TypeError。合并本身说明了关键冲突，而不是从合并链下面的字典中覆盖数据，而是创build一组值并追加到该值上; 没有数据丢失。

它可能不是页面上最有效的，但它是最彻底的，当你合并你的2到N个字母时，你不会失去任何信息。

 def merge_dicts(*dicts): if not reduce(lambda x, y: isinstance(y, dict) and x, dicts, True): raise TypeError, "Object in *dicts not of type dict" if len(dicts) < 2: raise ValueError, "Requires 2 or more dict objects" def merge(a, b): for d in set(a.keys()).union(b.keys()): if d in a and d in b: if type(a[d]) == type(b[d]): if not isinstance(a[d], dict): ret = list({a[d], b[d]}) if len(ret) == 1: ret = ret[0] yield (d, sorted(ret)) else: yield (d, dict(merge(a[d], b[d]))) else: raise TypeError, "Conflicting key:value type assignment" elif d in a: yield (d, a[d]) elif d in b: yield (d, b[d]) else: raise KeyError return reduce(lambda x, y: dict(merge(x, y)), dicts[1:], dicts[0]) print merge_dicts({1:1,2:{1:2}},{1:2,2:{3:1}},{4:4})

输出：{1：[1,2]，2：{1：2,3：1}，4：4}

这应该有助于将dict2所有项目合并为dict1 ：

 for item in dict2: if item in dict1: for leaf in dict2[item]: dict1[item][leaf] = dict2[item][leaf] else: dict1[item] = dict2[item]

请testing它，并告诉我们这是否是你想要的。

编辑：

上面提到的解决scheme只合并一个级别，但正确地解决了OP给出的例子。要合并多个级别，应该使用recursion。

这个简单的recursion过程会将一个字典合并到另一个字典中，同时覆盖冲突的键：

 #!/usr/bin/env python2.7 def merge_dicts(dict1, dict2): """ Recursively merges dict2 into dict1 """ if not isinstance(dict1, dict) or not isinstance(dict2, dict): return dict2 for k in dict2: if k in dict1: dict1[k] = merge_dicts(dict1[k], dict2[k]) else: dict1[k] = dict2[k] return dict1 print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {2:{"c":"C"}, 3:{"d":"D"}})) print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {1:{"a":"A"}, 2:{"b":"C"}}))

输出：

 {1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}} {1: {'a': 'A'}, 2: {'b': 'C'}}

andrew cookes有一个小问题答案：在某些情况下，当修改返回的字典时，它会修改第二个参数b 。具体来说就是因为这一行：

 if key in a: ... else: a[key] = b[key]

如果b[key]是一个dict ，它将被简单地赋值给a ，意味着对该dict任何后续修改都将影响a和b 。

 a={} b={'1':{'2':'b'}} c={'1':{'3':'c'}} merge(merge(a,b), c) # {'1': {'3': 'c', '2': 'b'}} a # {'1': {'3': 'c', '2': 'b'}} (as expected) b # {'1': {'3': 'c', '2': 'b'}} <---- c # {'1': {'3': 'c'}} (unmodified)

为了解决这个问题，该行将不得不被replace为：

 if isinstance(b[key], dict): a[key] = clone_dict(b[key]) else: a[key] = b[key]

其中clone_dict是：

 def clone_dict(obj): clone = {} for key, value in obj.iteritems(): if isinstance(value, dict): clone[key] = clone_dict(value) else: clone[key] = value return

仍然。这显然不包括list ， set和其他内容，但我希望它能说明合并dicts时的缺陷。

为了完整起见，这是我的版本，你可以通过它多个dicts ：

 def merge_dicts(*args): def clone_dict(obj): clone = {} for key, value in obj.iteritems(): if isinstance(value, dict): clone[key] = clone_dict(value) else: clone[key] = value return def merge(a, b, path=[]): for key in b: if key in a: if isinstance(a[key], dict) and isinstance(b[key], dict): merge(a[key], b[key], path + [str(key)]) elif a[key] == b[key]: pass else: raise Exception('Conflict at `{path}\''.format(path='.'.join(path + [str(key)]))) else: if isinstance(b[key], dict): a[key] = clone_dict(b[key]) else: a[key] = b[key] return a return reduce(merge, args, {})

我有两个字典（ a和b ），每个字典都可以包含任意数量的嵌套字典。我想recursion合并它们，而b优先于a 。

考虑到嵌套字典为树，我想要的是：

为了更新a ，每个b中的每一条叶子的path都将以b表示
如果在b相应的path中find叶子，则覆盖子树
- 保持所有b叶节点都保持叶的不变性。

现有的答案对我来说有点复杂，在书架上留下了一些细节。我一起黑了下面，它通过我的数据集的unit testing。

  def merge_map(a, b): if not isinstance(a, dict) or not isinstance(b, dict): return b for key in b.keys(): a[key] = merge_map(a[key], b[key]) if key in a else b[key] return a

示例（为清晰起见，格式化）：

  a = { 1 : {'a': 'red', 'b': {'blue': 'fish', 'yellow': 'bear' }, 'c': { 'orange': 'dog'}, }, 2 : {'d': 'green'}, 3: 'e' } b = { 1 : {'b': 'white'}, 2 : {'d': 'black'}, 3: 'e' } >>> merge_map(a, b) {1: {'a': 'red', 'b': 'white', 'c': {'orange': 'dog'},}, 2: {'d': 'black'}, 3: 'e'}

b中需要维护的path是：

1 -> 'b' -> 'white'
2 -> 'd' -> 'black'
3 -> 'e' 。

有一个独特和不冲突的道路：

1 -> 'a' -> 'red'
1 -> 'c' -> 'orange' -> 'dog'

所以他们仍然在合并地图中呈现。

当然，代码将取决于您解决合并冲突的规则。这里有一个版本，可以接受任意数量的参数，并将它们recursion地合并到任意深度，而不使用任何对象变异。它使用以下规则来解决合并冲突：

字典优先于非字典值（ {"foo": {...}}优先于{"foo": "bar"} ）
后面的参数优先于先前的参数（如果按顺序合并{"a": 1} ， {"a", 2}和{"a": 3} ，则结果为{"a": 3} ）

 try: from collections import Mapping except ImportError: Mapping = dict def merge_dicts(*dicts): """ Return a new dictionary that is the result of merging the arguments together. In case of conflicts, later arguments take precedence over earlier arguments. """ updated = {} # grab all keys keys = set() for d in dicts: keys = keys.union(set(d)) for key in keys: values = [d[key] for d in dicts if key in d] # which ones are mapping types? (aka dict) maps = [value for value in values if isinstance(value, Mapping)] if maps: # if we have any mapping types, call recursively to merge them updated[key] = merge_dicts(*maps) else: # otherwise, just grab the last value we have, since later arguments # take precedence over earlier arguments updated[key] = values[-1] return updated

我一直在testing你的解决scheme，并决定在我的项目中使用这个解决scheme：

 def mergedicts(dict1, dict2, conflict, no_conflict): for k in set(dict1.keys()).union(dict2.keys()): if k in dict1 and k in dict2: yield (k, conflict(dict1[k], dict2[k])) elif k in dict1: yield (k, no_conflict(dict1[k])) else: yield (k, no_conflict(dict2[k])) dict1 = {1:{"a":"A"}, 2:{"b":"B"}} dict2 = {2:{"c":"C"}, 3:{"d":"D"}} #this helper function allows for recursion and the use of reduce def f2(x, y): return dict(mergedicts(x, y, f2, lambda x: x)) print dict(mergedicts(dict1, dict2, f2, lambda x: x)) print dict(reduce(f2, [dict1, dict2]))

将函数作为parameter passing是将jterrace解决scheme扩展为所有其他recursion解决scheme的关键。

我能想到的最简单的方法是：

 #!/usr/bin/python from copy import deepcopy def dict_merge(a, b): if not isinstance(b, dict): return b result = deepcopy(a) for k, v in b.iteritems(): if k in result and isinstance(result[k], dict): result[k] = dict_merge(result[k], v) else: result[k] = deepcopy(v) return result a = {1:{"a":'A'}, 2:{"b":'B'}} b = {2:{"c":'C'}, 3:{"d":'D'}} print dict_merge(a,b)

输出：

 {1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

由于dictviews支持set操作，我能够大大简化jterrace的答案。

 def merge(dict1, dict2): for k in dict1.keys() - dict2.keys(): yield (k, dict1[k]) for k in dict2.keys() - dict1.keys(): yield (k, dict2[k]) for k in dict1.keys() & dict2.keys(): yield (k, dict(merge(dict1[k], dict2[k])))

任何尝试将dict和非dict（技术上来说，一个带有'keys'方法的对象和一个没有'keys'方法的对象）结合起来，都会引发一个AttributeError。这包括对函数的初始调用和recursion调用。这正是我想要的，所以我离开了它。你可以很容易地捕获recursion调用抛出的AttributeErrors，然后产生任何值。

我有另一个稍微不同的解决scheme：

 def deepMerge(d1, d2, inconflict = lambda v1,v2 : v2) : ''' merge d2 into d1. using inconflict function to resolve the leaf conflicts ''' for k in d2: if k in d1 : if isinstance(d1[k], dict) and isinstance(d2[k], dict) : deepMerge(d1[k], d2[k], inconflict) elif d1[k] != d2[k] : d1[k] = inconflict(d1[k], d2[k]) else : d1[k] = d2[k] return d1

默认情况下，它会解决冲突，支持第二个字典中的值，但是您可以轻松地覆盖这个值，有些巫师甚至可以抛出exception。 :)。

字典词典合并

最简单的例子：“叶子是以空的字典结尾的嵌套字典”：

复杂案例：“叶子是其他types”

应用到原来的问题：

合并主干以在Subversion中分支

Git有什么好的（免费的）可视合并工具？（在窗口）

为什么我在Subversion中发生树冲突？

将DLLembedded到已编译的可执行文件中

如何合并2 List <T>与在C＃中删除重复值

有免费的Xml Diff / Merge工具吗？

用PHP合并两个图像

以编程方式将代码添加到JavaScript函数

任何体面的文本差异/合并引擎的.NET？

Git Cherry-pick vs合并工作stream程

字典词典合并

最简单的例子：“叶子是以空的字典结尾的嵌套字典”：

复杂案例：“叶子是其他types”

应用到原来的问题：

合并主干以在Subversion中分支

Git有什么好的（免费的）可视合并工具？ （在窗口）

为什么我在Subversion中发生树冲突？

将DLLembedded到已编译的可执行文件中

如何合并2 List <T>与在C＃中删除重复值

有免费的Xml Diff / Merge工具吗？

用PHP合并两个图像

以编程方式将代码添加到JavaScript函数

任何体面的文本差异/合并引擎的.NET？

Git Cherry-pick vs合并工作stream程

Git有什么好的（免费的）可视合并工具？（在窗口）