计算两个Python字典中包含的键的差异
假设我有两个Python字典 – dictA
和dictB
。 我需要找出在dictB
是否有键,而不是在dictA
。 什么是最快的方式去呢?
我应该将字典键转换成一个集合,然后去?
有兴趣知道你的想法…
感谢您的回应。
抱歉没有正确陈述我的问题。 我的情况是这样的 – 我有一个dictA
可以是相同的dictB
或可能有一些键与dictB
相比缺less,否则某些键的值可能会不同,必须设置为dictA
键的值。
问题是字典没有标准,可以有字典的字典。
说
dictA={'key1':a, 'key2':b, 'key3':{'key11':cc, 'key12':dd}, 'key4':{'key111':{....}}} dictB={'key1':a, 'key2:':newb, 'key3':{'key11':cc, 'key12':newdd, 'key13':ee}.......
所以'key2'的值必须重新设置为新的值,'key13'必须被添加到字典中。 键值没有固定的格式。 它可以是一个简单的价值或字典或字典的字典。
您可以使用键上的设置操作:
diff = set(dictb.keys()) - set(dicta.keys())
这里有一个类来查找所有的可能性:添加的内容,被删除的内容,哪些键值对是相同的,哪些键值对被更改。
class DictDiffer(object): """ Calculate the difference between two dictionaries as: (1) items added (2) items removed (3) keys same in both but changed values (4) keys same in both and unchanged values """ def __init__(self, current_dict, past_dict): self.current_dict, self.past_dict = current_dict, past_dict self.set_current, self.set_past = set(current_dict.keys()), set(past_dict.keys()) self.intersect = self.set_current.intersection(self.set_past) def added(self): return self.set_current - self.intersect def removed(self): return self.set_past - self.intersect def changed(self): return set(o for o in self.intersect if self.past_dict[o] != self.current_dict[o]) def unchanged(self): return set(o for o in self.intersect if self.past_dict[o] == self.current_dict[o])
以下是一些示例输出:
>>> a = {'a': 1, 'b': 1, 'c': 0} >>> b = {'a': 1, 'b': 2, 'd': 0} >>> d = DictDiffer(b, a) >>> print "Added:", d.added() Added: set(['d']) >>> print "Removed:", d.removed() Removed: set(['c']) >>> print "Changed:", d.changed() Changed: set(['b']) >>> print "Unchanged:", d.unchanged() Unchanged: set(['a'])
可作为github回购: https : //github.com/hughdbrown/dictdiffer
如果你想recursion的差异,我已经写了一个python包: https : //github.com/seperman/deepdiff
安装
从PyPi安装:
pip install deepdiff
用法示例
input
>>> from deepdiff import DeepDiff >>> from pprint import pprint >>> from __future__ import print_function # In case running on Python 2
相同的对象返回空
>>> t1 = {1:1, 2:2, 3:3} >>> t2 = t1 >>> print(DeepDiff(t1, t2)) {}
项目的types已更改
>>> t1 = {1:1, 2:2, 3:3} >>> t2 = {1:1, 2:"2", 3:3} >>> pprint(DeepDiff(t1, t2), indent=2) { 'type_changes': { 'root[2]': { 'newtype': <class 'str'>, 'newvalue': '2', 'oldtype': <class 'int'>, 'oldvalue': 2}}}
一个项目的价值已经改变
>>> t1 = {1:1, 2:2, 3:3} >>> t2 = {1:1, 2:4, 3:3} >>> pprint(DeepDiff(t1, t2), indent=2) {'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}
项目添加和/或删除
>>> t1 = {1:1, 2:2, 3:3, 4:4} >>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff) {'dic_item_added': ['root[5]', 'root[6]'], 'dic_item_removed': ['root[4]'], 'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}
string差异
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}} >>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff, indent = 2) { 'values_changed': { 'root[2]': {'newvalue': 4, 'oldvalue': 2}, "root[4]['b']": { 'newvalue': 'world!', 'oldvalue': 'world'}}}
string差异2
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!\nGoodbye!\n1\n2\nEnd"}} >>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n1\n2\nEnd"}} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff, indent = 2) { 'values_changed': { "root[4]['b']": { 'diff': '--- \n' '+++ \n' '@@ -1,5 +1,4 @@\n' '-world!\n' '-Goodbye!\n' '+world\n' ' 1\n' ' 2\n' ' End', 'newvalue': 'world\n1\n2\nEnd', 'oldvalue': 'world!\n' 'Goodbye!\n' '1\n' '2\n' 'End'}}} >>> >>> print (ddiff['values_changed']["root[4]['b']"]["diff"]) --- +++ @@ -1,5 +1,4 @@ -world! -Goodbye! +world 1 2 End
types更改
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}} >>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n\n\nEnd"}} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff, indent = 2) { 'type_changes': { "root[4]['b']": { 'newtype': <class 'str'>, 'newvalue': 'world\n\n\nEnd', 'oldtype': <class 'list'>, 'oldvalue': [1, 2, 3]}}}
列表差异
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3, 4]}} >>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff, indent = 2) {'iterable_item_removed': {"root[4]['b'][2]": 3, "root[4]['b'][3]": 4}}
清单差异2:
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}} >>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff, indent = 2) { 'iterable_item_added': {"root[4]['b'][3]": 3}, 'values_changed': { "root[4]['b'][1]": {'newvalue': 3, 'oldvalue': 2}, "root[4]['b'][2]": {'newvalue': 2, 'oldvalue': 3}}}
列表差异忽略顺序或重复:(与上面相同的字典)
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}} >>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}} >>> ddiff = DeepDiff(t1, t2, ignore_order=True) >>> print (ddiff) {}
包含词典的列表:
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}} >>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff, indent = 2) { 'dic_item_removed': ["root[4]['b'][2][2]"], 'values_changed': {"root[4]['b'][2][1]": {'newvalue': 3, 'oldvalue': 1}}}
集:
>>> t1 = {1, 2, 8} >>> t2 = {1, 2, 3, 5} >>> ddiff = DeepDiff(t1, t2) >>> pprint (DeepDiff(t1, t2)) {'set_item_added': ['root[3]', 'root[5]'], 'set_item_removed': ['root[8]']}
命名的元组:
>>> from collections import namedtuple >>> Point = namedtuple('Point', ['x', 'y']) >>> t1 = Point(x=11, y=22) >>> t2 = Point(x=11, y=23) >>> pprint (DeepDiff(t1, t2)) {'values_changed': {'root.y': {'newvalue': 23, 'oldvalue': 22}}}
自定义对象:
>>> class ClassA(object): ... a = 1 ... def __init__(self, b): ... self.b = b ... >>> t1 = ClassA(1) >>> t2 = ClassA(2) >>> >>> pprint(DeepDiff(t1, t2)) {'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}
添加对象属性:
>>> t2.c = "new attribute" >>> pprint(DeepDiff(t1, t2)) {'attribute_added': ['root.c'], 'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}
不知道是否“快”,但通常情况下,可以做到这一点
dicta = {"a":1,"b":2,"c":3,"d":4} dictb = {"a":1,"d":2} for key in dicta.keys(): if not key in dictb: print key
正如亚历克斯·马尔泰利(Alex Martelli)所写的,如果你只是想检查B中的任何一个键是否不在A中, any(True for k in dictB if k not in dictA)
将是要走的路。
要find丢失的密钥:
diff = set(dictB)-set(dictA) #sets C:\Dokumente und Einstellungen\thc>python -m timeit -s "dictA = dict(zip(range(1000),range (1000))); dictB = dict(zip(range(0,2000,2),range(1000)))" "diff=set(dictB)-set(dictA)" 10000 loops, best of 3: 107 usec per loop diff = [ k for k in dictB if k not in dictA ] #lc C:\Dokumente und Einstellungen\thc>python -m timeit -s "dictA = dict(zip(range(1000),range (1000))); dictB = dict(zip(range(0,2000,2),range(1000)))" "diff=[ k for k in dictB if k not in dictA ]" 10000 loops, best of 3: 95.9 usec per loop
所以这两个解决scheme几乎是相同的速度。
如果你确切的意思是你所说的(你只需要找出如果“B中有任何键”而不是A中的那个,那么可能是那个),最快的方法应该是:
if any(True for k in dictB if k not in dictA): ...
如果你真的需要找出哪些键,如果有的话,在B中,而不是在A中,而不只是“如果”有这样的键,那么现有的答案是非常合适的(但是,如果这是确实是你的意思;-)。
使用set()
:
set(dictA.keys()).intersection(dictB.keys())
关于这个参数 ,还有一个关于stackoverflow的问题,我不得不承认有一个简单的解决scheme:python的datadiff库帮助打印两个字典之间的区别。
这是一种可行的方法,允许评估为False
键,并尽可能使用生成器expression式尽早退出。 虽然这不是特别漂亮。
any(map(lambda x: True, (k for k in b if k not in a)))
编辑:
THC4k回复了我对另一个答案的评论。 这是一个更好,更漂亮的方法来做到以上几点:
any(True for k in b if k not in a)
不知道这是怎么一回事
这是一个古老的问题,并且比我所需要的要less一点,所以这个答案实际上解决了比这个问题更多的问题。 这个问题的答案帮助我解决了以下问题:
- (问)logging两个字典之间的差异
- 将#1的差异合并到基本字典中
- (问)合并两个字典之间的差异(把字典#2当作是一个差异字典)
- 尝试检测项目移动以及更改
- (问)recursion地做这一切
所有这些与JSON结合在一起,都可以提供非常强大的configuration存储支持。
解决scheme( 也在github上 ):
from collections import OrderedDict from pprint import pprint class izipDestinationMatching(object): __slots__ = ("attr", "value", "index") def __init__(self, attr, value, index): self.attr, self.value, self.index = attr, value, index def __repr__(self): return "izip_destination_matching: found match by '%s' = '%s' @ %d" % (self.attr, self.value, self.index) def izip_destination(a, b, attrs, addMarker=True): """ Returns zipped lists, but final size is equal to b with (if shorter) a padded with nulls Additionally also tries to find item reallocations by searching child dicts (if they are dicts) for attribute, listed in attrs) When addMarker == False (patching), final size will be the longer of a, b """ for idx, item in enumerate(b): try: attr = next((x for x in attrs if x in item), None) # See if the item has any of the ID attributes match, matchIdx = next(((orgItm, idx) for idx, orgItm in enumerate(a) if attr in orgItm and orgItm[attr] == item[attr]), (None, None)) if attr else (None, None) if match and matchIdx != idx and addMarker: item[izipDestinationMatching] = izipDestinationMatching(attr, item[attr], matchIdx) except: match = None yield (match if match else a[idx] if len(a) > idx else None), item if not addMarker and len(a) > len(b): for item in a[len(b) - len(a):]: yield item, item def dictdiff(a, b, searchAttrs=[]): """ returns a dictionary which represents difference from a to b the return dict is as short as possible: equal items are removed added / changed items are listed removed items are listed with value=None Also processes list values where the resulting list size will match that of b. It can also search said list items (that are dicts) for identity values to detect changed positions. In case such identity value is found, it is kept so that it can be re-found during the merge phase @param a: original dict @param b: new dict @param searchAttrs: list of strings (keys to search for in sub-dicts) @return: dict / list / whatever input is """ if not (isinstance(a, dict) and isinstance(b, dict)): if isinstance(a, list) and isinstance(b, list): return [dictdiff(v1, v2, searchAttrs) for v1, v2 in izip_destination(a, b, searchAttrs)] return b res = OrderedDict() if izipDestinationMatching in b: keepKey = b[izipDestinationMatching].attr del b[izipDestinationMatching] else: keepKey = izipDestinationMatching for key in sorted(set(a.keys() + b.keys())): v1 = a.get(key, None) v2 = b.get(key, None) if keepKey == key or v1 != v2: res[key] = dictdiff(v1, v2, searchAttrs) if len(res) <= 1: res = dict(res) # This is only here for pretty print (OrderedDict doesn't pprint nicely) return res def dictmerge(a, b, searchAttrs=[]): """ Returns a dictionary which merges differences recorded in b to base dictionary a Also processes list values where the resulting list size will match that of a It can also search said list items (that are dicts) for identity values to detect changed positions @param a: original dict @param b: diff dict to patch into a @param searchAttrs: list of strings (keys to search for in sub-dicts) @return: dict / list / whatever input is """ if not (isinstance(a, dict) and isinstance(b, dict)): if isinstance(a, list) and isinstance(b, list): return [dictmerge(v1, v2, searchAttrs) for v1, v2 in izip_destination(a, b, searchAttrs, False)] return b res = OrderedDict() for key in sorted(set(a.keys() + b.keys())): v1 = a.get(key, None) v2 = b.get(key, None) #print "processing", key, v1, v2, key not in b, dictmerge(v1, v2) if v2 is not None: res[key] = dictmerge(v1, v2, searchAttrs) elif key not in b: res[key] = v1 if len(res) <= 1: res = dict(res) # This is only here for pretty print (OrderedDict doesn't pprint nicely) return res
什么关于标准(比较完整的对象)
PyDev->新的PyDev模块 – >模块:unittest
import unittest class Test(unittest.TestCase): def testName(self): obj1 = {1:1, 2:2} obj2 = {1:1, 2:2} self.maxDiff = None # sometimes is usefull self.assertDictEqual(d1, d2) if __name__ == "__main__": #import sys;sys.argv = ['', 'Test.testName'] unittest.main()
如果在Python上≥2.7:
# update different values in dictB # I would assume only dictA should be updated, # but the question specifies otherwise for k in dictA.viewkeys() & dictB.viewkeys(): if dictA[k] != dictB[k]: dictB[k]= dictA[k] # add missing keys to dictA dictA.update( (k,dictB[k]) for k in dictB.viewkeys() - dictA.viewkeys() )
以下是深入比较2个字典键的解决scheme:
def compareDictKeys(dict1, dict2): if type(dict1) != dict or type(dict2) != dict: return False keys1, keys2 = dict1.keys(), dict2.keys() diff = set(keys1) - set(keys2) or set(keys2) - set(keys1) if not diff: for key in keys1: if (type(dict1[key]) == dict or type(dict2[key]) == dict) and not compareDictKeys(dict1[key], dict2[key]): diff = True break return not diff
这里有一个解决scheme可以比较两个以上的字典:
def diff_dict(dicts, default=None): diff_dict = {} # add 'list()' around 'd.keys()' for python 3 compatibility for k in set(sum([d.keys() for d in dicts], [])): # we can just use "values = [d.get(k, default) ..." below if # we don't care that d1[k]=default and d2[k]=missing will # be treated as equal if any(k not in d for d in dicts): diff_dict[k] = [d.get(k, default) for d in dicts] else: values = [d[k] for d in dicts] if any(v != values[0] for v in values): diff_dict[k] = values return diff_dict
用法示例:
import matplotlib.pyplot as plt diff_dict([plt.rcParams, plt.rcParamsDefault, plt.matplotlib.rcParamsOrig])
不知道它是否仍然相关,但我遇到了这个问题,我只需要返回所有嵌套字典等的变化字典等无法find一个好的解决scheme,但我最终编写了一个简单的function要做到这一点 。 希望这可以帮助,
如果你想要一个内置的解决scheme来完全比较任意的字典结构,@ Maxx的答案是一个好的开始。
import unittest test = unittest.TestCase() test.assertEqual(dictA, dictB)
根据ghostdog74的回答,
dicta = {"a":1,"d":2} dictb = {"a":5,"d":2} for value in dicta.values(): if not value in dictb.values(): print value
将打印不同的口碑值
下面我创build了两个字典。 我需要返回它们之间的关键和价值差异。 我被困在这里。 我不知道哪个方向是正确的。 我需要知道如何获得关键的价值差异。 我想先检查它们是否相同,如果不打印键值差异。 我不想使用深刻的差异。 如果他们是相同的我不知道比较?
num_list = [1,2] val_list = [0,1] dict1 = dict(zip(num_list,val_list)) print dict1 num_list2= [1,2] val_list2 = [0,6] dict2 = dict(zip(num_list2,val_list2)) print dict2 if dict1 == dict2
输出:当前{1:0,2:1} {1:0,2:6}
我的两个字典之间的对称差异的食谱:
def find_dict_diffs(dict1, dict2): unequal_keys = [] unequal_keys.extend(set(dict1.keys()).symmetric_difference(set(dict2.keys()))) for k in dict1.keys(): if dict1.get(k, 'N\A') != dict2.get(k, 'N\A'): unequal_keys.append(k) if unequal_keys: print 'param', 'dict1\t', 'dict2' for k in set(unequal_keys): print str(k)+'\t'+dict1.get(k, 'N\A')+'\t '+dict2.get(k, 'N\A') else: print 'Dicts are equal' dict1 = {1:'a', 2:'b', 3:'c', 4:'d', 5:'e'} dict2 = {1:'b', 2:'a', 3:'c', 4:'d', 6:'f'} find_dict_diffs(dict1, dict2)
结果是:
param dict1 dict2 1 ab 2 ba 5 e N\A 6 N\A f
正如其他答案中所提到的,unittest为比较字典提供了一些很好的输出,但是在这个例子中,我们不想先构build一个完整的testing。
刮unit testing的来源,它看起来像你可以得到一个公正的解决scheme,只是:
import difflib import pprint def diff_dicts(a, b): if a == b: return '' return '\n'.join( difflib.ndiff(pprint.pformat(a, width=30).splitlines(), pprint.pformat(b, width=30).splitlines()) )
所以
dictA = dict(zip(range(7), map(ord, 'python'))) dictB = {0: 112, 1: 'spam', 2: [1,2,3], 3: 104, 4: 111} print diff_dicts(dictA, dictB)
结果是:
{0: 112, - 1: 121, - 2: 116, + 1: 'spam', + 2: [1, 2, 3], 3: 104, - 4: 111, ? ^ + 4: 111} ? ^ - 5: 110}
哪里:
- ' – '表示第一个但不是第二个字典中的键/值
- '+'表示第二个键中的键/值,但不是第一个字典
就像在unit testing中一样,唯一需要注意的是,由于尾部的逗号/括号,最终映射可以被认为是diff。
尝试这个find交叉口,在这两个字典中的键,如果你想在第二个字典没有find的钥匙,只是使用不在 …
intersect = filter(lambda x, dictB=dictB.keys(): x in dictB, dictA.keys())
@Maxx有一个很好的答案,使用Python提供的unittest
工具:
import unittest class Test(unittest.TestCase): def runTest(self): pass def testDict(self, d1, d2, maxDiff=None): self.maxDiff = maxDiff self.assertDictEqual(d1, d2)
然后,在你的代码中的任何地方,你可以调用
try: Test().testDict(dict1, dict2) except Exception, e: print e
得到的输出看起来像diff
的输出,漂亮地用+
或-
打印字典,每个行都是不同的。