Index ¦ Archives ¦ Categories ¦ Tags

python Data model 学习笔记

学习Python2.7 Data model的心得笔记

Basic customization

object.hash()

每次update dict(如添加新key value, 或者get value)的时候,都会调用__hash__获取key的hash值,来进行key的查找。 测试如下:

class HashTest(object):
   def __init__(self, name):
           self.name = name
   def __hash__(self):
       print "__hash__ was called here!"
       return hash(self.name)

test1 = HashTest("hehe1")
test2 = HashTest("hehe2")

dict_test = dict()
dict_test[test1] = 0 # 输出会显示 __hash__ was called here!
dict_test[test2] = 1 # 同样,会显示 __hash__ was called here!

#获取value
dict_test[test1]
# __hash__ was called here!
# 0

dict_test[test2]
# __hash__ was called here!
# 1

如何更新value

test2 = "hehe1" # 我们将test2.name值改成与test1相同,则test1和test2的key的hash值应该相同
dict_test[test2] = 3 # 同样会显示 __hash__ was called here!

理论上现在dict_test里只有一个key就是test1的hash值, 这个key的value为3。但实际并不是

print dict_test
# 输出为 {<__main__.HashTest object at 0x10fcec250>: 3, <__main__.HashTest object at 0x10fcec650>: 0, <__main__.HashTest object at 0x10fcec250>: 1}

原因是dict在做update的时候,会调用__eq__对比key的值(我理解为key的id,或内存位置)。 python源码和分析,摘自这里:

static dictentry *
lookdict(dictobject *mp, PyObject *key, register long hash)
{
    register size_t i;
    register size_t perturb;
    register dictentry *freeslot;
    register size_t mask = (size_t)mp->ma_mask;
    dictentry *ep0 = mp->ma_table;
    register dictentry *ep;
    register int cmp;
    PyObject *startkey;

    i = (size_t)hash & mask;
    ep = &ep0[i];
    if (ep->me_key == NULL || ep->me_key == key)
        return ep;

    if (ep->me_key == dummy)
        freeslot = ep;
    else {
       if (ep->me_hash == hash) {
            startkey = ep->me_key;
            cmp = PyObject_RichCompareBool(startkey, key, Py_EQ); //比较key的值
            if (cmp < 0)
               return NULL;
            if (ep0 == mp->ma_table && ep->me_key == startkey) {
                if (cmp > 0) //只有key相等才会返回已有的位置,否则会寻找一个新的位置
                    return ep;
            }
            else {
                /* The compare did major nasty stuff to the
                 * dict:  start over.
                 * XXX A clever adversary could prevent this
                 * XXX from terminating.
                 */
                return lookdict(mp, key, hash);
            }
        }
        freeslot = NULL;
    }
    ...
}

原来python会调用Rich Compare(PyObject_RichCompareBool)的方法去对比key的值。 __ne__, __eq__, __lt__, __le__这些方法都是Rich Compare。

如果在rich comparison有定义的情况下,比较时(也就是==, !=, >=, <=, <, >时)就直接调用rich comparison方法。不调用cmp。

如果在rich comparison没有定义的情况下,比较时会调用cmp。cmp的返回值分>0, <0, =0三种情况,分别表示大于,小于,等于。比较式根据cmp的返回情况来返回最后的BOOL值(true或false)。

如果__cmp__和rich comparison都没定义的情况下,比较时,会比较对象的地址大小。

__cmp__也可以通过cmp(a,b)直接调用。

所以,只要自定义__eq__方法,让PyObject_RichCompareBool调用这个自定义的__eq__,就可以让我们使用自己想定义的key值比较方法。

class HashTest(object):
    def __init__(self, name):
        self.name = name
    def __hash__(self):
        print "__hash__ was called here!"
        return hash(self.name)
    def __eq__(self,r):
        if self.name == r.name:
            return True
        else:
            return False

test2 = "hehe1"
dict_test[test2] = 3
print d
# {<__main__.HashTest object at 0x10fcec810>: 3, <__main__.HashTest object at 0x10fcec650>: 1}

for k,v in dict_test.items():
    print k.name, v
# hehe1 3
# hehe1 1

NOTE 但是这样, 通过dict_test就访问不到value等于1的对象了,除非改变值test2.name的值。

test2.name = "hehe2"
for k,v in dict_test.items():
    print k.name, v

# hehe1 3
# hehe2 1

© Tian Li. Built using Pelican. Theme by Giulio Fidente on github. .