学习Python2.7 Data model的心得笔记
Basic customization
object.hash()
每次update dict(如添加新key value, 或者get value)的时候,都会调用__hash__
获取key的hash值,来进行key的查找。
测试如下:
class HashTest(object):
def __init__(self, name):
self.name = name
def __hash__(self):
print "__hash__ was called here!"
return hash(self.name)
test1 = HashTest("hehe1")
test2 = HashTest("hehe2")
dict_test = dict()
dict_test[test1] = 0 # 输出会显示 __hash__ was called here!
dict_test[test2] = 1 # 同样,会显示 __hash__ was called here!
#获取value
dict_test[test1]
# __hash__ was called here!
# 0
dict_test[test2]
# __hash__ was called here!
# 1
如何更新value
test2 = "hehe1" # 我们将test2.name值改成与test1相同,则test1和test2的key的hash值应该相同
dict_test[test2] = 3 # 同样会显示 __hash__ was called here!
理论上现在dict_test里只有一个key就是test1的hash值, 这个key的value为3。但实际并不是
print dict_test
# 输出为 {<__main__.HashTest object at 0x10fcec250>: 3, <__main__.HashTest object at 0x10fcec650>: 0, <__main__.HashTest object at 0x10fcec250>: 1}
原因是dict在做update的时候,会调用__eq__对比key的值(我理解为key的id,或内存位置)。 python源码和分析,摘自这里:
static dictentry *
lookdict(dictobject *mp, PyObject *key, register long hash)
{
register size_t i;
register size_t perturb;
register dictentry *freeslot;
register size_t mask = (size_t)mp->ma_mask;
dictentry *ep0 = mp->ma_table;
register dictentry *ep;
register int cmp;
PyObject *startkey;
i = (size_t)hash & mask;
ep = &ep0[i];
if (ep->me_key == NULL || ep->me_key == key)
return ep;
if (ep->me_key == dummy)
freeslot = ep;
else {
if (ep->me_hash == hash) {
startkey = ep->me_key;
cmp = PyObject_RichCompareBool(startkey, key, Py_EQ); //比较key的值
if (cmp < 0)
return NULL;
if (ep0 == mp->ma_table && ep->me_key == startkey) {
if (cmp > 0) //只有key相等才会返回已有的位置,否则会寻找一个新的位置
return ep;
}
else {
/* The compare did major nasty stuff to the
* dict: start over.
* XXX A clever adversary could prevent this
* XXX from terminating.
*/
return lookdict(mp, key, hash);
}
}
freeslot = NULL;
}
...
}
原来python会调用Rich Compare(PyObject_RichCompareBool)
的方法去对比key的值。
__ne__, __eq__, __lt__, __le__
这些方法都是Rich Compare。
如果在rich comparison有定义的情况下,比较时(也就是==, !=, >=, <=, <, >
时)就直接调用rich comparison方法。不调用cmp。
如果在rich comparison没有定义的情况下,比较时会调用cmp。cmp的返回值分>0, <0, =0
三种情况,分别表示大于,小于,等于。比较式根据cmp的返回情况来返回最后的BOOL值(true或false)。
如果__cmp__和rich comparison都没定义的情况下,比较时,会比较对象的地址
大小。
__cmp__也可以通过cmp(a,b)直接调用。
所以,只要自定义__eq__
方法,让PyObject_RichCompareBool
调用这个自定义的__eq__
,就可以让我们使用自己想定义的key值比较方法。
class HashTest(object):
def __init__(self, name):
self.name = name
def __hash__(self):
print "__hash__ was called here!"
return hash(self.name)
def __eq__(self,r):
if self.name == r.name:
return True
else:
return False
test2 = "hehe1"
dict_test[test2] = 3
print d
# {<__main__.HashTest object at 0x10fcec810>: 3, <__main__.HashTest object at 0x10fcec650>: 1}
for k,v in dict_test.items():
print k.name, v
# hehe1 3
# hehe1 1
NOTE 但是这样, 通过dict_test就访问不到value等于1的对象了,除非改变值test2.name的值。
test2.name = "hehe2"
for k,v in dict_test.items():
print k.name, v
# hehe1 3
# hehe2 1