JDK1.6 Map#hash#key位运算

详解

HashMap#indexFor()

1
2
3
4
5
6
7
/**
* Returns index for hash code h.
*/
static int indexFor(int h, int length) {
// 1
return h & (length-1);
}

HashMap#hash()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/**
* Applies a supplemental hash function to a given hashCode, which
* defends against poor quality hash functions. This is critical
* because HashMap uses power-of-two length hash tables, that
* otherwise encounter collisions for hashCodes that do not differ
* in lower bits. Note: Null keys always map to hash 0, thus index 0.
*/
static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}

Redis2.4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
n.size = realsize;  
n.sizemask = realsize-1;
....
hile(de) {
unsigned int h;

nextde = de->next;
/* Get the index in the new hash table */
h = dictHashKey(d, de->key) & d->ht[1].sizemask;
de->next = d->ht[1].table[h];
d->ht[1].table[h] = de;
d->ht[0].used--;
d->ht[1].used++;
de = nextde;
}

标注代码分析

  1. a%b取模的形式都被替换成了a&(b-1) ,当hashtable的长度是2的幂的情况下,这两者是等价的。
  2. hashtable的长度最好是2的n次方。保证:分布更均匀,碰撞几率更小。

    HashMap

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    /**
    * Constructs an empty <tt>HashMap</tt> with the specified initial
    * capacity and load factor.
    *
    * @param initialCapacity the initial capacity
    * @param loadFactor the load factor
    * @throws IllegalArgumentException if the initial capacity is negative
    * or the load factor is nonpositive
    */
    public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
    throw new IllegalArgumentException("Illegal initial capacity: " +
    initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
    initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
    throw new IllegalArgumentException("Illegal load factor: " +
    loadFactor);

    // Find a power of 2 >= initialCapacity
    int capacity = 1;
    while (capacity < initialCapacity)
    capacity <<= 1;

    this.loadFactor = loadFactor;
    threshold = (int)(capacity * loadFactor);
    table = new Entry[capacity];
    init();
    }

redis

1
2
3
4
5
6
7
8
9
10
11
12
/* Our hash table capability is a power of two */  
static unsigned long _dictNextPower(unsigned long size)
{
unsigned long i = DICT_HT_INITIAL_SIZE;

if (size >= LONG_MAX) return LONG_MAX;
while(1) {
if (i >= size)
return i;
i *= 2;
}
}

位运算的效率最高,这也是&取代%的原因。

1
2
3
4
5
6
7
8
9
10
11
12
int main(int argc, char* argv[])  
{
int a = 0x111;
int b = 0x222;
int c = 0;
int d = 0;

c = a & (b-1);
d = a % b;

return 0;
}

反编译

1
2
3
4
5
6
7
8
9
10
11
13:       c = a & (b-1);  
00401044 mov eax,dword ptr [ebp-8]
00401047 sub eax,1
0040104A mov ecx,dword ptr [ebp-4]
0040104D and ecx,eax
0040104F mov dword ptr [ebp-0Ch],ecx
14: d = a % b;
00401052 mov eax,dword ptr [ebp-4]
00401055 cdq
00401056 idiv eax,dword ptr [ebp-8]
00401059 mov dword ptr [ebp-10h],edx

可以看到,&操作用3mov+1and+1sub %操作用2mov+1cdp+1idiv。
可以查阅Coding_ASM_-_Intel_Instruction_Set_Codes_and_Cycles资料,发现前者只需5个CPU周期,而后者至少需要26个CPU周期(注意,是最少!!!) 效率显而易见。所以以后在写的时候,也可以使用前者的写法。
Coding_ASM_-_Intel_Instruction_Set_Codes_and_Cycles

总结

当hashtable的长度是2的幂的情况下,a%b == a&(b-1)。其他条件下,是不成立的。

评论

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×