eXtra-Large/Jeff

Reviewing data structure and algorithm

I'm waiting for the notice of my interview time, and I know indeed I need to prepare a lot.

So start from several online post by interviewees talking about their past experience. I started from basics of Data Structure. When I browse through the book, I noticed that basically nothing was learned in the DS course I took in BUPT. What a great time I had then, and I got an A- out of that class. I should totally have failed!

Several things I'm taking care of:

1. Binary tree and Sorting

2. Hashing and probing

3. What's the difference between Binary tree and hash table? My current understanding is that Hash table would work better and quick in searching when the data is of finite and known size, while binary tree would be more helpful in providing storing and searching capacity when we have large data set. (why?)

4. Inheritance, what would be inherited in new instances? How does virtual function work?

2008-02-13 23:41:34 GMT

Comments (1 total)

Author:Anonymous

Problems with hash tables

Although hash table lookups use constant time on average, the time spent can be significant. Evaluating a good hash function can be a slow operation. In particular, if simple array indexing can be used instead, this is usually faster.

Hash tables in general exhibit poor locality of reference—that is, the data to be accessed is distributed seemingly at random in memory. Because hash tables cause access patterns that jump around, this can trigger microprocessor cache misses that cause long delays. Compact data structures such as arrays, searched with linear search, may be faster if the table is relatively small and keys are cheap to compare, such as with simple integer keys. According to Moore's Law, cache sizes are growing exponentially and so what is considered "small" may be increasing. The optimal performance point varies from system to system.

More significantly, hash tables are more difficult and error-prone to write and use. Hash tables require the design of an effective hash function for each key type, which in some situations is more difficult and time-consuming to design and debug than the simple comparison function required for a self-balancing binary search tree. In open-addressed hash tables it is fairly easy to create a poor hash function.

Additionally, in some applications, a black hat with knowledge of the hash function may be able to supply information to a hash which creates worst-case behavior by causing excessive collisions, resulting in very poor performance (i.e., a denial of service attack). In critical applications, either universal hashing can be used or a data structure with better worst-case guarantees may be preferable. For details, see Crosby and Wallach's Denial of Service via Algorithmic Complexity Attacks.

2008-02-14 00:09:05 GMT

Site Home