What is difference to index on integer and char?

Question:

>
> I'm counting the net wisdom here in CDOS to help settle an argument.
> There is a prevailing sense around my department that indexing character
> keys are a bad thing, and that such indexes are drastically less efficient
> than numerical ones.  This is resulting in people inventing all kinds of
> kludgy workarounds to circumvent character based indexes.
>
> Their argument, of course, is that the CPU can natively compare numbers in
> a single instruction, but there is no hardware mechanism for comparing
> strings. I believe this argument falls apart at the RDBMS level, as Oracle
> is doing millions of other operations for that query besides a tiny little
> compare.
>
> Here's my reasoning:
>
>         -  Internally, Oracle stores both characters and numbers as bytes.
> Since the Oracle NUMBER datatype isn't necessarily the CPU native integer
> format, Oracle cannot and does not combine the byte values together to
> form a larger integer. (i.e. if you have a NUMBER(4) and a CHAR(4), Oracle
> is doing 4 single byte compares either way).
>
>         -  Performance tests on 6 million row tables have yielded a very
> tiny performance difference, sometimes going either way.
>
>         -  It is the depth of the B-Tree index, the number of leaf blocks,
> and uniqueness that govern performance, not the datatype being indexed.
>
> NUMBER(10) vs. VARCHAR2(10) -- both unique indexes -- Should there be a
> difference?
>
> NUMBER(10) vs. DATE -- both non-unique indexes -- Any difference?
>
> Does anyone want to voice their opinions here? I'm in favor of numerical
> keys, but in some places it's just far more convienant to use varchar
> keys. Of course long varchar indexes will be slower, but we're talking
> 10 or 20 byte varchar's at most.
>
> Comments anyone?

Answer 1:

I suggest that you watch their chins drop when you tell them that Oracle
stores numbers in a proprietary floating point format.

When comparing Oracle will break out early if the comparison is false.
When comparing Number to Number or VARCHAR2/CHAR to VARCHAR2/CHAR for
equality the compare will end at the first unequal byte.  When
comparing NUMBER to VARCHAR2/CHAR a conversion operation is performed
before the compare is performed.

Depending on database activity, size of the SGA, etc. the index may
be RAM resident in which case the depth of the tree may not make
very much difference in terms of performance.

My rule of thumb -- admittedly a carryover from my days as a 3GL
programmer -- is to only use NUMBER when the column is used for
computations.  Otherwise use VARCHAR2 unless valid values are
fixed length and length is used in validity checking in which
case I prefer to use CHAR.

--
Jerry Gitomer
Once I learned how to spell DBA, I became one.

Answer 2:

Surely some of the 'tricks' to get numerical indexes artificially in to a
relational data model will run the risk of creating structures which are far
less efficient to access than a simple non-numerical index would produced.

If you have the artificial example of an isolated table you may be alright
but if you have tables with inter-relationships, referencial integrity
checks for instance (in the database or in the software) you could be
walking through more tables that you otherwise might need.

Regards

Kevin

Answer 3:

I'm absolutelly agree. Main reason: I prefer correct database structure to
questionable perfomance increase. There is a lot of ways for perfomance
turning (indexing, query turning, finally server-side hardware enhancement)
but there is no simple ways for workaround with incorrect datatypes. For
example, if You choosed INTEGER datatype to store document number. I't
possible You got some perfomance increase. But You could
do nothing if You need store a document with '12/7-1' number. And if You
choosed varchar2 datatype and got bad perfomance, You can create simple or
bitmap index, or make table partitioned or install additional memory to
database server mashine and so on... Similar situation is possible with any
non-additive (non-multiplicative etc) attribute. So, it's better to choose
datatypes corresponding to data nature anyway.

Answer 4:

>I'm counting the net wisdom here in CDOS to help settle an argument.
>There is a prevailing sense around my department that indexing character
>keys are a bad thing, and that such indexes are drastically less efficient
>than numerical ones.  This is resulting in people inventing all kinds of
>kludgy workarounds to circumvent character based indexes.
>
>Their argument, of course, is that the CPU can natively compare numbers in
>a single instruction, but there is no hardware mechanism for comparing
>strings. I believe this argument falls apart at the RDBMS level, as Oracle
>is doing millions of other operations for that query besides a tiny little
>compare.

and the fact that we never store numbers in a native format, we cannot.

>
>Here's my reasoning:
>
> -  Internally, Oracle stores both characters and numbers as bytes.
>Since the Oracle NUMBER datatype isn't necessarily the CPU native integer
>format, Oracle cannot and does not combine the byte values together to
>form a larger integer. (i.e. if you have a NUMBER(4) and a CHAR(4), Oracle
>is doing 4 single byte compares either way).
>

well, not really.  A CHAR(4) is 5 bytes long (leading byte length, 4 bytes of
character data).  A Number(4) is from 1 to 4 bytes in length:

ops$tkyte@8i> select vsize(0), vsize(9999), vsize(-9999) from dual;

  VSIZE(0) VSIZE(9999) VSIZE(-9999)
---------- ----------- ------------
         1           3            4

ops$tkyte@8i>

In either case though -- your assumption is correct that we never store numbers
in a table in the native format -- all data (7 bytes of a DATE, N bytes of a
varchar, N bytes of a char, n bytes of a number) are compared using the
equivalent of "memcmp()".  Char data, when using different character sets to
sort and such can take longer in certain cases but typically NOT when using an
index.

> -  Performance tests on 6 million row tables have yielded a very
>tiny performance difference, sometimes going either way.
>

exactly. (sort of fixes the misconception in paragraph 1 above)

> -  It is the depth of the B-Tree index, the number of leaf blocks,
>and uniqueness that govern performance, not the datatype being indexed.
>

pretty much. disk layout - number of 'holes' in the index and other things may
come into play but the above is correct.

>NUMBER(10) vs. VARCHAR2(10) -- both unique indexes -- Should there be a
>difference?
>

not noticable in my experience

>NUMBER(10) vs. DATE -- both non-unique indexes -- Any difference?
>

number 10, varying 1-6  bytes.
DATE fixed length field 7 bytes.

not a noticable difference when indexed.

>Does anyone want to voice their opinions here? I'm in favor of numerical
>keys, but in some places it's just far more convienant to use varchar
>keys. Of course long varchar indexes will be slower, but we're talking
>10 or 20 byte varchar's at most.
>
>Comments anyone?
 

--
See http://osi.oracle.com/~tkyte/ for my columns 'Digging-in to Oracle8i'...
Current article is "Part I of V, Autonomous Transactions" updated June 21'st
 
Thomas Kyte                   [email protected]
Oracle Service Industries     Reston, VA   USA

Opinions are mine and do not necessarily reflect those of Oracle Corporation
 
 

Hosted by www.Geocities.ws

1