Question:
>
> I'm counting the net wisdom here in CDOS to help settle an argument.
> There is a prevailing sense around my department that indexing character
> keys are a bad thing, and that such indexes are drastically less
efficient
> than numerical ones. This is resulting in people inventing
all kinds of
> kludgy workarounds to circumvent character based indexes.
>
> Their argument, of course, is that the CPU can natively compare numbers
in
> a single instruction, but there is no hardware mechanism for comparing
> strings. I believe this argument falls apart at the RDBMS level,
as Oracle
> is doing millions of other operations for that query besides a tiny
little
> compare.
>
> Here's my reasoning:
>
> - Internally,
Oracle stores both characters and numbers as bytes.
> Since the Oracle NUMBER datatype isn't necessarily the CPU native
integer
> format, Oracle cannot and does not combine the byte values together
to
> form a larger integer. (i.e. if you have a NUMBER(4) and a CHAR(4),
Oracle
> is doing 4 single byte compares either way).
>
> - Performance
tests on 6 million row tables have yielded a very
> tiny performance difference, sometimes going either way.
>
> - It is the
depth of the B-Tree index, the number of leaf blocks,
> and uniqueness that govern performance, not the datatype being indexed.
>
> NUMBER(10) vs. VARCHAR2(10) -- both unique indexes -- Should there
be a
> difference?
>
> NUMBER(10) vs. DATE -- both non-unique indexes -- Any difference?
>
> Does anyone want to voice their opinions here? I'm in favor of numerical
> keys, but in some places it's just far more convienant to use varchar
> keys. Of course long varchar indexes will be slower, but we're talking
> 10 or 20 byte varchar's at most.
>
> Comments anyone?
Answer 1:
I suggest that you watch their chins drop when you tell them that Oracle
stores numbers in a proprietary floating point format.
When comparing Oracle will break out early if the comparison is false.
When comparing Number to Number or VARCHAR2/CHAR to VARCHAR2/CHAR for
equality the compare will end at the first unequal byte. When
comparing NUMBER to VARCHAR2/CHAR a conversion operation is performed
before the compare is performed.
Depending on database activity, size of the SGA, etc. the index may
be RAM resident in which case the depth of the tree may not make
very much difference in terms of performance.
My rule of thumb -- admittedly a carryover from my days as a 3GL
programmer -- is to only use NUMBER when the column is used for
computations. Otherwise use VARCHAR2 unless valid values are
fixed length and length is used in validity checking in which
case I prefer to use CHAR.
--
Jerry Gitomer
Once I learned how to spell DBA, I became one.
Answer 2:
Surely some of the 'tricks' to get numerical indexes artificially in
to a
relational data model will run the risk of creating structures which
are far
less efficient to access than a simple non-numerical index would produced.
If you have the artificial example of an isolated table you may be alright
but if you have tables with inter-relationships, referencial integrity
checks for instance (in the database or in the software) you could
be
walking through more tables that you otherwise might need.
Regards
Kevin
Answer 3:
I'm absolutelly agree. Main reason: I prefer correct database structure
to
questionable perfomance increase. There is a lot of ways for perfomance
turning (indexing, query turning, finally server-side hardware enhancement)
but there is no simple ways for workaround with incorrect datatypes.
For
example, if You choosed INTEGER datatype to store document number.
I't
possible You got some perfomance increase. But You could
do nothing if You need store a document with '12/7-1' number. And if
You
choosed varchar2 datatype and got bad perfomance, You can create simple
or
bitmap index, or make table partitioned or install additional memory
to
database server mashine and so on... Similar situation is possible
with any
non-additive (non-multiplicative etc) attribute. So, it's better to
choose
datatypes corresponding to data nature anyway.
Answer 4:
>I'm counting the net wisdom here in CDOS to help settle an argument.
>There is a prevailing sense around my department that indexing character
>keys are a bad thing, and that such indexes are drastically less efficient
>than numerical ones. This is resulting in people inventing all
kinds of
>kludgy workarounds to circumvent character based indexes.
>
>Their argument, of course, is that the CPU can natively compare numbers
in
>a single instruction, but there is no hardware mechanism for comparing
>strings. I believe this argument falls apart at the RDBMS level, as
Oracle
>is doing millions of other operations for that query besides a tiny
little
>compare.
and the fact that we never store numbers in a native format, we cannot.
>
>Here's my reasoning:
>
> - Internally, Oracle stores both characters and numbers as
bytes.
>Since the Oracle NUMBER datatype isn't necessarily the CPU native
integer
>format, Oracle cannot and does not combine the byte values together
to
>form a larger integer. (i.e. if you have a NUMBER(4) and a CHAR(4),
Oracle
>is doing 4 single byte compares either way).
>
well, not really. A CHAR(4) is 5 bytes long (leading byte length,
4 bytes of
character data). A Number(4) is from 1 to 4 bytes in length:
ops$tkyte@8i> select vsize(0), vsize(9999), vsize(-9999) from dual;
VSIZE(0) VSIZE(9999) VSIZE(-9999)
---------- ----------- ------------
1
3 4
ops$tkyte@8i>
In either case though -- your assumption is correct that we never store
numbers
in a table in the native format -- all data (7 bytes of a DATE, N bytes
of a
varchar, N bytes of a char, n bytes of a number) are compared using
the
equivalent of "memcmp()". Char data, when using different character
sets to
sort and such can take longer in certain cases but typically NOT when
using an
index.
> - Performance tests on 6 million row tables have yielded a very
>tiny performance difference, sometimes going either way.
>
exactly. (sort of fixes the misconception in paragraph 1 above)
> - It is the depth of the B-Tree index, the number of leaf blocks,
>and uniqueness that govern performance, not the datatype being indexed.
>
pretty much. disk layout - number of 'holes' in the index and other
things may
come into play but the above is correct.
>NUMBER(10) vs. VARCHAR2(10) -- both unique indexes -- Should there
be a
>difference?
>
not noticable in my experience
>NUMBER(10) vs. DATE -- both non-unique indexes -- Any difference?
>
number 10, varying 1-6 bytes.
DATE fixed length field 7 bytes.
not a noticable difference when indexed.
>Does anyone want to voice their opinions here? I'm in favor of numerical
>keys, but in some places it's just far more convienant to use varchar
>keys. Of course long varchar indexes will be slower, but we're talking
>10 or 20 byte varchar's at most.
>
>Comments anyone?
--
See http://osi.oracle.com/~tkyte/ for my columns 'Digging-in to Oracle8i'...
Current article is "Part I of V, Autonomous Transactions" updated June
21'st
Thomas Kyte
[email protected]
Oracle Service Industries Reston, VA
USA
Opinions are mine and do not necessarily reflect those of Oracle Corporation