Author: C.
Mohanasundaram
Date: Feb 16, 2005
Making Code 64 Bit Compliance
Objective
The objective of this document is to list out the places
where one has to be careful while porting an existing code base written for 32 bit
platform to 64 bit platform and writing code which is both 32 bit and 64 bit
compliance.
Difference between 32 bit and 64 bit
platforms
For understanding the things that are to be taken care while
coding one has to understand what exactly the difference between 32 bit
platform and 64 bit platform in terms of C/C++ programs. Even though the C
standard does not put much restriction on the size of data types the industry
has accepted the following standards and sizes of the C/C++ primitive data
types.
C standard suggested sizes and rules
For all
platforms
char is
always 1 byte
short <=
int <= long
short
>=2 bytes
long >=4
bytes
Discussion
of non integrals is out of scope of this document.
Industry accepted sizes and rules
Following
are the rules commonly accepted in industry apart from the C Standard
For 32 bit
platform (ILP32)
int == long ==
pointer == 4 bytes
For 64 bit
platform (LP64)
int == 4
bytes
long ==
pointer == 8 bytes
Now the whole trouble is caused because the industry has
decided to keep the size of the int as 4 bytes and the size of long and pointer
as 8 bytes. It is a habit of all the C/C++ programmers to write code mingling
the data types int and long since they are of same size in 32 bit platform. But
such code will break in 64 bit platforms. In case of pointers the programmers
tend to manipulate the pointers assuming the size as 4 bytes in 32 bit
platforms which will obviously fail in 64 bit platforms. Following are the list
of places where one has to be careful regarding the 32 bit and 64 bit platform
compliance.
While porting we have to look out
for the places where the size of the data types is hard coded as 4 or similar
numbers instead of using size of operators. This will most commonly happen when
using the standard library functions like malloc, memcpy etc. While writing new
code don’t assume sizes and use sizeof operator.
While using bitwise operators the
programmers have no choice other than using integral constants. This has to be
checked carefully while porting. While writing new code the programmer has to
make sure that the integral constant that they use will work in the same way in
both 32 bit and 64 bit platforms.
If the standard function offsetof is not used for get the
offsets of structure members and is done by writing their own code means
trouble. In such code the assumption will be the size of int ==
long == pointer which will obviously fail in 64 bit platform.
C/C++ programmers normally tend to
break the data byte wise assuming the size of the data and manipulate it. One
typical example is breaking the integral values in to individual characters and
send them across network instead of converting the values to real strings using
sprintf or similar functions.
Anything and everything if not
mentioned in C as per Standard will be assumed as int. For example
unsigned a; /* The
compiler will assume it as int */
fun (void); /*
The compiler will assume the return type as int */
#define TRUE 1 /* The
compiler will assume the constant as int */
One typical place whether this assumption of the compiler
might create problem is
#include
<stdio.h>
#define
TURN_SIGN_ON(x) (x |= (1 << ((sizeof(long)*8)-1)))
int main
()
{
int x = 0;
int i =0;
TURN_SIGN_ON(x);
for(i=0;i<sizeof(long)*8;i++)
{
if(x
& 1<<i)
{
printf("1");
}
else
{
printf("0");
}
}
return 0;
}
Output on a 32 bit platform
00000000000000000000000000000001
Output on a 64 bit platform
0000000000000000000000000000000000000000000000000000000000000000
The macro TURN_SIGN_ON is supposed
to turn on the sign bit of an integral. This code will work on 32 bit platforms
irrespective of whether int or long is passed as the parameter. But will fail
on 64 bit platforms irrespective of whether int or long is passed because the
constant 1 used in the macro is
assumed to be an integer which is 4 bytes in both the platforms and totally 63
bits are shifted left in 64 bit platforms and will obviously go out of scope of
the value 1 stored in 4 bytes and
the worst part is that the compiler can not warn this.
Along with the primitive data types
the size of size_t has been unofficially changed to 8 bytes which was
previously 4 bytes in 32 bit platform. This has to be taken in to consideration
while porting the existing code and also while writing new code. I hope an
example is not need since we have seen so many cases and I assume the user can
image the problems that arise due to this change. But the sizeof size_t has to
be checked in the compiler documentation before using it whose size may vary.
The standard functions which return
long like strtol etc used to return a 4 byte value and programmers tend to
store them in int. But will create problem in 64 bit platform where the value
will be truncated. Similarly user defined functions too have to be taken care
of.
C/C++ programmers tend to assume the
pointer size as 4 bytes and manipulate them. Following are the examples which I
have come across in my experience.
If the pointer is converted to int for some manipulation
like using it as key in maps or hash tables or using them as indexes etc it
will work in 32 bit platform and will fail in 64 bit platform since the address
will be truncated when it is converted to int and will be pointing to wrong
address.