© 1992-1999 Microsoft Corporation

Back

Hardware White Paper

Long Filename Specification

Version 1.0, June 22, 2003

This document contains the design to support long filenames in the FAT file system.

Contents

1. Overview .....................................................................................................................................6

2. Problem Description and Objectives ................................................................................................6

3. Solution and Justification ...............................................................................................................7

4. Data Structures Description...........................................................................................................15

5. Design Overview..........................................................................................................................15

6. Exported Interfaces ......................................................................................................................15

7. Issues .........................................................................................................................................19

1. Overview

This document contains the design to support long filenames in the FAT file system.

2. Problem Description and Objectives

The length supported for a long filename must be a minimum of 32 bytes. That is each component of

a long path name may contain at least 32 character. The maximum file path name is limited to 260

characters. This is the limit imposed for single byte character path names on Windows NT®.

Defining a reasonable maximum limit to the filename length makes application writing easier. This

way the application knows the maximum size of a buffer that is required to hold the entire file path

name.

Long file names will be stored with the case preserved; they will not be translated into upper case as

8.3 file names are. The case of a long name will not be matched on finds, opens, creates, etc., the

match will be case insensitive. The case in a long name, which is stored when it is created, will be

returned on find operations. Therefore if a file is created with the name "Foo", then later searched for

through a find operation with the name "FOO", the same file will be found. If a long-name find

operation is used, the file name "Foo" will be returned. This also means that the long names "Foo"

and "fOO" may not exist in the same directory.

The name space between long and 8.3 names will be common. In other words a long name for a file

cannot be the same as a short name in another file. If a file is created with a short name API the long

name is forced to be the same. If there is a conflict it will always occur in the short name space. For

example, if the file "FOO" is created by a short name API, the long name will be forced to be "FOO".

To insure this can take place, a file created with a long name API that is given a valid 8.3 name, the

short name must be forced to be the same. For example, if the file "Foo" is created by a long name

API, the short name will be forced to be "FOO". If there is a conflict in either the long or short name

space then an error must be returned.

The long name APIs will operate uniformly on both long and short names. An open of a file, through

a long name API, does not match case and will open either a long or short name which ever exists.

The 8.3 APIs will only operate on the short name space. The long name space is not visible to these

APIs.

Logically both a long and a short name will always be created for a file. In practice if a long name

API creates a file that is a valid 8.3 name, only an 8.3 entry may be created for the name. If the name

is longer than 8.3, or contains lower case characters, then a long name entry must be created for the

file and an 8.3 name will automatically be generated from it and stored in the 8.3 entry. If a long

name is renamed, the 8.3 name stored for it will by default be automatically recreated. The

application will not be allowed to define, or independently change the short name that is automatically

created. If an existing Int 21h rename function is called to rename an 8.3 filename that also has a long

name, the long name will be changed as well to the 8.3 name. This may actually be stored only in the

8.3 entry.

The valid character set for long file names includes the current set of valid file name characters for the

FAT based 8.3 file system plus a few additional characters. There are no characters removed from the

valid file name character set defined for MS-DOS® operating system versions >= 3.00. The biggest

difference, in terms of valid filename characters, is that LOWER CASE characters are allowed in the

name stored in the directory entry. Note also that SPACE (20h) is a valid file name character, and

note that this is not a change from the existing 8.3 MS-DOS valid file name character convention.

The current directory is limited to 64 characters based on the 8.3 names. The long name for the

current directory may be much longer, up to 260 bytes, however setting the current directory through a

long name API can not result in the 8.3 current directory exceeding 64 characters. This is required for

compatibility with the existing current directory limitations.

The handling of long names in the Macrotech® Windows® operating system must be consistent with

the NTFS and OFS file systems. Both of these file systems support long and short names, and are

fully compatible with each other. Any solution provided on FAT must be compatible with these

environments as well.

The long name APIs for MS-DOS and WIN16 applications will be multiplexed through a single Int

21h function.

It is desirable to keep the existing FAT format compatible so that long name media may be

interchanged with a down level system.

3. Solution and Justification

The original idea for supporting long filename was to have a separate hidden file in each directory that

contains the long names. This file would have to contain both long and 8.3 names. There are a

number of problems with this idea related to keeping this long name directory file in sync. As a result

it has been discarded in favor of the preferred solution defined below.

3.1. Proposed Solution

One idea has been proposed to use additional directory entries in addition to the existing 8.3 entry to

contain the long name. This idea adds additional 32 byte directory entries adjacent to the existing

directory entry. The existing directory entry would be unmodified to provide the 8.3 name

compatibility.

The initial idea was to setup the additional entries to look like unused (deleted) 32 byte directory

entries. This approach has since been changed to use valid file entries with the attribute byte set to

HIDDEN, SYSTEM, READ-ONLY, and VOLUME LABEL. Each additional directory entry also

contains a signature byte and a byte checksum, or CRC, of the short 8.3 name. The First Cluster field

of these entries is also set to 0.

The setting of the volume label makes these entries almost invisible on down level systems, and

prevents them from being accidentally used. The TRUE 8.3 volume label directory entry has only the

VOLUME LABEL bit set in the attribute byte. This allows the LFNFAT file system to be able to

easily distinguish the true volume label entry and removes any order restriction on the true volume

label . The true volume label entry can be anywhere in the root directory, it doesn't have to be first,

but we would like it to be so that down level systems will find the correct label.

The signature byte and the attribute byte form a validation signature that identifies the entry as a long

name for the associated file. The signature byte contains an ordered value for each additional

directory entry (each entry, first to last, has a unique signature value).

The byte following the attribute byte contains a check sum, or CRC of the 8.3 file name. This check

sum is used to help insure the long name is valid for the associated 8.3 directory entry. This is useful

when a disk is taken to a down level system and the 8.3 names are changed.

The "first cluster" field of the long name directory entry is restricted to always be zero. This prevents

existing tools that directly access the disk from thinking that clusters are cross-linked by these long

name directory entries. With out this data could be lost by these tools trying to "clean up" the disk.

The remaining bytes in the long name directory entries are used to contain long file name characters.

Since the long name directory entries are not visible files on down level systems, any restrictions on

using the 10 unused 32-byte directory entry bytes can probably be relaxed. Each long name entry then

contains 27 available bytes. With the setting of the attribute byte, checksum byte, signature byte, and

zeroing the first_cluster field in the long name entries, these available bytes are not contiguous.

Instead they are separated into three disjoint pieces.

Originally we were considering using a fixed number of additional entries to support long names

greater than 32 characters, (the Mac file name size). However, it is desirable to handle long names

compatibly with the NTFS and Win32® API, which support up to 255-character file names. To

support file names up to 255 characters long, the number of long name directory entries used will be

dynamically allocated. Only the number of entries required to hold the long name will be used. If the

long name is 27 characters or less, a single entry will be used. If the long name is more than 27

characters, multiple entries will be used, up to a maximum of 255 characters. The only difference in

the format of multiple long name entries will be in the signature byte. The signature byte is an

ordered value for each entry in a given long name. Using ordered signature values per entry simply

helps reduce the possibility of invalid detection of a long name directory. This way, by looking at the

signature in the long name entry, the position of the entry within the long name can be identified.

If the long name does not fill the entire entry, then a null terminator will be placed in the string so that

additional spaces, or garbage bytes will not be returned. The long name directory entries will not be

padded with spaces as the 8.3 directory entries are. When a long name directory entry is allocated, the

parts of it reserved for name characters are filled with 0FFh. When a long name is shortened, through

rename, the extra bytes in the last extension past the new position of the terminating null are 0FFhfilled.

The reason for this is an artifact of the behavior of NDD and DISKFIX, which will allow the

Windows disk repair code to detect that one of these applications has modified the entry. This is an

easy addition to the rename/create code and is of low performance impact.

The additional entries immediately precede the existing 8.3 directory entry. By placing the long name

entries prior to the 8.3 directory entry they may be cached while searching for a valid 8.3 directory

entry. Once a valid entry is found the long name entries will already be available in memory. This

simplifies the code to implement long file names by preventing the need to handle boundary

conditions while searching for long names at the end of the directory.

There is no reason that this directory extension architecture cannot be applied to the Volume Label as

well as directory and file names. This allows volume labels to have up to 255 characters, the same as

file names.

The use of additional directory entries has the advantage that it keeps the long name associated with

the true directory entry. This makes keeping the long and short names in sync much easier. The

search for a filename would be more localized on the disk, which would make it faster. It would not

require as much disk space due to a single copy of the 8.3 name and a single cluster chain being

allocated, rather than two cluster chains for the hidden file mechanism.

The only other problems that have been identified are performance concerns with extracting the long

filenames. The long name signature needs to be checked, and the short name needs to be checksumed

to validate the long name. The long name itself may also be discontiguous if it spans 2 directory

entries. It is also possible for a long name to spans across a disk sector boundary. Despite these

performance concerns the impact is not as bad as implementing long names through a hidden file.

There are also remote possibilities that a deleted non-long name directory entry contains a valid

signature, which would be misdetected as a long name entry. The checksum of the 8.3 name also has

the possibility of matching when it should not. These are both possible cases that could exist,

however they are very rare and would simply result in a long name incorrectly being associated with

an 8.3 directory entry.

This architecture allows both new and old format directory entries to be intermixed in the same

directory. It would be nice NOT to support this intermixing for performance and code complexity

reasons, but this is required anyway to allow support of old disk utilities and changes made to the

disks by down level systems. It also saves space in the directory entry. This is important in the root

directory since it is fixed in length and cannot be dynamically grown.

3.2. Automatic Name Generation

The name space used for long and short names is considered to be a single name space such that

conflicts will not occur between long and short names. In other words, a long name for a file may not

contain the same name, ignoring case, as the short name in a different file. This prevents confusion

for the user regarding the name of a file. To handle this, when ever a file name is created or changed

the alternate name is automatically generated for the file. If a long name API is used, then the short

name will be automatically generated. If an existing short name API is used, then the long name will

be automatically generated.

3.2.1. The basic generation algorithm

The basic generation algorithm has two parts. The first part is to generate a starting name or basis,

and the second part is to modify the basis until the name is unique.

For example, if the user specified a long name of "LONGNAMEFILE.EXE" the closest corresponding

8.3 name will be generated with the numeric value "-1" added to use as a basis. In this example the

8.3 basis of the long name is "LONGNA-1.EXE". If the file name "LONGNA-1.EXE" already exists

in the directory, then the second part of the algorithm is used to modify the name until it is unique. In

this example, "LONGNA-1.EXE" will be changed to "LONGNA-2.EXE", and if that is not unique

then "LONGNA-3.EXE", and so forth.

The basis name that is tried always contains the numeric value, starting with "-1". This is done to

prevent auto-generated names from using up the more commonly used 8.3 name space. This also

makes it somewhat easier to identify auto generated 8.3 names.

3.2.2. File name extensions

An important goal in the automatic name generation strategy is to preserve the file extensions as much

as possible. For example, if a long file name ends in ".EXE" then the 8.3 name will also end in

".EXE". Note that long names can have multiple extensions (defined by having multiple periods in

the file name), but even then an attempt will be made to make some sense out of the extension.

3.2.3. Generating long names from 8.3 names

When an 8.3 name API creates a file, the long name is always defined as being the 8.3 name. If a

conflict exists an error will be returned and the creation of the file will fail. Note this will only occur

if an 8.3 name already exists with the same name. Existing applications will always be able to locate

the file that is in conflict.

3.2.4. Generating 8.3 names from long names

When a long name API creates a file, the long name is first checked to see if it conforms to a valid 8.3

format, that is, ignoring case. If it does conform, then the long name is defined as the 8.3 name. If a

conflict exists an error will be returned and the creation of the file will fail. This allows the creation of

an 8.3 name to always be able to create a long name that is the same as the 8.3 name. Note that the

file name causing the long name API to fail may be an 8.3 file name.

The following steps are used to form an 8.3 basis from a long name.

1. Remove all spaces. For example "My File" becomes "MyFile".

2. Initial periods, trailing periods, and extra periods prior to the last embedded period are removed.

For example ".logon" becomes "logon", "junk.c.o" becomes "junkc.o", and "main." becomes "main".

3. Translate all illegal 8.3 characters into "_" (underscore). For example, "The[First]Folder" becomes

"The_First_Folder".

4. If the name does not contain an extension then truncate it to 6 characters. If the names does contain

an extension, then truncate the first part to 6 characters and the extension to 3 characters. For

example, "I Am A Dingbat" becomes "IAmADi" and not "IAmADing.bat", and "Super Duper

Editor.Exe" becomes "SuperD.Exe".

5. If the name does not contain an extension then a "-1" is appended to the name. If the name does

contain an extension, then a "-1" is appended to the first part of the name, prior to the extension. For

example, "MyFile" becomes "MyFile-1, and "Junk.bat" becomes "Junk-1.bat". This numeric value is

always added to help reduce the conflicts in the 8.3 name space for automatic generated names.

6. This step is optional dependent on the name colliding with an existing file. To resolve collisions

the decimal number, that was set to 1, is incremented until the name is unique. The number of

characters needed to store the number will grow as necessary from 1 digit to 2 digits and so on. If the

length of the basis (ignoring the extension) plus the dash and number exceeds 8 characters then the

length of the basis is shortened until the new name fits in 8 characters. For example, if "FILENA-

1.EXE" conflicts the next names tried are "FILENA-2.EXE", "FILENA-3.EXE", ..., "FILEN-

10.EXE", "FILEN-11.EXE", etc.

3.2.5. Handling international character sets

The long name character set will always be ANSI. This stays compatible with windows applications

where everything is ANSI. In the future this will be expanded to Unicode, which is a super set of

ANSI. The short name character set will always be OEM. This stays compatible with MS-DOS and

existing windows applications that store 8.3 file names using the OEM character set.

The problem with the OEM code page is when it converts from lower to upper case it maps multiple

extended characters to a single upper case character. This creates problems because it does not

preserve the information that the extended character provides. It also prevents the creation of some

file names that are different, but because of the mapping to upper case they become the same file

name.

The ANSI code page solves this problem by always providing a translation for lower case characters

to a single and separate upper case character. It would be nice to use ANSI for both long and short

names, but this creates problems with MS-DOS applications, which expect the short names to be in

the OEM code page. If we used ANSI for short file names, the MS-DOS application would no longer

be able to properly display many of the characters in these file names since the OEM code page does

not contain a number of the upper case characters that exist in ANSI. This means file names

containing these ANSI characters would be displayed as some strange OEM code page symbol.

The mapping of short names to long names is not a problem for normal text characters. Since all

upper case OEM characters can be mapped to the proper character in the ANSI character set, the short

names can always be mapped and displayed using the ANSI character set. If extended graphics

characters are used in the short file name then the mapping will not be as consistent. These characters

are not commonly used in file names and no special handling is provided for them.

The automatic generation of short names from long names creates more problems. Conflicts can

occur when different long names that fit within the 8.3 format contain extended characters. To

prevent these conflicts in the mapping of long ANSI names to short OEM names the -number (-1,-

2,...) will be added to a short filename when the long name contains an extended character that maps

to a character below 128.

To make this more generic, if the long name does not map directly to the short name such that each

character has a valid upper case translation, then the -number will be appended to the short naame.

This addition of the -number takes place even if the name would otherwise fit within a valid 8.3

format. This helps prevent collisions in the short name space as a result of the mapping. This

includes the mapping of invalid characters to an underscore, which will result in the same type of

conflict.

This use of the -number is consistent with the handling for long names that do not fit in the 8.3 space.

The use of extended characters simply forces the generation of a short name to follow the same rules

as names that don't fit in the 8.3 space, even if this would otherwise not be the case.

The mapping of the extended characters follows the same process that is uses today in Windows 3.1 to

map ANSI to OEM. This remains consistent with what Windows has done in the past.

This allows long names to preserve the extended character information, and always map to a nonconflicting

short name that can be displayed using the OEM character set.

Note this is not consistent with the NTFS solution, which maps all extended characters to an

underscore in the short name. The NTFS file system handles the mapping of extended characters from

long to short names by treating any character greater than 127 as an invalid character. The creation of

a short name converts all invalid characters in to an underscore '_'. This prevents them from having

any knowledge of code pages in the translation of characters. It also does not solve the problem

caused by mapping multiple characters into a single character.

3.3. Search Operations

For short name APIs, the passed in name is in the OEM character set. This name is converted to

upper case before it is used in a search operation in the short name space. The long name space is not

searched for short name APIs.

For long name APIs, the passed in name is in the ANSI character set. A long name API search

operation must cover both the long and short name spaces. The long-name find API will return both

the long and short name in the result buffer. The returned long name will be in the ANSI character set

and the short name will be in the OEM character set.

To handle a long name search for a file with extended characters, two conversions will take place on

the passed in name. For searching the long name entries the name will be converted to upper case

using the ANSI character set. Each entry long name directory entry will also be converted to upper

case for the comparison. For searching the short name entries the name will first be converted to

upper case using the ANSI character set, then it will be converted from ANSI to the OEM character

set. If the conversion to the OEM character set results in a mapping of an extended character to a

character below 128, then the short names will not be searched. This is because the name being

searched for cannot exist in the OEM character set.

For short file names that are stored as only a short name and containing extended characters, the long

name find API will convert the short name to ANSI and return the result as the long name for the file.

This conversion from OEM to ANSI will be the same as that done in Windows 3.1.

3.4. MS-DOS APPLICATION Support of Long Names

MS-DOS applications will not be prevented from using the long name APIs. However MS-DOS

applications may not be able to properly display a long file name. This occurs because long file

names use the ANSI character set and may contain ANSI characters that cannot be properly displayed

using the OEM character set. The MS-DOS applications must be aware of this and deal with these

ANSI file names as they see appropriate.

This is also a reason that we should not provide long name support in the MS-DOS command line

utilities that we provide with the system.

3.5. Effect of Down Level Systems

The support of long file names is most important on the hard disk, however it will be supported on

removable media as well. The proposed architecture provides support for long names without

breaking compatibility with the existing FAT format. A disk can be read by a down level system with

out any compatibility problems. An existing disk does not go through a conversion process before it

can start using long names. All of the current files remain unmodified. The long name directory

entries are added when a long name is created. The addition of a long name to an existing file may

require the 8.3 directory entry to be moved if the required adjacent directory entries are not available.

The long name entries are as hidden as hidden or system files are on a down level system. This is

enough to keep the casual user from causing problems. The user can copy the files off using the 8.3

name, and put new files on without any side effects.

The interesting part of this is what happens when the disk is taken to a down level MS-DOS system

and the directory is changed. This can affect the long name entries since the down level system

ignores these long names and will not insure they are properly associated with the 8.3 name.

A down level system will only see the long name entries when searching for a label. On a down level

system, the volume label will be incorrectly reported if the true volume label does not come before all

of the long name entries in the root directory. This is because the long name entries also have the

volume label bit set. This is unfortunate, but not considered a critical enough problem considering the

alternatives.

If an attempt is made to remove the volume label, one of the long name directory entries may be

deleted. This would be a rare occurrence, but should be easy to detect on the Windows system, the

long name entry will not be a valid file entry, since it will be marked as deleted. If the deleted entry is

reused, then the attribute byte will not have the proper value for a long name entry. The down side to

this is a subsequent attempt to create a volume label will fail if the true volume label still exists. The

label command could run into this when it renames a volume label, which would be confusing to the

user.

If a file is renamed on a down level system, then only the short name will be renamed. The long name

will not be affected. Since the long and short names must be kept consistent across the name space, it

is desirable to have the long name become invalid as a result of this rename. The checksum, or CRC,

of the 8.3 name that is kept in the long name directory provides the ability to detect this type of

change. This checksum will be checked to validate the long name before it is used. Rename will

cause problems only if the renamed 8.3 file name happens to have the same checksum. We need to

carefully evaluate the checksum method we use. The "duplicate checksum" frequency is critical to

how frequent this bad behavior could be.

This rename of the 8.3 name must also not conflict with any of the long names. Otherwise a down

level system could create a short name in one file that matches a long name, when case is ignored, in a

different file. To prevent this, the automatic creation of an 8.3 name from a long name that follows

the 8.3 format will directly maps the long name to the 8.3 name by converting the characters to upper

case.

If the file is deleted, then the long name is simply abandon. If a new file is created, the long name

may be incorrectly associated with the new file name. As in the case of a rename the checksum of the

8.3 name will help prevent this incorrect association.

3.6. Affect on Existing Disk Utilities

Chkdsk on a down level system does not complain about these long names on a disk and finds nothing

wrong with them.

The following are some evaluations that have been done using PC-TOOLS and NORTON to

understand the effect they have on a disk containing long file names.

SPEEDDISK sorts directories without paying any attention to the "extra volume labels". The fact that

this moves the LFN entries is the only problem with this. This is the reason for the signature bytes in

the extensions to identify the extension order so that the directory can be automatically fixed by

Windows chkdsk in all cases where there are not checksum collisions in the directory. Without the

ordering of the signatures, the USER will have to be asked to group the extensions in the proper order

for all files with more than one LFN extension.

COMPRESS directory sort seems to be totally disabled by this. It compresses the disk and leaves the

directories completely alone. Thus running COMPRESS on an LFN disk seems to have no side

effects.

NDD does not like the file size field being non-0 when the first cluster pointer is 0 and wants to

overwrite the file size field with 0. To handle this the unused bytes of the extension entries are nonzero

padded, this allows this mod to be detected by the Windows chkdsk and the user asked to help fix

it. This is the only thing that NDD seems to be upset about.

DISKFIX has the same behavior as NDD regarding the file size field, plus it gets very upset when

certain of the 10 reserved bytes are non zero. In the case where one of the 10 reserved bytes it cares

about has something non-zero in it, it COMPLETELY erases the entry. It writes E5 in the first byte,

and 0 in all other bytes. For this reason, the check sum byte is placed in this reserved area. This way

the behavior of DISKFIX will become more predictable, it will totally erase all LFN extension entries.

This is bad, but its going to do something bad no matter how this is handled. It is preferable to have a

predictable bad behavior than an unpredictable one.

CPS MS-DOS Anti-Virus and McAfee SCAN do not think these disks are infected.

None of these applications seems to care about the extensions having MS-DOS 5.00 illegal file name

characters in them. The invalid characters are totally ignored. UNDELETE may have a different

opinion.

OTHER INTERESTING INFORMATION COLLECTED:

Set the directory bit is bad. Downlevel CHKDSK reports all the extensions as invalid sub directories.

NDD is upset and turns off some of the DIR attribute bits (not all of them interestingly?!?!?!?!?).

DISKFIX will erase all of the LFN extensions regardless of whether the 10 reserved bytes are nonzero.

COMPRESS bails: Insufficient memory. SPEEDDISK unaffected. Not setting the VOLUME

LABEL bit is bad. Downlevel CHKDSK will find the extension files and complain about the fact that

the file size field is wrong and want to 0 it.

Non zero values in First Cluster is very bad. If what is in there happens to look like a cross link, very

evil things start happening. NDD got very upset and changed one of the 8.3 entries to 0 size (because

it was cross linked), the data became "lost" and was assigned to an NDD "lost data" file.

SPEEDDISK and COMPRESS hung, or reported disk trashed and bailed. CHKDSK and DISKFIX

were unaffected.

3.7. Network and Other File System Support

Long file names will be supported in the protected mode FAT file systems in Windows. However,

other file systems may be accessed by Windows that might not support long file names. The most

common case will be network resources. There are three cases that must be handled here, systems

with 8.3 name only, systems with long names only, and system with both long and 8.3 names.

A file system that only supports 8.3 names, such as CD-ROM, handles the 8.3 name APIs with out an

issue. When a long name API is used to access a file, the API will be mapped into its corresponding

8.3 API. This mapping process will validate the file name to insure that it is a valid 8.3 name. If the

name is longer than 8.3, an error will be returned to the caller. The long name will not be passed to

the file system since it would truncate the name and not return an error. The IFSMGR must know if a

file system driver handles long name or not in order to provide this support. The IFSMGR is

responsible for making sure that long names are never passed to a file system that cannot handle them.

This may require a new "Get FSD Info" function to provide this information to a long name aware

application. This will allow long name aware applications to adjust their behavior when running on

non long name file systems.

The HPFS file system is an example of a file system that only supports long name. The results of an

8.3 name API request to a long name only file system will depend on the particular file system. For

HPFS all valid 8.3 file names will be handled as expected. These APIs will be able to access all of the

files that have a valid 8.3 file name. The long names, those that do not fit in the 8.3 format, will not

be visible through these APIs. The long name APIs will be able to access all files on the file system.

The one difference will be in the handling of the long name find API. The Win32 find API returns

both the long and 8.3 name for a file. On a long name only file system the 8.3 file name is not

defined, therefore the Win32 find API must allow the returned short name to be undefined. This is not

a change in the Win32 find API return structure, it simply means that a null string may be returned for

the short name.

File systems that support both long and 8.3 file names are not an issue, except for networked devices.

The current LANMAN SMB protocol does not support returning both long and 8.3 names on a find

operation. This is required for the Win32 find API. To support this the transact2 SMB protocol will

be used with a new function defined.

3.8. FAT File Name Character Set

A valid FAT file system file names has the following form:

o [\][directory\]filename[.extension]

The directory parameter specifies the directory that contains the file's directory entry. The directory

may be made up of multiple components, each being separated by a backslash (\). The last directory

component must be followed by a backslash (\) to separate it from the filename. There are two special

directory components, (.) and (..). The director component (.) represents the current directory. The

directory component (..) represents the directory one level up (closer to the root). If the specified

directory is not in the current directory, the directory must include the names of all directories in the

path, separated by backslashes. The root directory is specified by using a backslash at the beginning

of the name. For example, if the directory "abc" is in the directory "sample" and "sample" is in the

root directory, the correct directory specification is "\sample\abc".

In an 8.3 file name, the directory name and file name consist of up to 8 characters followed by an

optional period (.) and extension of up to 3 characters. The characters may be any combination of

letters, digits, or the following special characters:

o $ % ' - _ @ ~ ` ! ( )

In a long file name, the directory and file name components consist of up to 48 characters. The

characters may be any combination of those defined for 8.3 names with the addition of the period (.)

character.

The following 7 characters : + , ; = [ ] are legal in a long name but illegal in 8.3 names. A space is

also a valid character in a long name, it always has been for 8.3 name also however it just does not get

used.

4. Data Structures Description

The layout of a long name directory entry appears as follows.

Last Long Name Directory Entry

. . .

2nd Long Name Directory Entry

1st Long Name Directory Entry

Existing 8.3 Directory Entry

Note that the number of long name directory entry will depend on the length of the long name.

4.1. Long Name Directory Entry Structure

LNDIRENT STRUCT

dir_lname1 DB 10 DUP (?) ; long name string

dir_sig DB ? ; signature byte

dir_attr DB ? ; file attributes

dir_flags DB ? ; flags byte (TBD)

dir_chksum DB ? ; checksum of 8.3 name

dir_lname2 DB 12 DUP (?) ; long name string

dir_first DW ? ; first cluster number, must be 0

dir_lname3 DB 4 DUP (?) ; long name string

LNDIRENT ENDS

5. Design Overview

TBD.

6. Exported Interfaces

The following APIs need to be provided to support long filenames.

Int 21 file attributes function

Int 21 file delete function

Int 21 file dir function (make, remove, change, get)

Int 21 file find function

Int 21 file open/create function

Int 21 file rename function

These APIs will be supported through a single Int 21h function. The original idea was to pass the

parameters in a structure. This has been changed to look just like the current Int 21h interfaces, with

the existing function number placed in AL. These functions are the same as their existing counterpart,

with the exception that they will accept and return long names as appropriate.

The Find APIs are an exception. The format of the search result buffer is changed to follow the

Win32 Find API result buffer. The Find APIs also return and uses a handle, the same as the Win32

APIs, to identify the search that is in progress.

This solution greatly simplifies the work required to support these functions in the system. The long

name functions that are passed on to existing file system can be easily mapped into the 8.3 APIs. The

translation of these parameters in to the protected mode file system interface already exists for the 8.3

APIs. With this interface the long-name APIs can be handled with the same mapping code.

Long Filename Specificationn—Page 17

Coexistence with FAT12, FAT16 & FAT32

Depending on the length of the long filename, the system will create a number of invalid 8.3 entries in the Directory Table, these are the LFN (Long Filename) entries. These LFN entries are stored with the with the last LFN entry topmost, and the first LFN entry just above a valid Directory Entry. So when looked upon from the top and down, the Directory Table looks something like this:

Directory Example
Entry Nr.	Without LFN Entries	With LFN Entries
...	...	...
n	Normal 1	Normal 1
n+1	Normal 2	LFN for Normal 2 - Part 3
n+2	Normal 3	LFN for Normal 2 - Part 2
n+3	Normal 4	LFN for Normal 2 - Part 1
n+4	Normal 5	Normal 2
n+5	Normal 6	Normal 3
...	...	...

7. Issues

File types will be defined as the last extension in the file name. How these are presented to the user

will be determined by the shell.

The "HACK". Since the manipulation of a new LFN file by an old application will often end up

deleting the LFN, we may need to consider implementing the following HACK. This HACK is

targeted specifically at Word-processors/Editors:

Many existing Word Processors implement file editing as follows: Make a copy of the file

under a TEMP name. Edit the TEMP name. When USER says SAVE, delete the original file,

and rename the temp file to the file name that was just deleted. It would be nice if this

standard Word Processor/Editor scenario didn't LOOSE the LFN. We will implement this as

follows. When a compatibility 8.3 DELETE or RENAME call is made on a file that has an

LFN, the DELETED or RENAMED LFN and 8.3 info is cached in a per task cache. If the

next INT 21h call that is made is a compatibility 8.3 rename of a file to the same 8.3 name as

is stored in the cache, the cached LFN info is added to the just renamed file and the cache is

invalidated. If any other INT 21h call is made (ie. not 8.3 rename) the cache is invalidated.

We will need to look carefully at the implementations in all of the major Windows and MS-DOS word

processors of this to make sure there is no intervening INT 21h call that is made between the DELETE

and the RENAME. We will gain HIGH user benefit if we can get this hack to work for most of the

Word Processors (WIN and PC WORD, WORDPERFECT, AMI-PRO).