Technical Notes - Win32, COM

1. Essential COM - by Don Box.
2. Inside ATL COM � by Shepherd.
3. Progrmming Apps. for Windows � by Jeffrey Rictor.

1. Essential COM - by Don Box

Chap1: COM as a better C++
Main aim is to build UDTs(User Defined Types) that could be reused.
Libraries:
Libraries were precursors to COM.  Initially distributed as �source codes�. Later as DLLs or LIBs.
Class __declspec(dllexport) MyClass { ..}  exports �MyClass� and its methods through a library(e.g. MyClassLib). All the methods of �MyClass� would be added to an exports list allowing runtime resolution of each method to its address. Additionally the linker provides an import library containing the symbols of the �MyClass� methods and the name of the dll where the symbols are present.
Client links against the import library. �Stubs� will be added to the executable binary that inform the loader at runtime to load the corresponding dll.
Why libraries(dlls) could not be used for code reuse perfectly across all compilers?

1. C++ lacks standardization at binary level.  Due to this, library built using one compiler will not be
    compatible with other binaries.


Name Mangling:
To allow function and operator overloading, every entry point name in the program is identified by a unique name determined by its name, no of arguments etc. Hence allowing multiple functions having same names possible. This is called �name mangling�.

2. Name mangling of compilers differ.
�name mangling� schemes of every compiler differs from vendor to vendor.
One way of overcoming this problem is using @alias names for overloaded names in the DEF (exports) file.

2. Certain language features are implemented in vendor specific manner and hence incompatible. E.g. Exceptions handling.

3. Versioning problem. Since C++ does not support encapsulation at binary level .

A closer solution towards COM is defining the binary interfaces as abstract base classes. To make the binary compatibility possible, following points are considered.
1. C-style structures can be made binary compatible.
2. Making argument passing uniform across all compilers is possible.
3. Make all compilers implement virtual function call mechanism equivalently(particularly for abstract base classes).

Vtbles and vptrs:
Compiler generates and uses the vtables and vptrs in the binary layout of the class.

- compiler generates silently a static array of function pointers for each class, that has virtual functions. This array is called �vtable�.
- Each instance(object) of a class contains an invisible member(put by compiler) called vptr.  This vptr is automatically initialized in the �constructor()�, to point the vtable of that object�s class.
  Compiler writes some hidden code in the constructor for this purpose.
- When a virtual function is called, the corresponding vptr will be referenced to reach the vtable. Then the �index� of the function in the vtable will be added as an offset which will give us
  the exact function pointer of the function to be invoked.

Hence we differentiate �interfaces� from �implementations� for binary compatibility.  Then next problem to solve is the need of another interface for creation and deletion of these objects.
For creating/deleting the objects we need to export a separate global method which is not part of the interface.  This has to be exposed through the library.

The interface developed using above assumptions is not extendable I.e. cannot be altered.
If we want to make a change in the interface we need to provide �MyInterface2� a second version that inherits �MyInterface�.
But this will have issues because RTTI is not uniform across compilers and may not help to identify the object correctly.
Then we need to �reference count� to make the object creation/deletion/reference synchronized.

Chap 2: Interfaces
A separate language for interfaces is needed to bring commonality across all platforms.
IDL ? Interface Definition Language. A C-like language that allows RPC calls to be described in a language-neutral manner.
Underlying communication between COM components is MS-RPC(MicroSoft).

 When we compile an IDL file (say dog.idl) using MIDL.exe we get five output files. Dog.h, dog_I.c, dog.tlb, dog_p.c and dlldata.c.  The header files will have abstract class definitions for the interfaces.
 They also have C-compatible structures defined for the (same)interfaces to allow C programs to use/implement the interfaces.
         

2. Inside ATL COM � by Shepherd.

Visual Basic with Custom Control Edition provides support for developing COM components.

COM interfaces are defined using IDL (Interface Definition Language).  If you use MIDL you get a .idl file. Compiling this file using MIDL compiler would give you a set of .c , .h files.

Interface will be defined as a �Iinterface� type of class (in the .h file). (e.g. ISpellChecker).

One or more such interfaces will be �implemented� by a set of concrete classes called �COM classes�. (e.g. class CSpellChecker : public ISpellChecker, public IPersistMFU)

�COM Class object� is a meta-class that creates and does life-time maintenance of a �COM class�. �COM Class object� is global and static (to the COM Server it exists).
�COM Class object� does this work by implementing �IClassFactory� interface.

�Class Objects� are singletons. This single instance takes responsibility of creating/maintaining �COM class� instances(COM objects).

Methods for a DLL COM Server:
DllRegisterServer, DllUnregisterServer, DllGetClassObject, and DllCanUnloadNow.

COM DLL maintains a self-reference count. When this reference count becomes zero, that means no COM objects from this server are being used. �DllCanUnloadNow� function checks this value before unloading a dll (calling FreeLibrary).
COM DLLs have �InprocServer32� registry entry that contains the path of the dll. SCM will look for this registry key and loads the dll into memory.

COM Exes have �LocalServer32� registry key. That contains the path of the exe. Calls sequence ? CoInitialize() ? CoRegisterclassObject() ? windows-msg-loop  ? at the end CoRevokeClassobject() ? CoUnInitialize().

Apartment : COM objects, Threads live in Apartments. If the COM object and the thread that created it lives in the same Apartment then it is called STA(Single Threaded Apartment).

         

3. Progrmming Apps. for Windows � by Jeffrey Rictor.


1.Error Handling
Common return types of win32 APIs.
Void, bool, handle, pvoid, long/dword.

When a win32 function detects an error, the 32-bit error code is stored in a TLS allocated for that thread. Hence each thread�s error code is maintained individually.

31-30 - severity, 29 - (0-Microsoft,1-customer);28-reserved; 27-16- facility code 15-0 - exception code.

@err, hr ? at the debug watch window shows the current thread�s last error code.

GetLastError, SetLastError, FormatMessage(gets the msg string from error code).

2.Unicode
DBCS ? Double byte character set.  1 to 2 bytes wide. A character may either 1 byte or 2 byte of length.
Unicode - found in 1998 by apple and Xerox. All chars are of 2 bytes length.
65000 characters are possible.
-enables easy data exchange between languages.
- single binary can support all languages.
Win2K is built on Unicode.  All internal win32 calls use/return Unicode. It also supports ANSI. If we pass an ANSI string as input to an win32 api, it is converted into Unicode internally and then used.

Win98 supports ANSI. writing an Unicode app is possible but difficult.

WinCE supports only Unicode.

COM uses only Unicode.

C-Runtime library provides features for Unicode strings. �string.h�. Unicode char is defined as
Typedef unsigned short wchar_t;

The library provides Unicode replacements for string functions such as strlen?wcslen?tcslen.

Use TCHAR facility to support dual support.  Tchar.h. Use _UNICODE ifdef statements to support Unicode and ansi versions in the same source code.

Like C-Runtime lib, the OS also provides string manipulation functions such as StrCat, StrChr, StrCmp and Stripy (all Unicode).

3.Kernel Objects
Objects created and handled by the kernel when we invoke certain win32 APIs.
Process, thread, semaphore, mutex, Iocompletionport, file, pipe, token, waitable timer, mailslot, event, job objects etc..
Kernel allocates a block of memory(a data structure) for the object. It can only be accessed by kernel.
We use them using their handles.  Handles are process-relative. But can be shared across process boundaries.
Kernel objects are protected by security descriptors. (a security descriptor parameter in their APIs would be present).

User objects or GDI objects are menus, windows, dialogs, cursors, brushes and fonts. These objects are not protected by security descriptors.

Handle table of a process:
- each process a handle table allocated by the system during its initialization.
- Its an array of data structures  each row fields are
 index1--PtrToObjMemory--AccessMask--Flags.
- whenever a kernel object is created in the process an entry of that objects is put in the table.
- the handle value returned is the indexno in the table.
- a global reference count is maintained by the system for each kernel object.  If the object is referenced by another process then its reference count is incremented.
- closehandle() fn decreases the reference count of the kernel object. When the count becomes 0 the object is deleted.
-  Handle leaks occurs if closeHandle() is not invoked properly. But when the process terminates the system cleans up the handle table properly.

Sharing kernel objects across process boundaries:
- FileMapping objects used to share data across processes.
- mailslots and pipes are used to send data blocks across process across network.
- mutex, semaphore and events allow thread synchronization within and across processes.

Three ways of sharing: 1.using bInheritHandle flag in the creation API and 2.using named objects and 3. Using DuplicateHandle().
Object handle inheritance:
- allowed between parent-child processes.
- parent process should indicate a handle is inheritable when creating the object. This will set a bit in the �Flags� field of the object in handle table.
- When the parent creates the child process (createProcess()), it has to set the bInheritHandle parameter to true.  Then during child process initialization the system, after creating the handle table, copies the �inheritable� handles (by checking the Flag field) to the same �index� location in the child process handle table, and increments that object�s usage count.
- If any new object is created in the parent process after the �creation� of the child process, that handle is not inherited (becos inheriting handle table happens at the initialization time of the child process).
- Child process cannot differentiate whether the handle entry it has is �inherited� from its parent or its own.
- Inherited handles are passed to the �child� normally through command line arguments.  The same handle value is �valid� in the �child� becos the handle table entry is copied to the same indexno in the child�s handle table.
Another way is to send through a PostMessage to a window in the child process.
Third way is through environment variables(in the parent process). Environment variables are also inherited to child and hence child process gets the handles.
- To alter a handle�s flags use �SetHandleInformation()�. To get �GetHandleInformation()�.

Named Objects:
Some of the kernel objects can be given a �name� during its creation. E.g �szName� parameter in CreateMutex().
This allows the object sharable between �any� two processes (not necessarily parent-child).
e.g. ProcessA ?creates mutex named �mutexhcm�.  ProcessB ? calls createmutex with the same name. The system finds the kernel object with this �name� already exists and instead of creating a new object it just refers to the existing object in ProcessB. It adds a new handle table entry in ProcessB (at an empty location); Increments the object�s usage count. These two handles for the object are different and only valid within their own process context.

ProcessB can call OpenMutex() instead of CreateMutex() to get the reference of the object.  CreateMutex() creates if the mutex does not exist already whereas OpenMutex() fails if the mutex does not exist already.

Terminal Server Name spaces:
Terminal server has separate name space for each �session� it handles.  The �kernel objects� have session-level scope. I.e. two sessions can have two kernel objects with the same name.
�Service� applications in a Terminal Server have �global� scope. We can create a named object with global scope by defining the name as �Global\\mutexhcm�.

DuplicateHandle:
- Normally involves three processes; source process(object created),target process(object�s handle shared) and the catalyst process(that calls the DuplicateHandle() function).
- can be used between two processes too.

Chap4.Processes
Process components: 1. A �process� kernel object,
2. An address space that contains the binary�s  and dll�s code and data. The address space also contains dynamically allotted memory such as thread stack and a heap.  Processes are inert.

To execute anything in the process it should have at least one thread called a main thread(primary thread).  More than one thread can exist in a process running simultaneously. Each thread has its own CPU registers and stack allocated in the process�s address space.

Types of Apps: GUIs(Graphical User Interface) and CUIs(Console User Interface).

Linker switch - /SUBSYSTEM:CONSOLE ? CUI apps.  /SUBSYSTEM:WINDOWS ? GUIs.

OS loader loads the app binary.  It looks at the binary�s header and gets this subsystem value. If it is �CONSOLE� then it creates a console window.  If it is WINDOWS it just loads the application.

WinMains:
- each app should have a windows entry point function. 4 entry-point fns are available.
1. WinMain ? for GUI , ANSI    2. wWinMain ? for GUI, Unicode.
3. main  ? for CLI, ANSI          4. wmain ? for CLI, Unicode.
Linker looks for the appropriate entry-point based on /SUBSYSTEM switch.  If our code has mismatching entry-point and switch then linker shows error.
If no /SUBSYSTEM switch mentioned then linker decides the appropriate entry-point automatically.

The OS does not directly calls these �winmain� entry-points.  But it calls a C/C++ runtime library runtime startup fn first that initializes the CRT library and then this startup fn invokes the �win mains� entry-point of our code.

The startup code of CRT lib is in �crt0.c�.

CRT startup fn�s job :
- gets a ptr to the process�s full cmdline.
- gets a ptr to the process�s environment variables.
- initialize c/c++ CRT global variables. (use stdlib.h to access them).
- initialize the CRT�s heap (malloc, calloc fns use this heap)
- call constructor of all global and static c++ class objects.
- finally calls our winmain entry-point fn in the code.

Once our winmain entry-point fn returns, the startup fn calls CRT �exit()�.

The exit() fn does the following:
1. Calls any _onexit() call back fns if registered.
2. Calls destructors for all global and static objects.
3. Calls OS�s ExitProcess() API to destroy the process.

Instance Handle:
Every executable/DLL loaded in to a process�s space will have an �instance handle�.
HInstExe ? the exe�s instance handle. ? value is the �base address� location where the binary is loaded into the process�s address space.
The linker decides where to load the binary in a process�s address space. VC++ linker uses a default base address 0x00400000 or above.

/BASE:address ? switch tells the loader to load the binary at mentioned �address� in the process

GetModuleHandle(szName):
Returns the base address(handle) of the �szName� module. If passed �NULL� parameter it gets the calling executable�s(not DLL�s) base address.

Process� Command line:
- When a new process is created it is passed a command line.
- cmdline has at least one �token�, the name of the app.
- when CRT invokes �winmain� it removes the �exename� from the process-cmdline and passes the rest to the �winmain� pszCmdLine parameter.

Process� Environment Variable:
- a block of memory allotted within the address space.
- of the form �VarName1=xxx\0  VarName2=yyyy\0� etc.  �=� sign cannot be used as a value.
- Win2K system gets �system� and (logged in)�user� environment variables from registry.
   HKLM\SYSTEM\CurrentControlSet\Control\ Session Manager\Environment - system�s.
   HKUSR\Environment - user�s.
  APIs:
  GetEnvironmentVariable(), ExpandEnvironmentStrings(), SetEnvironmentVariable() - to get, add, modify, delete environment variables.

Process� current drive, directory:
- maintained process-wide.
- If a thread changes �currdir� then it gets changed for all the threads in the process.
- fns: GetCurrentDirectory() and SetCurrentDirectory()
- the currdir is not maintained for each and every drive.
- Through environment variables we can set currdirs for each drive, as
   �=C:=C:\Utility\Bin� ? environment variable sets the �C� drive�s currdir to C:\Utility\Bin.
- to pass currdirs of individual dirs to the child-process we can use the above environment variable setting method to set the currdirs for drives in the parent-process.  Then inherit the handles in child process which also inherit�s the environment variables.

VersionInfo:
- GetVersion(), GetVersionEx() - uses OSVERSIONINFOEX structure, VerifyVersionInfo() - to verify the app is running on the needed OS version.

CreateProcess():
- when a thread calls this fn, system creates a �process� kernel object.
- then system creates a virtual address space for the process and loads the exe and dlls into the memory.
- then system creates a �thread� object. This �primary thread� starts up with the CRT startup function which in turn will invoke our �winmain()�.

An issue:
create process() returns �true� before loading of all DLLs. If any DLL load is failed then process exits.  Hence parent process could not know about initialization issues in child-process.
Following are parameters of �CreateProcess()� fn.

Param-PszCommandLIne:
Is a PTSTR. It should not be a �read-only� constant string (like TEXT(�NOTEPAD�)). If it is read-only the fn will throw exception. The reason is �CreateProcess� changes the cmdline string internally and when it returns it restores the original cmdline string.
Problem is only with Unicode version of the function. To fix this prob assign it to an array(non-const).

Param-pszApplicationName:
If it is NULL, then the first token in �pszcommandline� (is the name of the app) is assumed to passed in here.  System looks for the app and tries to load it.

CloseHandle() does not closes the concerned process or thread.  It just closes the handle and decrements the usage count by 1.  If the usage count reaches zero the process automatically will be closed.

Parent-child relationship exists between process at the time when the child is spawned.  After child got initialized then there is no relationship between these two processes.

Terminating a process:
Use ExitProcess(), TerminateProcess() in the main thread. But this will not guarantee CRT cleanup that is done by the startup fn when the winMain returns.

When synchronization between two different processes needed use WaitForSingleObject, WaitForMultipleObjects() APIs that wait on one or more kernel objects.

When we create a child process which will start running on its own, we should close the handles process handle and thread handles of the child process in the parent process.

chap5.Jobs
Job ? Collection of processes considered and managed as a single entity.
Win2K provides a �job� kernel object.
- CreateJobObject() ? creates a job object.
- Create a process in suspended mode using CreateProcess().
- place the process in job object using AssignProcessToJobObject().
- resumethread() of the process.

Closing a �job� object handle does not terminate all its processes automatically.
If you closed a �job� handle then it is not possible open a new handle to it.

SetInformationJobObject() ? fn used to set restrictions on processes inside an job object. The restrictive parameters are PerProcessUserTimeLimit (user mode time per process), PerJobUser-TimeLimit, WorkingSetSize (of a process), ActiveProcessLimit (no of concurrent processes), Affinity (how many CPUs can run this job), Priority Class.  The structure used is
_JOBOBJECT_BASIC_LIMIT_INFORMATION

There are also extended information that can be set using this API using the structure
 _JOBOBJECT_EXTENDED_LIMIT_INFORMATION.  Using this we can restrict job�s processes to prevent. power operations on the system, reading/writing clipboard, changing system parameters, creating/switching desktops, accessing USER (GDI) objects present outside the job.

QueryInformationJobObject() ? a process can query the �job� it belongs to.

TerminateJobObject() ? terminates all processes inside a job (using terminate process()).

Job notifications:
To get notifications, create an IO Completion port, assign it to a job using SetInformationJobObject() API. Once attached, the IO completion port receives all the events occurring in the job. Use GetQueuedCompletionStatus() to get the events received at the IOCompleteionPort.

Chap6-Thread Basics:
Thread�s components : A thread object and a thread stack.
- threads execute within a process� context. Lie within the process� address space. Share data of the process.
- thread is a path of execution within a process.

When creating a windows application, it is good to make a �single thread�(user interface thread) creating and handling all the windows in the app. This thread will also have GetMessage() loop.  Other threads are �worker threads� and won�t create a window.

CreateThread() ? creates a new thread. A thread fn has to be indicated in this call which will be the entry-point fn of the thread.  When the thread fn returns the thread exits.

- system allocates memory from the process� address space.
- thread has access to process� handle table, global data and all other threads� stack and data.

In a c/c++ app, prefer _beginthreadEx() rather than CreateThread() to create a thread.
- Default thread stack size allocated is 1 MB.
- same thread fn could be used as entry-point fn for multiple threads.

ExitThread() - cleans up the thread. But the C/c++ objects those were created would not be destroyed.

TerminateThread() - is asynchronous. I.e. it tells the system that the thread needs to be terminated. It may take some time before the thread terminates depending upon the thread status.
The thread stack is not cleaned up until the process itself terminates.
TerminateThread() will not notify the DLLs in the process about termination, so cleanup would be messy.
The thread owns windows and hook objects which will be freed when terminating the thread.
The state of the thread kernel object becomes signaled and usage count--.

Thread creation internals:


VOID BaseThreadStart(PTHREAD_START_ROUTINE pfnStartAddr, PVOID pvParam) {
   _ _try {
      ExitThread((pfnStartAddr)(pvParam));
   }
   _ _except(UnhandledExceptionFilter(GetExceptionInformation())) {
      ExitProcess(GetExceptionCode());
   }
   // NOTE: We never get here.
}

- create thread() creates a thread kernel object, then allocates stack in the process� address space.
- stacks always grow from high memory ? low memory.
- pvParam, pfnStartAddr (thread fn) values of the create thread() are written at the top of stack.
- CONTEXT, consists of CPU registers(values) and other properties, is a part of thread kernel obj.
- usage count ? kernel obj usage count.
- BaseThreadStart() ? fn that embeds the �threadfn� for the following additional support:
  SEH (Structured Exception Handling), system calls the �threadfn�, on returning the �threadfn� BaseThreadStart() calls ExitThread() fn.
- For a primary thread the kernel32.dll function invoked is �BaseProcessStart� which does the similar work of �BaseThreadStart�. Only difference is it does not have pvParam.

C/C++ CRT Libs:
CRT Libs ? LibC.lib, LibCD.lib, LibCMt.lib, MSVCRt.lib, MSVCRtD.lib.
CRT variables: _errno, _doserrno, _wcstok, _strerror, tmpfile, asctime, _ecvt, _fcvt .. Etc.
- CRT variables are not thread-safe.

To make thread-safe usage of library we should call _beginthreadex() rather than CreateThread().

CRT uses a �tiddata� structure to hold thread-specific data including the CRT variable values.

_beginthreadex() does the following:
 - each thread gets its own �tiddata� structure allocated in CRT lib�s heap.
 - pvParam and ThreadFnAddr are stored inside this �tiddata� .
 - calls CreateThread() internally with �tiddata� as param and _threadstartex() as threadfn.

_threadstartex() does the following:
- BaseThreadStart calls _threadstartex() with �tiddata� as param.
- uses TlsSetValue() to use TLS(Thread Local Storage) to store �tiddata�.
- an SEH(Structure Exception Handling) frame is embedded around the thread fn.
- at the end calls _endthreadex().

_endthreadex() does the following:
- gets the �tiddata� TLS value of the thread.
- invokes ExitThread().

If we called CreateThread() inside a c/c++ app the system allocates a �tiddata� structure. But when exiting the thread this structure is not cleaned up becos _unthreaded() was not called.
�tiddata� structure cleanup is done by the CRT-LIB DLL cleanup routine during thread/process exit sequence. So memory is not leaked.

GetCurrentThread(), GetCurrentProcess() returns pseudo-handles. They don�t increment the object�s usage-count and have only local scope. CloseHandle() fn need not be called on these handles.
To get a real handle from the pseudo-handles use �DuplicateHandle()� fn.

Chap7. Thread Scheduling, Priorities and Affinities
- Thread scheduling is done by the system scheduler.  It loads the CONTEXT structure of a thread into the processor and executes the thread for some time (nano secs) then writes the current register values back to the CONTEXT structure and loads the next thread.  This is called context switching.

SuspendThread() ? should be called carefully. If the thread was doing some �locking� operations such as allocation space in heap, then heap is locked and could not be used by any other threads until this(suspended) thread releases the lock.

Sleep(dwmillisecs) ? Just relishes the current thread not to be scheduled for �dwmillisecs�.

SwitchToThread() ? allows any lower-priority thread which is being starved of CPU time to execute in the current thread�s current scheduled time quantum.

GetThreadTimes(), GetProcessTimes() ? gets the time consumption statistics of the thread and process respectively.

To calculate them more granular level use
QueryPerformanceFrequency and QueryPerformanceCounter.

GetThreadContext(), SetThreadContext() - to get/set a thread object�s context structure. This gets the user-mode context only (not the kernel-mode context). The thread has to be in suspended state before we do these operations.

Threads priority number could be 0(the lowest) ? 31 (highest)
- zero page thread is the only thread running with priority 0. This thread does zeroing of any free pages of RAM.

Starvation occurs when a 31(highest) priority thread exists that does not allow any of lower priority threads to get scheduled. This possibility is less in multi-processor machines.

Higher priority thread always preempts lower priority thread (even if the lower priority thread is in the middle of its time slice).

The a thread calls GetMessage(), and system finds no message for this thread then this thread remains suspended(not scheduled) until it gets a message to process.  If it gets a message then system put this thread for scheduling.

Since, the system exposes an abstract view about the scheduler there are few common priorities defined. Idle, below normal, normal, above normal, high and real-time.

Set/GetProcessPriorityBoost(), Set/GetThreadPriority()

Scheduler gives a little bit more of time-slice to a �foreground process�( process that handles a window). This is to immediately respond to the user interactions. �Background process� is not tweaked for time-slice. This tweaking of scheduler is configurable in control panel.

NUMA architecture (Non-Uniform Memory Access) : A system has more than one boards. Each board has four CPUs and each board has a bank of (shared)memory cache for their CPUs.

Soft Affinity:
System tries to run a thread on the same processor it ran last (and processor cache memory improves performance).

Hard Affinity:
When multiple processors present, threads of a process can be pointed towards a same memory board (of CPUs) every time they are scheduled.

Use Get/SetProcessAffinityMask(), Get/SetThreadAffinityMask().

Chap8. Thread synchronization in User-mode
Atomic access: using InterLockedXXX() functions.
InterlockedEchangeAdd(plAddend, lIncrement) ? to increment/decrement a global variable atomically (thread safe).

How they work:
Processor dependent implementation.
In x86, Interlocked fns assert a hardware bus signal prevents another cpu accessing the same address.
In Alpha systems, a special bit flag in the cpu flags register is set and the memory is noted (and same bit flag for the same memory in any other cpu is reset)? cpu reads from the memory ? modifies the value ? again checks if the flag is reset by other cpu, if not then writes back the value into the memory.

InterlockedExchangePointer(pTargetaddr, Value) ? to replace the value pointed by the address in the first param by the value passed in the second param.

Spin lock: synchronized access using a global bool variable, whose state is modified using InterlockedExchange().
  Bool g_flock State = FALSE; //global state.
  Void Func1{ While (InterlockExchange(&g_flock, TRUE) == TRUE) sleep(0); //spin-lock.
         //Access the shared resource
      InterlockExchange(&g_flock, FALSE);
The while-loop spins repeatedly until it gets access.
Disadvantage: wasting CPU time.
InterlockedIncrement/Decrement.() ? older fns.

Cache Lines:
CPU cache lines act as buffers for CPU. CPU reads data from RAM into its cache lines and uses them. For a single read CPU reads a full cache line size (32 bit or 64 bit).
Hence data structures are better to be aligned as 32/64-bit boundaries to increase performance.

Cache lines between multiple CPU face synchronization issues if the same data is processed by multiple CPUs.
To avoid this and to improve performance,
? data should be chunked in terms of cache-line sizes.
? read-only and read/writeable data should be well separated by at least one cache lines boundary size.
? make sure a data is always accessed by a single thread.
? make sure the data is always access by a single CPU (thread affinity).

Volatile ? tells the system that the variable can be modified by concurrent threads. Each time in use it will always be reloaded from its memory location.

Critical Sections:
Is a structure(CRITICAL_SECTION) that protects a section of code that requires exclusive access by only one thread at any point of time.

EnterCriticalSection(&critsectstruct);
/*piece of code to protect*/;
LeaveCriticalSection(&critsectstruct);

- synchronization between threads of the same process only (not for diff processes). There is way to make critical sections used by multiple process.
- critical section structure can be declared as local, global or in heap. All threads that use this structure must know the address of this object somehow.

InitializeCriticalSection(&CSObj); //every thread should invoke this call.
EnterCriticalSection(&CSObj); ? checks the structure�s member variables to find out that any other thread has entered already and using it.
If not used, then this thread changes the member variables and enters to access the shared resource.
If another thread entered already then current thread just waits on the critical section to get freed (by LeaveCriticalSection(&CSObj) called by the other thread).
No CPU time wasted since the waiting thread is put to wait state and hence not scheduled.

The �timeout� period for critical sections waiting threads is mentioned in registry at
HKLM\System\CurrentControlset\Control\Session Manager   �CriticalSectionTimeOut� key value.
Default value is 30 days.

TryCriticalSection() ? just checks whether possible to access the shared code.  If not it goes out calling LeaveCriticalSection() and can do some thing else.  This could be used wherever the thread need not wait on EnterCriticalSection().

InitializeCriticalSectionAndSpinCount(&Csobj, spincount) ? If this fn called, then EnterCriticalSection tries �spincount� times (spin locking) before it returns.
For single CPU systems �spincount� value is ignored (assumed as 0).
Using spin locks with critical sections is better.

Critical section internally creates and uses an �event kernel object� when more than one thread contends for the resource. Event object created only when another thread comes to contend.

- use separate CS for different data (even though they present in the same class or accessed in a same function).
- when using multiple Critical Sections in a same function the order of accessing them should be same in all functions.

Chap9. Thread synchronization with Kernel Objects(Kernel Mode)
- when we switch a process from user-mode to kernel-mode 1000 CPU cycles are needed, costly.
- process/thread objects can also be used for synchronization. A process is created in non-signaled state; when it terminates its state becomes signaled.

Wait fns: WaitForSingleObject(), WaitForMultipleObjects().  Waits for one or more kernel objects to become signaled.  Waiting for multiple objects gives an option to wait for either any one or all of the objects to get signaled.

Side effects: Auto-reset event object will have a side effect when its state becomes signaled.  When the event object becomes signaled the wait fn returns but as a side effect the event is reset to non-signaled again.
         



Back to Home

Hosted by www.Geocities.ws

1