Component Object Model (COM) basics
COM is tricky. It's a technology that intimidates everyone at first sight. Fortunately you don't need to know much about COM to start exploring the shell. Trust me, when I first started programming 2xExplorer back in late 1998 I didn't have the foggiest idea about the whole thing (which should help explain why it was a whole year before drag/drop etc was supported by the program).
Viewed simplistically, COM is a set of objects that do "stuff" for you. From such a "client" perspective it's not too hard to use COM. All you need to do is call CoInitialize at the beginning of the program to boot the COM subsystem, and if you should be so kind, remember to cleanup the infrastructure by calling CoUninitialize just before you quit.
Inbetween, you get access to various objects through so called "interfaces", which are pointers that expose the functionality that you are after. After you are done with an "object" you need to Release it, returning the resources to the system. Finally, if you indirectly obtain some memory allocated during your interaction with objects, it is your responsibility to Free it.
Surely, that's not too hard, is it? Being a mere client, you may remain oblivious about the complicated trickery that the COM subsystem has to come up with so that you can use the various objects. All that remains is to know what objects exist and what sort of things can they do for you. Hey, that's what visual basic programmers do all the time, need I say more? <g>
Topics: Object basics | The SuperUnknown | Shell Objects
This is the object oriented era. COM objects are abstractions for chunks of self-contained code that expose some functionality. For instance, there are objects that know how to read the contents of any folder, objects that can extract thumbnail images of image files, and so on. Each object has a unique identifier called a CLSID, which presumably stands for "class identifier". Technically speaking these are 128-bit numbers that look like {00BB2763-6A77-11D0-A535-00C04FD7D062}. Thankfully, for commonly used objects there are descriptive names that you can use instead, like CLSID_AutoComplete, which arguably is much friendlier to use. You may also hear of terms like GUID, IID etc, but they all refer to the same 128-bit unique identifier principle.
Most COM objects are registered in your system. Anybody who has hacked about with regEdit.exe, the registry editor, will have undoubtedly come across a key called HKEY_CLASSES_ROOT\CLSID, which contains about a million subkeys or so, with funny names like the {00BB2763-6A77-11D0-A535-00C04FD7D062} one mentioned above. Well, now you know what all these items stand for: they are COM objects registered by the applications installed on your computer. More often than not, there is some DLL (dynamic link library) that implements a COM object. Each DLL may actually contain more than one object.
Once you know about an object, the next thing you need is to literally take advantage of it. Objects offer access to their functionality through the so called "interfaces". You may think of interfaces as groups of methods, or function calls, that perform a certain task. Each object exports at least one interface, and usually more than that. For example, shell folder objects export a number of interfaces; one is IShellFolder, which offers methods to enumerate contents, get file names and attributes etc; another is IContextMenu, which can show a context menu for items contained in a folder; and so on. Interfaces have unique identifiers, too, like COM objects, only here they are called IIDs (interface identifiers). However, down deep they are the same 128-bit numbers like CLSIDs.
Let us consider an example. Let's assume that there is this COM object that has the identifier CLSID_MyObject, which exports an interface called IMyInterface, whose identifier is IID_IMyInterface. This interface contains a method called MethodIReallyNeed(), which for the sake of simplicity takes no arguments. Here's how you could access this method:
CoInitialize(NULL); // absolutely essential: initialize the COM subsystem IMyInterface* pIFace; // create the object and obtain a pointer to the sought interface CoCreateInstance(CLSID_MyObject, NULL, CLSCTX_ALL, IID_IMyInterface, &pIFace); pIFace->MethodIReallyNeed(); // use the object pIFace->Release(); // free the object CoUninitialize(); // cleanup COM after you're done using its services |
The above code fragment demonstrates the basic steps of using COM objects. First of all you have to initialize COM using CoInitialize. Then you instantiate the object you're after and request the target interface, all in one stroke using the CoCreateInstance API. If successful, this will return a pointer to the requested interface, that will allow you to use the object. After all is said and done, it's time for cleaning up: you need to free up the interface you requested (and hence the object itself) by calling the Release method, which is supported by all COM objects. Finally, the COM subsystem itself is shut down using CoUninitialize.
If you are using MFC, it's easier to initialise COM using AfxOleInit early within your application's InitInstance. This internally energizes COM using OleInitialize, which must be used instead of CoCreateInstance for any applications that use the system clipboard or implement drag/drop. In such cases, COM is powered down using OleUninitialize, but MFC is doing this automatically so you don't have to worry.
Interfaces are instruction manuals for COM objects. They contain methods which advertise what an object can do. You can't do anything else with objects but call the methods they expose through their interfaces. Let's take a look at the definition of the most elementary COM interface called IUnknown:
interface IUnknown {
public:
virtual HRESULT QueryInterface(REFIID riid, void** ppvObject) = 0;
virtual ULONG AddRef(void) = 0;
virtual ULONG Release(void) = 0;
};
|
People familiar with C++ shouldn't have any problems understanding this definition. The keyword "interface" is an alias for struct, so there's nothing peculiar there. Note that all methods are public (else there would be no point in presenting them to the world) and pure virtual, making sure that objects that derive (in the usual C++ class inheritance scheme) from this interface will do the actual implementation for these methods. An interface is just the protocol; the object is free to use whatever means to achieve the result expected by some method.
The definition of IUnknown is characteristic in that it lacks member variables. Clearly, the real object that exposes IUnknown will have many member variables but it won't make them available to outsiders. All communication will have to be achieved just through interface function calls. For example, the QueryInterface method above "returns" a new interface pointer to the caller via the second argument ppvObject.
HRESULT
is a common return data type from most interface functions, describing the
success or failure status of the call. Although this is a basic 4-byte data type,
it is broken down into subfields and needs special macros to be interpreted.
Most of the time you'd be using SUCCEEDED or FAILED, which are
self-explanatory.
TIP: Developer Studio comes with a handy little
tool called Error Lookup which is available from the Tools menu. Except
from regular windows error codes it also understands HRESULTs, and will provide
a textual explanation for many COM errors, helping you figure out what went
wrong.
All COM objects have to expose IUnknown. This is the trademark of COM, and starting point for all operations. QueryInterface is the most useful method here, since it asks the object for other interface pointers it supports. We've already seen the use of Release that frees a COM object after you're done using it. You wouldn't be calling AddRef directly often.
IUnknown is also the base class for all other COM interfaces. Hence, if you have a pointer to any interface, you can use its QueryInterface, inherited from IUnknown, to request another interface supported by the same object. Similarly, you can use the inherited Release method for freeing up any interface. Once all the interface pointers obtained are released, the object itself can stop functioning, usually by self-destruction.
| ADVANCED: Object life-cycle |
|---|
| All COM objects are dynamically created by class factories. Each object manages its own lifetime using reference counting, which simply is the number of clients holding outstanding interface pointers. When this number reaches 0, the object performs a "delete this;" suicide act to free up the resources. When you first obtain an interface pointer, say via CoCreateInstance, the object's reference counter is 1. Subsequently each successful call to QueryInterface, extracting more interface pointers from the object, increases the counter by 1. When you Release an interface, this counter is reduced by one. The object is not deleted yet, since it still serves clients. It is only when the last interface pointer is released that the object can unload itself. Sometimes this reference counter can be manipulated, as e.g. manually calling AddRef to place a stranglehold to the object, ensuring that it remains alive — just don't forget to release this extra reference, too. |
Microsoft developers have written a lot of code for the windows shell. This is organised in a number of COM objects that can do almost everything you could imagine to manipulate the filesystem and its interface to the end users. On the downside, there are tons of objects to deal with, many exposing too many interfaces. All in all there's a steep learning curve here; you'd have to comprehend many items before you attempt to do even the simplest operations. But I can assure you that once you reach such a point of maturity, satisfaction is guaranteed.
There are several ways you could get hold of some shell COM object. Most frequently an interface pointer would be returned by some windows API call like SHGetDesktopFolder, which returns the IShellFolder object of the desktop. Once you have access to such an object, you may obtain other objects and interfaces through regular methods. For example, IShellFolder::GetUIObjectOf will give you access to a number of useful interfaces like IContextMenu, IDropTarget etc.
Rarely, you'd
instantiate objects directly through CoCreateInstance like in the sample
code above, as for example when dealing with the IShellLink interface,
that deals with creating and resolving links (shortcuts) to other shell objects.
The common issue in all shell operations is that once you have a pointer to some
interface, regardless of the exact route this was obtained, you need to call its
Release method once you're done using it, so that the COM object can be
freed up.
TIP: The GUIDs for all shell objects that you'd
need to access directly are already defined in header files, so instead of the
128-bit number you just need to know the equivalent constant identifier. More
often than not this is the interface name prepended by IID_; for instance
the GUID for IShellLink is the constant IID_IShellLink.
A shell object frequently used is the shell memory manager, IMalloc. The idea is that the shell should provide a way for objects to allocate and free memory in a language independent fashion. Whenever some shell object allocates some memory as a result of your calling some of its methods, it is your responsibility to free that memory. A common example is PIDLs, which are the shell equivalents of filesystem pathnames to files. Whenever you use a shell object's method that returns a PIDL, it is your responsibility to free up the allocated memory, using shell's allocator object. You can obtain a pointer to this object by calling the SHGetMalloc API. The method you'll be using most frequently is IMalloc::Free, to release memory returned by some shell interface. Note that the shell memory manager is a COM object itself, so after you're done using it you need to Release it.
| ADVANCED |
|---|
| All shell objects are supplied by in-process DLL servers. This is more efficient since they execute in the same address space as the main application that utilises them. Each object runs in it's individual COM appartment, hence there are usually no problems for thread synchronisation; the COM subsystem ensures that only one client accesses these objects at each time. |
| Component Object Model (COM), DCOM, and Related Capabilities |
|
|
|
StatusAdvanced NoteWe recommend Object Request Broker, Remote Procedure Call, and Component-Based Software Development/COTS Integration, as prerequisite readings for this technology description.
Purpose and OriginCOM [COM 95] refers to both a specification and implementation developed by Microsoft Corporation which provides a framework for integrating components. This framework supports interoperability and reusability of distributed objects by allowing developers to build systems by assembling reusable components from different vendors which communicate via COM. By applying COM to build systems of preexisting components, developers hope to reap benefits of maintainability and adaptability. COM defines an application programming interface (API) to allow for the creation of components for use in integrating custom applications or to allow diverse components to interact. However, in order to interact, components must adhere to a binary structure specified by Microsoft. As long as components adhere to this binary structure, components written in different languages can interoperate. Distributed COM [DCOM 97] is an extension to COM that allows network-based component interaction. While COM processes can run on the same machine but in different address spaces, the DCOM extension allows processes to be spread across a network. With DCOM, components operating on a variety of platforms can interact, as long as DCOM is available within the environment. It is best to consider COM and DCOM as a single technology that provides a range of services for component interaction, from services promoting component integration on a single platform, to component interaction across heterogeneous networks. In fact, COM and its DCOM extensions are merged into a single runtime. This single runtime provides both local and remote access. While COM and DCOM represent "low-level" technology that allows components to interact, OLE [Brockschmidt 95], ActiveX [Active 97] and MTS [Harmon 99] represent higher-level application services that are built on top of COM and DCOM. OLE builds on COM to provide services such as object "linking" and "embedding" that are used in the creation of compound documents (documents generated from multiple tool sources). ActiveX extends the basic capabilities to allow components to be embedded in Web sites. MTS expands COM capabilities with enterprise services such as transaction and security to allow Enterprise Information Systems (EIS) to be built using COM components. COM+ is the evolution of COM. COM+ integrates MTS services and message queuing into COM, and makes COM programming easier through a closer integration with Microsoft languages as Visual Basic, Visual C++, and J++. COM+ will not only add MTS-like quality of service into every COM+ object, but it will hide some of the complexities in COM coding. The distinctions among various Microsoft technologies and products are sometimes blurred. Thus, one might read about "OLE technologies" which encompass COM, or "Active Platform" as a full web solution. In this technology description, we focus on the underlying technology represented by COM, DCOM, and COM+.
Technical DetailCOM is a binary compatibility specification and associated implementation that allows clients to invoke services provided by COM-compliant components (COM objects). As shown in Figure 5, services implemented by COM objects are exposed through a set of interfaces that represent the only point of contact between clients and the object.
Figure 5: Client Using COM Object Through an Interface Pointer [COM 95] COM defines a binary structure for the interface between the client and the object. This binary structure provides the basis for interoperability between software components written in arbitrary languages. As long as a compiler can reduce language structures down to this binary representation, the implementation language for clients and COM objects does not matter - the point of contact is the run-time binary representation. Thus, COM objects and clients can be coded in any language that supports Microsoft's COM binary structure. A COM object can support any number of interfaces. An interface provides a grouped collection of related methods. For example, Figure 6 depicts a COM object that emulates a clock. IClock, IAlarm and ITimer are the interfaces of the clock object. The IClock interface can provide the appropriate methods (not shown) to allow setting and reading the current time. The IAlarm and ITimer interfaces can supply alarm and stopwatch methods.
Figure 6: Clock COM object COM objects and interfaces are specified using Microsoft Interface Definition Language (IDL), an extension of the DCE Interface Definition Language standard (see Distributed Computing Environment). To avoid name collisions, each object and interface must have a unique identifier. Interfaces are considered logically immutable. Once an interface is defined, it should not be changed-new methods should not be added and existing methods should not be modified. This restriction on the interfaces is not enforced, but it is a rule that component developers should follow. Adhering to this restriction removes the potential for version incompatibility-if an interface never changes, then clients depending on the interface can rely on a consistent set of services. If new functionality has to be added to a component, it can be exposed through a different interface. For our clock example, we can design an enhanced clock COM object supporting the IClock2 interface that inherits from IClock. IClock2 may expose new functionality. Every COM object runs inside of a server. A single server can support multiple COM objects. As shown in Figure 7, there are three ways in which a client can access COM objects provided by a server:
Figure 7: Three Methods for Accessing COM Objects [COM 95] If the client and server are in the same process, the sharing of data between the two is simple. However, when the server process is separate from the client process, as in a local server or remote server, COM must format and bundle the data in order to share it. This process of preparing the data is called marshalling. Marshalling is accomplished through a "proxy" object and a "stub" object that handle the cross-process communication details for any particular interface (depicted in Figure 8). COM creates the "stub" in the object's server process and has the stub manage the real interface pointer. COM then creates the "proxy" in the client's process, and connects it to the stub. The proxy then supplies the interface pointer to the client. The client calls the interfaces of the server through the proxy, which marshals the parameters and passes them to the server stub. The stub unmarshals the parameters and makes the actual call inside the server object. When the call completes, the stub marshals return values and passes them to the proxy, which in turn returns them to the client. The same proxy/stub mechanism is used when the client and server are on different machines. However, the internal implementation of marshalling and unmarshalling differs depending on whether the client and server operate on the same machine (COM) or on different machines (DCOM). Given an IDL file, the Microsoft IDL compiler can create default proxy and stub code that performs all necessary marshalling and unmarshalling.
Figure 8: Cross-process communication in COM [COM 95] All COM objects are registered with a component database. As shown in Figure 9, when a client wishes to create and use a COM object:
An important aspect in COM is that objects have no identity, i.e. a client can ask for a COM object of some type, but not for a particular object. Every time that COM is asked for a COM object, a new instance is returned. The main advantage of this policy is that COM implementations can pool COM objects and return these pooled objects to requesting clients. Whenever a client has finished using an object the instance is returned to the pool. However, there are mechanisms to simulate identity in COM such as monikers (reviewed later).
Figure 9: Creating a COM object pointer [COM 95] COM includes interfaces and API functions that expose operating system services, as well as other mechanisms necessary for a distributed environment (naming, events, etc.). These are sometimes referred to as COM technologies (or services), and are shown in Table 3. Table 3: COM Technologies
COM has enjoyed great industrial support with thousands of ISVs developing COM components and applications. However, COM suffers from some weaknesses that have been recognized by Microsoft and addressed in Component Object Model+, which is the ongoing upgrade of COM.
Both issues were partially mitigated by add-ons of COM, complexity by integrated development environments and robustness by MTS. However, to further address those problems, the company is working to turn COM+ and the MTS (Microsoft Transaction Server) into one programming model that will simplifying the lives of developers building distributed, enterprise-wide COM applications. COM+ integrates seamlessly with all COM-aware languages (basically Microsoft languages). Users write components in their favorite language. The tool chosen and the COM+ runtime take care of turning these classes into COM components [Kirtland 97].
Usage ConsiderationsA number of issues must be evaluated when considering COM, DCOM, and COM+. They include
MaturityCOM has its roots in OLE version 1, which was created in 1991 and was a proprietary document integration and management framework for the Microsoft Office suite. Microsoft later realized that document integration is just a special case of component integration. OLE version 2, released in 1995 was a major enhancement over its predecessor. The foundation of OLE version 2, now called COM, provided a general-purpose mechanism for component integration on Windows platforms [Brockschmidt 95]. While this early version of COM included some notions of distributed components, more complete support for distribution became available with the DCOM specifications and implementations for Windows95 and Windows NT released in 1996. Beta versions of DCOM for Mac, Solaris and other operating systems followed shortly after. There are many PC-based applications that take advantage of COM and DCOM technology. The basic approach has proven sound, and as previously mentioned, a large component industry has sprung up to take advantage of opportunities created by the Microsoft technology. On the other hand, DCOM has just arrived on non-Windows platforms, and there is little experience with it. DCOM for non-Windows platforms is mainly used to communicate COM based programs with legacy applications in Mainframes and Unix workstations. COM+ is much younger than COM, it was announced in Sept. 23, 1997 and shipped with windows 2000 (a.k.a. Windows NT 5.0). COM+ can be considered the next release of COM. We are unaware of any large-scale distributed applications relying on COM+ support. The computing paradigm for distributed applications is in flux, due to the relative immaturity of the technology and recent advances in web-based computing. The Web-centered computing industry has begun to align itself into two technology camps-with one camp centered around Microsoft's COM/DCOM/COM+, Internet Explorer, and ActiveX capabilities, and the other camp championing Netscape, CORBA, and Java/J2EE solutions. Both sides argue vociferously about the relative merits of their approach, but at this time there is no clear technology winner. Fortunately, both camps are working on mechanisms to support interplay between the technology bases. Thus, a COM/DCOM to CORBA mapping is supported by CORBA vendors [Foody 96], and Microsoft has incorporated Java into an Internet strategy. However, work on interconnection between the competing approaches is not complete, and each camp would shed few tears if the other side folded.
Costs and LimitationsLow cost development tools from Microsoft (such as Visual C++ or Visual Basic), as well as tools from other vendors provide the ability to build and access COM components for Windows platforms. Construction of clients and servers is straightforward on these platforms. In addition, the initial purchase price for COM and DCOM is low on Windows platforms. For other platforms the prices are considerably more expensive. DCOM for mainframes, for example, costs around two hundred thousand dollars by December 1999. Beyond basic costs to procure the technology, any serious software development using COM/DCOM/COM+ requires substantial programmer expertise-the complexities of building distributed applications are not eliminated. It would be a serious mistake to assume that the advent of distributed object technologies like COM/DCOM/COM+ reduces the need for expertise in areas like distributed systems design, multi-threaded applications, and networking. However, Microsoft has a strong support organization to assist individuals developing COM/DCOM clients and objects: many sample components, books and guides on the subject of COM/DCOM development are available. Unfortunately, information on COM+ is limited at this time.
DependenciesDependencies include Remote Procedure Call and Distributed Computing Environment.
AlternativesCOM/DCOM/COM+ represents one of a number of alternate technologies that support distributed computing. Some technologies, such as remote procedure call, offer "low level" distribution support. Other technologies, such as message oriented middleware and transaction processing monitors, offer distribution support paradigms outside the realm of objects. The Common Object Request Broker Architecture (CORBA) and Java 2 Enterprise Edition (J2EE) can be considered direct competitors to COM/DCOM. Information about technologies supporting distributed computing is available in the following places:
Complementary TechnologiesOne commonly hears of COM and DCOM in conjunction with OLE, ActiveX, MTS and COM+. Indeed, these and other technologies constitute Microsoft's distributed and web-oriented strategy. This strategy is globally referred as Distributed interNet Architecture(tm) (DNA) and it comprises a full set of products and specifications to implement net-centric applications. Technologies championed by other vendors can also be used in conjunction with COM. For example, COM objects can be created and manipulated from Java code. Tools are provided to create Java classes from COM type library information-these classes can be included in Java code. Using Internet Explorer, Java programs can also expose functionality as COM services. In general, Microsoft's approach for Java support involves tying it very closely to its existing Internet strategy (Internet Explorer, COM/DCOM, ActiveX); i.e., to provide a mechanism for interfacing to the wide range of components that already adhere to Microsoft's strategy and specifications. COM+ is a good candidate to implement the middle layer of multitier architectures. The distribution support and quality of service provided by COM+ can help to overcome some of the complexities involved in these architectures.
Index CategoriesThis technology is classified under the following categories. Select a category for a list of related topics.
References and Information Sources
Current Author/MaintainerSantiago Comella-Dorda, SEI External Reviewers
Modifications13 Mar 2001: Update with new developments of COM
|