.NET Framework For Java Programmers
Author: Ashish Banerjee
Objective
After reading this article Java programmers should be able to decipher and de-jargonize the .NET architecture and relate it with the proposed ECMA standard.Target Audience
Java programmers and system architects.Summary
This article outlines Microsoft's proposed standardization of .NET framework in ECMA forum as CLI (Common Language Infrastructure), but the Microsoft documentation refer this as CLR (Common Language Runtime). The CLR and JVM are compared with respect to market forces which shaped the CLR definition. Components of CLR are examined followed by details of Microsoft's implementation of the CLR as the .NET framework.All along .NET framework is compared with Java architecture.
The material is derived from author's own experience with Java since early 1996, Microsoft's MSDN site and standard documents from sites like ECMA and W3C.org.
Overview
.NET framework is the Microsoft's answer to Java commune's objections to \"Windonization\" of Java.Microsoft introduces a new language C#, designed by the Visual J++ team. But in the process it has done away with DCOM and also have changed it's flagship language Visual Basic.
In a nutshell, .NET constitutes presently of three compiled languages C#, VB.NET and C++, a Java like runtime virtual machine environment, five execution containers hosting this runtime, namely: ASP.NET, Windows Shell, VBA scripting host for Office suite, Visual Forms container and IE (Internet Explorer). Much like Java it contains a rich set of API and lib.
Enhancements over Java framework include use of SOAP (Simple Object Access Protocol) for remoting. Version and security scoping using concept of Application Assembly (described later). A Common Type System is introduced for making mixed language programming easier. For example a VB component can inherit from a C# class.
In longer term Java and .NET will converge and therefore an overview of the new framework is presented here from Java programmer's perspective.
Comparing CLR with JVM
The .NET framework's Common Language Runtime (CLR) is much similar to Java Virtual Machine (JVM), in terms of garbage collection, security, just in time compilation (JIT).However, the fundamental difference arises from the variance in perception of the Sun's Java design team headed by James Gosling and that of Microsoft's C# designers spear headed by Anders Hejlsberg
Sun viewed the Internet as an heterogeneous network consisting of multiple operating systems. Thus Sun had to design the GUI as the least common factor, supportable by all such platform. This was also the major reason of Java's failure in client side applications. Java has been successful only on server side where there is no great need for GUI.
Having failed at client side desktop application arena, Sun is now targeting Java to server side applications market, which is dominated by Unix and Linux flavors having approximately 60% of the server market, the rest 40% rests with Windows NT.
But this view was not conducive to Microsoft, which holds about 90% of client side desktop market. Microsoft wanted to provide a window centric Internet development platform. Thus it added a few Window specific features in it's Java implementation, similar to what it had had done with it's C++ implementation. This along with Microsoft's refusal to support Java RMI, which competed with it floundering remoting technology called DCOM, resulted a law suite. Microsoft lost the law suite in late 2000, and had to pay USD 20 million to Sun as settlement amount. This antagonist attitude made Microsoft break away from Java and float it's own language called C#.
The C# team was carved out of the Microsoft J++ team, and it's effort finally led to the creation of .NET framework.
Microsoft intends to leverage it's desktop leadership, to shape the Internet applications development by introducing the .NET framework. Thus the supported languages map the Windows GUI more closely in it's framework, much similar to C++ MFC and J++ WFC (Windows Foundation Classes). In spite of the platform independence design claims, all the three supported languages produce windows .exe code by default.
Microsoft played the standardization game better than Sun. Microsoft, though being an USA based company proposed the C# and Common Language Infrastructure (CLI), the back bone of .NET framework, for standardization with ECMA (European Computer Manufacturing Association) TC39 Technical Committee in October 2000. Ironically Sun also happens to be a member of this standing committee, which looks after computer languages related standardization issues. See http://www.ecma.ch.
Microsoft has also successfully standardized Simple Object Access protocol (SOAP) through W3C (http://www.w3c.org). SOAP is a XML and HTTP based remote object access protocol. SOAP competes with Java's RMI and Microsoft's own DCOM. RMI has the limitation of being language specific, and DCOM had limited acceptability outside the Windows community, this was, despite the best of Microsoft's effort to port DCOM on Unix platforms.
CORBA, another remoting contender, which even has internet specific transport namely IIOP, is more or less dead, due to it's vendor non interoperability.
SOAP, by virtue of HTTP transport can operate easily over firewalls and therefore can easily transident LAN and Internet. However, SOAP being XML based, burdens both client and server for XML parsing, which is relatively CPU intensive, compared to binary protocols like RMI and DCOM.
Java platform views the Internet world as one language running on different operating systems (OS), whereas .NET framework views the world running on one OS with a programmers having choice of multiple languages. Therefore Java platform interpolates multiple operating systems, and .NET framework interpolates multiple languages.
Apparently from the above discussion, the market forces are largely responsible for the state-of-the-art rather than technical design considerations.
Inside The Common Language Runtime
The Common Language Runtime (CLR), is the runtime environment of the .NET framework, which manages the execution of code and provides services.The Common Language Runtime (CLR), is also proposed for ECMA standard. However, The ECMA documents refer the CLR as Common Language Infrastructure (CLI). It has five components namely:
- CTS - Common Type System
- CLS - Common Language Specification
- CIL - Common Intermediate Language
- JIT - Just in Time Compiler
- VES - Virtual Execution System
CLI - Common Language Infrastructure
The Common Language Infrastructure (CLI) provides a language neutral platform for application development and deployment. CLI supports both Object Oriented Paradigm (OOP) as well as hooks for modeling procedural and structured languages.CLI provides languages with a framework for security, garbage collection, exception handling and also provides a platform for language interoperability. For example C# objects can inherit from C++ classes and VB procedures can use the C# components.
Please Note that the Microsoft documentations refer CLI as CLR (Common Language Runtime).
After reading through the ECMA standard documents, like me, you will probably develop the feeling that CLI is an attempt to standardize the next generation Java framework for accommodating the older pre Internet era languages like VB and C++.
The five components of the CLI is briefly described below.
CTS - Common Type System
The Common Type System, support both Object Oriented Programming like Java as well as Procedural languages like 'C'. It deals with two kinds of entities: Objects and Values. Values are the familiar atomic types like integers and chars. Objects are self defining entities containing both methods and variables.Objects and Values can be categorized into the following hierarchy:
Types can be of two kinds Value Types and Reference Types. Value Types can further categorized into built-in (for example Integer Types and Float Type) and user defined types like Enum.
Reference Type can be divided into three sub categories: Self Describing Reference Type, Pointers and Interfaces. Pointers can be sub divided into Function pointers, Managed and Unmanaged Types.
Value Types can be converted into Reference Type, and this conversion is called Boxing of Values. De-referencing the Boxed Value Types from the Referenced Type is called Un-Boxing.
Casting rules from one type to another, for example conversion of char to integer types are also defined within the Common Type System.
Common Type System also defines scope and assemblies. An assembly is a configured set of loadable code modules and other resources that together implement a unit of functionality. A scope is a collection of grouped names of different kinds of values or reference types.
CLS - Common Language Specification
The Common Language Specification (CLS) aids the development of mixed language programming. It defines a subset of Common Type System which all class library providers and language designers targeting CLR must adhere to.
CLS is a subset of CTS. If a component written in one language (say C#) is to be used from another language (say VB.NET), then the component writer must adhere to types and structures defined by CLS.
CIL - Common Intermediate Language
All compilers complying with CLI must generate an intermediate language representation called Common Intermediate Language (CIL). The CLI uses this intermediate language to either generate native code or use Just In Time (JIT) compilation to execute the intermediate code on the fly.
The Microsoft documents refer this standard's implementation as MSIL (Microsoft Intermediate Language).
JIT - Just in Time Compiler
The JIT or Just in Time Compiler is the part of the runtime execution environment, which is used to convert the intermediate language contained in the executable file, called assemblies, into native executable code.
The security policy settings are referred at this stage to decide if code being compiled needs to be type safe. If not an exception is thrown and JIT process is aborted.
VES - Virtual Execution System
Virtual Execution System (VES), is more or less equivalent to the JVM (Java Virtual Machine).
VES loads, links and runs the programs written for Common Language Infrastructure contained in Portable Executable (PE) files.
Virtual Execution System (VES) fulfills it's loader function by using information contained in the metadata and uses late binding (or linking) to integrate modules compiled separately, which may even be written in different languages.
VES also provides services during execution of the codes, that include automatic memory management, profiling and debugging support, security sandboxes, and interoperability with unmanaged code, such as COM components.
Managed codes are Intermediate Language (IL) code along with metadata contained in Portable Executable (PE) files, these may be .EXE or .DLL. This needs just in Time (JIT) compiler to convert it into native executable code. There is also a provision of pre compiled executable which is called unmanaged code. The advantage of unmanaged code is that is does not need to JIT compilation but has the disadvantage of unportablity across different Operating System (OS) platforms.
Microsoft's Implementation of CLI is CLR
The Microsoft's implementation and adaptation of the above standard has resulted in difference in terminology, for example Common Intermediate Language (CIL) is called Microsoft Intermediate Language (MSIL) and Common Language Infrastructure (CLI) is referred to as Common Language Runtime (CLR).
These changes in naming convention, I believe, is to create a branding distinction while implementing the standards. This was probably intended to avoid the clash that occurred with the Java the language standard, Java the island, Java the coffee brand and Java the Sun's trademark! But, in the long run, it will only lengthen the already long list of confusing acronyms and jargons in the programmer's dictionary.
We use CLI and CLR interchangeably, however, it will be more correct to say that CLR is the Microsoft's implementation of CLI.
Apart from scripted languages like JavaScript and VBScript, the .NET framework presently supports three compiled languages, namely: VB.NET, VC++ and C# (pronounced C Sharp) These language compilers target this runtime. The type verifiable compiler's output is called managed code.
Unsafe codes can also be generated by compilers, which is called unmanaged code. Garbage collection is only handled for managed codes.
The managed code has access to Common Language Runtime (CLR) features such as multi- language integration, exception handling across language boundaries, security and versioning and a simplified deployment .
An interesting facility being experimented by microsoft is the cross language inheritance. For example, a C# class can inherit from a VB object! Each of these features will be discussed in detail later.
The CLR provides services to the managed code. The language compilers emit metadata, that describes the types, members, and references in the code. Metadata is stored along with the code: every loadable common language runtime image contains metadata.
The metadata helps the CLR to locate and load classes, lay out instances in memory, resolve method invocations, generate native code, enforce security, and set up run time context boundaries.
The CLR, much like Java Virtual Machine (JVM) provides automatic garbage collection facilities to the managed code, this garbage collection feature is called managed data. But unlike Java VM, the CLR also has mechanism to syntactically switch off automatic garbage collection called unmanaged data, where the programmer is responsible for garbage collection.
The CLR has been designed to facilitate cross language integration. Two kind of integration is possible: tightly coupled and loosely coupled, which is also called remoting. The tightly coupled inter language method call is achieved within the CLR; this assumes that the two languages calling each other are both .NET framework compliant like VC++, VB.NET or C# or are at least COM compliant. Thus C# programs can talk to Java programs through ActiveX Java Bean bridge! This is assuming that both the C# and Java codes reside on a single computer.
Remoting or loosely coupled inter language interaction is suitable when the two interacting programs written in different languages are on different operating system (OS) platforms, like C# client residing on Windows CE talking to Solaris based server side Java code. This integration is achieved through an XML based protocol called Simple Object Access Protocol (SOAP) which was proposed by Microsoft and is adopted by W3C consortium (http://www.w3c.org). An open source SOAP gateway implementation of Java is available from Apache.org at http://xml.apache.org.
SOAP has transport layer independent, XML formatted content and currently HTTP and SMTP transport implementations are available from both Microsoft and Apache.org for .NET framework and Java platforms respectively .
All .NET framework components carry information about the components and resources they use, in a XML formatted document called metadata. The runtime, uses this information to dynamically link the components, ensuring version integrity and security controls; This makes the application theoretically more resilient against version changes. Only time will tell if this innovation is successfully implemented.
Another good feature introduced in this new framework is reduction of Windows system registry dependency. Registration information and state data are no longer stored in the system registry, but inside the metadata. This should make the server side component deployment much easier.
.NET framework's Common Language Runtime (CLR) claims to have the ability to compile once and run on any CPU and operating system that supports the runtime. We will see if this becomes a real possibility in near future.
Common Intermediate Language (CIL)
The .NET framework's implementation of Common Intermediate Language (CIL) is called Microsoft Intermediate Language (MSIL). Unless specified otherwise, we will use the terms Intermediate Language (IL), MSIL and CIL interchangeably.
Managed code is produced by one of the three compilers which translate the source code into Microsoft intermediate language (MSIL).
Common Intermediate Language (CIL) and therefore it's Microsoft rendering called Microsoft intermediate language (MSIL) is said to be a CPU independent set of instructions that can be efficiently converted to native code.
MSIL intermediate instruction set has instructions for loading, storing, initializing, object method calling , many conventional instructions for arithmetic and logical operations, control flow, direct memory access, and exception handling. All the three languages included in this framework have Java like \"try catch\" exception handling facility.
Just like Java, before the managed code is executed, the intermediate language is converted to CPU specific code by a just in time (JIT) compiler. The runtime supplies one or more JIT compilers for each computer architecture it supports. However, the code can be compiled into native form during installation itself.
When a Common Language Specification (CLS) compliant compiler produces Common Intermediate Language (CIL), it also produces metadata, describing the Common Language Types (CLT) specific types used in the code, including the definition of each type, the signatures of each type's members, the members that the code references, and other data that the runtime uses at execution time.
The MSIL and metadata are contained in a portable executable (PE) file which is an extension of the Microsoft Portable Executable (PE) and Unix world's Common Object File Format (COFF) used for executable content. They appear to the user as the familiar .EXE and .DLL files.
One of the fundamental differences between Java Virtual Machine (JVM) instruction sets and Common Intermediate Language (CIL) is that JVM is big endian ( most significant byte first) and CIL uses little endian ( least significant byte first) binary representation. This difference will not be apparent to most of the programmers. Only system level programmers would have to deal with it.
The file format, can accommodate either of Common Intermediate Language or native code as well as metadata, a signature pattern enables the operating system to recognize Common Language Runtime images.
The presence of metadata in the executable file enables the components to be self descriptive. This eliminates the need for additional type libraries or Interface Definition Language (IDL) used in DCOM and CORBA. The runtime locates and extracts the metadata from the file as necessary during execution.
Managed Execution
There are two kinds of codes that can exist inside the executable files now, the old machine dependent codes, like existing ActiveX controls, are called unmanaged
As mentioned earlier, there are currently three compiled languages C#, C++ and VB provided by Microsoft, which target the Common Language Runtime (CLR). This runtime is a multi-language execution environment, and supports a common base of data types and language features. however, the language compiler determines what subset of the runtime's functionality is available, and the design pattern of the code is influenced by the features exposed by the compiler.
The coding syntax is determined by the compiler, not by the runtime. If the component is required to be completely usable by components written in other languages, it must use only language features that are included in the Common Language Specification (CLS) in the component's exported types.
Application Domains
Application domains are light weight process. It can be visualized as an extension of Java's sandbox security and Thread model.
The Common Language Runtime provides a secure, lightweight unit of processing called an application domain. Application domains also enforce security policy.
By light weight it means that multiple application domains run in a single Win32 process, yet they provide a kind of fault isolation, that is fault in one application domain does not corrupt other application domains. This aids in enhancing execution security against viruses as well as helps in debugging faulty codes.
The Common Language Runtime relies on type safety and verifiability features of Common Type System (CTS) to provide fault isolation between application domains. Since type verification can be conducted statically before execution, it is cost efficient and needs less security support from microprocessor hardware.
Each application can have multiple application domains associated with it. And each application domain has a configuration file, containing security permissions. This configuration information is used by the Common Language Runtime to provide sandbox security similar to that of Java sandbox model.
Although multiple application domains can run within a process, no direct calls are allowed between methods of objects in different application domains. Instead, a proxy mechanism is used for code space isolation.
An assembly is the functional unit of sharing and reuse in the Common Language Runtime. It is the equivalent of JAR (Java Archive) files of Java.
Assembly is a collection of physical files package in a .CAB format or newly introduced .MSI file format. The assemblies contained in a .CAB or .MSI files are called static assemblies, they include .NET Framework types (interfaces and classes) as well as resources for the assembly (bitmaps, JPEG files, resource files, etc.). They also include metadata that eliminates the need of IDL file descriptors, which were required for describing COM components.
The Common Language Runtime also provide API's that script engines use to create dynamic assemblies when executing scripts. These assemblies are run directly and are never saved to disk.
Microsoft has greatly diminished the role of Windows Registry system with introduction of assemblies concept, which is an adaptation of Java's JAR deployment technology.
Assemblies is an adaptation, but not a copy of Java's JAR technology. It has been improved upon in some ways, for example it has introduced a versioning system. However, since the .NET framework is skewed towards the Windows architecture some of the Java's JAR portability features may have been sacrificed.
Again, similar to JAR files, the assemblies too contain an entity called manifest. However, manifest in .NET framework plays somewhat wider role. Manifest is a metadata describing the inter-relationship between the entities contained in the assemblies like managed code, images and multimedia resources. Manifest also specifies versioning information.
The manifest is basically a deployment descriptor, having XML syntax. Java programmers can relate it with J2EE (Java 2 Enterprise Edition) deployment descriptors for EjB (Enterprise Java Beans) applications.
The Microsoft documentation stress that assemblies are \"logical dlls\". This may be a reasonable paradigm for VB or C++ programmers, but Java programmers will find it easier, if we visualize assemblies as an extension of JAR concept. However, unlike JAR, each assembly can have only one entry point defined, which can be either DllMain, WinMain, or Main.
As stated earlier, Assemblies have a manifest metadata. This contains version and digitally signed information. This purports to implement version control and authentication of the software developer. Version and authentication procedure is carried out by the runtime during loading the assembly into the code execution area.
Again, much like Java's trusted lib. concept, .NET Assemblies can be placed in secured area called global assembly cache. This area is equivalent to trusted class path of Java. Only system administrators can install or deinstall Assemblies from the global assembly cache. There is a place for downloaded or transient Assemblies called downloaded assembly cache. The Assemblies loaded from global assembly cache run outside the sandbox and have faster load time as well as enjoy more freedom to access file system resources. The Assemblies loaded from the downloaded cache area are subject to more security checks, therefore are slower to load and since they run inside the sandbox; enjoy much less privileges.
Assemblies manifests also contain information regarding sharing of code by different Applications and Application Domains.
To summarize, the Operating System can have multiple applications running simultaneously, each such application occupies a separate Win32 process and can contain multiple Application Domains. An Application Domain can be constructed from multiple assemblies.
Execution
The Common Language Runtime provides the infrastructure that enables execution to take place as well as a variety of services that can be used during execution. Before a method can be executed, it must be compiled to processor specific code. Each method for which MSIL has been generated is JIT compiled when it is called for the first time, then executed. The next time the method is executed, the existing JIT compiled native code is executed. The process of JIT compiling and then executing the code is repeated until execution is complete.
As mentioned earlier, the recompilation can be avoided by compiling the code during installation into native executable code.
During execution, managed code receives services such as automatic memory management, security, interoperability with unmanaged code, cross language debugging support, and enhanced deployment and versioning support.
JIT Compilation
Before Intermediate Language (IL) can be executed, it must be converted by a .NET Framework Just In Time (JIT) compiler to native code, which is CPU specific code that runs on the same computer architecture that the JIT compiler is running on.
Microsoft's designers insist that the runtime never interprets any language, it always executes native code, only conversion to native form may be deferred. Even the scripting languages like VBScript are now compiled and executed!
The idea behind JIT compilation recognizes the fact that some code may never get called during execution; therefore, rather than using time and memory to convert all of the MSIL in a PE (portable executable) file to native code, it converts the Intermediate Language as it is needed during execution and store the resulting native code so that it is accessible for subsequent calls.
The loader creates and attaches a stub to each of the type's methods when the type is loaded; on the initial call to the method, the stub passes control to the JIT compiler, which converts the MSIL for that method into native code and modifies the stub to direct execution to the location of the native code. Subsequent calls of the JIT compiled method proceed directly to the native code that was previously generated, reducing the time it takes to JIT compile and execute the code.
The compilation process (JIT or during installation time) converts the Intermediate Language (IL) to native code. The code however, must pass a verification process. Verification examines the Intermediate Language (IL) and metadata to see whether the code is type safe, that is, it accesses only the authorized memory locations, Identities are what they claim to be and reference to a type is compatible with the type referenced. These features protects the application from bugs and viruses.
During the verification process, Intermediate Language (IL) code is examined in an attempt to confirm that the code can access memory locations and call methods only through properly defined types.
Due to design limitation of some programming languages, like 'C', it's compilers may not be able to produce verifiable type safe codes, such codes can only be executed from trusted area.
Runtime Hosts
The runtime is typically started and managed by environments like ASP.NET, IE or the Windows Shell. These hosting environments run managed code on behalf of the user and take advantage of the application isolation features provided by application domains. In fact it is the host that determines where the application domain boundaries lie and in what application domain user code is run in. The Common Language Runtime provides a set of classes and interfaces used by hosts to create and manage Application Domains.
There are five Common Language Runtime hosts:
ASP.NET - ASP.NET creates application domains to run user code. Application domains are created per application as defined by the web server.
Internet Explorer - IE creates an application domain per site.
Windows Shell EXE - Each application that is launched from the command line runs in a separate application domain.
VBA - VBA runs the script code contained in an Office document in an application domain.
Windows Forms Designer - The Windows Forms Designer places each form the user is building in a separate application domain. When the user edits the form and rebuilds, Windows Forms shuts down the old application domain, recompiles the code and runs it in a new application domain.
Conclusion
.NET is definitely an improvement over Java framework, but it is NOT going to displace Java any time soon. Though in coming years Java and .NET will converge.
It currently lacks support for other platforms. Since .NET has been architected by Microsoft, it is less likely to find the open source support base of free thinking programmers, which was one of the main reasons of Java's popularity.
Java has been there for more than five years now, and Java programmers have already survived two waves of downturn. First in 1998 when most web sites weeded out Applets and second in late 2000, when all the VC fueled DOTCOM hot balloons came down. Scott Adams' Dilbert strips at http://www.dilbert.com has a good fill of VC and DOTCOM cartoons.
All remaining employed Java programmers must have a good handle of .NET architecture to continue to remain employable.
The party is over for DOTCOM, so let's party with DOTNET !!!