Is Your Code Safe?
VS.NET's use of Microsoft Intermediate Language creates big advantages, but exposes VB.NET on the desktop.
by Dan Fergus
URL: http://www.vbpj.com/upload/free/features/vbpj/2001/05may01/df0501/df0501-1.asp
In the Visual Studio.NET (VS.NET) Framework, compilers such as VB, Visual C++, and C# compile their source programs into Microsoft Intermediate Language (MSIL), which is subsequently Just-In-Time (JIT)-compiled into native machine code before execution. But you might not know exactly what happens in VS.NET when you click on the Build button, or whether your proprietary code and information are safe from prying eyes and tampering when you ship IL code to your customer. I'll step through the inner workings of the .NET Framework to show you what's new and explain what concerns you should have with MSIL under VB.NET.
What you need:
VB.NET beta 1
You need to be clear about several points. First, .NET is designed for client/server and Web apps. Software is moving to the Internet and to client/server-based applications, so many applications now look more like browsers than the old-style interface. .NET follows this trend.
Also note that .NET isn't suitable for desktop applications if you care about protecting your intellectual property because you can't protect it in managed MSIL code on the desktop. This is a shocker to me. Although MSIL's premise is good and the .NET Framework and Common Language Runtime (CLR) are stable, they're just not feasible or workable in a standalone desktop application from a security standpoint. As a VB programmer, or even a C# programmer, you have no way to write anything but this unprotected managed code in .NET.
Because of this limitation, you must write unmanaged C++ code if you want to protect the code in your desktop app. The only sure way to protect intellectual property: Wrap the code into an unmanaged C++ component and use the COM interoperability interface to call it from .NET managed code. Unfortunately, this isn't a workable option for VB or C# programmers.
You also should know that Active Server Pages.NET (ASP.NET) applications are safe because they run entirely on the server. This is the nirvana of .NET—running code that's buried away on a protected server, far from the reach of any party interested in looking at your code. ASP.NET makes Web development incredibly easy, and Visual Basic.NET (VB.NET) is a good tool for writing ASP.NET apps.
VB.NET has a steep learning curve, and .NET as a whole will experience slow acceptance. Migrating from VB6 to VB.NET is not easy, and you'll need to support VB6 applications for some time before porting them. VB6 might now be the Microsoft stepchild, but many developers will use it for a long time to come.
MSIL is Old—But New
See what happens when you build a project in VB.NET: Create a sample project you can use as you generate code and assemblies. Open VS.NET, create a new Visual Basic project, add a Label control to the form, and change its Text property to "Good Bye Visual Basic 6.0" (see Figure 1). Instead of the standard Hello World app, you'll write a GoodByeVB6 app. Figure 1 | Create an App Under .NET. Click here.
You need to know some boundaries and terms before diving into .NET. First, the idea of IL is nothing new. The VB and C++ compilers have generated IL for years, but no one discussed this publicly and no one documented it. The single biggest change from the way you shipped applications in the past is in the code the compiler generates. Other than the name, the new MSIL bears little resemblance to the IL of the VB6 compilers—so if you've worked with IL in the past, prepare yourself for a whole new experience. Look at an MSIL fragment generated from the GoodByeVB6 application (see Figure 2).
This code sets up a stack of eight bytes, then pushes the this pointer onto the stack and calls the get_Label1 method. Then the code pushes the desired label text onto the stack and calls the set_Text method (see the sidebar, "How to Read MSIL," for a more detailed explanation of the structure).
When you run the compiler, you generate a program that's not a true executable as we know it today. Instead, it's called an assembly (see the Glossary sidebar). An assembly is a grouping of files deployed as a single file. In today's architecture, you might think of a single executable as an assembly. More accurately, an assembly groups the executable file with any support DLLs, images, resources, or help files. Figure 2 | Treading on Unsafe Ground. Click here.
An assembly almost always consists of at least two files: the executable and the manifest. The manifest is a list of all the files that exist inside the assembly. The executable content inside the assembly is referred to individually as a module. Conceptually, modules correspond to DLLs or EXEs; each module contains metadata, in addition to the metadata of its parent assembly. The assembly format is an enhanced version of the current Portable Executable (PE) format.
The standard PE header comes at the beginning of the file. Inside the file is the CLR header, followed by the data required to load the code into its process space—referred to as metadata (see Figure 3). Metadata describes to the execution engine how the module should be loaded, what additional files it needs, how to load those additional files, and how to interact with COM and the .NET runtime. Metadata also describes the methods, interfaces, and classes contained in the module or assembly. The information the metadata provides allows the JIT compiler to compile and run the module. The metadata section exposes much of your application's internals and eases the transition from disassembled IL to useful code. Figure 3 | Of Modules and Metadata Click here.
At the heart of .NET's code deployment is the issue of managed code—code written exclusively to run under the CLR's control. You can create managed code from VB.NET, C#, or C++, but C++ is the only language that can create unmanaged code on the .NET platform. You can't use VB6 to create unmanaged code for the .NET platform because you ship assembled i386 instruction code rather than IL code with VB6. You can't ship anything but IL code when you use managed code, as you must with VB.NET.
Now look at the benefits of using the new MSIL code. When you leave your code at the MSIL stage, you can install and run it on any platform that supports the CLR. This might not be a big deal to you now because the list of platforms that currently support .NET is short: only 32-bit Windows. But soon that list will include 64-bit platforms and .NET for Windows CE devices (pocket PCs). Leaving your code as MSIL allows you to move seamlessly to these and other new platforms in the future.
Another advantage of MSIL: The JIT compiler converts the MSIL to native code on the target machine. So the JIT compiler can take advantage of the specific hardware and optimize the code for that specific platform. This comes in handy, for example, when optimizing the code for particular registers or op-codes found on certain hardware that has a particular processor. Look at the Compile tab's Advanced Optimizations button in VB6's Project Properties. Using the metadata in the assembly, the JIT compiler knows what your code does and what the platform supports, makes these optimization decisions for you on the fly, and enhances your code performance.
Yet another benefit concerns the two v's of .NET: validation and verification. Validation is a series of tests you can perform on your module to ensure the metadata, MSIL code, and file format are consistent. Code that can't pass these tests might crash the execution engine or the JIT compilers. Once you validate your module, the code is correct and ready to run.
The code is verified when the JIT compiler converts it from MSIL to native code. Verification involves checking the metadata to ascertain the program cannot access memory or other resources it doesn't have permission for. Verified code is also type-safe. This check is done even if the program is compiled directly to native code, but it's not 100-percent accurate unless the JIT compiler does it, because the test results depend on metadata from other assemblies. If you compile to native code before shipping, you run the risk of another assembly changing on the target machine, which will make your program type-unsafe.
Using the JIT compiler guarantees all related assemblies' current versions are considered when the validation and verification are done. This procedure ensures that the running program will be type-safe and that it will run with the correct security permissions. You can verify and validate your code yourself using the .NET SDK's PEVerify tool.
Reverse Engineering is Easy
Perhaps the single biggest concern with shipping your assembly as MSIL instead of compiled code is security. Remember, the assembly has a manifest of all the package's modules, and the metadata describes each of the modules in detail. The .NET SDK ships with a program called ILDASM, an IL disassembler that takes the module and dumps well-formatted IL code and metadata descriptions for all your application's modules. It can't get any easier to reverse engineer your code (see Listing 1).
One common retort to this perceived problem: A real application is big and the volume of IL dumped would be overwhelming. This might stop an amateur, but it won't bother someone who really wants to crack your code. The truth is this: The dump from ILDASM is much easier to read than the dump from a disassembly of compiled code. An interested party can learn a lot about your application from the IL dump.
Keep your company secrets secret, according to Microsoft, by putting any module containing company secrets on a protected server. That's fine if your program is an ASP.NET client/server application, but it doesn't work well if your application is a standard desktop application. How do you protect intellectual property then? The MSIL Assembler documentation cites a reference to the command-line parameter /owner:
ilasm ... /ownerilasm ... /owner=fergus
This option encrypts an assembly with a password to prevent it from being disassembled. The problem is that Microsoft is going to remove this option, which didn't do a good job in the first place. So you can't protect intellectual property in your desktop applications written with managed C++, C#, or VB code for .NET beta 1.
But there's hope. Before the final .NET release candidate is completed, Microsoft might introduce an obfuscator that alters your MSIL's private methods to make them unreadable to anyone except the CLR JIT compiler. However, this won't hide your application's public methods or calls it must make to an external library. Changing the names or hiding these public calls would make it impossible for the CLR to link to the external functions. So a hacker can still see any slick tricks you use when calling to the system DLLs when he or she digs through your IL code (see the sidebar, "Obfuscators Hide Vulnerable Code").
You can use only one method now to protect your intellectual property on a desktop application. As a VB developer, you might find this difficult, but you must write your critical code in unmanaged C++ and access it from your VB.NET application using the interoperability mechanism provided for accessing unmanaged code.
You can't JIT-compile the code before shipping because all managed code must ship as MSIL. But you can compile the code to assembly form when you install it on the target machine. This sounds fine at first. The code on the install disk is still IL, however, so you can extract it from the setup file manually and disassemble it separately from the install. Once the image is installed on the user's disk, it's in assembly form rather than IL. In addition to security, this buys you a little speed when the application runs because the JIT compiler doesn't need to compile the IL code then.
Third-Party Vendors Might Hesitate
As an application vendor for desktop applications, you know what you have to do. You can write unmanaged C++ and use it from your managed VB code. You designed your application and feel safe because you know the code is good and won't corrupt process spaces it shouldn't corrupt. Although managed code is guaranteed to behave itself, your unmanaged code is stable and reliable, and it's yours. However, if you're a third-party vendor and choose unmanaged code over managed code in your components, you force consumers to step back from the benefits of .NET and open themselves up to the same problems they have now. Part of .NET's beauty is the ability to write managed code and know that doing so won't corrupt the memory used by the process space of your application and other applications. But without that knowledge, vendors might avoid writing controls in .NET managed code to restrict consumer access to the IL and potentially to algorithms their application uses.
I love VS.NET, and VB.NET in particular. The IDE's enhancements themselves provide enough of a reason to change to VB.NET (see the VBPJ .NET article in Resources). The language enhancements add to your programming tools, and the simple access to the underlying OS makes your Declare life easier. VB.NET serves as a fine tool for creating secure ASP.NET applications. However, if your main target is a thick client or desktop application, you might want to investigate the issues before shipping code with VS.NET. We'll see what Microsoft does to help developers who program desktop applications.