IL Code: What is Intermediate Language
Table of Contents
What is intermediate language (IL code)?
As mentioned in my pervious article, managed modules contain metadata and programmatic IL code. It is a processor-independent machine language developed by Microsoft, in consultation with several commercial and academic organizations specializing in the development of languages and compilers. IL is a higher-level language compared to most other machines languages. It allows you to work with objects and has commands for creating and initialization of objects, calling virtual methods and direct manipulation of array elements. It even has commands for initiating and catching exceptions to handle errors. IL CODE can be thought of as object-oriented machine language.
Typically developers program in high-level languages such as C#, Visual Basic, or F#. The compilers of these languages generate IL code. But such code can be written in assembly language as well, so Microsoft provides IL assembler (ILAsm.exe) and IL disassembler (ILDasm.exe).
Keep in mind that any high-level language will likely only use part of the capabilities provided by the CLR. At the same time, the assembly language IL opens access to all CLR features. If your selected programming language does not give access to exactly those CLR functions that are needed, you can write part of the program code in assembler IL or in another programming language, allowing them to be used.
Attention
I think that the ability to easily switch between languages when they are closely related tagging is wonderful CLR quality. Unfortunately, I am also pretty sure that developers will often pass it by. Languages like C # and Visual Basic great for I / O programming. APL (A Programming Language) is a great language for engineering and financial calculations. Wednesday The CLR allows you to write the I / O part of the application in C#, and engineering calculations are in APL language. The CLR offers unprecedented the level of integration of these languages, and in many projects it is worth seriously considering the use of several languages at the same time.
To execute any method, its IL CODE must be converted to machine commands. This is done by the CLR’s Just-In-Time (JIT) compiler. In fig. 1 shows what happens the first time a method is called. The CLR finds all types just before executing the Main method data referenced by the program code of the Main method. In this CLR shares your internal data structures used for access control to referenced types. In fig. 1 the Main method refers to the only the type is Console, and the CLR exposes a single internal structure. This internal data structure contains one entry for each method, defined in type Console. Each record contains an address where you can but find the implementation of the method. When this structure is initialized, the CLR writes in each entry the address of the internal undocumented function contained in the CLR itself. I designate this function as JITCompiler.
The first time the Main method calls the WriteLine method, JITCompiler function. She is responsible for compiling the IL CODE of the called method. into the processor’s own instructions. Since IL CODE is compiled directly just in time, this CLR component is often called JIT compiler.
Note
If the application runs on x86 Windows or WoW64 mode, the JIT compiler generates commands for the x86 architecture. For applications running as 64-bit on x64 version of Windows, JIT compiler generates x64 commands. Finally, if the application is running on an ARM version of Windows, the JIT compiler generates ARM instructions.
The JITCompiler function knows the method called and the type in which it is defined. The JITCompiler searches the metadata of the corresponding assembly for the IL CODE of the called method. The JITCompiler then examines and compiles the IL CODE into machine instructions, which are stored in a dynamically allocated block of memory. Thereafter JITCompiler falls back to the type’s internal data structure generated by the environment CLR, and replaces the address of the called method with the address of the memory block containing ready-made machine instructions. Finally, the JITCompiler transfers control to the code in this block of memory. This program code is the implementation of the WriteLine method (a variant of this method with a String parameter). From this method, control returns to the Main method, which continues execution as usual.
Consider re-calling the Main method to the WriteLine method. To that at the moment, the code of the WriteLine method has already been checked and compiled, so the call to the memory block is done directly, without calling JITCompiler. Having worked out the method WriteLine returns control to the Main method.
The degradation in performance is observed only the first time the method is called. All subsequent calls are performed “at maximum speed”, therefore that re-verification and compilation are not performed.
The JIT compiler stores machine instructions in heap. This is a sign cheat that the compiled code is destroyed when the application exits. To call the application again or to run its second in parallel an instance (in another operating system process), the JIT compiler will have to re-compile IL-code into machine instructions. Depending on the application this can lead to a significant increase in memory costs compared to with a low-level application for which the code in memory is available read-only and shared by all instances of the application.
For most applications, the performance degradation associated with the JIT compiler is negligible. Most applications over and over again refers to the same methods. This affects performance only once during application execution. In addition, the execution of the method usually takes longer than calling it.
Also be aware that the CLR JIT compiler optimizes machine the code is similar to the unmanaged C ++ compiler. And again: creation optimized code takes more time, but when executed it is much more productive than non-optimized.
There are two C # compiler options that affect code optimization -/ optimize and / debug. The following table shows their impact on quality IL CODE generated by the C# compiler and machine code generated by the JIT the compiler.
Compiler options | IL CODE quality compiler | Machine quality JIT code |
/ optimize- / debug-
(default) |
Unoptimized | Optimized |
/ optimize- / debug (+ / full /pdbonly) | Unoptimized | Unoptimized |
/ optimize + / debug (- / + / full /pbdonly) | Optimized | Optimized |
C option / optimize C# compiler generates unoptimized IL CODE containing many empty commands (no-operation, NOP). These commands pre-assigned to support the edit-and-continue feature in Visual Studio during the debugging process. They also simplify the debugging process, allowing you to set breakpoints on control commands, such as for, while, do, if, else, as well as try, catch and finally blocks. During optimizing the IL code, the C# compiler removes these extraneous commands, complicating the process of debugging the code, but at the same time optimizing the program control flow. Besides In addition, it is possible that some of the evaluation functions are not executed during debugging. However, the IL CODE is smaller and this reduces the resulting EXE size. or DLL files; also, IL CODE is easier to read for those who love to explore IL CODE trying to understand what exactly the compiler spawned.
Also, the compiler builds the PDB (Program Database) file only when setting the / debug (+ / full / pdbonly) parameter. PDB file helps the debugger find local variables and link IL commands to source code. The / debug: full option tells the JIT compiler what you intend to do debugging the assembly; The JIT compiler stores information about which machine This code was generated for each IL instruction.
This allows you to use JIT debug function of Visual Studio to link the debugger to an already running one process and simplify code debugging. Without the / debug option: full software compiler By default, it does not store information about the correspondence between IL and machine code this speeds up compilation and reduces memory costs. If you run process in the Visual Studio debugger, then the JIT compiler will keep track of the information information about the correspondence between IL and machine code (regardless of the state of the /debug) unless you uncheck Suppress JIT Optimization On Module Load (Managed Only) in Visual Studio.
When creating a new C # project in Visual Studio in debug configuration the project is set with / optimize and / debug: full, and in config release – options / optimize + and / debug: pdbonly.
Developers with experience writing unmanaged C or C ++ code usually worries about how all this affects performance. After all, unmanaged code compiles for a specific processor and, when called, can simply fill up. In a controlled environment, code compilation has two phases. First the compiler walks through the source code, doing as much work on generating IL-code. But to execute the code, the IL CODE itself must be compiled into machine instructions at runtime, which requires allocating additional memory that cannot be shared, and additional costs of processor time.
I myself came to CLR with experience in programming in C / C ++ and I am strongly worried about additional costs. Indeed, the second stage of compilation, happening at runtime slows down execution and requires allocation dynamic memory. However, Microsoft has worked hard on optimization to keep these additional costs to a minimum.
If you are also skeptical about two-phase compilation, be sure to try building applications and measure their performance. Then do the same with some non-trivial managed applications created by Microsoft or other companies. It will surprise you how effective they are executed.
Believe it or not, many people (myself included) believe that managed applications can even outperform unmanaged applications. There are many reasons for this. For example, at the moment when The JIT compiler compiles IL CODE into machine code at runtime, it knows more about the runtime than the unmanaged compiler can. Here are some of the possibilities for improving the performance of code versus unmanaged:
- JIT compiler can detect that the application is running on the processor Intel Pentium 4, and generate machine code with special instructions, supported by the Pentium 4. Typically, unmanaged applications are compiled are with the most general set of commands and do not use special commands, able to improve the efficiency of the application.
- The JIT compiler can determine that some condition on that computer on which it is executed always turns out to be false. Let’s say the method contains the following snippet:
if (numberOfCPUs> 1) {
…
}
If the computer has only one processor, then the JIT compiler will not generate machine instructions for the specified fragment. In this case, the machine code is optimized for a specific machine, and therefore takes up less space and runs faster.
- CLR can profile the executable and recompile IL into machine code at runtime. The recompiled code is refactored to reduce erroneous branch predictions based on observed patterns of performance. The current versions of the CLR are the feature is not supported, but it may appear in future versions.
These are just a few of the reasons why future managed code can outperform unmanaged code. As I already said, in most applications, quite good performance is achieved and in it will improve in the future.
If your experiments show that the JIT compiler does not provide your application with the required performance level, perhaps you it is worth using the NGen.exe utility from the .NET Framework SDK. This utility compiles all the assembly IL CODE into machine code and saves it to a file on disk. At runtime when the assembly is loaded, the CLR automatically checks if whether a precompiled version of the assembly exists, and if so, a precompiled loads it so that runtime compilation is no longer required.
Also, when analyzing performance, the System.
Runtime.ProfileOptimization. It forces the CLR to store (in a file) information about which methods are JIT compiled at runtime. If the machine on which the application is running has several processors, in future launches of the application, the JIT compiler in parallel compiles these methods on other threads. As a result, living is faster because multiple methods are compiled in parallel separately, and this happens during the initialization of the application (instead of JIT compilation).
IL CODE and verification:
IL is a stack language; this means that all of its instructions fill the operands onto the executive stack and pop the results off the stack. IL does not contain instructions to work with registers, and this makes it easier to create new languages and compilers, generating code for the CLR.
IL instructions are also untyped. For example, IL has an instruction for adding the last two operands pushed onto the stack. The addition instruction does not have two separate versions (32-bit and 64-bit). When executed, the addition instruction determines the types of the operands stored on the stack, and performs the appropriate operation.
However, in my opinion, the biggest advantage of the IL CODE is not even is that it abstracts the developer from a particular processor. IL the security of the application and its resilience to errors. In the process IL compilations into CLR machine instructions perform a procedure called verification – analyzing the high-level IL CODE and checking the security of all Operations. For example, verification makes sure that each method with the right number of parameters, that all the parameters transmitted have the right type that the return value of each method is used it is correct that each method contains return instructions, etc. about the methods and types used in the verification process stored in metadata control module.
In Windows, each process has its own virtual address space. The need to split address spaces is explained the fact that the application code is in principle unreliable. Nothing interferes with the application perform a read or write operation on an invalid memory address (and unfortunately, this often happens in practice). Placing Windows Processes in Isolated Address Spaces Provides Security and Stability systems; one process cannot harm another process.
Verification of managed code, however, ensures that the code does not improperly access memory and cannot harm the execution of another applications. This means that you can run multiple managed applications. locations in one Windows virtual address space. Since Windows processes require significant expenditure of operating resources system, their excess in the system reduces performance and limits available resources. Reducing the number of processes by running multiple applications in one operating system process improves performance, reduces resource costs and provides the same level of protection as if each application had its own process. This is another the advantage of managed code over unmanaged code.
So the CLR provides the ability to execute multiple managed applications in the same operating system process. Each managed application runs in an application domain (AppDomain). By default, each a managed EXE file runs in a separate address space consisting of from one domain. However, the process that provides hosting (hosting) CLR – for example, IIS (Internet Information Services) or Microsoft SQL Server, can run multiple application domains in one operating room process systems.
Unsafe code:
By default, the Microsoft C# compiler generates safe code. This term is understood as a code, the safety of which is confirmed in the verification process. However, the Microsoft C# compiler also allows workers write unsafe code that can directly work with addresses memory and manipulate bytes at these addresses. As a rule, these extremely powerful tools are used to interact with unmanaged code or to optimize time-critical algorithms.
However, using unsafe code poses a significant risk: unsafe code can corrupt data structures and use (or even create) security vulnerabilities. For this reason, the C# compiler requires that all methods containing unsafe code were marked with the unsafe keyword, while the source was compiled using the / unsafe compiler option.
When the JIT compiler tries to compile an unsafe method, it first verifies that the assembly containing the method has been provided System.Security.Permissions.SecurityPermission with the SkipVerification flag of the System.Security.Permissions enumeration set. SecurityPermissionFlag. If the flag is set, the JIT compiler compiles unsafe code and allows its execution. The CLR trusts this code and hopes that direct memory access and byte manipulation will not cause harm. If the flag is not set, the JIT compiler throws a System. InvalidProgramException or System.Security.VerificationException prevented rotating method execution. Most likely, at this moment the application crashed will end, but at least without causing harm.
Microsoft provides a utility called PEVerify.exe that verifies all assembly methods and reports all methods containing unsafe code. You might want to run PEVerify.exe on all assemblies you reference; this will let you know about possible problems with the launch of your applications over the intranet or the Internet.
Note
By default, assemblies loaded from the local machine or over the network have complete trust; this means that they are allowed to do anything, including unsafe code. However, by default, assemblies that run over the Internet are not get permission to execute unsafe code. If they contain non- safe code, one of the mentioned exceptions is thrown. Administrator or the end user can change these defaults, however this If so, he is solely responsible for the behavior of this code.
It should be borne in mind that verification requires access to metadata containing found in all dependent assemblies. Thus, when you use PEVerify to check the assembly, the program must be able to find and load everything. mined assemblies. Since PEVerify uses the CLR to find dependent assemblies, this uses the same anchor and search rules that are usually applied are used during the execution of assemblies.
IL code and intellectual property protection:
Some developers are concerned that IL does not provide sufficient level intellectual property protection for their algorithms. In other words, they believe that someone else can use the IL disassembler, take the managed module built by them and it is easy to restore the logic of the application code.
Yes, IL CODE works at a higher level than most other assemblers, and in general, IL CODE disassembly is performed relatively simple. However, when implementing code that runs on the server side (web service, web form or stored procedure), the assembly resides on the server. Since an outsider cannot access the assembly, he cannot use any programs for viewing IL – your intellectual property in full security.
If you are worried about redistributable assemblies, use “cloak” utilities from independent developers. Such programs encrypt all closed symbolic names in assembly metadata. It will be difficult for an outsider to decipher such a name and understand the purpose of each method. Note that disguise provides only relative protection, because the CLR must at some point access the IL CODE in order to JIT it.
If you do not believe that disguise provides the desired level of protection intellectual property, consider implementing more secret algorithms in an unmanaged module containing machine instructions instead of IL and metadata. You can then use the CLR (if you have sufficient permissions) to work with unmanaged parts of your applications. Of course, this decision assumes that you are not worried about the possibility of disassembling machine instructions unmanaged code.
NGen.exe:
The NGen.exe program included with the .NET Framework can be used to compile IL CODE to machine code when installing the application on the machine user. Since the code is compiled at the installation stage, the JIT compiler CLR doesn’t have to compile it at runtime, which can improve application performance. NGen.exe is useful in two situations.
Acceleration of application launch. Starting NGen.exe speeds up startup because the code is already compiled to machine form and no compilation needs to be done at the stage of execution. Reducing the working set of the application. If you expect an assembly to be loaded in multiple processes at the same time, the program will handle it NGen.exe can reduce the working set of an application. The point is that NGen.exe converts IL to machine code and saves the result in a separate file. This the file can be mapped to memory in multiple address spaces of the same temporarily, and the code will be shared, without each processing your own code instance.