Translate to another language

Compiler

A compiler is a computer program (or set of programs) that translates text written in a computer language (the source language) into another computer language (the target language). The original sequence is usually called the source code and the output called object code. Commonly the output has a form suitable for processing by other programs (e.g., a linker), but it may be a human readable text file.

The most common reason for wanting to translate source code is to create an executable program. The name "compiler" is primarily used for programs that translate source code from a high level language to a lower level language (e.g., assembly language or machine language). A program that translates from a low level language to a higher level one is a decompiler. A program that translates between high-level languages is usually called a language translator, source to source translator, or language converter. A language rewriter is usually a program that translates the form of expressions without a change of language.

A compiler is likely to perform many or all of the following operations: lexing, preprocessing, parsing, semantic analysis, code optimizations, and code generation.



Types of compilers


There are many ways to classify compilers according to the input and output, internal structure, and runtime behavior. For example,
  • A program that translates from a low level language to a higher level one is a decompiler.
  • A program that translates between high-level languages is usually called a language translator, source to source translator, language converter, or language rewriter (this last term is usually applied to translations that do not involve a change of language)
Native versus cross compiler

Most compilers are classified as either native compilers or cross compilers.
A compiler may produce binary output intended to run on the same type of computer and operating system ("platform") as the compiler itself runs on. This is sometimes called a native-code compiler. Alternatively, it might produce binary output designed to run on a different platform. This is known as a cross compiler. Cross compilers are very useful when bringing up a new hardware platform for the first time (see bootstrapping). Cross compilers are also necessary when developing software for microcontroller systems that have barely enough storage for the final machine code, much less a compiler. Compilers which are capable of producing both native and foreign binary output may be called either a cross compiler or a native compiler depending on a specific use, although it would be more correct to classify them as cross compilers.

Interpreters are never classified as native or cross compilers, because they do not output a binary representation of their input code.

Virtual machine (VM) compilers are typically not classified as either native or cross compilers. However, if need be, they can be classified as one or the other, especially in the less usual cases where a compiler is running inside the same VM (making it a native compiler), or where a compiler is capable of producing an output for several different platforms, including a VM (making it a cross compiler).

One-pass versus multi-pass compilers

Classifying compilers by number of passes[1] has its background in the hardware resource limitations of computers. Compiling involves performing lots of work and early computers did not have enough memory to contain one program that did all of this work. So compilers were split up into smaller programs which each made a pass over the source (or some representation of it) performing some of the required analysis and translations.
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one pass compilers are generally faster than multi-pass compilers. Many languages were designed so that they could be compiled in a single pass (e.g., the Pascal programming language).

In some cases the design of a language feature may require a compiler to perform more than one pass over the source. For instance, when a declaration appearing on line 20 of the source affects the translation of the statement appearing on line 10; the first pass needs to gather information about declarations appearing after statements that they affect, with the actual translation happening during a second pass.

The disadvantage of compiling in a single pass is that it is not possible to perform many of the sophisticated optimizations needed to generate high quality code. It can be difficult to count exactly how many passes an optimizing compiler makes. For instance, different phases of optimization may analyse one expression many times but only analyse another expression once.

Splitting a compiler up into small programs is a technique used by researchers interested in producing provably correct compilers. Proving the correctness of a set of small programs often requiring less effort than proving the correctness of a larger, single, equivalent program.

While the typical multi-pass compiler outputs machine code from its final pass, there are several other types:
  • A "source-to-source compiler" is a type of compiler that takes a high level language as its input and outputs a high level language. For example, an automatic parallelizing compiler will frequently take in a high level language program as an input and then transform the code and annotate it with parallel code annotations (e.g. OpenMP) or language constructs (e.g. Fortran's DOALL statements).
  • Stage compiler that compiles to assembly language of a theoretical machine, like some Prolog implementations
    • This Prolog machine is also known as the Warren Abstract Machine (or WAM). Byte-code compilers for Java, Python (and many more) are also a subtype of this.
  • Just-in-time compiler, used by Smalltalk and Java systems, and also by Microsoft .Net's Common Intermediate Language (CIL)
    • Applications are delivered in bytecode, which is compiled to native machine code just prior to execution.
Compiled versus interpreted languages

Many people divide higher-level programming languages into compiled languages and interpreted languages. However, there is rarely anything about a language that requires it to be compiled or interpreted. Compilers and interpreters are implementations of languages, not languages themselves. The categorization usually reflects the most popular or widespread implementations of a language -- for instance, BASIC is thought of as an interpreted language, and C a compiled one, despite the existence of BASIC compilers and C interpreters.

There are exceptions; some language specifications assume the use of a compiler (as with C), or spell out that implementations must include a compilation facility (as with Common Lisp). Some languages have features that are very easy to implement in an interpreter, but make writing a compiler much harder; for example, SNOBOL4, and many scripting languages are capable of constructing arbitrary source code at runtime with regular string operations, and then executing that code by passing it to a special evaluation function. To implement these features in a compiled language, programs must usually be shipped with a runtime environment that includes the compiler itself.

Hardware compilation

Just as software can be compiled for execution on a processor, hardware can now be compiled directly from a high-level program into a netlist of components suitable for loading into programmable hardware such as a Field Programmable Gate Array (FPGA), produced by manufacturers like Xilinx. This new paradigm has been dubbed computing without computers.

All text used in this article is available under the GNU Free Documentation License. It uses material from the Wikipedia article "Compiler".