Next: Optimization switches Up: Optimizations Previous: Optimizations

Non processor specific

The following sections describe the general optimizations done by the compiler, they are non processor specific. Some of these require some compiler switch override while others are done automatically (those which require a switch will be noted as such).

Constant folding

In Free Pascal, if the operand(s) of an operator are constants, they will be evaluated at compile time.

Example

   x:=1+2+3+6+5;
will generate the same code as
   x:=17;

Furthermore, if an array index is a constant, the offset will be evaluated at compile time. This means that accessing MyData[5] is as efficient as accessing a normal variable.

Finally, calling Chr, Hi, Lo, Ord, Pred, or Succ functions with constant parameters generates no run-time library calls, instead, the values are evaluated at compile time.

Constant merging

Using the same constant string two or more times generates only one copy of the string constant.

Short cut evaluation

Evaluation of boolean expression stops as soon as the result is known, which makes code execute faster then if all boolean operands were evaluted.

Constant set inlining

Using the in operator is always more efficient then using the equivalent <>, =, <=, >=, < and > operators. This is because range comparisons can be done more easily with in then with normal comparison operators.

Small sets

Sets which contain less then 33 elements can be directly encoded using a 32-bit value, therefore no run-time library calls to evaluate operands on these sets are required; they are directly encoded by the code generator.

Range checking

Assignments of constants to variables are range checked at compile time, which removes the need the generation of runtime range checking code.

Remark: This feature was not implemented before version 0.99.5 of Free Pascal.

Shifts instead of multiply or divide

When one of the operands in a multiplication is a power of two, they are encoded using arithmetic shifts instructions, which generates more efficient code.

Similarly, if the divisor in a div operation is a power of two, it is encoded using arithmetic shifts instructions.

The same is true when accessing array indexes which are powers of two, the address is calculated using arithmetic shifts instead of the multiply instruction.

Automatic alignment

By default all variables larger then a byte are guaranteed to be aligned at least on a word boundary.

Furthermore all pointers allocated using the standard runtime library (New and GetMem among others) are guaranteed to return pointers aligned on a quadword boundary (64-bit alignment).

Alignment of variables on the stack depends on the target processor.

Remark: Quadword alignment of pointers is not guaranteed on systems which don't use an internal heap, such as for the Win32 target.

Remark: Alignment is also done between fields in records, objects and classes, this is not the same as in Turbo Pascal and may cause problems when using disk I/O with these types. To get no alignment between fields use the packed directive or the {$PackRecords n} switch. For further information, take a look at the reference manual under the record heading.

Smart linking

This feature removes all unreferenced code in the final executable file, making the executable file much smaller.

Smart linking is switched on with the -Cx command-line switch, or using the {$SMARTLINK ON} global directive.

Remark: Smart linking was implemented starting with version 0.99.6 of Free Pascal.

Inline routines

The following runtime library routines are coded directly into the final executable : Lo, Hi, High, Sizeof, TypeOf, Length, Pred, Succ, Inc, Dec and Assigned.

Remark: Inline Inc and Dec were not completely implemented until version 0.99.6 of Free Pascal.

Case optimization

When using the -O1 switch, case statements in certain cases will be decoded using a jump table, which in certain cases will make the case statement execute faster.

Stack frame omission

Under certain specific conditions, the stack frame (entry and exit code for the routine, see section 3.3) will be omitted, and the variable will directly be accessed via the stack pointer.

Conditions for omission of the stack frame :

The function has no parameters nor local variables.
Routine does not call other routines.
Routine does not contain assembler statements. However, a assembler routine may omit it's stack frame.
Routine is not declared using the Interrupt directive.
Routine is not a constructor or destructor.

Register variables

When using the -Or switch, local variables or parameters which are used very often will be moved to registers for faster access.

Remark: Register variable allocation is currently an experimental feature, and should be used with caution.

Intel x86 specific

Here follows a listing of the opimizing techniques used in the compiler:

When optimizing for a specific Processor (-Op1, -Op2, -Op3, the following is done:
- In case statements, a check is done whether a jump table or a sequence of conditional jumps should be used for optimal performance.
- Determines a number of strategies when doing peephole optimization: movzbl (%ebp), %eax on PentiumPro and PII systems will be changed into xorl %eax,%eax; movb (%ebp),%al for lesser systems.
Cyrix 6x86 processor owners should optimize with -Op3 instead of -Op2, because -Op2 leads to larger code, and thus to smaller speed, according to the Cyrix developers FAQ.
When optimizing for speed (-OG, the default) or size (-Og), a choice is made between using shorter instructions (for size) such as enter $4, or longer instructions subl $4,%esp for speed. When smaller size is requested, things aren't aligned on 4-byte boundaries. When speed is requested, things are aligned on 4-byte boundaries as much as possible.
Simple optimization (-O1) makes sure the peephole optimizer is used, as well as the reloading optimizer.
Uncertain optimizations (-Ou): With this switch, the reloading optimizer can be forced into making uncertain optimizations.
You can enable uncertain optimizations only in certain cases, otherwise you will produce a bug; the following technical description tells you when to use them:
If uncertain optimizations are enabled, the reloading optimizer assumes that
If something is written to a local/global register or a procedure/function parameter, this value doesn't overwrite the value to which a pointer points.
If something is written to memory pointed to by a pointer variable, this value doesn't overwrite the value of a local/global variable or a procedure/function parameter.
The practical upshot of this is that you cannot use the uncertain optimizations if you access any local or global variables through pointers. In theory, this includes Var parameters, but it is all right if you don't both read the variable once through its Var reference and then read it using it's name.
The following example will produce bad code when you switch on uncertain optimizations:
Var temp: Longint; Procedure Foo(Var Bar: Longint); Begin If (Bar = temp) Then Begin Inc(Bar); If (Bar <> temp) then Writeln('bug!') End End; Begin Foo(Temp); End.
The reason it produces bad code is because you access the global variable Temp both through its name Temp and through a pointer, in this case using the Bar variable parameter, which is nothing but a pointer to Temp in the above code.
On the other hand, you can use the uncertain optimizations if you access global/local variables or parameters through pointers, and only access them through this pointer.
For example:
Type TMyRec = Record a, b: Longint; End; PMyRec = ^TMyRec; TMyRecArray = Array [1..100000] of TMyRec; PMyRecArray = ^TMyRecArray; Var MyRecArrayPtr: PMyRecArray; MyRecPtr: PMyRec; Counter: Longint; Begin New(MyRecArrayPtr); For Counter := 1 to 100000 Do Begin MyRecPtr := @MyRecArrayPtr^[Counter]; MyRecPtr^.a := Counter; MyRecPtr^.b := Counter div 2; End; End.
Will produce correct code, because the global variable MyRecArrayPtr is not accessed directly, but through a pointer (MyRecPtr in this case).
In conclusion, one could say that you can use uncertain optimizations only when you know what you're doing.

Motorola 680x0 specific

Using the -O2 switch does several optimizations in the code produced, the most notable being:

Sign extension from byte to long will use EXTB
Returning of functions will use RTD
Range checking will generate no run-time calls
Multiplication will use the long MULS instruction, no runtime library call will be generated
Division will use the long DIVS instruction, no runtime library call will be generated

Next: Optimization switches Up: Optimizations Previous: Optimizations

Michael Van Canneyt
Thu Sep 10 14:04:11 CEST 1998