GCC 4.1 Release SeriesChanges, New Features, and Fixes ====================================================== Caveats ======= General Optimizer Improvements ============================== - GCC now has infrastructure for inter-procedural optimizations and the following inter-procedural optimizations are implemented: - Profile guided inlining. When doing profile feedback guided optimization, GCC can now use the profile to make better informed decisions on whether inlining of a function is profitable or not. This means that GCC will no longer inline functions at call sites that are not executed very often, and that functions at hot call sites are more likely to be inlined. A new parameter min-inline-recursive-probability is also now available to throttle recursive inlining of functions with small average recursive depths. - Discovery of pure and const functions, a form of side-effects analysis. While older GCC releases could also discover such special functions, the new IPA-based pass runs earlier so that the results are available to more optimizers. The pass is also simply more powerful than the old one. - Analysis of references to static variables and type escape analysis, also forms of side-effects analysis. The results of these passes allow the compiler to be less conservative about call-clobbered variables and references. This results in more redundant loads being eliminated and in making static variables candidates for register promotion. - Improvement of RTL-based alias analysis. The results of type escape analysis are fed to the RTL type-based alias analyzer, allowing it to disambiguate more memory references. - Interprocedural constant propagation and function versioning. This pass looks for functions that are always called with the same constant value for one or more of the function arguments, and propagates those constants into those functions. - GCC will now eliminate static variables whose usage was optimized out. - -fwhole-program --combine can now be used to make all functions in program static allowing whole program optimization. As an exception, the main function and all functions marked with the new externally_visible attribute are kept global so that programs can link with runtime libraries. - GCC can now do a form of partial dead code elimination (PDCE) that allows code motion of expressions to the paths where the result of the expression is actually needed. This is not always a win, so the pass has been limited to only consider profitable cases. Here is an example: int foo (int *, int *); int bar (int d) { int a, b, c; b = d + 1; c = d + 2; a = b + c; if (d) { foo (&b, &c); a = b + c; } printf ("%d\n", a); } The a = b + c can be sunk to right before the printf. Normal code sinking will not do this, it will sink the first one above into the else-branch of the conditional jump, which still gives you two copies of the code. - GCC now has a value range propagation pass. This allows the compiler to eliminate bounds checks and branches. The results of the pass can also be used to accurately compute branch probabilities. - The pass to convert PHI nodes to straight-line code (a form of if-conversion for GIMPLE) has been improved significantly. The two most significant improvements are an improved algorithm to determine the order in which the PHI nodes are considered, and an improvement that allow the pass to consider if-conversions of basic blocks with more than two predecessors. - Alias analysis improvements. GCC can now differentiate between different fields of structures in Tree-SSA's virtual operands form. This lets stores/loads from non-overlapping structure fields not conflict. A new algorithm to compute points-to sets was contributed that can allows GCC to see now that p->a and p->b, where p is a pointer to a structure, can never point to the same field. - Various enhancements to auto-vectorization: - Incrementally preserve SSA form when vectorizing. - Incrementally preserve loop-closed form when vectorizing. - Improvements to peeling for alignment: generate better code when the misalignment of an access is known at compile time, or when different accesses are known to have the same misalignment, even if the misalignment amount itself is unknown. - Consider dependence distance in the vectorizer. - Externalize generic parts of data reference analysis to make this analysis available to other passes. - Vectorization of conditional code. - Reduction support. - GCC can now partition functions in sections of hot and cold code. This can significantly improve performance due to better instruction cache locality. This feature works best together with profile feedback driven optimization. - A new pass to avoid saving of unneeded arguments to the stack in vararg functions if the compiler can prove that they will not be needed. - Transition of basic block profiling to tree level implementation has been completed. The new implementation should be considerably more reliable (hopefully avoiding profile mismatch errors when using -fprofile-use or -fbranch-probabilities) and can be used to drive higher level optimizations, such as inlining. The -ftree-based-profiling command line option was removed and -fprofile-use now implies disabling old RTL level loop optimizer (-fno-loop-optimize). Speculative prefetching optimization (originally enabled by -fspeculative-prefetching) was removed. New Languages and Language specific improvements ================================================ C and Objective-C ----------------- - The old Bison-based C and Objective-C parser has been replaced by a new, faster hand-written recursive-descent parser. Ada --- - The build infrastructure for the Ada runtime library and tools has been changed to be better integrated with the rest of the build infrastructure of GCC. This should make doing cross builds of Ada a bit easier. C++ --- - ARM-style name-injection of friend declarations is no longer the default. For example: struct S { friend void f(); }; void g() { f(); } will not be accepted; instead a declaration of f will need to be present outside of the scope of S. The new -ffriend-injection option will enable the old behavior. - The (undocumented) extension which permitted templates with default arguments to be bound to template template parameters with fewer parameters has been deprecated, and will be removed in the next major release of G++. For example: template