Patchwork use gcc 4.6.0 link time optimization to reduce coreboot execution time

login
register
about
Submitter Scott
Date 2011-04-29 03:01:50
Message ID <4E2C6DA2F6424C5BBA589A3E5D85B414@m3a78>
Download mbox | patch
Permalink /patch/2925/
State New
Headers show

Comments

Scott - 2011-04-29 03:01:50
Adds a kconfig option to enable gcc link time optimization.
Link time optimization reduces both rom stage and ram stage
image size by removing unused functions and data. Reducing the
image size saves boot time by minimizing the flash memory read
and decompress time for ram stage.

The option is off by default because of side effects such as
long build time and unusable dwarf2 debug output. This
option cuts persimmon+seabios DOS boot from SSD time from
690 ms to 640 ms. 

Signed-off-by: Scott Duplichan <scott@notabs.org>
Stefan Reinauer - 2011-04-29 03:59:46
On 4/28/11 8:01 PM, Scott Duplichan wrote:
> Adds a kconfig option to enable gcc link time optimization.
> Link time optimization reduces both rom stage and ram stage
> image size by removing unused functions and data. Reducing the
> image size saves boot time by minimizing the flash memory read
> and decompress time for ram stage.
>
> The option is off by default because of side effects such as
> long build time and unusable dwarf2 debug output. This
> option cuts persimmon+seabios DOS boot from SSD time from
> 690 ms to 640 ms.

Did you do some size tests with non-AGESA targets?

Does lto work with our "driver"s? I hoped that once we have LTO 
available we could get rid of the distinction between drivers and 
objects and handle everything the way we handle drivers now, letting gcc 
remove the functions we don't need.

> Signed-off-by: Scott Duplichan<scott@notabs.org>
>

Should we instead probe for availability of -flto in 
util/xcompile/xcompile and use it if it is there?

What's the problem with dwarf2? GCC 4.6 uses mostly dwarf4 unless you 
manually force it to dwarf2. Will this still be a problem?


> Index: Makefile
> ===================================================================
> --- Makefile	(revision 6549)
> +++ Makefile	(working copy)
> @@ -211,7 +211,7 @@
>   de$(EMPTY)fine $(1)-objs_$(2)_template
>   $(obj)/$$(1).$(1).o: src/$$(1).$(2) $(obj)/config.h $(4)
>   	@printf "    CC         $$$$(subst $$$$(obj)/,,$$$$(@))\n"
> -	$(CC) $(3) -MMD $$$$(CFLAGS) -c -o $$$$@ $$$$<
> +	$(CC) $(3) -MMD $$$$(CFLAGS) $$$$(LTO_OPTIMIZE) -c -o $$$$@ $$$$<

Hm.. I think LTO_OPTIMIZE should be added to CFLAGS instead, that would 
make the patch a whole lot less intrusive.

> Index: src/arch/x86/init/bootblock.ld
> ===================================================================
> --- src/arch/x86/init/bootblock.ld	(revision 6549)
> +++ src/arch/x86/init/bootblock.ld	(working copy)
> @@ -22,7 +22,6 @@
>   OUTPUT_FORMAT("elf32-i386", "elf32-i386", "elf32-i386")
>   OUTPUT_ARCH(i386)
>
> -TARGET(binary)
>   SECTIONS
>   {
>   	. = CONFIG_ROMBASE;
Hm interesting... does this hurt LTO?
Scott - 2011-04-29 05:45:54
Stefan Reinauer wrote:

] Did you do some size tests with non-AGESA targets?

The improvement for non-agesa-v5 projects is smaller.
Here are a couple of examples:
AMD Mahogany F10       standard  -flto
fallback/romstage      74803     73004
fallback/coreboot_ram  55665     49928

Intel D945GCLF         standard  -flto
fallback/romstage      33144     29841
fallback/coreboot_ram  69435     65774

By the way, the attached patch needs one more change to
build some of the non-agesa projects with -flto enabled.
from an 04/20/2011 email...
- sed -e 's/\.rodata/.rom.data/g' -e 's/\.text/.section .rom.text/g' $^ > $@.tmp
+ sed -e 's/\.rodata/.rom.data/g' -e 's/^[ \t]*\.text/.section .rom.text/g' $^ > $@.tmp

] Does lto work with our "driver"s? I hoped that once we have LTO 
] available we could get rid of the distinction between drivers and 
] objects and handle everything the way we handle drivers now, letting gcc 
] remove the functions we don't need.

I have not encountered coreboot "drivers" directly yet. But I
think the answer is yes. Link time optimization completely
eliminates code that is never called and data that is never
referenced.

] Should we instead probe for availability of -flto in 
] util/xcompile/xcompile and use it if it is there?

One problem is that while gcc 4.5.2 supports -flto, it
is experimental and crashes during the build. So probing
logic would have to exclude gcc 4.5.2. Also, there might
be some objections to the long build time (agesa projects
only), and to the lack of debug support.

] What's the problem with dwarf2? GCC 4.6 uses mostly dwarf4 unless you 
] manually force it to dwarf2. Will this still be a problem?

Testing with both dwarf2 and dwarf4 give the same result: no line
number information for files compiled with -flto. The docs warn:
   "Link time optimization does not play well with generating
    debugging information. Combining -flto with -g is currently
    experimental and expected to produce wrong results".

> Index: Makefile
> ===================================================================
> --- Makefile	(revision 6549)
> +++ Makefile	(working copy)
> @@ -211,7 +211,7 @@
>   de$(EMPTY)fine $(1)-objs_$(2)_template
>   $(obj)/$$(1).$(1).o: src/$$(1).$(2) $(obj)/config.h $(4)
>   	@printf "    CC         $$$$(subst $$$$(obj)/,,$$$$(@))\n"
> -	$(CC) $(3) -MMD $$$$(CFLAGS) -c -o $$$$@ $$$$<
> +	$(CC) $(3) -MMD $$$$(CFLAGS) $$$$(LTO_OPTIMIZE) -c -o $$$$@ $$$$<

] Hm.. I think LTO_OPTIMIZE should be added to CFLAGS instead, that would 
] make the patch a whole lot less intrusive.

Here is why it is more complicated than seems necessary. The idea of
-flto is you just add it to compiler flags, and make sure to pass
the flags during the link step. When I did this, the build fails with:
"cannot find entry symbol protected_start". This causes the entire crt0.s
to be considered dead code and omitted. With C code, this problem can be
overcome using __attribute__((externally_visible)). But gas has no
equivalent and I could not find a solution, other than compile part
without -flto. In the patch, $(LTO_OPTIMIZE) is the normal optimization
flags, while $(OPTIMIZE) is optimization excluding -flto. There might
be a nicer way to orginize this, but somehow -flto needs to be skipped
in one case.

> Index: src/arch/x86/init/bootblock.ld
> ===================================================================
> --- src/arch/x86/init/bootblock.ld	(revision 6549)
> +++ src/arch/x86/init/bootblock.ld	(working copy)
> @@ -22,7 +22,6 @@
>   OUTPUT_FORMAT("elf32-i386", "elf32-i386", "elf32-i386")
>   OUTPUT_ARCH(i386)
>
> -TARGET(binary)
>   SECTIONS
>   {
>   	. = CONFIG_ROMBASE;
] interesting... does this hurt LTO?

Yes, and I do not currently understand exactly why. TARGET(binary)
does not seem to work as I interpret the documentation. The difference
in -flto behavior may be due to our linker script handling of section
names such as 'text.unlikely' that appear with -flto. Just a guess. I
found omitting TARGET(binary) solved a -flto build problem and stuck
with it.

Thanks,
Scott
Rudolf Marek - 2011-04-29 06:46:29
> image size by removing unused functions and data. Reducing the
> image size saves boot time by minimizing the flash memory read

It always made me wonder how DMA support for flash memory copy could improve 
this. The SB700/SB800 has a support for that - it allows to copy the flash 
memory using DMA.

Thanks
Rudolf
Kevin O'Connor - 2011-04-30 15:56:20
On Thu, Apr 28, 2011 at 10:01:50PM -0500, Scott Duplichan wrote:
> The option is off by default because of side effects such as
> long build time and unusable dwarf2 debug output. This
> option cuts persimmon+seabios DOS boot from SSD time from
> 690 ms to 640 ms. 

That's a great boot time!  Do you have a breakdown of where the 640ms
is spent?

-Kevin
Scott - 2011-05-01 03:37:09
Kevin O'Connor wrote:

] That's a great boot time!  Do you have a breakdown of where the 640ms
] is spent?
]
] -Kevin

Hello Kevin,

I tried adding some serial logging to get an idea about where
the time is spent. The logging adds 8 ms to the boot time:

Time in ms
0      cold reset
366    memory initialization complete
469    seabios: maininit(void)
483    seabios: vga_setup() called
604    seabios: vga_setup() returned
621    seabios: startBoot(void)
648    dos autoexec utility logs pmtimer value

It looks like the lengthy operations are memory init and VBIOS
execution, which is consistent with past experience.

UEFI BIOS on this same hardware platform is taking more than 
10 seconds.

Here seabios kconfig options I changed:

Build for coreboot                           y
Hardware init during option ROM execution    y
Bootmenu                                     n
ATA controllers                              n
AHCI controllers                             y
Floppy controller                            n
PS/2 port                                    n
USB UHCI controllers                         n
Parallel port                                n
PCIBIOS interface                            n
APM interface                                n
PnP BIOS interface                           n
S3 resume                                    n
SMBIOS                                       n
Serial port debugging                        y
Show screen writes on debug ports            n

Thanks,
Scott
Антон Кочков - 2011-05-01 03:45:18
May be add something like profiling option with patch for implementing
such feature?
Best regards,
Anton Kochkov.




On Sun, May 1, 2011 at 07:37, Scott Duplichan <scott@notabs.org> wrote:
> Kevin O'Connor wrote:
>
> ] That's a great boot time!  Do you have a breakdown of where the 640ms
> ] is spent?
> ]
> ] -Kevin
>
> Hello Kevin,
>
> I tried adding some serial logging to get an idea about where
> the time is spent. The logging adds 8 ms to the boot time:
>
> Time in ms
> 0      cold reset
> 366    memory initialization complete
> 469    seabios: maininit(void)
> 483    seabios: vga_setup() called
> 604    seabios: vga_setup() returned
> 621    seabios: startBoot(void)
> 648    dos autoexec utility logs pmtimer value
>
> It looks like the lengthy operations are memory init and VBIOS
> execution, which is consistent with past experience.
>
> UEFI BIOS on this same hardware platform is taking more than
> 10 seconds.
>
> Here seabios kconfig options I changed:
>
> Build for coreboot                           y
> Hardware init during option ROM execution    y
> Bootmenu                                     n
> ATA controllers                              n
> AHCI controllers                             y
> Floppy controller                            n
> PS/2 port                                    n
> USB UHCI controllers                         n
> Parallel port                                n
> PCIBIOS interface                            n
> APM interface                                n
> PnP BIOS interface                           n
> S3 resume                                    n
> SMBIOS                                       n
> Serial port debugging                        y
> Show screen writes on debug ports            n
>
> Thanks,
> Scott
>
>
> --
> coreboot mailing list: coreboot@coreboot.org
> http://www.coreboot.org/mailman/listinfo/coreboot
>
Peter Stuge - 2011-05-01 04:01:01
Антон Кочков wrote:
> May be add something like profiling option with patch for implementing
> such feature?

Feel free to send a patch?


//Peter
Scott - 2011-05-01 04:11:37
Anton Kochkov wrote:

] May be add something like profiling option with patch for implementing
] such feature?
] Best regards,
] Anton Kochkov.

Hello Anton,

In the past I have seen such logging code added to a BIOS code base.
It was for Phoenix legacy if I remember correctly. A challenge with 
this method is using it to exactly pinpoint a problem. For this board
I have the luxury of a jtag debugging setup, the Sage SmartProbe. This
arrangement is very handy for boot time reduction. I do a crude form
of profiling by breaking in randomly during post. I still find it
spending some time in lzma decode of ramstage. That time was reduced
by the -flto compiler option. 

Thanks,
Scott
Kevin O'Connor - 2011-05-01 15:15:24
On Sat, Apr 30, 2011 at 11:11:37PM -0500, Scott Duplichan wrote:
> Anton Kochkov wrote:
> 
> ] May be add something like profiling option with patch for implementing
> ] such feature?
> ] Best regards,
> ] Anton Kochkov.
> 
> Hello Anton,
> 
> In the past I have seen such logging code added to a BIOS code base.
> It was for Phoenix legacy if I remember correctly. A challenge with 
> this method is using it to exactly pinpoint a problem. For this board

There is a tool in the seabios repo - tools/readserial.py .  It can be
run on a separate host that reads the debug serial output - it
provides timing info on each line read and can adjust the times to
eliminate the cost of writing to the serial port.

It's not perfect, but it can provide a broad overview of where time is
spent.

If you haven't already tried it, it's usage follows:

./tools/readserial.py /dev/ttyS0 115200

>For this board
> I have the luxury of a jtag debugging setup, the Sage SmartProbe. This
> arrangement is very handy for boot time reduction. I do a crude form
> of profiling by breaking in randomly during post. I still find it
> spending some time in lzma decode of ramstage. That time was reduced
> by the -flto compiler option. 

Thanks - I'll have to try that on my board (an old epia-cn machine).
I found lzma to be time intensive.

-Kevin
Kevin O'Connor - 2011-05-01 15:42:11
On Sat, Apr 30, 2011 at 10:37:09PM -0500, Scott Duplichan wrote:
> Kevin O'Connor wrote:
> ] That's a great boot time!  Do you have a breakdown of where the 640ms
> ] is spent?
> I tried adding some serial logging to get an idea about where
> the time is spent. The logging adds 8 ms to the boot time:
> 
> Time in ms
> 0      cold reset
> 366    memory initialization complete
> 469    seabios: maininit(void)
> 483    seabios: vga_setup() called
> 604    seabios: vga_setup() returned
> 621    seabios: startBoot(void)
> 648    dos autoexec utility logs pmtimer value

Interesting - thanks.  These numbers look similar to the times I was
getting with a different board last year:

http://www.coreboot.org/pipermail/coreboot/2009-December/054770.html

> It looks like the lengthy operations are memory init and VBIOS
> execution, which is consistent with past experience.
> 
> UEFI BIOS on this same hardware platform is taking more than 
> 10 seconds.

:-)

> Here seabios kconfig options I changed:
[...]
> ATA controllers                              n
> AHCI controllers                             y
> Floppy controller                            n
> PS/2 port                                    n
> USB UHCI controllers                         n
> Parallel port                                n
> PCIBIOS interface                            n
> APM interface                                n
> PnP BIOS interface                           n
> S3 resume                                    n
> SMBIOS                                       n

Do these options change the boot time?  Since there is already 160ms
of time spent in SeaBIOS, I would have thought the time for all of
these could have been done in parallel anyway.

-Kevin
Scott - 2011-05-02 18:30:49
Kevin O'Connor wrote:

]> Here seabios kconfig options I changed:
][...]
]> ATA controllers                              n
]> AHCI controllers                             y
]> Floppy controller                            n
]> PS/2 port                                    n
]> USB UHCI controllers                         n
]> Parallel port                                n
]> PCIBIOS interface                            n
]> APM interface                                n
]> PnP BIOS interface                           n
]> S3 resume                                    n
]> SMBIOS                                       n
]
]Do these options change the boot time?  Since there is already 160ms
]of time spent in SeaBIOS, I would have thought the time for all of
]these could have been done in parallel anyway.
]
]-Kevin

It looks like disabling unused options saves about 5 ms. Even if
the code has no significant execution time, removing it makes the
compressed payload smaller and take less time to read from flash.

Thanks,
Scott

Patch

Index: Makefile
===================================================================
--- Makefile	(revision 6549)
+++ Makefile	(working copy)
@@ -211,7 +211,7 @@ 
 de$(EMPTY)fine $(1)-objs_$(2)_template
 $(obj)/$$(1).$(1).o: src/$$(1).$(2) $(obj)/config.h $(4)
 	@printf "    CC         $$$$(subst $$$$(obj)/,,$$$$(@))\n"
-	$(CC) $(3) -MMD $$$$(CFLAGS) -c -o $$$$@ $$$$<
+	$(CC) $(3) -MMD $$$$(CFLAGS) $$$$(LTO_OPTIMIZE) -c -o $$$$@ $$$$<
 en$(EMPTY)def
 end$(EMPTY)if
 endef
Index: Makefile.inc
===================================================================
--- Makefile.inc	(revision 6549)
+++ Makefile.inc	(working copy)
@@ -66,7 +66,7 @@ 
 	$(CC) -x assembler-with-cpp -E -MMD -MT $$(@) -D__ACPI__ -P -include $(abspath $(obj)/config.h) -I$(src) -I$(src)/mainboard/$(MAINBOARDDIR) $$< -o $$(basename $$@).asl
 	iasl -p $$(obj)/$(1) -tc $$(basename $$@).asl
 	mv $$(obj)/$(1).hex $$(basename $$@).c
-	$(CC) $$(CFLAGS) $$(if $$(subst dsdt,,$$(basename $$(notdir $(1)))), -DAmlCode=AmlCode_$$(basename $$(notdir $(1)))) -c -o $$@ $$(basename $$@).c
+	$(CC) $$(CFLAGS) $$(LTO_OPTIMIZE) $$(if $$(subst dsdt,,$$(basename $$(notdir $(1)))), -DAmlCode=AmlCode_$$(basename $$(notdir $(1)))) -c -o $$@ $$(basename $$@).c
 	# keep %.o: %.c rule from catching the temporary .c file after a make clean
 	mv $$(basename $$@).c $$(basename $$@).hex
 endef
@@ -101,8 +101,17 @@ 
 INCLUDES += -Isrc/devices/oprom/include
 # abspath is a workaround for romcc
 INCLUDES += -include $(abspath $(obj)/config.h)
+
+# when '-flto' is used, optimization flags must be passed to both compile and link steps
+# pass $(LTO_OPTIMIZE) to compile and link steps to support the LTO_OPTIMIZE option
+# use $(OPTIMIZE) to compile files not compatible with link time optimization
+OPTIMIZE :=-Os -fomit-frame-pointer $(CONFIG_EXTRA_OPTIMIZE)
+LTO_OPTIMIZE :=$(OPTIMIZE)
+ifeq ($(CONFIG_LTO_OPTIMIZE),y)
+LTO_OPTIMIZE :=$(LTO_OPTIMIZE) -flto
+endif
 
-CFLAGS = $(INCLUDES) -Os -pipe -g
+CFLAGS = $(INCLUDES) -pipe -g
 CFLAGS += -nostdlib -Wall -Wundef -Wstrict-prototypes -Wmissing-prototypes
 CFLAGS += -Wwrite-strings -Wredundant-decls -Wno-trigraphs
 CFLAGS += -Wstrict-aliasing -Wshadow
@@ -112,7 +121,7 @@ 
 ifneq ($(CONFIG_AMD_AGESA),y)
 CFLAGS += -nostdinc 
 endif
-CFLAGS += -fno-common -ffreestanding -fno-builtin -fomit-frame-pointer
+CFLAGS += -fno-common -ffreestanding -fno-builtin
 
 additional-dirs := $(objutil)/cbfstool $(objutil)/romcc $(objutil)/options
 
@@ -180,7 +189,7 @@ 
 
 $(obj)/%.ramstage.o: $(obj)/%.c $(obj)/config.h $(OPTION_TABLE_H)
 	@printf "    CC         $(subst $(obj)/,,$(@))\n"
-	$(CC) -MMD $(CFLAGS) -c -o $@ $<
+	$(CC) -MMD $(CFLAGS) $(LTO_OPTIMIZE) -c -o $@ $<
 
 #######################################################################
 # Clean up rules
Index: src/arch/x86/init/bootblock.ld
===================================================================
--- src/arch/x86/init/bootblock.ld	(revision 6549)
+++ src/arch/x86/init/bootblock.ld	(working copy)
@@ -22,7 +22,6 @@ 
 OUTPUT_FORMAT("elf32-i386", "elf32-i386", "elf32-i386")
 OUTPUT_ARCH(i386)
 
-TARGET(binary)
 SECTIONS
 {
 	. = CONFIG_ROMBASE;
Index: src/arch/x86/Makefile.bootblock.inc
===================================================================
--- src/arch/x86/Makefile.bootblock.inc	(revision 6549)
+++ src/arch/x86/Makefile.bootblock.inc	(working copy)
@@ -76,13 +76,13 @@ 
 $(obj)/coreboot.romstage: $(obj)/coreboot.pre1 $$(romstage-objs) $(obj)/romstage/ldscript.ld
 	@printf "    LINK       $(subst $(obj)/,,$(@))\n"
 	printf "CONFIG_ROMBASE = 0x0;\nAUTO_XIP_ROM_BASE = 0x0;\n" > $(obj)/location.ld
-	$(CC) -nostdlib -nostartfiles -static -o $(obj)/romstage.elf -L$(obj) -T $(obj)/romstage/ldscript.ld $(romstage-objs)
-	$(OBJCOPY) -O binary $(obj)/romstage.elf $(obj)/romstage.bin
+	$(CC) $(LTO_OPTIMIZE) -nostdlib -nostartfiles -static -o $(obj)/romstage.elf -L$(obj) -T $(obj)/romstage/ldscript.ld $(romstage-objs)
+	$(OBJCOPY) -O binary $(obj)/romstage.elf $(obj)/romstage.bin
 	printf "CONFIG_ROMBASE = 0x" > $(obj)/location.ld
 	$(CBFSTOOL) $(obj)/coreboot.pre1 locate $(obj)/romstage.bin $(CONFIG_CBFS_PREFIX)/romstage $(CONFIG_XIP_ROM_SIZE) > $(obj)/location.txt
 	cat $(obj)/location.txt >> $(obj)/location.ld
 	printf ';\nAUTO_XIP_ROM_BASE = CONFIG_ROMBASE & ~(CONFIG_XIP_ROM_SIZE - 1);\n' >> $(obj)/location.ld
-	$(CC) -nostdlib -nostartfiles -static -o $(obj)/romstage.elf -L$(obj) -T $(obj)/romstage/ldscript.ld $(romstage-objs)
+	$(CC) $(LTO_OPTIMIZE) -nostdlib -nostartfiles -static -o $(obj)/romstage.elf -L$(obj) -T $(obj)/romstage/ldscript.ld $(romstage-objs)
 	$(NM) -n $(obj)/romstage.elf | sort > $(obj)/romstage.map
 	$(OBJCOPY) --only-keep-debug $(obj)/romstage.elf $(obj)/romstage.debug
 	$(OBJCOPY) --strip-debug $(obj)/romstage.elf
Index: src/arch/x86/Makefile.inc
===================================================================
--- src/arch/x86/Makefile.inc	(revision 6549)
+++ src/arch/x86/Makefile.inc	(working copy)
@@ -136,7 +136,7 @@ 
 
 $(obj)/coreboot_ram: $(obj)/coreboot_ram.o $(src)/arch/x86/coreboot_ram.ld #ldoptions
 	@printf "    CC         $(subst $(obj)/,,$(@))\n"
-	$(CC) -nostdlib -nostartfiles -static -o $@ -L$(obj) -T $(src)/arch/x86/coreboot_ram.ld $(obj)/coreboot_ram.o
+	$(CC) $(LTO_OPTIMIZE) -nostdlib -nostartfiles -static -o $@ -L$(obj) -T $(src)/arch/x86/coreboot_ram.ld $(obj)/coreboot_ram.o $(obj)/coreboot.a
 	$(NM) -n $(obj)/coreboot_ram | sort > $(obj)/coreboot_ram.map
 	$(OBJCOPY) --only-keep-debug $@ $(obj)/coreboot_ram.debug
 	$(OBJCOPY) --strip-debug $@
@@ -232,11 +232,11 @@ 
 
 $(obj)/mainboard/$(MAINBOARDDIR)/ap_romstage.o: $(src)/mainboard/$(MAINBOARDDIR)/ap_romstage.c $(OPTION_TABLE_H)
 	@printf "    CC         $(subst $(obj)/,,$(@))\n"
-	$(CC) -MMD $(CFLAGS) -I$(src) -D__PRE_RAM__ -I. -I$(obj) -c $< -o $@
+	$(CC) -MMD $(CFLAGS) $(OPTIMIZE) -I$(src) -D__PRE_RAM__ -I. -I$(obj) -c $< -o $@
 
 $(obj)/mainboard/$(MAINBOARDDIR)/romstage.pre.inc: $(src)/mainboard/$(MAINBOARDDIR)/romstage.c $(OPTION_TABLE_H) $(obj)/build.h $(obj)/config.h
 	@printf "    CC         romstage.inc\n"
-	$(CC) -MMD $(CFLAGS) -D__PRE_RAM__ -I$(src) -I. -I$(obj) -c -S $< -o $@
+	$(CC) -MMD $(CFLAGS) $(OPTIMIZE) -D__PRE_RAM__ -I$(src) -I. -I$(obj) -c -S $< -o $@
 
 $(obj)/mainboard/$(MAINBOARDDIR)/romstage.inc: $(obj)/mainboard/$(MAINBOARDDIR)/romstage.pre.inc
 	@printf "    POST       romstage.inc\n"
Index: src/Kconfig
===================================================================
--- src/Kconfig	(revision 6549)
+++ src/Kconfig	(working copy)
@@ -59,6 +59,14 @@ 
 	bool "LLVM/clang"
 endchoice
 
+config LTO_OPTIMIZE
+	bool "Use gcc -flto link time optimization"
+	default n
+	depends on COMPILER_GCC
+	help
+	  Use with gcc 4.6.0 or later to reduce code size by
+	  removing unused functions and data.
+
 config SCANBUILD_ENABLE
 	bool "Build with scan-build for static analysis"
 	default n