Patchwork Relocable payloads

login
register
about
Submitter Rudolf Marek
Date 2010-02-25 22:36:43
Message ID <4B86FB7B.7050103@assembler.cz>
Download mbox | patch
Permalink /patch/977/
State New
Headers show

Comments

Rudolf Marek - 2010-02-25 22:36:43
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Hi again,

In case someone wants to look into this. The attached patch tries to do
relocable coreboot_ram. It does not work. It looks like dynamic linker does not
fix call to hardware main in the c_start.o - reason is unknown.

Rudolf
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkuG+3oACgkQ3J9wPJqZRNVnJACgh9PhlmbZvuCNShN2dyqbqyjI
WPkAoN59zMS/um9FkpQ5JpVJhWdRBqV3
=QwIL
-----END PGP SIGNATURE-----
Myles Watson - 2010-02-25 22:44:12
> In case someone wants to look into this. The attached patch tries to do
> relocable coreboot_ram. It does not work. It looks like dynamic linker does
> not
> fix call to hardware main in the c_start.o - reason is unknown.
>
Relocating coreboot_ram seems like a great idea.  It seems like there was a
lot of discussion on the mailing list with v3 about PIC and why it couldn't
work for us.  My memory about it is fuzzy now, but a little searching might
turn something up.

Thanks,
Myles
Rudolf Marek - 2010-02-25 22:51:39
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Well It could work. I forgot to attach the actual linker ;) again taken from
memtest86 This does work for libpayload payloads (tint), but here it does not.

Rudolf
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkuG/voACgkQ3J9wPJqZRNVKSACeJiBkHO386NHK7IX8MofPyPeE
uNUAn1ITH70SuDIuhgy8TDzV73s2Abir
=Gtcx
-----END PGP SIGNATURE-----
Stefan Reinauer - 2010-02-26 01:23:15
On 2/25/10 11:44 PM, Myles Watson wrote:
>
>     In case someone wants to look into this. The attached patch tries
>     to do
>     relocable coreboot_ram. It does not work. It looks like dynamic
>     linker does not
>     fix call to hardware main in the c_start.o - reason is unknown.
>
> Relocating coreboot_ram seems like a great idea.  It seems like there
> was a lot of discussion on the mailing list with v3 about PIC and why
> it couldn't work for us.  My memory about it is fuzzy now, but a
> little searching might turn something up.

The idea sounds incredibly sweet.

But lets make sure we gain from it in the end...
Relocating coreboot_ram would safe us two 1MB sized memcpy on the resume
path, so we would safe at least 200 microseconds of boot time in the
case we're resuming. (assuming memory is 6.4G/s, DDR2-800 aka PC2-6400) 
.... 0.2milliseconds of 400+... worth the complexity?

What other benefits are there?


Stefan
Stefan Reinauer - 2010-02-26 01:26:03
On 2/26/10 2:23 AM, Stefan Reinauer wrote:
> On 2/25/10 11:44 PM, Myles Watson wrote:
>>
>>     In case someone wants to look into this. The attached patch tries
>>     to do
>>     relocable coreboot_ram. It does not work. It looks like dynamic
>>     linker does not
>>     fix call to hardware main in the c_start.o - reason is unknown.
>>
>> Relocating coreboot_ram seems like a great idea.  It seems like there
>> was a lot of discussion on the mailing list with v3 about PIC and why
>> it couldn't work for us.  My memory about it is fuzzy now, but a
>> little searching might turn something up.
>
> The idea sounds incredibly sweet.
>
> But lets make sure we gain from it in the end...
> Relocating coreboot_ram would safe us two 1MB sized memcpy on the
> resume path, so we would safe at least 200 microseconds of boot time
> in the case we're resuming. (assuming memory is 6.4G/s, DDR2-800 aka
> PC2-6400)  .... 0.2milliseconds of 400+... worth the complexity?
minus the time added needed by the linker for the linking..

How does linking go with lzma?
- do the relocations require more RAM? How much?
- can the sections and relocations be lzma'ed together? or are they
separate files in CBFS?
Graeme Russ - 2010-02-26 01:57:54
On Fri, Feb 26, 2010 at 12:26 PM, Stefan Reinauer <stepan@coresystems.de> wrote:
> On 2/26/10 2:23 AM, Stefan Reinauer wrote:
>
> On 2/25/10 11:44 PM, Myles Watson wrote:
>>
>> In case someone wants to look into this. The attached patch tries to do
>> relocable coreboot_ram. It does not work. It looks like dynamic linker
>> does not
>> fix call to hardware main in the c_start.o - reason is unknown.
>
> Relocating coreboot_ram seems like a great idea.  It seems like there was a
> lot of discussion on the mailing list with v3 about PIC and why it couldn't
> work for us.  My memory about it is fuzzy now, but a little searching might
> turn something up.

I have recently put a lot of effort into getting the x86 port of U-Boot to
fully relocatable. Have a look at the git tree of U-Boot. This is the one
that does all the work:

http://git.denx.de/cgi-bin/gitweb.cgi?p=u-boot.git;a=commitdiff;h=1c409bc7101a24ecd47a13a4e851845d66dc23ce

>
> The idea sounds incredibly sweet.
>
> But lets make sure we gain from it in the end...
> Relocating coreboot_ram would safe us two 1MB sized memcpy on the resume
> path, so we would safe at least 200 microseconds of boot time in the case
> we're resuming. (assuming memory is 6.4G/s, DDR2-800 aka PC2-6400)  ....
> 0.2milliseconds of 400+... worth the complexity?
>
> minus the time added needed by the linker for the linking..
>
> How does linking go with lzma?
> - do the relocations require more RAM? How much?

Yes, but only a little. The binary size is larger as it needs the relocation
information table, but this does need need to be loaded into RAM.

> - can the sections and relocations be lzma'ed together? or are they separate
> files in CBFS?
>

Regards,

Graeme
Rudolf Marek - 2010-02-28 09:33:36
>> But lets make sure we gain from it in the end...
>> Relocating coreboot_ram would safe us two 1MB sized memcpy on the 
>> resume path, so we would safe at least 200 microseconds of boot time 
>> in the case we're resuming. (assuming memory is 6.4G/s, DDR2-800 aka 
>> PC2-6400)  .... 0.2milliseconds of 400+... worth the complexity?
> minus the time added needed by the linker for the linking..

Is this WB?

> How does linking go with lzma?

The linker is called in c_start.o and it is reloc.c "stolen" from memtest86. It 
is a part of the resulting image.

> - do the relocations require more RAM? How much?

They do, i think it is just few kilobytes

> - can the sections and relocations be lzma'ed together? or are they 
> separate files in CBFS?

so far I packed resulting image to one text section, just to be able to load it 
with the stage loader.

Oh and it does work, I made a mistake not to link it to addr 0, but the dynamic 
loader is doing its own relocation so it got twice big addresses.

Now if only I could fix AMD CAR. I think Stephan is right that dynamic loader 
might add more complexity. It took me less time to get it working, then fixing 
the AMD K8 CAR to handle the resume copy properly :)

Anyway, I proved that we can have it, nice result too (at least for me ;)

I will try to fix the AMD CAR somehow now to make resume/suspend work again.

Rudolf

Patch

Index: src/boot/selfboot.c
===================================================================
--- src/boot/selfboot.c	(revision 5134)
+++ src/boot/selfboot.c	(working copy)
@@ -327,6 +327,7 @@ 
 	return ret;
 }
 
+#define RELO 0x0000
 
 static int build_self_segment_list(
 	struct segment *head,
@@ -355,7 +356,7 @@ 
 					segment->type == PAYLOAD_SEGMENT_CODE ?  "code" : "data",
 					ntohl(segment->compression));
 			new = malloc(sizeof(*new));
-			new->s_dstaddr = ntohl((u32) segment->load_addr);
+			new->s_dstaddr = ntohl((u32) segment->load_addr) + RELO;
 			new->s_memsz = ntohl(segment->mem_len);
 			new->compression = ntohl(segment->compression);
 
@@ -376,13 +377,13 @@ 
 				 ntohl(segment->mem_len));
 			new = malloc(sizeof(*new));
 			new->s_filesz = 0;
-			new->s_dstaddr = ntohl((u32) segment->load_addr);
+			new->s_dstaddr = ntohl((u32) segment->load_addr) + RELO;
 			new->s_memsz = ntohl(segment->mem_len);
 			break;
 
 		case PAYLOAD_SEGMENT_ENTRY:
 			printk_debug("  Entry Point 0x%p\n", (void *) ntohl((u32) segment->load_addr));
-			*entry =  ntohl((u32) segment->load_addr);
+			*entry =  ntohl((u32) segment->load_addr) + RELO;
 			/* Per definition, a payload always has the entry point
 			 * as last segment. Thus, we use the occurence of the
 			 * entry point as break condition for the loop.
Index: src/cpu/amd/car/disable_cache_as_ram.c
Index: src/arch/i386/include/arch/cpu.h
===================================================================
--- src/arch/i386/include/arch/cpu.h	(revision 5134)
+++ src/arch/i386/include/arch/cpu.h	(working copy)
@@ -35,9 +35,9 @@ 
 {
 	struct cpuid_result result;
 	asm volatile(
-		"cpuid"
+		" pushl %%ebx ; cpuid ; movl %%ebx, %%esi ; pop %%ebx"
 		: "=a" (result.eax),
-		  "=b" (result.ebx),
+		  "=S" (result.ebx),
 		  "=c" (result.ecx),
 		  "=d" (result.edx)
 		: "0" (op));
@@ -52,18 +52,18 @@ 
 {
 	unsigned int eax;
 
-	__asm__("cpuid"
+	__asm__(" pushl %%ebx ; cpuid ; movl %%ebx, %%esi ; pop %%ebx"
 		: "=a" (eax)
 		: "0" (op)
-		: "ebx", "ecx", "edx");
+		: "ecx", "edx", "esi");
 	return eax;
 }
 static inline unsigned int cpuid_ebx(unsigned int op)
 {
 	unsigned int eax, ebx;
 
-	__asm__("cpuid"
-		: "=a" (eax), "=b" (ebx)
+	__asm__(" pushl %%ebx ; cpuid ; movl %%ebx, %%esi ; pop %%ebx"
+		: "=a" (eax), "=S" (ebx)
 		: "0" (op)
 		: "ecx", "edx" );
 	return ebx;
@@ -72,20 +72,20 @@ 
 {
 	unsigned int eax, ecx;
 
-	__asm__("cpuid"
+	__asm__(" pushl %%ebx ; cpuid ;  pop %%ebx"
 		: "=a" (eax), "=c" (ecx)
 		: "0" (op)
-		: "ebx", "edx" );
+		: "edx" );
 	return ecx;
 }
 static inline unsigned int cpuid_edx(unsigned int op)
 {
 	unsigned int eax, edx;
 
-	__asm__("cpuid"
+	__asm__(" pushl %%ebx ; cpuid ; pop %%ebx"
 		: "=a" (eax), "=d" (edx)
 		: "0" (op)
-		: "ebx", "ecx");
+		: "ecx");
 	return edx;
 }
 
Index: src/arch/i386/Makefile.inc
===================================================================
--- src/arch/i386/Makefile.inc	(revision 5134)
+++ src/arch/i386/Makefile.inc	(working copy)
@@ -47,12 +47,12 @@ 
 
 $(obj)/coreboot_ram: $(obj)/coreboot_ram.o $(src)/arch/i386/coreboot_ram.ld #ldoptions
 	@printf "    CC         $(subst $(obj)/,,$(@))\n"
-	$(CC) -nostdlib -nostartfiles -static -o $@ -L$(obj) -T $(src)/arch/i386/coreboot_ram.ld $(obj)/coreboot_ram.o
+	$(CC)  -nostdlib -nostartfiles -shared -o $@ -L$(obj) -T $(src)/arch/i386/coreboot_ram.ld $(obj)/coreboot_ram.o
 	$(NM) -n $(obj)/coreboot_ram | sort > $(obj)/coreboot_ram.map
 
 $(obj)/coreboot_ram.o: $(obj)/arch/i386/lib/c_start.o $(drivers) $(obj)/coreboot.a $(LIBGCC_FILE_NAME)
 	@printf "    CC         $(subst $(obj)/,,$(@))\n"
-	$(CC) -nostdlib -r -o $@ $(obj)/arch/i386/lib/c_start.o $(drivers) -Wl,-\( $(obj)/coreboot.a $(LIBGCC_FILE_NAME) -Wl,-\)
+	$(CC) -nostdlib -fPIC -r -o $@ $(obj)/arch/i386/lib/c_start.o $(drivers) -Wl,-\( $(obj)/coreboot.a $(LIBGCC_FILE_NAME) -Wl,-\)
 
 $(obj)/coreboot.a: $(objs)
 	@printf "    AR         $(subst $(obj)/,,$(@))\n"
Index: src/arch/i386/lib/Makefile.inc
===================================================================
--- src/arch/i386/lib/Makefile.inc	(revision 5134)
+++ src/arch/i386/lib/Makefile.inc	(working copy)
@@ -1,5 +1,6 @@ 
+obj-y += cpu.o
+obj-y += reloc.o
 obj-y += c_start.o
-obj-y += cpu.o
 obj-y += pci_ops_conf1.o
 obj-y += pci_ops_conf2.o
 obj-y += pci_ops_mmconf.o
Index: src/arch/i386/lib/c_start.S
===================================================================
--- src/arch/i386/lib/c_start.S	(revision 5134)
+++ src/arch/i386/lib/c_start.S	(working copy)
@@ -1,13 +1,46 @@ 
 #include <arch/asm.h>
 #include <arch/intel.h>
 
+	/* Reload all of the segment registers 
+	leal	gdt@GOTOFF(%ebx), %eax
+	movl	%eax, 2 + gdt_descr@GOTOFF(%ebx)
+	lgdt	gdt_descr@GOTOFF(%ebx)
+	leal	flush@GOTOFF(%ebx), %eax
+	pushl	$KERNEL_CS
+	pushl	%eax
+	lret
+flush:	movl	$KERNEL_DS, %eax
+	movw	%ax, %ds
+*/
+
 	.section ".text"
 	.code32
 	.globl _start
 _start:
+
+	/* Load the GOT pointer */
+	call	0f
+0:	popl	%ebx
+	addl	$_GLOBAL_OFFSET_TABLE_+[.-0b], %ebx
+
 	cli
-	lgdt	%cs:gdtaddr
+/*
+	lgdt	%cs:gdtaddr@GOTOFF(%ebx)
+*/
+	leal	gdt@GOTOFF(%ebx), %eax
+	movl	%eax, 2 + gdtaddr@GOTOFF(%ebx)
+
+	lgdt	gdtaddr@GOTOFF(%ebx)
+
+	leal	flush@GOTOFF(%ebx), %eax
+	pushl	$0x10
+	pushl	%eax
+	lret
+flush:
+
+/*
 	ljmp	$0x10, $1f
+*/
 1:	movl	$0x18, %eax
 	movl	%eax, %ds
 	movl	%eax, %es
@@ -19,8 +52,8 @@ 
 
 	/** clear stack */
 	cld
-	leal	_stack, %edi
-	movl	$_estack, %ecx
+	leal	_stack@GOTOFF(%ebx), %edi
+	leal _estack@GOTOFF(%ebx), %ecx
 	subl	%edi, %ecx
 	shrl	$2, %ecx   /* it is 32 bit align, right? */
 	xorl	%eax, %eax
@@ -28,8 +61,8 @@ 
 	stosl
 
 	/** clear bss */
-	leal	_bss, %edi
-	movl	$_ebss, %ecx
+	leal	_bss@GOTOFF(%ebx), %edi
+	leal	_ebss@GOTOFF(%ebx), %ecx
 	subl	%edi, %ecx
 	jz	.Lnobss
 	shrl	$2, %ecx  /* it is 32 bit align, right? */
@@ -39,7 +72,7 @@ 
 .Lnobss:
 
 	/* set new stack */
-	movl	$_estack, %esp
+	leal	_estack@GOTOFF(%ebx), %esp
 
 	/* Push the cpu index and struct cpu */
 	pushl	$0
@@ -49,25 +82,31 @@ 
 	pushl	%ebp
 
 	/* Save the stack location */
-	movl	%esp, %ebp
 
+/*
+	movl	%esp, %ecx
+*/
+
 	/* Initialize the Interrupt Descriptor table */
-	leal	_idt, %edi
-	leal	vec0, %ebx
+	leal	_idt@GOTOFF(%ebx), %edi
+	leal	vec0@GOTOFF(%ebx), %ebp
+	leal	_idt_end@GOTOFF(%ebx), %ecx
+	
+
 	movl	$(0x10 << 16), %eax	/* cs selector */
 
-1:	movw	%bx, %ax
-	movl	%ebx, %edx
+1:	movw	%bp, %ax
+	movl	%ebp, %edx
 	movw	$0x8E00, %dx		/* Interrupt gate - dpl=0, present */
 	movl	%eax, 0(%edi)
 	movl	%edx, 4(%edi)
-	addl	$6, %ebx
+	addl	$6, %ebp
 	addl	$8, %edi
-	cmpl	$_idt_end, %edi
+	cmpl	%ecx, %edi
 	jne	1b
 
 	/* Load the Interrupt descriptor table */
-	lidt	idtarg
+	lidt	idtarg@GOTOFF(%ebx)
 
 	/*
 	 *	Now we are finished. Memory is up, data is copied and
@@ -77,7 +116,18 @@ 
 	intel_chip_post_macro(0xfe)	/* post fe */
 
 	/* Restore the stack location */
-	movl	%ebp, %esp
+/*
+	movl	%ecx, %esp
+*/
+
+/*
+	movl %eax,loader_eax@GOTOFF(%ebx)
+ovl %ebx,loader_ebx@GOTOFF(%ebx)
+*/
+
+	leal	_dl_start@GOTOFF(%ebx), %eax
+	call	*%eax
+
 	
 	/* The boot_complete flag has already been pushed */
 	call	hardwaremain
@@ -245,9 +295,13 @@ 
 	.globl gdt, gdt_end, gdt_limit, idtarg
 
 gdt_limit = gdt_end - gdt - 1	/* compute the table limit */
+
+
 gdtaddr:
 	.word	gdt_limit
-	.long	gdt		/* we know the offset */
+	.long	0
+/*
+gdt*/		/* we know the offset */
 
 	 .data
 
Index: src/arch/i386/coreboot_ram.ld
===================================================================
--- src/arch/i386/coreboot_ram.ld	(revision 5134)
+++ src/arch/i386/coreboot_ram.ld	(working copy)
@@ -39,6 +39,19 @@ 
 		. = ALIGN(16);
 		_etext = .;
 	}
+
+	.dynsym     : { *(.dynsym) }
+	.dynstr     : { *(.dynstr) }
+	.hash       : { *(.hash) }
+	.gnu.hash   : { *(.gnu.hash) }
+	.dynamic    : { *(.dynamic) }
+
+	.rel.text    : { *(.rel.text   .rel.text.*) }
+	.rel.rodata  : { *(.rel.rodata .rel.rodata.*) }
+	.rel.data    : { *(.rel.data   .rel.data.*) }
+	.rel.got     : { *(.rel.got    .rel.got.*) }
+	.rel.plt     : { *(.rel.plt    .rel.plt.*) }
+
 	.rodata : {
 		_rodata = .;
 		. = ALIGN(4);
@@ -75,7 +88,14 @@ 
 		*(.data)
 		_edata = .;
 	}
+	. = ALIGN(4);
 
+	.got : {
+		*(.got.plt)
+		*(.got)
+		_edata = . ;
+	}
+
 	.sdata : {
 		_SDA_BASE_ = .;
 		*(.sdata)
Index: Makefile
===================================================================
--- Makefile	(revision 5134)
+++ Makefile	(working copy)
@@ -144,43 +144,43 @@ 
 	$(CPP) -D__ACPI__ -P $(CPPFLAGS) -include $(obj)/config.h -I$(src) -I$(src)/mainboard/$(MAINBOARDDIR) $$< -o $$(basename $$@).asl
 	iasl -p $$(basename $$@) -tc $$(basename $$@).asl
 	mv $$(basename $$@).hex $$(basename $$@).c
-	$(CC) -m32 $$(CFLAGS) $$(if $$(subst dsdt,,$$(basename $$(notdir $$@))), -DAmlCode=AmlCode_$$(basename $$(notdir $$@))) -c -o $$@ $$(basename $$@).c
+	$(CC) -m32 -fPIC $$(CFLAGS) $$(if $$(subst dsdt,,$$(basename $$(notdir $$@))), -DAmlCode=AmlCode_$$(basename $$(notdir $$@))) -c -o $$@ $$(basename $$@).c
 endef
 
 define objs_c_template
 $(obj)/$(1)%.o: src/$(1)%.c $(obj)/config.h
 	@printf "    CC         $$(subst $$(obj)/,,$$(@))\n"
-	$(CC) -m32 $$(CFLAGS) -c -o $$@ $$<
+	$(CC) -m32 -fPIC $$(CFLAGS) -c -o $$@ $$<
 endef
 
 define objs_S_template
 $(obj)/$(1)%.o: src/$(1)%.S $(obj)/config.h
 	@printf "    CC         $$(subst $$(obj)/,,$$(@))\n"
-	$(CC) -m32 -DASSEMBLY $$(CFLAGS) -c -o $$@ $$<
+	$(CC) -m32  -fPIC -DASSEMBLY $$(CFLAGS) -c -o $$@ $$<
 endef
 
 define initobjs_c_template
 $(obj)/$(1)%.o: src/$(1)%.c $(obj)/config.h
 	@printf "    CC         $$(subst $$(obj)/,,$$(@))\n"
-	$(CC) -m32 $$(CFLAGS) -c -o $$@ $$<
+	$(CC) -m32 -fPIC $$(CFLAGS) -c -o $$@ $$<
 endef
 
 define initobjs_S_template
 $(obj)/$(1)%.o: src/$(1)%.S $(obj)/config.h
 	@printf "    CC         $$(subst $$(obj)/,,$$(@))\n"
-	$(CC) -m32 -DASSEMBLY $$(CFLAGS) -c -o $$@ $$<
+	$(CC) -m32 -fPIC -DASSEMBLY $$(CFLAGS) -c -o $$@ $$<
 endef
 
 define drivers_c_template
 $(obj)/$(1)%.o: src/$(1)%.c $(obj)/config.h
 	@printf "    CC         $$(subst $$(obj)/,,$$(@))\n"
-	$(CC) -m32 $$(CFLAGS) -c -o $$@ $$<
+	$(CC) -m32  -fPIC $$(CFLAGS) -c -o $$@ $$<
 endef
 
 define drivers_S_template
 $(obj)/$(1)%.o: src/$(1)%.S
 	@printf "    CC         $$(subst $$(obj)/,,$$(@))\n"
-	$(CC) -m32 -DASSEMBLY $$(CFLAGS) -c -o $$@ $$<
+	$(CC) -m32 -fPIC -DASSEMBLY $$(CFLAGS) -c -o $$@ $$<
 endef
 
 define smmobjs_c_template