Patchwork Promote heap sizing to first-class Kconfig citizenship.

login
register
about
Submitter Joe Korty
Date 2010-05-14 19:11:35
Message ID <20100514191135.GA3418@tsunami.ccur.com>
Download mbox | patch
Permalink /patch/1333/
State Rejected
Headers show

Comments

Joe Korty - 2010-05-14 19:11:35
Promote heap sizing to first-class Kconfig citizenship.

Changing the heap size is something that those, like me,
with large PCI device trees need to do.  Therefore heap
size should appear as a normal, user-answerable question
within the Kconfig build system.

Also change the malloc debug message to more clearly
indicate how much memory is left.

Signed-off-by: Joe Korty <joe.korty@ccur.com>
Myles Watson - 2010-05-14 19:19:45
On Fri, May 14, 2010 at 1:11 PM, Joe Korty <joe.korty@ccur.com> wrote:
> Promote heap sizing to first-class Kconfig citizenship.
>
> Changing the heap size is something that those, like me,
> with large PCI device trees need to do.  Therefore heap
> size should appear as a normal, user-answerable question
> within the Kconfig build system.

I think the difference here is that you're a developer (not a user)
once you start touching the code.  Users shouldn't have to worry about
the heap size.  It should be set larger in your mainboard Kconfig if
the mainboard needs more heap space.

Thanks,
Myles
Stefan Reinauer - 2010-05-14 19:26:29
On 5/14/10 9:19 PM, Myles Watson wrote:
> On Fri, May 14, 2010 at 1:11 PM, Joe Korty <joe.korty@ccur.com> wrote:
>   
>> Promote heap sizing to first-class Kconfig citizenship.
>>
>> Changing the heap size is something that those, like me,
>> with large PCI device trees need to do.  Therefore heap
>> size should appear as a normal, user-answerable question
>> within the Kconfig build system.
>>     
> I think the difference here is that you're a developer (not a user)
> once you start touching the code.  Users shouldn't have to worry about
> the heap size.  It should be set larger in your mainboard Kconfig if
> the mainboard needs more heap space.
>   
I agree with Myles here. If the heap size is not good enough, coreboot
is broken and needs to be fixed.

Stefan
Joe Korty - 2010-05-14 19:49:28
On Fri, May 14, 2010 at 03:19:45PM -0400, Myles Watson wrote:
> On Fri, May 14, 2010 at 1:11 PM, Joe Korty <joe.korty@ccur.com> wrote:
> > Promote heap sizing to first-class Kconfig citizenship.
> >
> > Changing the heap size is something that those, like me,
> > with large PCI device trees need to do. ?Therefore heap
> > size should appear as a normal, user-answerable question
> > within the Kconfig build system.
> 
> I think the difference here is that you're a developer (not a user)
> once you start touching the code.  Users shouldn't have to worry about
> the heap size.  It should be set larger in your mainboard Kconfig if
> the mainboard needs more heap space.


Some background:
The reason I'm looking at coreboot is that standard BIOSes
(apparently) run out of memory while doing the bus walk,
when I plug a PCI-e expansion chassis into the motherboard
and populate it.  The BIOS will either lock up or the OS
will boot but what the OS sees for a PCI Bus (via lspci
-tv) is clearly corrupt.

So my job was/is to do an experiment to see if our problems
are indeed due to out-of-memory issues in standard BIOSes, and
if so, if coreboot could be a useful way around this issue.

And indeed, the first time I booted coreboot with a
populated PCI-e chassis attached, I got an out-of-memory
halt from coreboot.  Increasing CONFIG_HEAP_SIZE to
0x10000 (ie, 4x) got the system to boot, and lspci -tv
looks good also.  I have yet to try intermediate values.

Unfortunately we have an even bigger PCI-e loaded expansion
chassis (configuration #2), for which coreboot also hangs.
It's not an out-of-memory hang; it happens (apparently)
during the bus walk.  I haven't looked into this hang in
detail yet, so I don't have much to report.  But I do fear
it may be something more fundamental.

Regards,
Joe
Myles Watson - 2010-05-14 19:56:00
On Fri, May 14, 2010 at 1:49 PM, Joe Korty <joe.korty@ccur.com> wrote:
> On Fri, May 14, 2010 at 03:19:45PM -0400, Myles Watson wrote:
>> On Fri, May 14, 2010 at 1:11 PM, Joe Korty <joe.korty@ccur.com> wrote:
>> > Promote heap sizing to first-class Kconfig citizenship.
>> >
>> > Changing the heap size is something that those, like me,
>> > with large PCI device trees need to do. ?Therefore heap
>> > size should appear as a normal, user-answerable question
>> > within the Kconfig build system.
>>
>> I think the difference here is that you're a developer (not a user)
>> once you start touching the code.  Users shouldn't have to worry about
>> the heap size.  It should be set larger in your mainboard Kconfig if
>> the mainboard needs more heap space.
>
>
> Some background:
> The reason I'm looking at coreboot is that standard BIOSes
> (apparently) run out of memory while doing the bus walk,
> when I plug a PCI-e expansion chassis into the motherboard
> and populate it.  The BIOS will either lock up or the OS
> will boot but what the OS sees for a PCI Bus (via lspci
> -tv) is clearly corrupt.
I wonder if that could be partly due to the ACPI implementation too.

> So my job was/is to do an experiment to see if our problems
> are indeed due to out-of-memory issues in standard BIOSes, and
> if so, if coreboot could be a useful way around this issue.
>
> And indeed, the first time I booted coreboot with a
> populated PCI-e chassis attached, I got an out-of-memory
> halt from coreboot.  Increasing CONFIG_HEAP_SIZE to
> 0x10000 (ie, 4x) got the system to boot, and lspci -tv
> looks good also.  I have yet to try intermediate values.
It seems like you have a pretty specific special case.  Maybe we
should create a CONFIG_EXTRA_HEAP that depends on CONFIG_EXPERT that
lets you add heap.

> Unfortunately we have an even bigger PCI-e loaded expansion
> chassis (configuration #2), for which coreboot also hangs.
> It's not an out-of-memory hang; it happens (apparently)
> during the bus walk.  I haven't looked into this hang in
> detail yet, so I don't have much to report.  But I do fear
> it may be something more fundamental.

Sounds like fun.

Thanks,
Myles
Joe Korty - 2010-05-14 20:02:25
On Fri, May 14, 2010 at 03:56:00PM -0400, Myles Watson wrote:
> It seems like you have a pretty specific special case.

:)  From my point of view, large systems are the standard case
and normal desktops are the oddballs.....

Regards,
Joe

> Sounds like fun.

It's been educational and mind-stretching and worth it just
for that.
ron minnich - 2010-05-14 21:38:04
Joe, we have visited this type of issue from time to time. The heap
size, if it is related to a mainboard (and it is) belongs in the
mainboard Kconfig and should not be user-visible. The reason is that
if it is visible then that visibility implies that it can be safely
changed, much as the baud rate can be safely changed. That is clearly
wrong: many values of heap size will result in a locked up platform.

Thus, heap size can be set in mainboard kconfig, but should not be
user visible.

As for your pci problem, I suspect it's not out of memory for the
original bios but a bug in the bios or the hardware itself. We've had
lots of chipset/pci card combinations over the years that confused the
bioses, badly. It just happens.

thanks

ron
Joe Korty - 2010-05-15 02:18:54
On Fri, May 14, 2010 at 05:38:04PM -0400, ron minnich wrote:
> Joe, we have visited this type of issue from time to time. The heap
> size, if it is related to a mainboard (and it is) belongs in the
> mainboard Kconfig and should not be user-visible. The reason is that
> if it is visible then that visibility implies that it can be safely
> changed, much as the baud rate can be safely changed. That is clearly
> wrong: many values of heap size will result in a locked up platform.

Hi Ron,
Thanks for the update.  I haven't had any problems
increasing heap size but that could just be my motherboard.

What failure modes become possible when the heap size
is increased?

Joe
ron minnich - 2010-05-15 02:29:53
On Fri, May 14, 2010 at 7:18 PM, Joe Korty <joe.korty@ccur.com> wrote:

> What failure modes become possible when the heap size
> is increased?

suppose someone for whatever reason sets it to a preposterous size.
Not likely but we've seen that sort of thing happen.

it's not necessary to have it user visible, and that alone is a good
reason not to put it  there.

ron
Myles Watson - 2010-05-15 04:11:56
> On Fri, May 14, 2010 at 05:38:04PM -0400, ron minnich wrote:
> > Joe, we have visited this type of issue from time to time. The heap
> > size, if it is related to a mainboard (and it is) belongs in the
> > mainboard Kconfig and should not be user-visible. The reason is that
> > if it is visible then that visibility implies that it can be safely
> > changed, much as the baud rate can be safely changed. That is clearly
> > wrong: many values of heap size will result in a locked up platform.
> 
> Hi Ron,
> Thanks for the update.  I haven't had any problems
> increasing heap size but that could just be my motherboard.
It's easy to run out of RAM if you increase it too much, especially if the
stack gets too large.  For most boards, stack*processors + heap + code = 1M.

The bigger worry is that someone will decrease the RAM size, which is a
boot-time failure.  Build-time failures are easier to handle.

Thanks,
Myles
Myles Watson - 2010-05-21 16:10:51
On Fri, May 14, 2010 at 1:49 PM, Joe Korty <joe.korty@ccur.com> wrote:
> Some background:
> The reason I'm looking at coreboot is that standard BIOSes
> (apparently) run out of memory while doing the bus walk,
> when I plug a PCI-e expansion chassis into the motherboard
> and populate it.  The BIOS will either lock up or the OS
> will boot but what the OS sees for a PCI Bus (via lspci
> -tv) is clearly corrupt.
>
> So my job was/is to do an experiment to see if our problems
> are indeed due to out-of-memory issues in standard BIOSes, and
> if so, if coreboot could be a useful way around this issue.
>
> And indeed, the first time I booted coreboot with a
> populated PCI-e chassis attached, I got an out-of-memory
> halt from coreboot.  Increasing CONFIG_HEAP_SIZE to
> 0x10000 (ie, 4x) got the system to boot, and lspci -tv
> looks good also.  I have yet to try intermediate values.

Could you try the latest?  Devices now take ~ 1/4 the space that they
used to take.

> Unfortunately we have an even bigger PCI-e loaded expansion
> chassis (configuration #2), for which coreboot also hangs.
> It's not an out-of-memory hang; it happens (apparently)
> during the bus walk.  I haven't looked into this hang in
> detail yet, so I don't have much to report.  But I do fear
> it may be something more fundamental.

If you send the log to the list we might be able to help.

Thanks,
Myles
Joe Korty - 2010-05-21 20:07:20
On Fri, May 21, 2010 at 12:10:51PM -0400, Myles Watson wrote:
> On Fri, May 14, 2010 at 1:49 PM, Joe Korty <joe.korty@ccur.com> wrote:
> > Unfortunately we have an even bigger PCI-e loaded expansion
> > chassis (configuration #2), for which coreboot also hangs.
> > It's not an out-of-memory hang; it happens (apparently)
> > during the bus walk. ?I haven't looked into this hang in
> > detail yet, so I don't have much to report. ?But I do fear
> > it may be something more fundamental.
> 
> If you send the log to the list we might be able to help.

Hi Myles,
I've solved this one, kind of.  It is PCI IO Space
overflow, we are going over 0xffff which apparently is
a hard limit.  I image this is there so that inb, outw,
etc instructions can be used to reference these devices.

But if one doesn't use such instructions (instead using
memory mapped PCI IO space), I see no reason why Linux
and coreboot couldn't work with PCI IO Space addresses
> 0xffff.

Regards,
Joe
Myles Watson - 2010-05-21 20:28:41
> > If you send the log to the list we might be able to help.
> 
> Hi Myles,
> I've solved this one, kind of.  It is PCI IO Space
> overflow, we are going over 0xffff which apparently is
> a hard limit.  I image this is there so that inb, outw,
> etc instructions can be used to reference these devices.
> 
> But if one doesn't use such instructions (instead using
> memory mapped PCI IO space), I see no reason why Linux
> and coreboot couldn't work with PCI IO Space addresses
> > 0xffff.

The resource allocator doesn't care.  Just find the places where the I/O
flag is checked and the limit is set to 0xffff and try setting it larger.  I
would look in src/devices/pci_device.c and
src/northbridge/your_northbridge/northbridge.c first.

I'm not sure what will break, but we should be able to fix it pretty easily.

Thanks,
Myles
Carl-Daniel Hailfinger - 2010-05-21 21:04:26
Hi Joe,

On 21.05.2010 22:07, Joe Korty wrote:
> On Fri, May 21, 2010 at 12:10:51PM -0400, Myles Watson wrote:
>   
>> On Fri, May 14, 2010 at 1:49 PM, Joe Korty <joe.korty@ccur.com> wrote:
>>     
>>> Unfortunately we have an even bigger PCI-e loaded expansion
>>> chassis (configuration #2), for which coreboot also hangs.
>>> It's not an out-of-memory hang; it happens (apparently)
>>> during the bus walk. ?I haven't looked into this hang in
>>> detail yet, so I don't have much to report. ?But I do fear
>>> it may be something more fundamental.
>>>       
>> If you send the log to the list we might be able to help.
>>     
>
> I've solved this one, kind of.  It is PCI IO Space
> overflow, we are going over 0xffff which apparently is
> a hard limit.  I image this is there so that inb, outw,
> etc instructions can be used to reference these devices.
>
> But if one doesn't use such instructions (instead using
> memory mapped PCI IO space), I see no reason why Linux
> and coreboot couldn't work with PCI IO Space addresses
>   
>> 0xffff.
>>     

I'm interested in how you want to map port IO space to memory.
Please explain.

AFAIK PCI register space is totally independent of port IO space which
is totally independent of memory space. You can access PCI register
space via CF8/CFC port IO and via MMCONFIG memory, but I'm unaware of
any mechanisms to map IO ports to memory or the other way round.

Thanks,
Carl-Daniel
Joe Korty - 2010-05-22 00:52:43
I wrote:
>> Unfortunately, the latest coreboot still gets an out-of-mem condition
>> when the large pci-e chassis is attached.
>> 
>> I've attached two coreboot logs, both are the latest svn but the second
>> one has heap size at 0x10000 so that I can send you the log of what a
>> good boot might look like.

On Fri, May 21, 2010 at 04:33:12PM -0400, Myles Watson wrote:
> That's a lot of devices.  So maybe we need a Kconfig option called
> ADDITIONAL_HEAP that's available on the EXPERT menu.  It's possible that
> making links into lists will make you fit, but I'd expect someone to shrink
> the default heap when there's that much extra space for everyone else.

I certainly think that that would be OK.  It is impractical
to come up with a default heap size to cover the largest
possible IO configuration, since that would be very
large indeed.

Heck, even my failing large IO configuration is not really
very large.  I am putting only one expansion chassis on
the system.  I expect that we will eventually get customers
that will want two or even three expansion chassis (each
chassis holds 20 PCI-e cards).

In one sense this is kinda exciting.  Mainframes have
always been about large IO. That's what distinguishes them
from PCs.  With these tweaks, PC-like machines can start
to eat away at the bottom of that market.

Regards,
Joe

Joe
Peter Stuge - 2010-05-22 01:28:55
Joe Korty wrote:
> I've solved this one, kind of.  It is PCI IO Space
> overflow, we are going over 0xffff which apparently is
> a hard limit.

On x86 it is very much a hard limit. Not so on other architectures.


> I image this is there so that inb, outw,
> etc instructions can be used to reference these devices.
> 
> But if one doesn't use such instructions (instead using
> memory mapped PCI IO space),

The feasibility of that is totally device dependent. PCI devices can
expose all combinations of I/O and memory, and only the device driver
knows which one to use how.


> I see no reason why Linux and coreboot couldn't work with PCI IO
> Space addresses > 0xffff.

The I/O opcodes on x86 are limited to 16 bit addresses. Since this
is part of the architecture, both Linux and coreboot make this
assumption on x86 systems.


Joe Korty wrote:
> Heck, even my failing large IO configuration is not really
> very large.  I am putting only one expansion chassis on
> the system.

Either you are just totally out of luck with the I/O space situation,
or there is room for improvement in coreboot. Not at all impossible.

What cards did you have in this expansion chassis? Would it be
possible for you to provide lspci -vv output on that system?
Does the system boot if the chassis is completely empty?


> I expect that we will eventually get customers that will want two
> or even three expansion chassis (each chassis holds 20 PCI-e
> cards).

How do the chassis connect upstream, on PCI level? How does that
upstream-facing component divide address space? Does it reserve
a chunk for everything that can connect downstream? How big a chunk?


> In one sense this is kinda exciting.  Mainframes have
> always been about large IO. That's what distinguishes them
> from PCs.  With these tweaks, PC-like machines can start
> to eat away at the bottom of that market.

Only if 16 bits is enough for all I/O BARs that the plugged-in cards
need.

Maybe the allocation algorithm in coreboot can be optimized to
pack things better into those 16 bits, but worst case you've simply
hit an architecture limitation with x86. :\


//Peter
Joe Korty - 2010-05-22 15:49:40
On Fri, May 21, 2010 at 05:04:26PM -0400, Carl-Daniel Hailfinger wrote:
> > On Fri, May 21, 2010 at 12:10:51PM -0400, Myles Watson wrote:
> >> On Fri, May 14, 2010 at 1:49 PM, Joe Korty <joe.korty@ccur.com> wrote:
> > I've solved this one, kind of.  It is PCI IO Space
> > overflow, we are going over 0xffff which apparently is
> > a hard limit.  I image this is there so that inb, outw,
> > etc instructions can be used to reference these devices.
> >
> > But if one doesn't use such instructions (instead using
> > memory mapped PCI IO space), I see no reason why Linux
> > and coreboot couldn't work with PCI IO Space addresses
> >> 0xffff.
> 
> I'm interested in how you want to map port IO space to memory.
> Please explain.
> 
> AFAIK PCI register space is totally independent of port IO space which
> is totally independent of memory space. You can access PCI register
> space via CF8/CFC port IO and via MMCONFIG memory, but I'm unaware of
> any mechanisms to map IO ports to memory or the other way round.

Well, all I know at this point is that the Linux kernel
sources have code that maps inb etc either to the
instructions or to a memory dereference, and the .config
for that chooses memory dereference for x86.

It's gonna be fun seeing if high-IO-address space can be
made to work..

Regards,
Joe
Myles Watson - 2010-05-22 15:55:52
>> I'm interested in how you want to map port IO space to memory.
>> Please explain.
>>
>> AFAIK PCI register space is totally independent of port IO space which
>> is totally independent of memory space. You can access PCI register
>> space via CF8/CFC port IO and via MMCONFIG memory, but I'm unaware of
>> any mechanisms to map IO ports to memory or the other way round.
>
> Well, all I know at this point is that the Linux kernel
> sources have code that maps inb etc either to the
> instructions or to a memory dereference, and the .config
> for that chooses memory dereference for x86.
>
> It's gonna be fun seeing if high-IO-address space can be
> made to work..

We should be able to support any mapping that you can make work in
Linux.  It will be fun to see.

Thanks,
Myles
Joe Korty - 2010-05-22 16:11:50
On Fri, May 21, 2010 at 09:28:55PM -0400, Peter Stuge wrote:
> Joe Korty wrote:
> > I've solved this one, kind of.  It is PCI IO Space
> > overflow, we are going over 0xffff which apparently is
> > a hard limit.
>  ...
> What cards did you have in this expansion chassis? Would it be
> possible for you to provide lspci -vv output on that system?
> Does the system boot if the chassis is completely empty?

Hi Peter,
That particular expansion chassis load is no longer
accessible to me at the moment (sent elsewhere).
I'm pretty sure I can reconstruct it but I first have to
scrap up the parts.

The real problem is PCI-e bridges rounding everything up
to 4Kbyte boundaries.  Doing this for IO space is a real
pain, it doesn't take very many PCI-e bridges (and every PCI-e
card seems to have a bridge within it) to make us go
over the 0xffff IO Space limit.

I saw each Quad Ethernet taking 8K of IO space.  Thus it
doesn't take very many of these cards to fill up IO space.
The address space allocation (from /proc/ioports, from
memory) follows this pattern:

	a000-a01f
	a020-a03f
	b000-b01f
	b020-b03f

From the rounding it appears that this pci-e board
internally has two pci-e busses, each with two ethernets,
perhaps fronted with a pci-e mux giving the board its
connection to the outside world.

> How do the chassis connect upstream, on PCI level? How does that
> upstream-facing component divide address space? Does it reserve
> a chunk for everything that can connect downstream? How big a chunk?

The PCIe bridges seems to be rounding everything to 4k
boundaries.  I haven't found any documentation on what the
PCIe standards says that limit, if any, should actually be.

Regards,
Joe
ron minnich - 2010-05-22 16:54:08
pci bridges have always rounded to 4k multiples ... it's in the very
earliest spec.

ron
Joe Korty - 2010-05-22 18:54:44
On Sat, May 22, 2010 at 12:54:08PM -0400, ron minnich wrote:
> pci bridges have always rounded to 4k multiples ... it's in the very
> earliest spec.

Thanks.  It does seem that PCI-e uses more bridges than the older PCIs;
if so that would explain the excessive spreading-out of IO port addresses
under PCI-e.

Joe
Carl-Daniel Hailfinger - 2010-05-22 20:25:21
On 22.05.2010 20:54, Joe Korty wrote:
> On Sat, May 22, 2010 at 12:54:08PM -0400, ron minnich wrote:
>   
>> pci bridges have always rounded to 4k multiples ... it's in the very
>> earliest spec.
>>     
>
> Thanks.  It does seem that PCI-e uses more bridges than the older PCIs;
> if so that would explain the excessive spreading-out of IO port addresses
> under PCI-e.
>   

I think someone mentioned that some PCIe bridges can map IO port space
to memory space to give non-x86 systems access to IO port space (many
architectures do not have a separate IO port space).

If you manage to find such a PCIe bridge chip with mem<->IOport mapping
capability and if you can hack up Linux to use the memory accessor
functions for the devices behind such a bridge, you can work around
IOport resource space constraints. It would definitely be interesting to
see if that is possible in paractice without breaking lots of stuff.

Regards,
Carl-Daniel

Patch

Index: trunk/src/Kconfig
===================================================================
--- trunk.orig/src/Kconfig	2010-05-14 10:24:35.000000000 -0400
+++ trunk/src/Kconfig	2010-05-14 10:25:00.000000000 -0400
@@ -80,6 +80,17 @@ 
 	  Enables the use of ccache for faster builds.
 	  Requires ccache in path.
 
+config HEAP_SIZE
+	hex "Heap size (in bytes)"
+	default 0x4000
+	help
+	  The primary coreboot heap user is the PCI
+	  bus walk.  Therefore heap size may need to be
+	  increased on systems that have exceptionally
+	  large and/or deep PCI device trees.
+
+	  If unsure, use the default.
+
 endmenu
 
 source src/mainboard/Kconfig
@@ -124,10 +135,6 @@ 
 	bool
 	default n
 
-config HEAP_SIZE
-	hex
-	default 0x4000
-
 config DEBUG
 	bool
 	default n
Index: trunk/src/lib/malloc.c
===================================================================
--- trunk.orig/src/lib/malloc.c	2010-05-14 10:24:35.000000000 -0400
+++ trunk/src/lib/malloc.c	2010-05-14 10:25:00.000000000 -0400
@@ -14,7 +14,10 @@ 
 {
 	void *p;
 
-	MALLOCDBG("%s Enter, size %ld, free_mem_ptr %p\n", __func__, size, free_mem_ptr);
+	MALLOCDBG("%s Enter, size %ld, %d of %d bytes available.\n",
+		__func__, size,
+		(int)(free_mem_end_ptr - free_mem_ptr),
+		(int)(&_eheap - &_heap));
 
 	/* Checking arguments */
 	if (size < 0)