Patchwork MP table multicore patch

login
register
about
Submitter Stefan Reinauer
Date 2010-02-16 19:02:13
Message ID <4B7AEBB5.9070704@coresystems.de>
Download mbox | patch
Permalink /patch/931/
State New
Headers show

Comments

Stefan Reinauer - 2010-02-16 19:02:13
On 2/16/10 5:11 AM, Timothy Pearson wrote:
> Here is a cleaned up and tested version of the SMP APIC autodetect patch.
>
> Signed-off-by: Timothy Pearson <tpearson@raptorengineeringinc.com>
>
> ---
>
> It would of course be helpful to attach the patch.  My Webmail client
> keeps eating it...
>
> Timothy Pearson
> Raptor Engineering
>   


Do you have a comparison mptable as generated by the old and the new
version on your system?

And, can you describe high level, what the patch changes? It looks to me
as if you are recursing through the tree instead of just walking the
"all_devices" list.
So this implies that you don't catch all devices when running through
all_devices. This sounds like a problem in the resource allocator and
maybe it should be fixed there instead?

The code as it is runs in roughly O(n^3) and on top of that calls itself
which is a little bit scary.
But, looking at it a bit closer I see that the outer loop with p_it does
nothing. I dropped the outer loop and moved the lapic stuff to the inner
loop  where it is used, except passing it around on the stack. It will
probably mean 3 more cpuid calls and 3 more lapic reads as opposed to
heavy stack usage, but I think the stack usage has been more troublesome
in the past..

Please let me know what you think...

Stefan
Myles Watson - 2010-02-16 19:42:00
On Tue, Feb 16, 2010 at 12:02 PM, Stefan Reinauer <stepan@coresystems.de>wrote:

> On 2/16/10 5:11 AM, Timothy Pearson wrote:
> > Here is a cleaned up and tested version of the SMP APIC autodetect patch.
> >
> > Signed-off-by: Timothy Pearson <tpearson@raptorengineeringinc.com>
>


> And, can you describe high level, what the patch changes? It looks to me
> as if you are recursing through the tree instead of just walking the
> "all_devices" list.
> So this implies that you don't catch all devices when running through
> all_devices. This sounds like a problem in the resource allocator and
> maybe it should be fixed there instead?
>
I don't understand why this would be a resource allocator problem.  Aren't
we talking about the device tree?

Maybe the real problem is related to the memory corruption seen with printk?

Thanks,
Myles
Stefan Reinauer - 2010-02-16 23:40:00
On 2/16/10 8:42 PM, Myles Watson wrote:
>
>
> On Tue, Feb 16, 2010 at 12:02 PM, Stefan Reinauer
> <stepan@coresystems.de <mailto:stepan@coresystems.de>> wrote:
>
>     On 2/16/10 5:11 AM, Timothy Pearson wrote:
>     > Here is a cleaned up and tested version of the SMP APIC
>     autodetect patch.
>     >
>     > Signed-off-by: Timothy Pearson
>     <tpearson@raptorengineeringinc.com
>     <mailto:tpearson@raptorengineeringinc.com>>
>
>  
>
>     And, can you describe high level, what the patch changes? It looks
>     to me
>     as if you are recursing through the tree instead of just walking the
>     "all_devices" list.
>     So this implies that you don't catch all devices when running through
>     all_devices. This sounds like a problem in the resource allocator and
>     maybe it should be fixed there instead?
>
> I don't understand why this would be a resource allocator problem. 
> Aren't we talking about the device tree?

Oh, sorry I didn't say more...  What the old code did was the following:

    device_t cpu;
    for(cpu = all_devices; cpu; cpu = cpu->next) {
     ....
     <- check if the device is an APIC and if it is, dump an APIC entry
in the mptable ->
     ....
    }

But it seems that the code does not see all devices -- anymore -- not
sure if it ever did, but I think to remember that's what happened at
some point, or was at least intended.

So now the code looks l

scan_for_apic_devices(parent)
{
    for(c_it=0; c_it < parent->links; c_it++) {
        for (child = parent->link[c_it].children; child; child =
child->sibling) {
....
            if (child->path.type == DEVICE_PATH_APIC_CLUSTER) {
                // Found an APIC cluster, scan it for APICs
                smp_scan_for_apic_devices(child);
            }
        }
    }
}

So two more steps are necessary:
- check all the downwards links of a device instead of just walking
devices and checking their type.
- run recursively in a special case on APIC clusters.

This sounds a whole lot like something changed in the way "all_devices"
works. And if "all_devices" does not mean "all devices" I am sure there
are more places in our code that need similar fixes.

> Maybe the real problem is related to the memory corruption seen with
> printk?
Not completely impossible, but I figured it's easier to ask the guy who
rewrote the resource allocator if he knows something about how the
intended behavior of "all_devices" ;-)
- Maybe the behavior is intended, then we just need to check in
Timothy's patch.
- Maybe the behavior is not intended, but that's how the code works
right now. Then we'd rather have to look at the resource allocator and
decide if we want "all_devices" to mean "all devices", or whether we
rename the variable, or redefine its meaning.
- Maybe none of the above applies, then we need to do nasty stack
corruption debugging. In this case it would be fundamentally wrong to
touch either the device tree code or the mp table creation code until we
fix the corruption in order to make sure we don't create funny special
cases.

So, if you have hints enlightening any of the maybes, please share!


Stefan
Myles Watson - 2010-02-16 23:58:59
On Tue, Feb 16, 2010 at 4:40 PM, Stefan Reinauer <stepan@coresystems.de>wrote:

>  On 2/16/10 8:42 PM, Myles Watson wrote:
>
>
>
> On Tue, Feb 16, 2010 at 12:02 PM, Stefan Reinauer <stepan@coresystems.de>wrote:
>
>>  On 2/16/10 5:11 AM, Timothy Pearson wrote:
>> > Here is a cleaned up and tested version of the SMP APIC autodetect
>> patch.
>> >
>> > Signed-off-by: Timothy Pearson <tpearson@raptorengineeringinc.com>
>>
>
>
>> And, can you describe high level, what the patch changes? It looks to me
>> as if you are recursing through the tree instead of just walking the
>> "all_devices" list.
>> So this implies that you don't catch all devices when running through
>> all_devices. This sounds like a problem in the resource allocator and
>> maybe it should be fixed there instead?
>>
> I don't understand why this would be a resource allocator problem.  Aren't
> we talking about the device tree?
>
>
> Oh, sorry I didn't say more...  What the old code did was the following:
>
>     device_t cpu;
>     for(cpu = all_devices; cpu; cpu = cpu->next) {
>      ....
>      <- check if the device is an APIC and if it is, dump an APIC entry in
> the mptable ->
>      ....
>     }
>
> But it seems that the code does not see all devices -- anymore -- not sure
> if it ever did, but I think to remember that's what happened at some point,
> or was at least intended.
>
I'm really surprised that it didn't see all devices.


> So now the code looks l
>
> scan_for_apic_devices(parent)
> {
>     for(c_it=0; c_it < parent->links; c_it++) {
>         for (child = parent->link[c_it].children; child; child =
> child->sibling) {
> ....
>
>             if (child->path.type == DEVICE_PATH_APIC_CLUSTER) {
>                 // Found an APIC cluster, scan it for APICs
>                 smp_scan_for_apic_devices(child);
>             }
>         }
>     }
> }
>
> So two more steps are necessary:
> - check all the downwards links of a device instead of just walking devices
> and checking their type.
> - run recursively in a special case on APIC clusters.
>
> This sounds a whole lot like something changed in the way "all_devices"
> works. And if "all_devices" does not mean "all devices" I am sure there are
> more places in our code that need similar fixes.
>
I agree.  I've never seen the problem where I couldn't run through all the
devs with all_devices.

>
>  Maybe the real problem is related to the memory corruption seen with
> printk?
>
> Not completely impossible, but I figured it's easier to ask the guy who
> rewrote the resource allocator if he knows something about how the intended
> behavior of "all_devices" ;-)
>
As far as I know I didn't change that behavior.  Does anyone have a boot log
where they can see that the tree of devices has more devices than the list
(at the same stage of enumeration)?


> - Maybe the behavior is intended, then we just need to check in Timothy's
> patch.
> - Maybe the behavior is not intended, but that's how the code works right
> now. Then we'd rather have to look at the resource allocator and decide if
> we want "all_devices" to mean "all devices", or whether we rename the
> variable, or redefine its meaning.
>
I think it should mean all devices.  I changed resource allocation to go
through the tree because children of disabled devices shouldn't have
allocated resources.

- Maybe none of the above applies, then we need to do nasty stack corruption
> debugging. In this case it would be fundamentally wrong to touch either the
> device tree code or the mp table creation code until we fix the corruption
> in order to make sure we don't create funny special cases.
>
This is my vote, but I'm happy to be proven wrong.  I don't see anywhere in
the code where devices get added to the tree but not to the list of
devices.  I also don't remember a place where devices get removed from the
all_devices list.


> So, if you have hints enlightening any of the maybes, please share!
>
My suggestion would be to traverse the tree and the list and compare them.
The problem is that printk was causing a hang for him.

Thanks,
Myles
tpearson@raptorengineeringinc.com - 2010-02-24 07:26:41
> So two more steps are necessary:
> - check all the downwards links of a device instead of just walking devices
> and checking their type.
> - run recursively in a special case on APIC clusters.
>
> This sounds a whole lot like something changed in the way "all_devices"
> works. And if "all_devices" does not mean "all devices" I am sure there are
> more places in our code that need similar fixes.

This is the crux of the issue.  all_devices does NOT mean "all devices",
it means "all devices attached to the root node, which is all_devices". 
As the root node, usually the PCI bus and the APICs are visible.  On my
board, the APICs are all under an APIC cluster, so they are not
immediately visible from the root node.

Incidentally, there is code already in Coreboot (to generate the PCI
device lists) that takes all_devices and simply probes the PCI downward
link.  It (sanely, IMHO) does not expect to see PCI devices on the
all_devices root node.

I don't have access to a board that I can generate before/after tables
with at this time.  This behavior is very simple to see though; if you
turn on printk spew and look closely at the printed detected device tables
on any amdfam10 board, you will see the root node only connects to the
APIC cluster and the PCI root bus(es).

If you have any more questions feel free to ask!  Hopefully the root cause
of the problem can be located and solved.

Timothy Pearson
Raptor Engineering
Myles Watson - 2010-02-24 15:22:29
On Wed, Feb 24, 2010 at 12:26 AM, Timothy Pearson <
tpearson@raptorengineeringinc.com> wrote:

> > So two more steps are necessary:
> > - check all the downwards links of a device instead of just walking
> devices
> > and checking their type.
> > - run recursively in a special case on APIC clusters.
> >
> > This sounds a whole lot like something changed in the way "all_devices"
> > works. And if "all_devices" does not mean "all devices" I am sure there
> are
> > more places in our code that need similar fixes.
>
> This is the crux of the issue.  all_devices does NOT mean "all devices",
> it means "all devices attached to the root node, which is all_devices".
>
You're correct; this is the crux.  Here's a snippet of the fam10 boot log
for SimNOW:

Show all devs...Before Device Enumeration.
Root Device: enabled 1, 0 resources
APIC_CLUSTER: 0: enabled 1, 0 resources
APIC: 00: enabled 1, 0 resources
PCI_DOMAIN: 0000: enabled 1, 0 resources
PCI: 00:18.0: enabled 1, 0 resources
PCI: 00:00.0: enabled 1, 0 resources
PCI: 00:00.1: enabled 1, 0 resources
PCI: 00:01.0: enabled 1, 0 resources
PCI: 00:01.1: enabled 1, 0 resources
PCI: 00:00.0: enabled 1, 0 resources
PCI: 00:00.0: enabled 1, 0 resources
PCI: 00:00.1: enabled 1, 0 resources
PCI: 00:00.2: enabled 0, 0 resources
PCI: 00:01.0: enabled 0, 0 resources
PCI: 00:01.0: enabled 1, 0 resources
PNP: 002e.0: enabled 0, 3 resources
PNP: 002e.1: enabled 0, 2 resources
PNP: 002e.2: enabled 1, 2 resources
PNP: 002e.3: enabled 0, 2 resources
PNP: 002e.5: enabled 1, 4 resources
PNP: 002e.6: enabled 0, 1 resources
PNP: 002e.7: enabled 0, 3 resources
PNP: 002e.8: enabled 0, 0 resources
PNP: 002e.9: enabled 0, 0 resources
PNP: 002e.a: enabled 0, 0 resources
PNP: 002e.b: enabled 1, 2 resources
PCI: 00:01.1: enabled 1, 0 resources
PCI: 00:01.2: enabled 1, 0 resources
PCI: 00:01.3: enabled 1, 0 resources
I2C: 00:18: enabled 1, 0 resources
I2C: 00:50: enabled 1, 0 resources
I2C: 00:51: enabled 1, 0 resources
I2C: 00:52: enabled 1, 0 resources
I2C: 00:53: enabled 1, 0 resources
I2C: 00:50: enabled 1, 0 resources
I2C: 00:51: enabled 1, 0 resources
I2C: 00:52: enabled 1, 0 resources
I2C: 00:53: enabled 1, 0 resources
PCI: 00:01.5: enabled 0, 0 resources
PCI: 00:01.6: enabled 0, 0 resources
PCI: 00:18.1: enabled 1, 0 resources
PCI: 00:18.2: enabled 1, 0 resources
PCI: 00:18.3: enabled 1, 0 resources
PCI: 00:18.4: enabled 1, 0 resources


Compare with tree...
Root Device: enabled 1, 0 resources
 APIC_CLUSTER: 0: enabled 1, 0 resources
  APIC: 00: enabled 1, 0 resources
 PCI_DOMAIN: 0000: enabled 1, 0 resources
  PCI: 00:18.0: enabled 1, 0 resources
   PCI: 00:00.0: enabled 1, 0 resources
   PCI: 00:00.1: enabled 1, 0 resources
   PCI: 00:01.0: enabled 1, 0 resources
   PCI: 00:01.1: enabled 1, 0 resources
   PCI: 00:00.0: enabled 1, 0 resources
    PCI: 00:00.0: enabled 1, 0 resources
    PCI: 00:00.1: enabled 1, 0 resources
    PCI: 00:00.2: enabled 0, 0 resources
    PCI: 00:01.0: enabled 0, 0 resources
   PCI: 00:01.0: enabled 1, 0 resources
    PNP: 002e.0: enabled 0, 3 resources
    PNP: 002e.1: enabled 0, 2 resources
    PNP: 002e.2: enabled 1, 2 resources
    PNP: 002e.3: enabled 0, 2 resources
    PNP: 002e.5: enabled 1, 4 resources
    PNP: 002e.6: enabled 0, 1 resources
    PNP: 002e.7: enabled 0, 3 resources
    PNP: 002e.8: enabled 0, 0 resources
    PNP: 002e.9: enabled 0, 0 resources
    PNP: 002e.a: enabled 0, 0 resources
    PNP: 002e.b: enabled 1, 2 resources
   PCI: 00:01.1: enabled 1, 0 resources
   PCI: 00:01.2: enabled 1, 0 resources
   PCI: 00:01.3: enabled 1, 0 resources
    I2C: 00:18: enabled 1, 0 resources
     I2C: 00:50: enabled 1, 0 resources
     I2C: 00:51: enabled 1, 0 resources
     I2C: 00:52: enabled 1, 0 resources
     I2C: 00:53: enabled 1, 0 resources
     I2C: 00:50: enabled 1, 0 resources
     I2C: 00:51: enabled 1, 0 resources
     I2C: 00:52: enabled 1, 0 resources
     I2C: 00:53: enabled 1, 0 resources
   PCI: 00:01.5: enabled 0, 0 resources
   PCI: 00:01.6: enabled 0, 0 resources
  PCI: 00:18.1: enabled 1, 0 resources
  PCI: 00:18.2: enabled 1, 0 resources
  PCI: 00:18.3: enabled 1, 0 resources
  PCI: 00:18.4: enabled 1, 0 resources

There are 44 devices in the tree, and 44 in the all_devices list.  If there
is some place that the list is getting broken, we should fix it.


As the root node, usually the PCI bus and the APICs are visible.  On my
> board, the APICs are all under an APIC cluster, so they are not
> immediately visible from the root node.
>
They're not children of the root node, but they are accessible through the
next pointer:

    for (dev = all_devices; dev; dev = dev->next) {
        do_printk(debug_level,
              "%s: enabled %d, %d resources\n",
              dev_path(dev), dev->enabled,
              dev->resources);
    }




> I don't have access to a board that I can generate before/after tables
> with at this time.  This behavior is very simple to see though; if you
> turn on printk spew and look closely at the printed detected device tables
>
What are printed detected device tables?


If you have any more questions feel free to ask!  Hopefully the root cause
> of the problem can be located and solved.
>
> Sure.  Let's make sure it gets taken care of.

Thanks,
Myles

Patch

Index: src/arch/i386/smp/mpspec.c
===================================================================
--- src/arch/i386/smp/mpspec.c	(revision 5127)
+++ src/arch/i386/smp/mpspec.c	(working copy)
@@ -91,6 +91,52 @@ 
 	smp_add_mpc_entry(mc, sizeof(*mpc));
 }
 
+void smp_scan_for_apics(struct mp_config_table *mc, struct device *parent)
+{
+	int c_it;
+	struct device *child;
+	/* Scan all links of the root node for APIC clusters and APICs */
+	for(c_it=0; c_it < parent->links; c_it++) {
+		for (child = parent->link[c_it].children; child; child = child->sibling) {
+			/* Is this an APIC? */
+			if (child->path.type == DEVICE_PATH_APIC) {
+				int boot_apic_id;
+				unsigned apic_version;
+				unsigned long cpu_flag;
+				unsigned cpu_features;
+				unsigned cpu_feature_flags;
+				struct cpuid_result result;
+
+				if (!child->enabled)
+					continue;
+
+				/* Found an APIC, add it to the MP table */
+				boot_apic_id = lapicid();
+				apic_version = lapic_read(LAPIC_LVR) & 0xff;
+
+				result = cpuid(1);
+				cpu_features = result.eax;
+				cpu_feature_flags = result.edx;
+
+				cpu_flag = MPC_CPU_ENABLED;
+				if (boot_apic_id == child->path.apic.apic_id) {
+					cpu_flag = MPC_CPU_ENABLED | MPC_CPU_BOOTPROCESSOR;
+				}
+				smp_write_processor(mc, 
+					child->path.apic.apic_id, apic_version,
+					cpu_flag, cpu_features, cpu_feature_flags
+				);
+			}
+
+			// Or an APIC cluster?
+			if (child->path.type == DEVICE_PATH_APIC_CLUSTER) {
+				// Found an APIC cluster, scan it for APICs
+				smp_scan_for_apics(mc, child);
+			}
+		}
+	}
+}
+
 /* If we assume a symmetric processor configuration we can
  * get all of the information we need to write the processor
  * entry from the bootstrap processor.
@@ -100,37 +146,8 @@ 
  */
 void smp_write_processors(struct mp_config_table *mc)
 {
-	int boot_apic_id;
-	unsigned apic_version;
-	unsigned cpu_features;
-	unsigned cpu_feature_flags;
-	struct cpuid_result result;
-	device_t cpu;
-	
-	boot_apic_id = lapicid();
-	apic_version = lapic_read(LAPIC_LVR) & 0xff;
-	result = cpuid(1);
-	cpu_features = result.eax;
-	cpu_feature_flags = result.edx;
-	for(cpu = all_devices; cpu; cpu = cpu->next) {
-		unsigned long cpu_flag;
-		if ((cpu->path.type != DEVICE_PATH_APIC) || 
-			(cpu->bus->dev->path.type != DEVICE_PATH_APIC_CLUSTER))
-		{
-			continue;
-		}
-		if (!cpu->enabled) {
-			continue;
-		}
-		cpu_flag = MPC_CPU_ENABLED;
-		if (boot_apic_id == cpu->path.apic.apic_id) {
-			cpu_flag = MPC_CPU_ENABLED | MPC_CPU_BOOTPROCESSOR;
-		}
-		smp_write_processor(mc, 
-			cpu->path.apic.apic_id, apic_version,
-			cpu_flag, cpu_features, cpu_feature_flags
-		);
-	}
+	// Scan the root node for APIC clusters and APICs
+	smp_scan_for_apics(mc, all_devices);
 }
 
 void smp_write_bus(struct mp_config_table *mc,
@@ -179,7 +196,6 @@ 
 #ifdef DEBUG_MPTABLE
 	printk_debug("add intsrc srcbus 0x%x srcbusirq 0x%x, dstapic 0x%x, dstirq 0x%x\n",
 				srcbus, srcbusirq, dstapic, dstirq);
-	hexdump(__func__, mpc, sizeof(*mpc));
 #endif
 }