Patchwork ASRock E350M1: Boot delay with debug enabled, system RAM reported incorrectly in Linux

login
register
about
Submitter Scott
Date 2011-06-19 06:38:25
Message ID <36FF88D4AED54B48BC986E137EB15B03@asusp67>
Download mbox | patch
Permalink /patch/3169/
State New
Headers show

Comments

Scott - 2011-06-19 06:38:25
Marshall Buschman wrote:

]Hello:
]
]With Scott's work on PCIe support for the E350M1, the NIC and USB3 are 
]now working -- Thanks Scott!

Thanks for testing it on both the boards. Good to hear it works.

]The remaining problems that I know of are:
]
]1) Enabling coreboot serial debugging slows system boot dramatically: 5min+
]Someone mentioned in IRC that this is because we are attempting to write 
]to the serial device before it is ready, which causes some kind of 
]timeout/backoff/retry sequence. How can I help with this?

That is weird. The log file you sent is 38819 bytes. I would expect the
boot time penalty to be not much more than the I/O time of 38819 bytes /
(11520 bytes/second) = 3.37 seconds. I did a test with loglevel 8. It logged
45745 bytes and the boot time from cold reset to DOS prompt was 5.56
seconds. When I watch the serial output, it spews text nearly continuously.
There is no hardware or software handshaking for the writes, so nothing
should slow it down.  

]2) System RAM is reported incorrectly. In linux, "free -m" reports 480mb 
]of total RAM -- the full total is 4gb.

I see, I had tested only a small memory configuration so far. It looks
like any size greater than 4GB will fail. Try the attached patch.

Thanks,
Scott
Correct memory size reporting on AMD family 14h systems for >= 4GB.
Signed-off-by: Scott Duplichan <scott@notabs.org>
mbuschman@lucidmachines.com - 2011-06-19 06:44:08
On 6/19/2011 1:38 AM, Scott Duplichan wrote:
> Marshall Buschman wrote:
>
> ]Hello:
> ]
> ]With Scott's work on PCIe support for the E350M1, the NIC and USB3 are
> ]now working -- Thanks Scott!
>
> Thanks for testing it on both the boards. Good to hear it works.
>
> ]The remaining problems that I know of are:
> ]
> ]1) Enabling coreboot serial debugging slows system boot dramatically: 5min+
> ]Someone mentioned in IRC that this is because we are attempting to write
> ]to the serial device before it is ready, which causes some kind of
> ]timeout/backoff/retry sequence. How can I help with this?
>
> That is weird. The log file you sent is 38819 bytes. I would expect the
> boot time penalty to be not much more than the I/O time of 38819 bytes /
> (11520 bytes/second) = 3.37 seconds. I did a test with loglevel 8. It logged
> 45745 bytes and the boot time from cold reset to DOS prompt was 5.56
> seconds. When I watch the serial output, it spews text nearly continuously.
> There is no hardware or software handshaking for the writes, so nothing
> should slow it down.
There's something strange afoot -- It's not that it outputs slowly, it's 
that you get literally no serial output or boot activity of any kind for 
potentially several minutes (Peter Stuge observed ~5min, I observed 
times closer to 20min).
>
>
> ]2) System RAM is reported incorrectly. In linux, "free -m" reports 480mb
> ]of total RAM -- the full total is 4gb.
>
> I see, I had tested only a small memory configuration so far. It looks
> like any size greater than 4GB will fail. Try the attached patch.
>
> Thanks,
> Scott
>
I will test this now and report back.

Thanks!
-Marshall Buschman
mbuschman@lucidmachines.com - 2011-06-19 07:17:03
On 06/19/2011 01:38 AM, Scott Duplichan wrote:
> Marshall Buschman wrote:
>
> ]Hello:
> ]
> ]With Scott's work on PCIe support for the E350M1, the NIC and USB3 are
> ]now working -- Thanks Scott!
>
> Thanks for testing it on both the boards. Good to hear it works.
>
> ]The remaining problems that I know of are:
> ]
> ]1) Enabling coreboot serial debugging slows system boot dramatically: 5min+
> ]Someone mentioned in IRC that this is because we are attempting to write
> ]to the serial device before it is ready, which causes some kind of
> ]timeout/backoff/retry sequence. How can I help with this?
>
> That is weird. The log file you sent is 38819 bytes. I would expect the
> boot time penalty to be not much more than the I/O time of 38819 bytes /
> (11520 bytes/second) = 3.37 seconds. I did a test with loglevel 8. It logged
> 45745 bytes and the boot time from cold reset to DOS prompt was 5.56
> seconds. When I watch the serial output, it spews text nearly continuously.
> There is no hardware or software handshaking for the writes, so nothing
> should slow it down.
>
> ]2) System RAM is reported incorrectly. In linux, "free -m" reports 480mb
> ]of total RAM -- the full total is 4gb.
>
> I see, I had tested only a small memory configuration so far. It looks
> like any size greater than 4GB will fail. Try the attached patch.
Works great, 3.5gb of available RAM. I've submitted it into the review 
system.
> Thanks,
> Scott
>
Thanks again!
-Marshall
mbuschman@lucidmachines.com - 2011-06-19 07:33:49
On 06/19/2011 02:17 AM, Marshall Buschman wrote:
> On 06/19/2011 01:38 AM, Scott Duplichan wrote:
>> Marshall Buschman wrote:
>>
>> ]Hello:
>> ]
>> ]With Scott's work on PCIe support for the E350M1, the NIC and USB3 are
>> ]now working -- Thanks Scott!
>>
>> Thanks for testing it on both the boards. Good to hear it works.
>>
>> ]The remaining problems that I know of are:
>> ]
>> ]1) Enabling coreboot serial debugging slows system boot 
>> dramatically: 5min+
>> ]Someone mentioned in IRC that this is because we are attempting to 
>> write
>> ]to the serial device before it is ready, which causes some kind of
>> ]timeout/backoff/retry sequence. How can I help with this?
>>
>> That is weird. The log file you sent is 38819 bytes. I would expect the
>> boot time penalty to be not much more than the I/O time of 38819 bytes /
>> (11520 bytes/second) = 3.37 seconds. I did a test with loglevel 8. It 
>> logged
>> 45745 bytes and the boot time from cold reset to DOS prompt was 5.56
>> seconds. When I watch the serial output, it spews text nearly 
>> continuously.
>> There is no hardware or software handshaking for the writes, so nothing
>> should slow it down.
>>
>> ]2) System RAM is reported incorrectly. In linux, "free -m" reports 
>> 480mb
>> ]of total RAM -- the full total is 4gb.
>>
>> I see, I had tested only a small memory configuration so far. It looks
>> like any size greater than 4GB will fail. Try the attached patch.
> Works great, 3.5gb of available RAM. I've submitted it into the review 
> system.
>> Thanks,
>> Scott
>>
> Thanks again!
> -Marshall
>
>
Okay, There's something odd going on here. Now the NIC is gone. I'm 
going to investigate in the morning and abandon the change in gerrit 
until I know what's going on.
-Marshall
mbuschman@lucidmachines.com - 2011-06-19 17:45:10
On 06/19/2011 02:33 AM, Marshall Buschman wrote:
> On 06/19/2011 02:17 AM, Marshall Buschman wrote:
>> On 06/19/2011 01:38 AM, Scott Duplichan wrote:
>>> Marshall Buschman wrote:
>>>
>>> ]Hello:
>>> ]
>>> ]With Scott's work on PCIe support for the E350M1, the NIC and USB3 are
>>> ]now working -- Thanks Scott!
>>>
>>> Thanks for testing it on both the boards. Good to hear it works.
>>>
>>> ]The remaining problems that I know of are:
>>> ]
>>> ]1) Enabling coreboot serial debugging slows system boot 
>>> dramatically: 5min+
>>> ]Someone mentioned in IRC that this is because we are attempting to 
>>> write
>>> ]to the serial device before it is ready, which causes some kind of
>>> ]timeout/backoff/retry sequence. How can I help with this?
>>>
>>> That is weird. The log file you sent is 38819 bytes. I would expect the
>>> boot time penalty to be not much more than the I/O time of 38819 
>>> bytes /
>>> (11520 bytes/second) = 3.37 seconds. I did a test with loglevel 8. 
>>> It logged
>>> 45745 bytes and the boot time from cold reset to DOS prompt was 5.56
>>> seconds. When I watch the serial output, it spews text nearly 
>>> continuously.
>>> There is no hardware or software handshaking for the writes, so nothing
>>> should slow it down.
>>>
>>> ]2) System RAM is reported incorrectly. In linux, "free -m" reports 
>>> 480mb
>>> ]of total RAM -- the full total is 4gb.
>>>
>>> I see, I had tested only a small memory configuration so far. It looks
>>> like any size greater than 4GB will fail. Try the attached patch.
>> Works great, 3.5gb of available RAM. I've submitted it into the 
>> review system.
>>> Thanks,
>>> Scott
>>>
>> Thanks again!
>> -Marshall
>>
>>
> Okay, There's something odd going on here. Now the NIC is gone. I'm 
> going to investigate in the morning and abandon the change in gerrit 
> until I know what's going on.
> -Marshall
>
Nevermind, it works - Apparently there are disadvantages to doing things 
that require thought in the very early hours of the morning. :|
Thanks!
-Marshall Buschman
Scott - 2011-06-19 19:54:37
Marshall Buschman wrote:

]Nevermind, it works - Apparently there are disadvantages to doing things 
]that require thought in the very early hours of the morning. :|
]Thanks!

Hello Marshall,

Thanks for the update. I tested Win7 with this change and 4GB and found
it is not happy. Win7 makes a BSOD. Windbg with checked build reports:

-------------------------------------------------
ffffffff84126053: Store(TOM1=0xaaaaaaaa,MM1B)=0xaaaaaaaa
ffffffff8412605c: ShiftLeft(0x10000000,0x4,Local0)=0x100000000
ffffffff84126065:
Subtract(Local0=0x100000000,TOM1=0xaaaaaaaa,Local0)=0x55555556
ffffffff8412606c: Store(Local0=0x55555556,MM1L)=0x55555556
ffffffff84126072: Return(CRES=Buffer(0x42){
	
0x47,0x01,0xf8,0x0c,0xf8,0x0c,0x01,0x08,0x88,0x0d,0x00,0x01,0x0c,0x03
	
0x00,0x00,0x00,0x00,0xf7,0x0c,0x00,0x00,0xf8,0x0c,0x88,0x0d,0x00,0x01
	
0x0c,0x03,0x00,0x00,0x00,0x0d,0xff,0xff,0x00,0x00,0x00,0xf3,0x86,0x09
	
0x00,0x00,0x00,0x00,0x0a,0x00,0x00,0x00,0x02,0x00,0x86,0x09,0x00,0x00
	0xaa,0xaa,0xaa,0xaa,0x56,0x55,0x55,0x55,0x79,0x00})
ffffffff84126077: }ACPI: E820 Entry 3 (type 4503599627370497)
(c7fee00000000000-700000000) overlaps
ACPI: PCI  Entry -1431655766 Min:ffffffff00000000 Max:5555555600000000
Length:100000000 Align:0
ACPI:
ACPI: FATAL BIOS ERROR - Need new BIOS to fix PCI problems
-------------------------------------------------

Unfortunately the Win7 code that prints e820 message has an error where
the argument and format string do not match. One is long and the other
is long long. That is why the numbers are garbage. The real problem is
that the asl code for \_SB.PCI0._CRS is using uninitialized variable
TOM1. The default value of aaaaaaaa from from line 267 of
family14/ssdt.asl is being used.

Somehow the OS does need to know where the PCI hole can safely start.
It can't start immediately after the end of low ram because of uma.
\_SB.PCI0._CRS is one way to pass this information. This method requires
passing data from coreboot to asl, which is a pain. I wonder if just
reserving the uma range in the e820 map is sufficient? I will try to
do some experiments tonight.

If you can send me a binary or otherwise let me recreate the serial
logging problem, I will take a look.

Thanks,
Scott
Peter Stuge - 2011-06-19 20:05:49
Scott Duplichan wrote:
> The real problem is that the asl code for \_SB.PCI0._CRS is using
> uninitialized variable TOM1. The default value of aaaaaaaa from
> from line 267 of family14/ssdt.asl is being used.

Good find. Many thanks Scott.


> Somehow the OS does need to know where the PCI hole can safely start.
> It can't start immediately after the end of low ram because of uma.
> \_SB.PCI0._CRS is one way to pass this information. This method
> requires passing data from coreboot to asl, which is a pain.

Is it neccessarily that bad? Rudolf has developed some functions to
build AML at coreboot run time. It sounds like they might help?


> I wonder if just reserving the uma range in the e820 map is
> sufficient? I will try to do some experiments tonight.

Maybe short term, but the only real solution is indeed to store the
correct value from coreboot processing into AML.


> If you can send me a binary or otherwise let me recreate the serial
> logging problem, I will take a look.

http://stuge.se/stuge_e350m1_47b3fb_4mb.bin


//Peter
mbuschman@lucidmachines.com - 2011-06-19 20:47:11
On 06/19/2011 03:05 PM, Peter Stuge wrote:
> Scott Duplichan wrote:
>> The real problem is that the asl code for \_SB.PCI0._CRS is using
>> uninitialized variable TOM1. The default value of aaaaaaaa from
>> from line 267 of family14/ssdt.asl is being used.
> Good find. Many thanks Scott.
>
>
>> Somehow the OS does need to know where the PCI hole can safely start.
>> It can't start immediately after the end of low ram because of uma.
>> \_SB.PCI0._CRS is one way to pass this information. This method
>> requires passing data from coreboot to asl, which is a pain.
> Is it neccessarily that bad? Rudolf has developed some functions to
> build AML at coreboot run time. It sounds like they might help?
>
>
>> I wonder if just reserving the uma range in the e820 map is
>> sufficient? I will try to do some experiments tonight.
> Maybe short term, but the only real solution is indeed to store the
> correct value from coreboot processing into AML.
>
>
>> If you can send me a binary or otherwise let me recreate the serial
>> logging problem, I will take a look.
> http://stuge.se/stuge_e350m1_47b3fb_4mb.bin
>
>
> //Peter
>
To add another data point, using Peter's image, it takes roughly 1 
minute and 51 seconds to boot.
Log file is at http://www.lucidmachines.com/coreboot/1min51sec

Thanks!
-Marshall

Patch

diff -r -u coreboot-8fed77a\src\northbridge\amd\agesa_wrapper\family14\northbridge.c coreboot-4gbfix\src\northbridge\amd\agesa_wrapper\family14\northbridge.c
--- coreboot-8fed77a\src\northbridge\amd\agesa_wrapper\family14\northbridge.c	Sat Jun 18 19:50:32 2011
+++ coreboot-4gbfix\src\northbridge\amd\agesa_wrapper\family14\northbridge.c	Sun Jun 19 00:50:25 2011
@@ -652,8 +652,8 @@ 
     d = get_dram_base_mask(0);
 
     if (d.mask & 1) {
-        basek = ((resource_t)(d.base)) << 8;
-        limitk = (resource_t)((d.mask << 8) | 0xFFFFFF);
+        basek = ((resource_t)((u64)d.base)) << 8;
+        limitk = (resource_t)(((u64)d.mask << 8) | 0xFFFFFF);
 printk(BIOS_DEBUG, "adsr: (before) basek = %llx, limitk = %llx.\n",basek,limitk);
 
         /* Convert these values to multiples of 1K for ease of math. */