Patchwork Fix the delay loop

login
register
about
Submitter Carl-Daniel Hailfinger
Date 2010-03-29 04:50:20
Message ID <4BB0318C.3020401@gmx.net>
Download mbox | patch
Permalink /patch/1172/
State Accepted
Commit r990
Headers show

Comments

Carl-Daniel Hailfinger - 2010-03-29 04:50:20
[Side note: This is not the big timing code rewrite I originally wanted
to post. I decided to focus on readable code instead of clever code. My
other rewrite is available on request if anyone is interested.]

The current delay loop calculation is still from revision 1 of flashrom,
and since then it had a logic bug which caused all delays to be twice as
long as intended. Fix the delay duration.

Protect against delay loop overflows.

Detect a non-working delay loop.

Change the delay loop itself to ensure clever compiler optimizers won't
eliminate it (as happens with clang/llvm in the current code). Some
people suggested machine-specific asm, but the empty asm statement with
the loop counter as register/memory input has the benefit of being
perfectly cross-platform and working in gcc and clang.

If time goes backwards (catastrophical NTP time difference, manual time
change), timing measurements were shot because the new-old time
subtraction yielded negative numbers which weren't handled correctly
because the variable is unsigned. Simply return 1 microsecond timing in
that case.

If time goes forward too fast, pick the biggest possible timing
measurement with a guaranteed overflow avoidance for all timing
calculations.

Check four times if the calculated timing is at most 10% too fast. This
addresses OS scheduler interactions, e.g. being scheduled out during
measurement which inflates measurements.

If the timing is off, recalculate the timer values up to four times
before giving up.

Avoid division by zero in rare cases where timing measurements for a 250
ms delay return 0 us elapsed.

Signed-off-by: Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006@gmx.net>
Joerg Mayer - 2010-03-29 10:44:25
On Mon, Mar 29, 2010 at 06:50:20AM +0200, Carl-Daniel Hailfinger wrote:
> +__attribute__ ((noinline)) void myusec_delay(int usecs)

Is this supported on all supported compilers? If not, maybe create a #define
for that and use this instead of the explicit __attribute__((noinline)).

 ciao
    Joerg
Carl-Daniel Hailfinger - 2010-03-29 17:41:38
On 29.03.2010 12:44, Joerg Mayer wrote:
> On Mon, Mar 29, 2010 at 06:50:20AM +0200, Carl-Daniel Hailfinger wrote:
>   
>> +__attribute__ ((noinline)) void myusec_delay(int usecs)
>>     
>
> Is this supported on all supported compilers? If not, maybe create a #define
> for that and use this instead of the explicit __attribute__((noinline)).
>   

Hm. It is compatible with clang/LLVM, gcc and IBM XL C/C++.

Apparently MSVC (which can be tested for with _MSC_VER) wants
|__declspec(noinline)

|Then again, I don't think anyone ever compiled and ran flashrom with a
non-gcc compiler. Heck, without my delay loop fix flashrom won't work on
clang.

If you have other compilers available to test compilation or even
execution of flashrom, please report your test results to the list. I'm
willing to fix anything that comes up and doesn't require major surgery.
It should be possible to compile flashrom on any platform without
hardware support with the following:
make distclean
make CONFIG_INTERNAL=no CONFIG_SERPROG=no CONFIG_SATASII=no
CONFIG_DRKAISER=no CONFIG_NIC3COM=no CONFIG_BUSPIRATESPI=no
CONFIG_FT2232SPI=no CONFIG_DEDIPROG=no CONFIG_DUMMY=yes

Regards,
Carl-Daniel
Maciej Pijanka - 2010-03-31 16:58:44
On Mon, 29 Mar 2010, Carl-Daniel Hailfinger wrote:

> [Side note: This is not the big timing code rewrite I originally wanted
> to post. I decided to focus on readable code instead of clever code. My
> other rewrite is available on request if anyone is interested.]
> 
> The current delay loop calculation is still from revision 1 of flashrom,
> and since then it had a logic bug which caused all delays to be twice as
> long as intended. Fix the delay duration.
> 
> Protect against delay loop overflows.
> 
> Detect a non-working delay loop.
> 
> Change the delay loop itself to ensure clever compiler optimizers won't
> eliminate it (as happens with clang/llvm in the current code). Some
> people suggested machine-specific asm, but the empty asm statement with
> the loop counter as register/memory input has the benefit of being
> perfectly cross-platform and working in gcc and clang.
> 
> If time goes backwards (catastrophical NTP time difference, manual time
> change), timing measurements were shot because the new-old time
> subtraction yielded negative numbers which weren't handled correctly
> because the variable is unsigned. Simply return 1 microsecond timing in
> that case.
> 
> If time goes forward too fast, pick the biggest possible timing
> measurement with a guaranteed overflow avoidance for all timing
> calculations.

Maybe in case of leap seconds or time changing simply detect, and either fail
and restart whole timing calculation, from scratch (possibly noting attempt
number). If after some fixed number of retries you can't get reliable result.
simply fail?

> Check four times if the calculated timing is at most 10% too fast. This
> addresses OS scheduler interactions, e.g. being scheduled out during
> measurement which inflates measurements.

this makes sense but if we want that good results, maybe odd number of repeats,
leave off minimum and maximum ones, then calculate over remaining (and minimum
sane number of remaining atempts is imo 5, thus seven is needed to make sufficient
number of iterations).


Anyway, imo code is good and readable and gives more acurate results on my machines.

Acked-by: Maciej Pijanka <maciej.pijanka@gmail.com>


babol:~/flashrom# ./flashrom -V 2>&1 |grep loop
Calibrating delay loop... 167M loops per second, 10 myus = 12 us, 100 myus = 103 us, 1000 myus = 1003 us, 10000 myus = 10054 us, OK.
babol:~/flashrom# ./flashrom -V 2>&1 |grep loop
Calibrating delay loop... 164M loops per second, 10 myus = 12 us, 100 myus = 101 us, 1000 myus = 1031 us, 10000 myus = 9877 us, OK.
babol:~/flashrom# ./flashrom -V -p nic3com 2>&1 |grep loop
Calibrating delay loop... 167M loops per second, 10 myus = 12 us, 100 myus = 102 us, 1000 myus = 1034 us, 10000 myus = 10052 us, OK.
babol:~/flashrom# ./flashrom -V -p nic3com 2>&1 |grep loop
Calibrating delay loop... 167M loops per second, 10 myus = 11 us, 100 myus = 103 us, 1000 myus = 1035 us, 10000 myus = 10052 us, OK.
Carl-Daniel Hailfinger - 2010-03-31 23:55:35
On 31.03.2010 18:58, Maciej Pijanka wrote:
> On Mon, 29 Mar 2010, Carl-Daniel Hailfinger wrote:
>
>   
>> The current delay loop calculation is still from revision 1 of flashrom,
>> and since then it had a logic bug which caused all delays to be twice as
>> long as intended. Fix the delay duration.
>>
>> Protect against delay loop overflows.
>>
>> Detect a non-working delay loop.
>>
>> Change the delay loop itself to ensure clever compiler optimizers won't
>> eliminate it (as happens with clang/llvm in the current code). Some
>> people suggested machine-specific asm, but the empty asm statement with
>> the loop counter as register/memory input has the benefit of being
>> perfectly cross-platform and working in gcc and clang.
>>
>> If time goes backwards (catastrophical NTP time difference, manual time
>> change), timing measurements were shot because the new-old time
>> subtraction yielded negative numbers which weren't handled correctly
>> because the variable is unsigned. Simply return 1 microsecond timing in
>> that case.
>>
>> If time goes forward too fast, pick the biggest possible timing
>> measurement with a guaranteed overflow avoidance for all timing
>> calculations.
>>     
>
> Maybe in case of leap seconds or time changing simply detect, and either fail
> and restart whole timing calculation, from scratch (possibly noting attempt
> number). If after some fixed number of retries you can't get reliable result.
> simply fail?
>   

The code does exactly what you suggested (unless I'm mistaken). The
changelog needs improvement to actually tell people about this.


>> Check four times if the calculated timing is at most 10% too fast. This
>> addresses OS scheduler interactions, e.g. being scheduled out during
>> measurement which inflates measurements.
>>     
>
> this makes sense but if we want that good results, maybe odd number of repeats,
> leave off minimum and maximum ones, then calculate over remaining (and minimum
> sane number of remaining atempts is imo 5, thus seven is needed to make sufficient
> number of iterations).
>   

Hmmm... could be done, but I'd rather tell the user that timing is
garbage than to add an elaborate algorithm to convert garbage to gold. ;-)

> Anyway, imo code is good and readable and gives more acurate results on my machines.
>
> Acked-by: Maciej Pijanka <maciej.pijanka@gmail.com>
>   

Thanks, committed in r990.

Regards,
Carl-Daniel

Patch

Index: flashrom-delayloop_portable_fix/udelay.c
===================================================================
--- flashrom-delayloop_portable_fix/udelay.c	(Revision 988)
+++ flashrom-delayloop_portable_fix/udelay.c	(Arbeitskopie)
@@ -24,13 +24,16 @@ 
 #include <limits.h>
 #include "flash.h"
 
-// count to a billion. Time it. If it's < 1 sec, count to 10B, etc.
+/* loops per microsecond */
 unsigned long micro = 1;
 
-void myusec_delay(int usecs)
+__attribute__ ((noinline)) void myusec_delay(int usecs)
 {
-	volatile unsigned long i;
-	for (i = 0; i < usecs * micro; i++) ;
+	unsigned long i;
+	for (i = 0; i < usecs * micro; i++) {
+		/* Make sure the compiler doesn't optimize the loop away. */
+		asm volatile ("" : : "rm" (i) );
+	}
 }
 
 unsigned long measure_delay(int usecs)
@@ -43,30 +46,62 @@ 
 	gettimeofday(&end, 0);
 	timeusec = 1000000 * (end.tv_sec - start.tv_sec) +
 		   (end.tv_usec - start.tv_usec);
+	/* Protect against time going forward too much. */
+	if ((end.tv_sec > start.tv_sec) &&
+	    ((end.tv_sec - start.tv_sec) >= LONG_MAX / 1000000 - 1))
+		timeusec = LONG_MAX;
+	/* Protect against time going backwards during leap seconds. */
+	if ((end.tv_sec < start.tv_sec) || (timeusec > LONG_MAX))
+		timeusec = 1;
 
 	return timeusec;
 }
 
 void myusec_calibrate_delay(void)
 {
-	int count = 1000;
+	unsigned long count = 1000;
 	unsigned long timeusec;
-	int ok = 0;
+	int i, tries = 0;
 
 	printf("Calibrating delay loop... ");
 
-	while (!ok) {
+recalibrate:
+	while (1) {
 		timeusec = measure_delay(count);
+		if (timeusec > 1000000 / 4)
+			break;
+		if (count >= ULONG_MAX / 2) {
+			msg_pinfo("timer loop overflow, reduced precision. ");
+			break;
+		}
 		count *= 2;
-		if (timeusec < 1000000 / 4)
-			continue;
-		ok = 1;
 	}
+	tries ++;
 
-	// compute one microsecond. That will be count / time
-	micro = count / timeusec;
-	msg_pdbg("%ldM loops per second, ", micro);
+	/* Avoid division by zero, but in that case the loop is shot anyway. */
+	if (!timeusec)
+		timeusec = 1;
+	
+	/* Compute rounded up number of loops per microsecond. */
+	micro = (count * micro) / timeusec + 1;
+	msg_pdbg("%luM loops per second, ", micro);
 
+	/* Did we try to recalibrate less than 5 times? */
+	if (tries < 5) {
+		/* Recheck our timing to make sure we weren't just hitting
+		 * a scheduler delay or something similar.
+		 */
+		for (i = 0; i < 4; i++) {
+			if (measure_delay(100) < 90) {
+				msg_pdbg("delay more than 10% too short, "
+					 "recalculating... ");
+				goto recalibrate;
+			}
+		}
+	} else {
+		msg_perr("delay loop is unreliable, trying to continue ");
+	}
+
 	/* We're interested in the actual precision. */
 	timeusec = measure_delay(10);
 	msg_pdbg("10 myus = %ld us, ", timeusec);