Patchwork SPI flashing speedup

login
register
about
Submitter Carl-Daniel Hailfinger
Date 2010-06-16 12:13:22
Message ID <4C18BFE2.6040908@gmx.net>
Download mbox | patch
Permalink /patch/1512/
State Bitrotted
Headers show

Comments

Carl-Daniel Hailfinger - 2010-06-16 12:13:22
I know of no supported SPI chip which would require delays for any
operation. The delays in our SPI code are only there to reduce load on
the SPI bus. As such, they might be a good idea for shared flash, but
I'm not too convinced of that either. For all external programmers,
delays between commands are 100% pointless unless required by the flash
chip (and I don't know of any such requirements).

Michael, I'd like to hear your thoughts on the impact of reading the
status register in a tight loop instead of sleeping in between for
systems which use flash translation (e.g. by EC).

There are two options how we can handle this:
- Remove the delays completely.
- Mark the delays as optional, e.g. programmer_odelay() and let it
default to a nop.

This patch removes the delays.

Signed-off-by: Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006@gmx.net>
Michael Karcher - 2010-06-16 14:56:42
Am Mittwoch, den 16.06.2010, 14:13 +0200 schrieb Carl-Daniel Hailfinger:
> Michael, I'd like to hear your thoughts on the impact of reading the
> status register in a tight loop instead of sleeping in between for
> systems which use flash translation (e.g. by EC).
In shared flash environments, a short sleep between polls of the status
register might be very useful. I don't see a big timing impact of a 10us
sleep on flashrom, but it will greatly reduce SPI load. Also continous
polling is a host CPU hog.

So I would suggest the following:
 - 10us delay where a quick response is expected
 - usleep(1) where a slow response is expected (Should give up a time
slice, don't really know how it behaves on current tickless linux,
though.)

> There are two options how we can handle this:
> - Remove the delays completely.
> - Mark the delays as optional, e.g. programmer_odelay() and let it
> default to a nop.
Third option: Make the delays so short that they don't matter. I'm all
for the third option.

I would suggest to factor out the wait-WIP code and introduce three
delay levels: near-immediate (10us), short (usleep(1)) and long
(something like 50ms). Then the chip definition can state what polling
delay is recommended for chip erase, sector erase and write.

It might be an even better idea to state the expected duration of the
erase/write instruction in the chip definition. This would enable
flashrom to make a coarse guess about total time, and make choosing of a
small polling interval possible.

The reason I suggest this definitely more convoluted approach than just
killing the delays is that I dislike the CPU maxing out at 100% for
several minutes when waiting for chip-erase complete on a slow SPI chip.
Especially if this is a shared flash system, I have no idea whether you
could get thermal issues by that. One would hope that the "idle" mode of
the EC used during flashing puts the fan to maximum - but who can be
sure?

For these reasons, I am not going to ack this patch as is.

Regards,
  Michael Karcher
Andrew Goodbody - 2010-06-16 20:41:06
On 16/06/2010 15:56, Michael Karcher wrote:
> In shared flash environments, a short sleep between polls of the status
> register might be very useful. I don't see a big timing impact of a 10us
> sleep on flashrom, but it will greatly reduce SPI load.

No, I don't think so, at least not because of the shared environment. 
Generally (always?) in a shared environment you first have to tell the 
other device to completely stop accessing the flash. Once you start any 
operation other than read then you cannot have anything else access that 
flash device or else one or the other will get corrupted results eg ID 
instead of code fetch. So the best thing to do is to complete all 
operations as quickly as possible and then signal the shared accesses to 
resume.

Andrew
Michael Karcher - 2010-06-16 22:12:59
Am Mittwoch, den 16.06.2010, 21:41 +0100 schrieb Andrew Goodbody:
> On 16/06/2010 15:56, Michael Karcher wrote:
> > In shared flash environments, a short sleep between polls of the status
> > register might be very useful. I don't see a big timing impact of a 10us
> > sleep on flashrom, but it will greatly reduce SPI load.
> No, I don't think so, at least not because of the shared environment. 
> Generally (always?) in a shared environment you first have to tell the 
> other device to completely stop accessing the flash.
Hmm, you are right for raw SPI access. For the LPC->SPI translation in
many ECs (which we were not talking about here), there is hardware
arbitration that coordinates flash access from the LPC bust and flash
access from the EC core.


> Once you start any operation other than read then you cannot have anything
> else access that flash device or else one or the other will get corrupted
> results eg ID instead of code fetch.
For SPI this is not the case. A read ID command always returns the ID
and a read data command always returns data. For SPI it would be
important to not interrupt between WREN and the program instruction, but
as you already stated at the beginning: During flashing, the EC system
is shut down so no interfering action will happen.

> So the best thing to do is to complete all operations as quickly as
> possible and then signal the shared accesses to resume.
But the point on excessive CPU load still stands. A programmer_delay(10)
of cause will also cause excessive CPU load, as it is busy waiting. So
we should have the cases
 - no delay
 - short delay (give up timeslices)
 - long delay (several 10s of ms)

Regards,
  Michae Karcher

Patch

Index: flashrom-spi_nodelay/spi25.c
===================================================================
--- flashrom-spi_nodelay/spi25.c	(Revision 1048)
+++ flashrom-spi_nodelay/spi25.c	(Arbeitskopie)
@@ -455,11 +455,11 @@ 
 		return result;
 	}
 	/* Wait until the Write-In-Progress bit is cleared.
-	 * This usually takes 1-85 s, so wait in 1 s steps.
+	 * This usually takes 1-85 s.
 	 */
 	/* FIXME: We assume spi_read_status_register will never fail. */
 	while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP)
-		programmer_delay(1000 * 1000);
+		/* do nothing */;
 	if (check_erased_range(flash, 0, flash->total_size * 1024)) {
 		msg_cerr("ERASE FAILED!\n");
 		return -1;
@@ -500,11 +500,11 @@ 
 		return result;
 	}
 	/* Wait until the Write-In-Progress bit is cleared.
-	 * This usually takes 1-85 s, so wait in 1 s steps.
+	 * This usually takes 1-85 s.
 	 */
 	/* FIXME: We assume spi_read_status_register will never fail. */
 	while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP)
-		programmer_delay(1000 * 1000);
+		/* do nothing */;
 	if (check_erased_range(flash, 0, flash->total_size * 1024)) {
 		msg_cerr("ERASE FAILED!\n");
 		return -1;
@@ -545,10 +545,10 @@ 
 		return result;
 	}
 	/* Wait until the Write-In-Progress bit is cleared.
-	 * This usually takes 100-4000 ms, so wait in 100 ms steps.
+	 * This usually takes 100-4000 ms.
 	 */
 	while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP)
-		programmer_delay(100 * 1000);
+		/* do nothing */;
 	if (check_erased_range(flash, addr, blocklen)) {
 		msg_cerr("ERASE FAILED!\n");
 		return -1;
@@ -594,10 +594,10 @@ 
 		return result;
 	}
 	/* Wait until the Write-In-Progress bit is cleared.
-	 * This usually takes 100-4000 ms, so wait in 100 ms steps.
+	 * This usually takes 100-4000 ms.
 	 */
 	while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP)
-		programmer_delay(100 * 1000);
+		/* do nothing */;
 	if (check_erased_range(flash, addr, blocklen)) {
 		msg_cerr("ERASE FAILED!\n");
 		return -1;
@@ -641,10 +641,10 @@ 
 		return result;
 	}
 	/* Wait until the Write-In-Progress bit is cleared.
-	 * This usually takes 100-4000 ms, so wait in 100 ms steps.
+	 * This usually takes 100-4000 ms.
 	 */
 	while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP)
-		programmer_delay(100 * 1000);
+		/* do nothing */;
 	if (check_erased_range(flash, addr, blocklen)) {
 		msg_cerr("ERASE FAILED!\n");
 		return -1;
@@ -709,10 +709,10 @@ 
 		return result;
 	}
 	/* Wait until the Write-In-Progress bit is cleared.
-	 * This usually takes 15-800 ms, so wait in 10 ms steps.
+	 * This usually takes 15-800 ms.
 	 */
 	while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP)
-		programmer_delay(10 * 1000);
+		/* do nothing */;
 	if (check_erased_range(flash, addr, blocklen)) {
 		msg_cerr("ERASE FAILED!\n");
 		return -1;
@@ -977,7 +977,7 @@ 
 			if (rc)
 				break;
 			while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP)
-				programmer_delay(10);
+				/* do nothing */;
 		}
 		if (rc)
 			break;
@@ -1010,7 +1010,7 @@ 
 		if (result)
 			return 1;
 		while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP)
-			programmer_delay(10);
+			/* do nothing */;
 	}
 
 	return 0;
@@ -1044,13 +1044,13 @@ 
 		return result;
 	spi_send_command(6, 0, w, NULL);
 	while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP)
-		programmer_delay(5); /* SST25VF040B Tbp is max 10us */
+		/* do nothing */;
 	while (pos < size) {
 		w[1] = buf[pos++];
 		w[2] = buf[pos++];
 		spi_send_command(3, 0, w, NULL);
 		while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP)
-			programmer_delay(5); /* SST25VF040B Tbp is max 10us */
+			/* do nothing */;
 	}
 	spi_write_disable();
 	return 0;
Index: flashrom-spi_nodelay/it87spi.c
===================================================================
--- flashrom-spi_nodelay/it87spi.c	(Revision 1048)
+++ flashrom-spi_nodelay/it87spi.c	(Arbeitskopie)
@@ -304,7 +304,7 @@ 
 	 * This usually takes 1-10 ms, so wait in 1 ms steps.
 	 */
 	while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP)
-		programmer_delay(1000);
+		/* do nothing */;
 	return 0;
 }
 
Index: flashrom-spi_nodelay/wbsio_spi.c
===================================================================
--- flashrom-spi_nodelay/wbsio_spi.c	(Revision 1048)
+++ flashrom-spi_nodelay/wbsio_spi.c	(Arbeitskopie)
@@ -163,7 +163,6 @@ 
 
 	OUTB(writearr[0], wbsio_spibase);
 	OUTB(mode, wbsio_spibase + 1);
-	programmer_delay(10);
 
 	if (!readcnt)
 		return 0;