Patchwork Add timeout check to SPI bitbang bus request / Re: Flashrom stuck using 100% cpu while probing for a chip

login
register
about
Submitter Carl-Daniel Hailfinger
Date 2014-04-29 06:32:42
Message ID <535F478A.9030904@gmx.net>
Download mbox | patch
Permalink /patch/4134/
State New
Headers show

Comments

Carl-Daniel Hailfinger - 2014-04-29 06:32:42
Hi everyone,

sorry for resurrecting this patch from the dead.

Ian: No need to test again, I believe something in your hardware may
have been fundamentally broken. I have no idea how to debug it, but
failing/aborting is better than hanging, so I think I'll merge this patch.

GatoLoko: This patch will probably change the hang into a really slow
abort. Still, aborting is better than hanging.


Am 15.10.2011 09:32 schrieb Stefan Tauner:
> On Fri, 14 Oct 2011 23:15:09 +0100
> Iain Paton <selsinork@gmail.com> wrote:
>
>> Stefan Tauner wrote:
>>> AT49BV002A(N)'s ID is 0x07 afaics and that's equal to AT49F002(N)... hm.
>> tried 7 & 8, 8 for the T version according to the datasheet, but nothing to 
>> lose trying both
> does the log show any other result than 0xff/0x00 (for any chip, if you
> probe for all of them)? i think you have not posted a log for that one,
> have you?
>
>>> what did you try? does probing with the nicintel_spi code help?
>> didn't appear to, but I'm assuming that when you did it you left it stuck in 
>> the loop while you tried the nicintel code ?  Carl-Daniel's patch means it 
>> doesn't get stuck anymore, so I need to either revert that or try something else.
> no, i aborted after entering the loop. the enable method is executed at
> startup of the programmer... hm. maybe it gets reversed since then.
> carldani added some kind of roll-back mechanism where the code
> registers certain actions in init functions etc. that are reversed on
> shutdown. but i think ctrl+c just aborts anyway...
> after looking at the code... the comments in nicintel_spic.c explain
> that the roll-back thingy is not possible, but it does it manually in
> the shutdown function, but i am quite sure that this is not called at
> all when you abort while probing... so it should be ok. *shrug*
>
>>> so it is certainly not exactly the same problem i had; and i cant
>>> help you further right now, sorry.
>> np, with these being pci cards their usefulness is limited to me nowadays, 
>> in fact I think they've sat in a box since last time :)  so it's mostly 
>> curiosity on my part, no burning need for me to solve it.
> that was my motivation too... and that's not a very strong one ;)

Updated version of the timeout patch.

Abort controller accesses which take too long instead of hanging indefinitely.

Signed-off-by: Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006@gmx.net>
GatoLoko - 2014-04-29 13:09:28
El 29/04/14 08:32, Carl-Daniel Hailfinger escribió:
> Hi everyone,
>
> sorry for resurrecting this patch from the dead.
>
> Ian: No need to test again, I believe something in your hardware may
> have been fundamentally broken. I have no idea how to debug it, but
> failing/aborting is better than hanging, so I think I'll merge this patch.
>
> GatoLoko: This patch will probably change the hang into a really slow
> abort. Still, aborting is better than hanging.

This definitely works a lot faster than my attempt to adapt the old 
patch to the new tree. It takes under 5 minutes for a full run with this 
version while the old one takes over half an hour. I'd say each chip 
probes for under 2 seconds.

Still, I see three problem:

1.- If you run flashrom without verbose parameters all you get in the 
terminal is a lot of "SB600: execute_command timeout!", without context, 
without telling it's probing for chips, but then the log (with -o file) 
shows the probing before the timeout failing as "Probing for AMIC 
A25L05PT, 64 kB: SB600: execute_command timeout!".

2.- I got a lot of "FIFO pointer corruption! Pointer is 0, wanted 3
Something else is accessing the flash chip and causes random corruption.
Please stop all applications and drivers and IPMI which access the flash 
chip." but as far as I know there is no other application or driver 
accessing the chip. May it be attempting to probe the next chip before 
fully closing/stopping/timeuoting the previous attempt?

3.- My chip isn't detected and I'm unsure whether that's the correct 
behavior or not (shouldn't the Macronix MX25L4005APC-12G be suported as 
a MX25L4005(A/C)?)


Attached are two log, one with -VVV and other without.

Patch

Index: flashrom-bitbang_timeout/ogp_spi.c
===================================================================
--- flashrom-bitbang_timeout/ogp_spi.c	(Revision 1781)
+++ flashrom-bitbang_timeout/ogp_spi.c	(Arbeitskopie)
@@ -53,9 +53,13 @@ 
 	{0},
 };
 
-static void ogp_request_spibus(void)
+static int ogp_request_spibus(void)
 {
+	/* FIXME: Do we have to check if any other SPI accesses are running
+	 * right now? Can the request fail?
+	 */
 	pci_mmio_writel(1, ogp_spibar + ogp_reg_sel);
+	return 0;
 }
 
 static void ogp_release_spibus(void)
Index: flashrom-bitbang_timeout/bitbang_spi.c
===================================================================
--- flashrom-bitbang_timeout/bitbang_spi.c	(Revision 1781)
+++ flashrom-bitbang_timeout/bitbang_spi.c	(Arbeitskopie)
@@ -46,10 +46,11 @@ 
 	return master->get_miso();
 }
 
-static void bitbang_spi_request_bus(const struct bitbang_spi_master * const master)
+static int bitbang_spi_request_bus(const struct bitbang_spi_master * const master)
 {
 	if (master->request_bus)
-		master->request_bus();
+		return master->request_bus();
+	return 0;
 }
 
 static void bitbang_spi_release_bus(const struct bitbang_spi_master * const master)
@@ -136,14 +137,16 @@ 
 				    const unsigned char *writearr,
 				    unsigned char *readarr)
 {
-	int i;
+	int i, ret;
 	const struct bitbang_spi_master *master = flash->pgm->spi.data;
 
 	/* FIXME: Run bitbang_spi_request_bus here or in programmer init?
 	 * Requesting and releasing the SPI bus is handled in here to allow the
 	 * programmer to use its own SPI engine for native accesses.
 	 */
-	bitbang_spi_request_bus(master);
+	ret = bitbang_spi_request_bus(master);
+	if (ret)
+		return ret;
 	bitbang_spi_set_cs(master, 0);
 	for (i = 0; i < writecnt; i++)
 		bitbang_spi_rw_byte(master, writearr[i]);
Index: flashrom-bitbang_timeout/nicintel_spi.c
===================================================================
--- flashrom-bitbang_timeout/nicintel_spi.c	(Revision 1781)
+++ flashrom-bitbang_timeout/nicintel_spi.c	(Arbeitskopie)
@@ -71,6 +71,9 @@ 
 // #define FL_BUSY	30
 // #define FL_ER	31
 
+/* 1000000 loops are roughly 1 s. */
+#define MAX_REQUEST_LOOPS      1000000
+
 uint8_t *nicintel_spibar;
 
 const struct dev_entry nics_intel_spi[] = {
@@ -83,16 +86,27 @@ 
 	{0},
 };
 
-static void nicintel_request_spibus(void)
+static void nicintel_release_spibus(void);
+
+static int nicintel_request_spibus(void)
 {
 	uint32_t tmp;
+	int i = 0;
 
 	tmp = pci_mmio_readl(nicintel_spibar + FLA);
 	tmp |= 1 << FL_REQ;
 	pci_mmio_writel(tmp, nicintel_spibar + FLA);
 
 	/* Wait until we are allowed to use the SPI bus. */
-	while (!(pci_mmio_readl(nicintel_spibar + FLA) & (1 << FL_GNT))) ;
+	while (!(pci_mmio_readl(nicintel_spibar + FLA) & (1 << FL_GNT))) {
+		if (i++ > MAX_REQUEST_LOOPS) {
+			nicintel_release_spibus();
+			msg_perr("%s: Timeout! Aborting.\n", __func__);
+			return TIMEOUT_ERROR;
+		}
+	}
+
+	return 0;
 }
 
 static void nicintel_release_spibus(void)
Index: flashrom-bitbang_timeout/sb600spi.c
===================================================================
--- flashrom-bitbang_timeout/sb600spi.c	(Revision 1781)
+++ flashrom-bitbang_timeout/sb600spi.c	(Arbeitskopie)
@@ -159,12 +159,19 @@ 
 	return ret;
 }
 
-static void execute_command(void)
+static int execute_command(void)
 {
+	int timeout = 1000000;
+
 	mmio_writeb(mmio_readb(sb600_spibar + 2) | 1, sb600_spibar + 2);
 
-	while (mmio_readb(sb600_spibar + 2) & 1)
+	while ((mmio_readb(sb600_spibar + 2) & 1) && --timeout)
 		;
+	if (!timeout) {
+		msg_perr("SB600: execute_command timeout!\n");
+		return TIMEOUT_ERROR;
+	}
+	return 0;
 }
 
 static int sb600_spi_send_command(struct flashctx *flash, unsigned int writecnt,
@@ -177,6 +184,7 @@ 
 	unsigned char cmd = *writearr++;
 	unsigned int readoffby1;
 	unsigned char readwrite;
+	int ret;
 
 	writecnt--;
 
@@ -225,7 +233,9 @@ 
 		return SPI_PROGRAMMER_ERROR;
 
 	msg_pspew("Executing: \n");
-	execute_command();
+	ret = execute_command();
+	if (ret)
+		return ret;
 
 	/*
 	 * After the command executed, we should find out the index of the
Index: flashrom-bitbang_timeout/mcp6x_spi.c
===================================================================
--- flashrom-bitbang_timeout/mcp6x_spi.c	(Revision 1781)
+++ flashrom-bitbang_timeout/mcp6x_spi.c	(Arbeitskopie)
@@ -40,22 +40,38 @@ 
 #define MCP6X_SPI_REQUEST	0
 #define MCP6X_SPI_GRANT		8
 
+/* 1000000 loops are roughly 1 s. */
+#define MAX_REQUEST_LOOPS      1000000
+
 void *mcp6x_spibar = NULL;
 
 /* Cached value of last GPIO state. */
 static uint8_t mcp_gpiostate;
 
-static void mcp6x_request_spibus(void)
+static void mcp6x_release_spibus(void);
+
+static int mcp6x_request_spibus(void)
 {
+	int i = 0;
+
 	mcp_gpiostate = mmio_readb(mcp6x_spibar + 0x530);
 	mcp_gpiostate |= 1 << MCP6X_SPI_REQUEST;
 	mmio_writeb(mcp_gpiostate, mcp6x_spibar + 0x530);
 
 	/* Wait until we are allowed to use the SPI bus. */
-	while (!(mmio_readw(mcp6x_spibar + 0x530) & (1 << MCP6X_SPI_GRANT))) ;
+	while (!(mmio_readw(mcp6x_spibar + 0x530) & (1 << MCP6X_SPI_GRANT))) {
+		if (i++ > MAX_REQUEST_LOOPS) {
+			/* Update the cache. */
+			mcp_gpiostate = mmio_readb(mcp6x_spibar + 0x530);
+			mcp6x_release_spibus();
+			msg_perr("%s: Timeout! Aborting.\n", __func__);
+			return TIMEOUT_ERROR;
+		}
+	}
 
 	/* Update the cache. */
 	mcp_gpiostate = mmio_readb(mcp6x_spibar + 0x530);
+	return 0;
 }
 
 static void mcp6x_release_spibus(void)
Index: flashrom-bitbang_timeout/programmer.h
===================================================================
--- flashrom-bitbang_timeout/programmer.h	(Revision 1781)
+++ flashrom-bitbang_timeout/programmer.h	(Arbeitskopie)
@@ -157,7 +157,7 @@ 
 	void (*set_sck) (int val);
 	void (*set_mosi) (int val);
 	int (*get_miso) (void);
-	void (*request_bus) (void);
+	int (*request_bus) (void);
 	void (*release_bus) (void);
 	/* Length of half a clock period in usecs. */
 	unsigned int half_period;