Patchwork [sort,of] multiplex console output from each core

login
register
about
Submitter xdrudis
Date 2011-01-29 10:09:05
Message ID <20110129100905.GB8621@ideafix.casa.ct>
Download mbox | patch
Permalink /patch/2580/
State New
Headers show

Comments

xdrudis - 2011-01-29 10:09:05
Hello. 

This patch works for me but needs a small function calc_id_buffer for
each board/cpu/whatever. I only made one for amd quadcore because it
is what I have.  My board does not get to ramstage, so it might not
work there. It works for my serial console but should work for net or
usb if I'm not mistaken.  Also my tree is not up to date with svn
head.  Likely things could be placed elsewhere or done better, so it
is mostly an idea.

But I'm unlikely to have more time for it and it works for me so far,
so I'm sending it so that it does not get lost and in case somebody
likes the idea and wants to commit or do it better.

Signed-off-by: Xavi Drudis Ferran <xdrudis@tinet.cat>

My problem was with having serial output like 

[...]
init node: 00  cores: 03 
Start other core - nodeid: 00  cores: 03
start_other_cores() retuPPPrOOOnSSSeTTTd:::
   P000OxxxST3:330 00
0

x3   7cc
cosootrrraeeexxxr:::t  e    d-- --a---p--    {{a{ p  iAAAPPPcIIIiCCdCI:IID DD   ===   000231   NNNOOODDDEEEIIIDDD   ===   000000   CCCOOORRREEEIIIDDD   ===   000312}}}   ---------
[...]

So I changed it because I like it better like this:

[...]
����������������������������������������������������������������
[...]

which is ugly, slow, etc. but I can pipe it to a little script which turns it back into 
almost the original ("boooh!!!" :( )
or writes it a little nicer ( :) )

[...]
core0:init node: 00  cores: 03
core0:Start other core - nodeid: 00  cores: 03
core0:start_other_cores() returned
core1:POST:
core2:POST:
core3:POST:
core0:POST:
core1: 0x30
core2: 0x30
core3: 0x30
core0:0
core0:x37
core2: c
core3: c
core0:started ap apicid:
core1:corex:  --- { APICID = 01 NODEID = 00 COREID = 01} ---
core2:orex:  --- { APICID = 02 NODEID = 00 COREID = 02} ---
core3:orex:  --- { APICID = 03 NODEID = 00 COREID = 03} ---
[...]

or even ( :) )

[...]
init node: 00  cores: 03|                        |                        |                        
                        |                        |                        |                        
Start other core - nodei|                        |                        |                        
d: 00  cores: 03        |                        |                        |                        
start_other_cores() retu|P                       |P                       |P                       
rned                    |OST:                    |OST:                    |OST:                    
POST:                   | 0x30                   | 0x30                   | 0x30                   
0                       |                        |                        |                        
                        |                        |                        |                        
x37                     |                        | c                      | c                      
started ap apicid:      |corex:  --- { APICID = 0|orex:  --- { APICID = 02|orex:  --- { APICID = 03
                        |1 NODEID = 00 COREID = 0| NODEID = 00 COREID = 0 | NODEID = 00 COREID = 03
                        |1} ---                  |2} ---                  |} ---                   
                        |                        |                        |                        
[...]

or other options to if you fancy.

I observed characters from each core got mixed but bits from each
character not, so the superIO or someone probably did alreday
synchronize bytes correctly. So I could split each byte in two, add a
4 bit buffer id and pretend each core has a different serial port and
I'm multiplexing up to 16 channels for up to 16 cores into one
physical serial port, by sending together the channel/buffer id with
the character (4 bits buffer id, 4 bits half character).  Then a small
perl script demultiplexes each channel into a different buffer and
formats output.  If someone has more than 16 cores does she really
want to see ouput from all at a time?  Redefining the weak function
calc_id_buffer you can choose to have some of them mix into the same
buffer or just filter out the output for some of them.
 
Hope the patch is simple enough, else I can explain more later.
I must leave now.
xdrudis - 2011-01-30 14:46:01
On Sat, Jan 29, 2011 at 11:09:05AM +0100, xdrudis wrote:
> is what I have.  My board does not get to ramstage, so it might not
> work there. It works for my serial console but should work for net or

Ok, now I see it. It won't work with sprintf in ramstage. 

I shouldn't have modified vtxprintf but maybe console.c, either 
calling calc_id_buffer for each byte in __console_tx_byte (too expensive ?) or 
passing around id_buffer by more extensive modification (like 
changing tx_byte signature to add a channel parameter or something, maybe 
not worth it)

Forget it and sorry for the noise.
xdrudis - 2011-01-30 15:04:37
On Sat, Jan 29, 2011 at 11:09:05AM +0100, xdrudis wrote:

> formats output.  If someone has more than 16 cores does she really
> want to see ouput from all at a time?  Redefining the weak function
> calc_id_buffer you can choose to have some of them mix into the same
> buffer or just filter out the output for some of them.
>

Also wrong. Having several concurrent outputs with the same buffer id 
is not safe, demuxing could join the wrong half-characters.

Filtering out some cores would be an option.
Patrick Georgi - 2011-01-30 15:48:44
Am 30.01.2011 16:04, schrieb xdrudis:
> Also wrong. Having several concurrent outputs with the same buffer id 
> is not safe, demuxing could join the wrong half-characters.
> 
> Filtering out some cores would be an option.
Or just use more bits for cores once the need arises. That will give you
up to 128 cores. That should be enough for a while.


Patrick
Stefan Reinauer - 2011-01-30 19:32:58
* xdrudis <xdrudis@tinet.cat> [110130 15:46]:
> On Sat, Jan 29, 2011 at 11:09:05AM +0100, xdrudis wrote:
> > is what I have.  My board does not get to ramstage, so it might not
> > work there. It works for my serial console but should work for net or
> 
> Ok, now I see it. It won't work with sprintf in ramstage. 

It should not be needed in ram stage, as we have locking there.

Stefan
xdrudis - 2011-01-30 19:59:25
On Sun, Jan 30, 2011 at 08:32:58PM +0100, Stefan Reinauer wrote:
> * xdrudis <xdrudis@tinet.cat> [110130 15:46]:
> > On Sat, Jan 29, 2011 at 11:09:05AM +0100, xdrudis wrote:
> > > is what I have.  My board does not get to ramstage, so it might not
> > > work there. It works for my serial console but should work for net or
> > 
> > Ok, now I see it. It won't work with sprintf in ramstage. 
> 
> It should not be needed in ram stage, as we have locking there.
> 

Yes, it'd be mostly unneeded, but anyway the patch I sent does not 
disable it in ramstage. So it still causes sprintf to consume 
the double of bytes maybe beyond its buffers (and produce 
unreadable messages). The idea can work, the perl script may be
usable, but the patches to C should be redone from scratch. 

In fact I think there's some subtle breakage already in romstage.c,
since I didn't realise console_tx will check for \n to add \r and that
won't work with CONFIG_DEBUX_MUXED in that patch, although it won't be
noticed unless you have 16 cores (I think an \n from the 16th core
would produce a \n).

But continuing with the idea, the perl script might be changed to
detect some string in the input and stop demultiplexing. Sonner or
later coreboot may stop multiplexing or simply stop running and have
some other software writing to serial port, and it'd be nice if the
script would then simply copy the input unchanged (to all outputs if
it's not writing to stdout only ?).
Marc Jones - 2011-01-30 20:05:54
On Sun, Jan 30, 2011 at 12:32 PM, Stefan Reinauer
<stefan.reinauer@coreboot.org> wrote:
> * xdrudis <xdrudis@tinet.cat> [110130 15:46]:
>> On Sat, Jan 29, 2011 at 11:09:05AM +0100, xdrudis wrote:
>> > is what I have.  My board does not get to ramstage, so it might not
>> > work there. It works for my serial console but should work for net or
>>
>> Ok, now I see it. It won't work with sprintf in ramstage.
>
> It should not be needed in ram stage, as we have locking there.
>

I think that the locking can be added via the BSPs cache. All
multicore should use CAR and it is a matter of adding it where it
won't get stepped on by the normal use of CAR. For AMD fam10, the
sysinfo setup would need to be fixed up.

Marc
Stefan Reinauer - 2011-01-30 20:07:52
* xdrudis <xdrudis@tinet.cat> [110130 20:59]:
> Yes, it'd be mostly unneeded, but anyway the patch I sent does not 
> disable it in ramstage. So it still causes sprintf to consume 
> the double of bytes maybe beyond its buffers (and produce 
> unreadable messages). The idea can work, the perl script may be
> usable, but the patches to C should be redone from scratch. 

Can't you just hack up the muxing in console_tx_byte instead? That
should solvle the problem and make the patch a tad less intrusive.

Stefan
xdrudis - 2011-01-30 21:06:01
On Sun, Jan 30, 2011 at 09:07:52PM +0100, Stefan Reinauer wrote:
> * xdrudis <xdrudis@tinet.cat> [110130 20:59]:
> > Yes, it'd be mostly unneeded, but anyway the patch I sent does not 
> > disable it in ramstage. So it still causes sprintf to consume 
> > the double of bytes maybe beyond its buffers (and produce 
> > unreadable messages). The idea can work, the perl script may be
> > usable, but the patches to C should be redone from scratch. 
> 
> Can't you just hack up the muxing in console_tx_byte instead? That
> should solvle the problem and make the patch a tad less intrusive.
> 

yes, console_tx_byte or __console_tx_byte .

The only drawback is that then I have to call calc_id_buffer for 
each byte, I guess it doesn't matter, but it was less overhead 
calling it once per call to  vtxprintf (in fact there was another 
call in number(...) but still fewer than one per byte.


But I just got past fidvid now, and I should clean up the mess of code
I've been changing all these months and see what exactly fixed it, or
if I can have a decent patch.  Not sure it'll be much help for other
CPU revisions beyond C3, though.
xdrudis - 2011-01-30 21:29:55
On Sun, Jan 30, 2011 at 01:05:54PM -0700, Marc Jones wrote:
> 
> I think that the locking can be added via the BSPs cache. All
> multicore should use CAR and it is a matter of adding it where it
> won't get stepped on by the normal use of CAR. For AMD fam10, the
> sysinfo setup would need to be fixed up.
> 
> Marc

That is a little over my head, I believe. I thought in general the
cache was a different memory for each core (let alone each node).  My
phenom has L3 cache common to all cores but I thought romstage is not
using it.

And I guess the functionality is slightly different :

with muxing you get the node number in the output so you can format
things in columns, etc. And it is not dependent on the architecture
(except to find out the core number to use as buffer id).  With
locking you just separate output but don't know which core it comes
from (but it becomes feasible to include the core number in the
messages where appropiate, and you don't need demuxing, get less
serial traffic...).

locking in romstage might be useful for other things beyond console 
output, though. 

So both approaches have advantages. I'm just not sure how to implement
locking. That's the main reason I didn't, even if some comment
suggested it somewhere. Secondary excuses were that it seemed less
alteration of the runtime behaviour not to lock, and that I could also
set it up to filter out some output.

Patch

diff -u -r coreboot-nomux/src/console/Kconfig coreboot/src/console/Kconfig
--- coreboot-nomux/src/console/Kconfig	2011-01-16 15:35:29.000000000 +0100
+++ coreboot/src/console/Kconfig	2011-01-29 01:24:21.000000000 +0100
@@ -335,5 +335,18 @@ 
 	  usually displayed using a so-called "POST card" ISA/PCI/PCI-E
 	  device) on the serial console.
 
+config DEBUG_MUXED
+	bool "Multiplex debug output from each core"
+	default n
+	help
+          Transmit core number with each nibble of coreboot debug output, 
+          causing the debug output to be unintelligible at naked eye but
+          filtrable piping it to demux_debug to separate output from each 
+          core instead of watching the garbage produced by simultaneous 
+          transmission from each core. It will duplicate the size of debug 
+          output, halve the speed and require demuxing, but it felt less 
+          invasive that some form of mutex or blocking. Tested for serial,  
+          but should also work for usb or network console
+
 endmenu
 
diff -u -r coreboot-nomux/src/console/vtxprintf.c coreboot/src/console/vtxprintf.c
--- coreboot-nomux/src/console/vtxprintf.c	2010-11-14 23:21:56.000000000 +0100
+++ coreboot/src/console/vtxprintf.c	2011-01-29 01:24:21.000000000 +0100
@@ -30,6 +30,27 @@ 
 #define SPECIAL	32		/* 0x */
 #define LARGE	64		/* use 'ABCDEF' instead of 'abcdef' */
 
+unsigned char calc_id_buffer(void) {
+  return 0xf0; /* do we have some function to tell the current 
+                * core/node, implemented for each cpu ?
+                */
+}
+
+static void mux_byte(void (*tx_byte)(unsigned char byte), unsigned char idbuffer, unsigned char byte) {
+#ifdef CONFIG_DEBUG_MUXED
+
+  if (!(idbuffer & DISABLE_MUXED)) {
+    tx_byte((idbuffer & 0xf0) | (byte >> 4));
+    tx_byte((idbuffer & 0xf0) | (byte & 0x0f));
+  };
+  if (idbuffer & ENABLE_UNMUXED) {
+     tx_byte(byte);
+  };
+#else
+  tx_byte(byte);
+#endif
+}
+
 static int number(void (*tx_byte)(unsigned char byte),
 	unsigned long long num, int base, int size, int precision, int type)
 {
@@ -37,6 +58,7 @@ 
 	const char *digits="0123456789abcdefghijklmnopqrstuvwxyz";
 	int i;
 	int count = 0;
+        unsigned char idbuffer = calc_id_buffer();
 
 	if (type & LARGE)
 		digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
@@ -75,26 +97,26 @@ 
 	size -= precision;
 	if (!(type&(ZEROPAD+LEFT)))
 		while(size-->0)
-			tx_byte(' '), count++;
+		  mux_byte(tx_byte,idbuffer,' '), count++;
 	if (sign)
-		tx_byte(sign), count++;
+		mux_byte(tx_byte,idbuffer,sign), count++;
 	if (type & SPECIAL) {
 		if (base==8)
-			tx_byte('0'), count++;
+			mux_byte(tx_byte,idbuffer,'0'), count++;
 		else if (base==16) {
-			tx_byte('0'), count++;
-			tx_byte(digits[33]), count++;
+			mux_byte(tx_byte,idbuffer,'0'), count++;
+			mux_byte(tx_byte,idbuffer,digits[33]), count++;
 		}
 	}
 	if (!(type & LEFT))
 		while (size-- > 0)
-			tx_byte(c), count++;
+			mux_byte(tx_byte,idbuffer,c), count++;
 	while (i < precision--)
-		tx_byte('0'), count++;
+		mux_byte(tx_byte,idbuffer,'0'), count++;
 	while (i-- > 0)
-		tx_byte(tmp[i]), count++;
+		mux_byte(tx_byte,idbuffer,tmp[i]), count++;
 	while (size-- > 0)
-		tx_byte(' '), count++;
+		mux_byte(tx_byte,idbuffer,' '), count++;
 	return count;
 }
 
@@ -105,6 +127,7 @@ 
 	unsigned long long num;
 	int i, base;
 	const char *s;
+        unsigned char idbuffer = calc_id_buffer();
 
 	int flags;		/* flags to number() */
 
@@ -117,7 +140,7 @@ 
 
 	for (count=0; *fmt ; ++fmt) {
 		if (*fmt != '%') {
-			tx_byte(*fmt), count++;
+			mux_byte(tx_byte,idbuffer,*fmt), count++;
 			continue;
 		}
 
@@ -180,10 +203,10 @@ 
 		case 'c':
 			if (!(flags & LEFT))
 				while (--field_width > 0)
-					tx_byte(' '), count++;
-			tx_byte((unsigned char) va_arg(args, int)), count++;
+					mux_byte(tx_byte,idbuffer,' '), count++;
+			mux_byte(tx_byte,idbuffer,(unsigned char) va_arg(args, int)), count++;
 			while (--field_width > 0)
-				tx_byte(' '), count++;
+				mux_byte(tx_byte,idbuffer,' '), count++;
 			continue;
 
 		case 's':
@@ -195,11 +218,11 @@ 
 
 			if (!(flags & LEFT))
 				while (len < field_width--)
-					tx_byte(' '), count++;
+					mux_byte(tx_byte,idbuffer,' '), count++;
 			for (i = 0; i < len; ++i)
-				tx_byte(*s++), count++;
+				mux_byte(tx_byte,idbuffer,*s++), count++;
 			while (len < field_width--)
-				tx_byte(' '), count++;
+				mux_byte(tx_byte,idbuffer,' '), count++;
 			continue;
 
 		case 'p':
@@ -227,7 +250,7 @@ 
 			continue;
 
 		case '%':
-			tx_byte('%'), count++;
+			mux_byte(tx_byte,idbuffer,'%'), count++;
 			continue;
 
 		/* integer number formats - set up the flags and "break" */
@@ -248,9 +271,9 @@ 
 			break;
 
 		default:
-			tx_byte('%'), count++;
+			mux_byte(tx_byte,idbuffer,'%'), count++;
 			if (*fmt)
-				tx_byte(*fmt), count++;
+				mux_byte(tx_byte,idbuffer,*fmt), count++;
 			else
 				--fmt;
 			continue;

diff -u -r coreboot-nomux/src/cpu/amd/quadcore/quadcore_id.c coreboot/src/cpu/amd/quadcore/quadcore_id.c
--- coreboot-nomux/src/cpu/amd/quadcore/quadcore_id.c	2010-11-14 23:21:43.000000000 +0100
+++ coreboot/src/cpu/amd/quadcore/quadcore_id.c	2011-01-29 01:24:08.000000000 +0100
@@ -23,6 +23,18 @@ 
 #ifdef __PRE_RAM__
 #include <cpu/amd/model_10xxx_msr.h>
 #endif
+#include <console/vtxprintf.h>
+
+//  buffer id for console/vtxprintf
+unsigned char calc_id_buffer(void) {
+  // lower 4 bits should be 0, DISABLE_MUXED, or ENABLE_UNMUXED.
+  // I only use 4 bits, so it may not be useful for systems with many cores
+  // for systems with less cores, I prefer using th higher values to avoid 
+  // muxed bytes being interpreted as ASCII control characters or something
+  // so I negate (core_number mod 16) 
+  return ((~( (cpuid_ebx(1) >> 24) & 0xf))<<4);
+}
+
 
 //called by bus_cpu_scan too
 u32 read_nb_cfg_54(void)
diff -u -r coreboot-nomux/src/include/console/vtxprintf.h coreboot/src/include/console/vtxprintf.h
--- coreboot-nomux/src/include/console/vtxprintf.h	2010-11-14 23:21:58.000000000 +0100
+++ coreboot/src/include/console/vtxprintf.h	2011-01-29 01:24:24.000000000 +0100
@@ -34,6 +34,21 @@ 
 #include <stdarg.h>
 #endif
 
+#define DISABLE_MUXED 1
+#define ENABLE_UNMUXED 2
+
+  /* lower 4 bits should be 0, DISABLE_MUXED (1) and/or ENABLE_UNMUXED (2),
+   * so : 
+   *   0 : output muxed
+   *   1 : no output
+   *   2 : output unmuxed
+   *   3 : output muxed (2 bytes) followed by output unmuxed (1 byte) (any use ?)
+   * higher 4 bits should be the bitwise negation of the buffer id 
+   * to use in muxed output
+   */
+unsigned char __attribute__((weak)) calc_id_buffer(void) ;
+
+
 int vtxprintf(void (*tx_byte)(unsigned char byte), const char *fmt, va_list args);
 
 #endif
diff -N -u -r coreboot-nomux/util/demux_debug.pl coreboot/util/demux_debug.pl
--- coreboot-nomux/util/demux_debug.pl	1970-01-01 01:00:00.000000000 +0100
+++ coreboot/util/demux_debug.pl	2011-01-29 10:02:49.000000000 +0100
@@ -0,0 +1,593 @@ 
+#!/usr/bin/perl -w
+#    demux_debug.pl  -- demultiplex coreboot output with CONFIG_DEBUG_MUXED
+#    Copyright (C) 2011 Xavi Drudis Ferran <xdrudis@tinet.cat>
+#
+#    This program is free software; you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation; either version 2 of the License, or
+#    (at your option) any later version.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License along
+#    with this program; if not, write to the Free Software Foundation, Inc.,
+#    51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+#
+use Getopt::Long;
+use Pod::Usage;
+use List::Util qw[min];
+use IO::Handle;
+
+my $num_buffers = 4;
+my $width_buffer = 2000;
+my @out_buffers=[];
+my @in_buffers=[];
+#my $format='%*3$s|';
+my $format='%4$1d:%1s'."\n";
+my %formats=();
+my %files=();
+my $rtrim=1;
+my $ltrim=0;
+my $lf_does_cr=1;
+my $omit_regex = qr/^\s*$/;
+#my $omit_regex = undef;
+my $line =0;
+my %handles = ();
+
+sub parse_args {
+    use Getopt::Long;
+           GetOptions( "buffers=i"  => \$num_buffers, 
+                       "width=i"    => \$width_buffer,
+                       "format=s" => \$format,
+                       "rtrim!" => \$rtrim,
+                       "ltrim!" => \$ltrim,
+                       "lf_does_cr!" => \$lf_does_cr,
+                       "empty_skipped|omit_regex:s" => \$omit_regex,
+                       "help" => \$help,
+                       "nthformat=s%" => \%formats,
+                       "outputs=s%" => \%files,
+                       "chat" => sub {
+                                        $format='core%4$1d:%1s'."\n";
+                                        $width_buffer = 2000;
+                                        $rtrim=1;
+                                        $ltrim=0;
+                                        $omit_regex = qr/^\s*$/;
+                                     } ,
+                       "multicolumn" => sub {
+                                        $format='%*3$s|';
+                                        $width_buffer = 19;
+                                        $formats{"-1"}='%*3$s'."\n";
+                                        $rtrim=0;
+                                        $ltrim=0;
+                                        $omit_regex = undef;
+                                     } ,
+                       "classic" => sub {
+                                        $format='%s';
+                                        $width_buffer = 0;
+                                        $rtrim=0;
+                                        $ltrim=0;
+                                        $omit_regex = '^$(!:^)';
+                                     }           
+                      );
+     if (defined($omit_regex)) {
+           if ($omit_regex eq "") {
+               $omit_regex = qr/^\s*$/;
+           } else {
+               if ($omit_regex eq "none") {
+		 $omit_regex=undef;
+               } else { 
+	         $omit_regex = qr/$omit_regex/s;
+               }
+	   }
+    }
+    if ($help) {
+	pod2usage(1);
+    }
+               
+}
+
+sub init {
+    for  $k (keys( %formats)) {
+	for $r (split(/,/,$k)) {
+	    if ( @l = split /-/,$r) {
+              for $i ($l[0]..$l[$#l]) {
+                  $formats{$i}=$formats{$k};
+	      }
+	    }
+	}
+    } 
+    for $i (0..($num_buffers-1)) {
+	$out_buffers[$i] = { BUFFER=> (" " x $width_buffer) , POS => 0 , FORMAT => ($formats{$i}||$formats{$i-$num_buffers}|| $format), FILEHANDLE => \*STDOUT, RTRIM => $rtrim , LTRIM => $ltrim, OMITREGEX => $omit_regex};
+        $in_buffers[$i] = { EXPECT => 'HIGH' , BYTE => 0, LINES=> 0, COUNT => 0};
+    }
+    $/=\1; #input line = 1 byte
+    my @time = localtime();
+    my $time = localtime();
+    my $year2 = $time[5] % 100;
+    for $k (keys(%files)) {
+	for $r (split(/,/,$k)) {
+	    if (@l = split /-/,$r) {
+              for $i ($l[0]..$l[$#l]) {
+                $fileexpr = sprintf($files{$k},
+                   $width_buffer,
+                   $i,
+                   @time,
+                   $year2,
+                   $time);
+                if ($fileexpr !~ /^(\+?>|\|)/){
+	    	    $fileexpr = '>'. $fileexpr;
+	        }
+                
+	        if (!exists $handles{$fileexpr}) {
+                   open(   $handles{$fileexpr} ,$fileexpr) or die "can't open $fileexpr\n";
+                } 
+                
+	        $out_buffers[$i<0?$num_buffers-$i:$i]->{'FILEHANDLE'}=$handles{$fileexpr};
+	      }
+	    }
+        }
+    }
+}
+
+sub calc_id_buffer {
+    my ($in) = @_;
+    return ((~($in >> 4)) & 0xf);
+}
+
+sub input {
+    my ($in_char) = @_;
+    my $in = ord($in_char);
+    $idbuffer = calc_id_buffer($in);
+    if (($idbuffer < 0) || ($idbuffer >= $num_buffers)) {
+	printf STDERR 'Unknown input byte %1$x = %1$d = %1$b (%2$s) would give buffer id %3$d=%3$x=%3$b'."\n", $in, $in_char, $idbuffer;
+    } else { 
+       $nibble = ($in & 0xf);
+       if ($in_buffers[$idbuffer]->{'EXPECT'} eq "HIGH") {
+           $in_buffers[$idbuffer]->{'BYTE'}=$nibble;
+           $in_buffers[$idbuffer]->{'EXPECT'}="LOW";
+       } else {
+           $in_buffers[$idbuffer]->{'BYTE'}=(($in_buffers[$idbuffer]->{'BYTE'} << 4)|($nibble));
+           $in_buffers[$idbuffer]->{'EXPECT'}="HIGH";
+           process_byte($idbuffer, $in_buffers[$idbuffer]->{'BYTE'});
+           $in_buffers[$idbuffer]->{'COUNT'}++;
+           if ($in_buffers[$idbuffer]->{'BYTE'} eq '\n') {
+              $in_buffers[$idbuffer]->{'LINES'}++;
+       	}
+       }
+    }
+}
+
+sub process_byte {
+    my ($idbuffer,$byte) = @_;
+    my $char=chr($byte);
+    my $out=$out_buffers[$idbuffer];
+    if (($out->{'POS'} >= $width_buffer )
+	|| ($char eq "\n")){
+        my $pos=$out->{'POS'};
+	flush_output();
+        if ($char eq "\n" && !$lf_does_cr) {
+            $out->{'POS'}=$pos;
+	}
+    }
+    if ($char eq "\r") {
+       $out->{'POS'}=0;   
+    } elsif ($char eq "\t") {
+	my $chars = min(8,$width_buffer-$out->{'POS'});
+	substr($out->{'BUFFER'},$out->{'POS'},$chars)=" " x $chars;
+        $out->{'POS'} += $chars;
+    } elsif (($char ne "\n")||($width_buffer == 0)) {
+        substr($out->{'BUFFER'},$out->{'POS'},1)=$char;
+        $out->{'POS'} ++;
+    }
+
+}
+
+sub flush_output {
+    for $i (0..$num_buffers-1) {
+        my @time = localtime();
+        my $time = localtime();
+        my $year2 = $time[5] % 100;
+        my $buffer =  $out_buffers[$i]->{'BUFFER'};
+        
+
+        if ($out_buffers[$i]->{'LTRIM'} ) {
+	    $buffer =~ s/^\s*//;
+        } 
+
+        if ($out_buffers[$i]->{'RTRIM'} ) {
+	    $buffer =~ s/\s*$//;
+        } 
+
+        if (  (!defined($out_buffers[$i]->{'OMITREGEX'})) 
+               || ($buffer !~ $out_buffers[$i]->{'OMITREGEX'}) ) {
+            printf { $out_buffers[$i]->{'FILEHANDLE'} } 
+               $out_buffers[$i]->{'FORMAT'},
+             
+               $buffer,
+               $out_buffers[$i]->{'POS'},
+               $width_buffer,
+               $i,
+               $line++,
+               $in_buffers[$i]->{'COUNT'},
+               $in_buffers[$i]->{'LINES'},
+               $in_buffers[$i]->{'EXPECT'},
+               $in_buffers[$i]->{'BYTE'},
+               @time,
+               $year2,
+               $time;
+	     }
+	$out_buffers[$i]->{'BUFFER'} = (" " x $width_buffer) ;
+        $out_buffers[$i]->{'POS'} = 0;
+    };
+
+ #   print("\n");
+
+    for $fhr (values(%handles)) {
+        $fhr->flush();
+    }
+    STDOUT->flush();
+    STDERR->flush();
+}
+
+sub end() {
+    for $i (0..$num_buffers-1) {
+        if ( $out_buffers[$i]->{'POS'} > 0 ) {
+	    flush_output();
+            last;
+	}
+    }
+    for $fhr (values(%handles)) {
+	close $fhr;
+    }
+}
+
+parse_args();
+init();
+while(<>){
+    input($_);
+}
+end();
+
+
+__END__
+=head1 NAME
+
+demux_Debug.pl - demultiplexing coreboot output when CONFIF_DEBUG_MUXED to something readable
+
+=head1 SYNOPSYS
+
+sample [modes] [options] 
+
+
+modes:     
+     --chat
+     -m | --multicolumn 
+     --classic 
+
+options:
+
+     -b | --buffers Number of buffers. 
+     -w | --width Maximum size of each buffer. 
+     -f | format  perl's sprintf format expression for outputting each buffer.
+     -n | nthformat format for specific buffers. 
+     -o | --outputs filenames or perl open expressions for the especified buffers.
+     --rtrim         remove all trailing whitespace before writing an output buffer
+     --nortrim       ouput all trailing whitespace
+     --ltrim         remove all leading whitespace before writing an output buffer
+     --noltrim       output all leading whitespace
+     -e | --empty_skipped Don't write anything when an outputbuffer is empty after any trimming
+     --omit_regex    "none" or a regex to suppress matching output
+     --help          this text
+
+
+=head1 OPTIONS 
+
+=over 8
+
+
+=item B<--chat>  
+
+output messages from each core (until newline at any core) 
+on a line, prefixed with the node/core number, a little ressembling
+an IRC session where each participant presses all participants 
+enter key at once (ignores empty lines).
+
+Equivalent to (in bash) 
+
+-f $'core%4$1d:%1s\n' -w 2000 --rtrim --noltrim --omit_regex '^\s*$'
+ 
+
+
+=item B<-m | --multicolumn> 
+
+output mesages from each core side by side, roughly 
+keeping simultaneus output in the same line. 
+Reserves a part of a line for each core even if its
+empty and outputs a new line each time any core outputs 
+a new line or fills its reserved space in its column.
+
+Equivalent to (in bash)
+
+-f '%*3$s|' -n -1=$'%*3$s'\n' -w 19 --nortrim --noltrim --omit_regex none
+
+
+
+=item B<--classic>
+
+Try to recover output similar to coreboot without CONFIG_DEBUG_MUXED
+Outputs one character after another as they come mixing output from all cores.
+Might be useful with --outputs 
+
+Equivalent to 
+
+-f '%s' -w 0 --nortrim --noltrim --omit_regex '^$(!:^)'
+
+
+=item B<-b | --buffers>
+
+Number of buffers. Usually it'd be the number 
+of cores. The "protocol" maximum is 16. The 
+bitwise negation of the high 4 bits in each byte 
+indicate the destination buffer.
+
+
+=item B<-w | --width>
+
+Maximum size of each buffer. When this number of bytes
+have been demuxed without a newline, all buffers are
+flushed to output regardless. Width of each column in 
+characters when in multicolumn mode. -w 0 also turns on 
+output of newline characters as received. Other values 
+eat newlines and cause an output buffer flush instead, 
+which can output newline or not depending on formats
+
+ 
+=item B<-f | format>
+
+perl's sprintf format expression for outputting each buffer.
+See perldoc -f sprintf for details. Arguments are:
+
+=over 3             
+
+=item 1 
+
+output characters in this buffer (0-width characters)
+
+=item 2 
+
+number of putput characters in this buffer (length of 1)
+
+=item 3 
+
+width of buffer (-w value) 
+
+=item 4 
+
+buffer / core / node number (0-15)
+
+=item 5 
+
+ouput line number (roughly, number of times output 
+buffers have been flushed)
+
+=item  6 
+
+count of charaacters seen so far for this buffer.
+
+=item 7
+
+count of lines seen so far in this buffer (newline count)
+
+=item 8 
+
+state of this buffer HIGH if it is expecting a new char 
+or LOW if it is expecting the final half of a character
+when it has seen the first half already in this buffer
+
+=item 9 
+
+last byte (if 8=HIGH) or half byte (when 8=LOW) 
+seen so far in this buffer
+
+=item 10
+
+second of start of this buffer flush. See perldoc -f localtime
+
+=item 11
+
+minute of start of this buffer flush. See perldoc -f localtime
+
+=item  12
+
+hour of start of this buffer flush. See perldoc -f localtime
+
+=item 13
+
+day in month of start of this buffer flush. See perldoc -f localtime
+
+=item  14
+
+month of start of this buffer flush. See perldoc -f localtime
+
+=item  15
+
+ year (4 digits) of start of this buffer flush. See perldoc -f localtime
+
+=item 16
+
+day in week of this buffer flush. See perldoc -f localtime
+
+=item  17
+
+day in year of start of this buffer flush. See perldoc -f localtime
+
+=item  18
+
+daylight saving time of start of this buffer flush. See perldoc -f localtime
+
+=item  19
+
+year (2 digits) of start of this buffer flush. 
+
+=item 20
+
+date & time locale independent string of start of this buffer flush. 
+see man 3 ctime or perldoc -f localtime (in scalar context) 
+
+=back 
+
+=item B<-n | nthformat>
+
+format for specific buffers. It must be followed by <n>=<format>
+where <n> is a comma separated list of ranges and each range
+if either a number or two numbers separated by dash. numbers must
+be between - (number of buffers) (see -b) and  + (number of buffers -1).
+Negative numbers count from the last buffer backwards. <format> 
+if a output buffer format (see -f) that will be applied exclusively 
+to these buffers. This argument can be repeated as many times as needed
+to use different formats for different buffers.
+
+=item B<-o | --outputs>
+
+filenames or perl open expressions for the especified buffers.
+Must be followed by <n>=<output> where <n> is like for -n and 
+<output> is either a filename like "output.log" or an output
+file open expression (like ">>previous.log" or "|grep ERROR").
+In fact <output> is a format for perl sprintf that will give
+the open expression, so it can be made to include core/node number
+or date etc. See perldoc -f sprintf and perldoc -f open 
+The arguments passed to sprintf will be:
+
+=over 3 
+ 
+=item 1 
+
+width of buffer (-w value) 
+
+=item 2 
+
+number of buffer 
+
+=item 3
+
+second of program invocation (roughly). See perldoc -f localtime
+
+=item 4 
+
+minute of program invocation (roughly). See perldoc -f localtime
+
+=item 5 
+
+hour of program invocation (roughly). See perldoc -f localtime
+
+
+=item 6 
+
+day in month of program invocation (roughly). See perldoc -f localtime
+
+=item 7 
+
+month of program invocation (roughly). See perldoc -f localtime
+
+=item 8 
+
+year (4 digits) of program invocation (roughly). See perldoc -f localtime
+
+=item 9 
+
+day in week of this buffer flush. See perldoc -f localtime
+
+=item 10
+
+day in year of program invocation (roughly). See perldoc -f localtime
+
+=item 11
+
+daylight saving time of program invocation (roughly). See perldoc -f localtime
+
+=item 12
+
+year (2 digits) of program invocation (roughly). 
+
+=item 13
+
+date & time locale independent string of program invocation (roughly). 
+
+=back
+
+=item B<--rtrim>
+
+remove all trailing whitespace before writing an output buffer
+
+=item B<--nortrim>
+
+output all trailing whitespace
+
+
+=item B<--ltrim>
+
+remove all leading whitespace before writing an output buffer
+
+=item B<--noltrim>
+
+output all leading whitespace
+
+=item B<-e | --empty_skipped>
+
+Don't write anything when an outputbuffer is empty after any trimming
+Equivalent to --omit-regex '^\s*$'
+                     
+=item B<--omit_regex>
+
+If followed by "none" prints all output. Otherwise it is a perl reglar 
+expression that causes ouput to be suprresed when the contents of a buffer
+matches it.
+
+=item B<--lf_does_cr>
+
+When it gets a new line ('\n', ASCII 10, 0xA) it returns the buffer position
+to 0  even if there is no carriage return '\r'. 
+
+=item B<--nolf_does_cr>
+
+When it gets a new line ('\n', ASCII 10, 0xA) it leaves the first characters
+of next line empty to keep the current position in the buffer, unless a carriage
+return changes teh current position back to 0 
+
+=item B<--help>
+
+this text
+
+
+=back 
+
+
+=head1 DESCRIPTION
+
+B<demux_debug.pl> will read standard input and write to standard ouput (or the
+specified files) after separating characters from each core. It expects it's
+inputs to be encoded as coreboot does with COFIG_DEBUG_MUXED, that is, 
+each byte is split in two. The 4 high bits give the buffer id (core/node number, 
+for instance) but it is bitwise negated to try to avoid problems with 
+characters near 0 being interpreted as control characters. the low 4 bits are
+the high 4 bits of the transmitted char (in the first byte) or the 4 low bits
+of the transmitted char (in the second byte). 
+This way concurrent output from different cores can become readable again.
+
+It keeps a buffer for each core (or each buffer id) and outputs all
+buffers to the specified file(s) or standard output when either it
+gets a newline ('\n', ASCII 10, 0xA) from any one of the cores or any
+of the buffers fills.
+
+This allows for different formats of output, like storing the output of each 
+core in a different file, or printing in one column per core, or in separated
+lines with a prefix indicating the core "speaking".
+
+It outputs up to 8 white spaces (until end of buffer) instead of each tab ('\t'). 
+                       
+=head1 LICENSE 
+
+GPL >= 2