mailarchive of the ptxdist mailing list
 help / color / mirror / Atom feed
* [ptxdist] sporadic crashes of shell commands
@ 2014-01-17 10:16 Arno Euteneuer
  2014-01-17 10:45 ` Tim Niemeyer
  2014-01-17 11:15 ` Jürgen Beisert
  0 siblings, 2 replies; 7+ messages in thread
From: Arno Euteneuer @ 2014-01-17 10:16 UTC (permalink / raw)
  To: ptxdist


[-- Attachment #1.1: Type: text/plain, Size: 2666 bytes --]

Hallo Together,

We are fighting sporadic segmentation fault and illegal instruction crashes of standard shell commands like lsusb, lsmod, cp, du etc. on a rootfs build with ptxdist and OSELAS-Toolchains. The problems seem to disappear when we use a different Toolchain (ELDK).

Any ideas of what could be wrong will be highly appreciated.

Some details:
I'm working on a project using  a TAM3517 module from Technexion, incorporating an AM3517 Cortex A8 from TI.
The Kernel we are using for some reason is a 3.7.0-rc8.
We're using ptxdist-2012.12.0 and tried OSELAS-Toolchains 2011.11.3 and 2012.12.1 with softfp.
Our build systems run either  Ubuntu (64bit, virtual machine on Linux or Windows host) or Fedora (native).

A while ago we noticed that commands like e.g. lsmod would sometimes crash on the target with an Illegal Instruction or segfault, just to work correctly again in the next second. Also we sometimes got kicked out of our ssh session on the target for no obvious reason. This happened very seldom first. However, now after investigating into it, we are able to cause these faults with a simple shell script. The script executes a few - more or less arbitrarily selected - commands (du, lsmod, lusb, cp /boot/uImage /tmp/) in an endless loop and collects stderr outputs in a logfile. We usually start like 10 instances of the script in parallel (with &) and after a few minutes we find several reports about Illegal Instructions and/or Segmentation faults in the logfile.

We ruled out hardware problems as the origin of the problem because for the tests we have been using Technexion Twister-Boards. Furthermore we got rid of the problem by using a different non-OSELAS Toolchain.

We also ruled out the kernel because running our kernel (build with OSELAS-Toolchain) in an ELDK rootfs showed no problems.

We ruled out version problems of gcc or glibc because the problems appeared with OSELAS-Toolchain 2012.12.1 (gcc-4.7.3-glibc-2.16.0-binutils-2.22-kernel-3.6) and OSELAS-Toolchain 2011.11.3 (gcc-4.6.2-glibc-2.14.1-binutils-2.21.1a-kernel-2.6.39), but NOT with ELDK-5.4 Toolchain (gcc-4.7.2-glibc-2.16.0-kernel-3.6).

The rootfs we are using now finally doesn't show any illegal instructions or segfaults with our test script. It was generated by the same ptxdist-2012.12.0 as before with the same configurations for platform, kernel and so on. Only the toolchain was replaced by that from ELDK-5.4 (ftp://ftp.denx.de/pub/eldk/5.4/targets/armv7a/eldk-eglibc-i686-arm-toolchain-gmae-5.4.sh).

I would love to understand what is causing these problems and would highly appreciate any suggestion.

Arno Euteneuer




[-- Attachment #1.2: Type: text/html, Size: 6094 bytes --]

[-- Attachment #2: Type: text/plain, Size: 48 bytes --]

-- 
ptxdist mailing list
ptxdist@pengutronix.de

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ptxdist] sporadic crashes of shell commands
  2014-01-17 10:16 [ptxdist] sporadic crashes of shell commands Arno Euteneuer
@ 2014-01-17 10:45 ` Tim Niemeyer
  2014-01-17 13:04   ` Arno Euteneuer
  2014-01-17 11:15 ` Jürgen Beisert
  1 sibling, 1 reply; 7+ messages in thread
From: Tim Niemeyer @ 2014-01-17 10:45 UTC (permalink / raw)
  To: ptxdist

Hi Arno

Am 17.01.2014 11:16, schrieb Arno Euteneuer:
> We are fighting sporadic segmentation fault and illegal instruction
> crashes of standard shell commands like lsusb, lsmod, cp, du etc. on
> a rootfs build with ptxdist and OSELAS-Toolchains. The problems seem
> to disappear when we use a different Toolchain (ELDK).
I discovered similar problems on an AM37xx and OMAP35xx with Linux 3.4
and 3.3.

> Any ideas of what could be wrong will be highly appreciated.
Did you try to activate all the Kernel Arm-Errata?
At least for OMAP3503D we figured out that it suffers from more than
what the Kernel-Doc says: https://lkml.org/lkml/2013/2/18/349

With all erratas enabled, the AM37xx was (in my opinion) more stable.

> We also ruled out the kernel because running our kernel (build with
> OSELAS-Toolchain) in an ELDK rootfs showed no problems.
[..]
>  I would love to understand what is causing these problems and would
> highly appreciate any suggestion.
Yes, but you could try the errata and when they work, they may give you 
a good hint what the real problem (maybe in the toolchain) is.

Tim Niemeyer

-- 
ptxdist mailing list
ptxdist@pengutronix.de

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ptxdist] sporadic crashes of shell commands
  2014-01-17 10:16 [ptxdist] sporadic crashes of shell commands Arno Euteneuer
  2014-01-17 10:45 ` Tim Niemeyer
@ 2014-01-17 11:15 ` Jürgen Beisert
  2014-01-17 12:50   ` Arno Euteneuer
  1 sibling, 1 reply; 7+ messages in thread
From: Jürgen Beisert @ 2014-01-17 11:15 UTC (permalink / raw)
  To: ptxdist

Hi Arno,

On Friday 17 January 2014 11:16:02 Arno Euteneuer wrote:
> [...]
> A while ago we noticed that commands like e.g. lsmod would sometimes crash
> on the target with an Illegal Instruction or segfault, just to work
> correctly again in the next second. Also we sometimes got kicked out of our
> ssh session on the target for no obvious reason. This happened very seldom
> first. However, now after investigating into it, we are able to cause these
> faults with a simple shell script. The script executes a few - more or less
> arbitrarily selected - commands (du, lsmod, lusb, cp /boot/uImage /tmp/) in
> an endless loop and collects stderr outputs in a logfile. We usually start
> like 10 instances of the script in parallel (with &) and after a few
> minutes we find several reports about Illegal Instructions and/or
> Segmentation faults in the logfile.

Is there a correlation between the memory size the kernel gets reported and the 
used memory devices soldered onto the board? ;)

> [...]

Regards,
Juergen

-- 
Pengutronix e.K.                              | Juergen Beisert             |
Linux Solutions for Science and Industry      | http://www.pengutronix.de/  |

-- 
ptxdist mailing list
ptxdist@pengutronix.de

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ptxdist] sporadic crashes of shell commands
  2014-01-17 11:15 ` Jürgen Beisert
@ 2014-01-17 12:50   ` Arno Euteneuer
  2014-01-17 13:20     ` Jürgen Beisert
  0 siblings, 1 reply; 7+ messages in thread
From: Arno Euteneuer @ 2014-01-17 12:50 UTC (permalink / raw)
  To: ptxdist

Hi Jürgen,

> [...]
> Is there a correlation between the memory size the kernel gets reported
> and the used memory devices soldered onto the board? ;)
> 
> [...]
> 
I hope there is a very strong correlation ;-)

We have 256MB DDR2 RAM and 512MB NAND flash and get the following:

root@dlcpro:~ cat /proc/meminfo
MemTotal:         235876 kB
MemFree:          169852 kB
Buffers:               0 kB
Cached:            27988 kB
SwapCached:            0 kB
Active:            15648 kB
Inactive:          22560 kB
Active(anon):      10300 kB
Inactive(anon):      104 kB
Active(file):       5348 kB
Inactive(file):    22456 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:         10248 kB
Mapped:            29924 kB
Shmem:               184 kB
Slab:              11100 kB
SReclaimable:       4848 kB
SUnreclaim:         6252 kB
KernelStack:         544 kB
PageTables:          336 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      117936 kB
Committed_AS:      73872 kB
VmallocTotal:     761856 kB
VmallocUsed:       27740 kB
VmallocChunk:     639888 kB

This looks very much the same with either of the kernels, independent of the used toolchain.
For me this looks ok, doesn't it? (Although I must admit I'm not really sure what VmallocTotal tells me and whether it is correct to be so large. Looks like RAM + Flash?)

Best regards
Arno


-- 
ptxdist mailing list
ptxdist@pengutronix.de

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ptxdist] sporadic crashes of shell commands
  2014-01-17 10:45 ` Tim Niemeyer
@ 2014-01-17 13:04   ` Arno Euteneuer
  2014-01-20  7:49     ` Arno Euteneuer
  0 siblings, 1 reply; 7+ messages in thread
From: Arno Euteneuer @ 2014-01-17 13:04 UTC (permalink / raw)
  To: ptxdist

Hi Tim,

> I discovered similar problems on an AM37xx and OMAP35xx with Linux 3.4
> and 3.3.
> 
> > Any ideas of what could be wrong will be highly appreciated.
> Did you try to activate all the Kernel Arm-Errata?
> At least for OMAP3503D we figured out that it suffers from more than
> what the Kernel-Doc says: https://lkml.org/lkml/2013/2/18/349
> 
> With all erratas enabled, the AM37xx was (in my opinion) more stable.
>
That's very interesting.
 
> > We also ruled out the kernel because running our kernel (build with
> > OSELAS-Toolchain) in an ELDK rootfs showed no problems.
> [..]
> >  I would love to understand what is causing these problems and would
> > highly appreciate any suggestion.
> Yes, but you could try the errata and when they work, they may give you
> a good hint what the real problem (maybe in the toolchain) is.
> 
Yes, that's certainly worth a try. 
I didn't play around with the errata, yet, as I'm not deep enough in that topic and I hope these settings had been optimized by others.

Thanks for that hint!!

Best regards
Arno

-- 
ptxdist mailing list
ptxdist@pengutronix.de

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ptxdist] sporadic crashes of shell commands
  2014-01-17 12:50   ` Arno Euteneuer
@ 2014-01-17 13:20     ` Jürgen Beisert
  0 siblings, 0 replies; 7+ messages in thread
From: Jürgen Beisert @ 2014-01-17 13:20 UTC (permalink / raw)
  To: ptxdist


On Friday 17 January 2014 13:50:45 Arno Euteneuer wrote:
> Hi Jürgen,
>
> > [...]
> > Is there a correlation between the memory size the kernel gets reported
> > and the used memory devices soldered onto the board? ;)
> >
> > [...]
>
> I hope there is a very strong correlation ;-)

:)

> We have 256MB DDR2 RAM and 512MB NAND flash and get the following:
>
> root@dlcpro:~ cat /proc/meminfo
> MemTotal:         235876 kB
> MemFree:          169852 kB
> Buffers:               0 kB
> Cached:            27988 kB
> SwapCached:            0 kB
> Active:            15648 kB
> Inactive:          22560 kB
> Active(anon):      10300 kB
> Inactive(anon):      104 kB
> Active(file):       5348 kB
> Inactive(file):    22456 kB
> Unevictable:           0 kB
> Mlocked:               0 kB
> SwapTotal:             0 kB
> SwapFree:              0 kB
> Dirty:                 0 kB
> Writeback:             0 kB
> AnonPages:         10248 kB
> Mapped:            29924 kB
> Shmem:               184 kB
> Slab:              11100 kB
> SReclaimable:       4848 kB
> SUnreclaim:         6252 kB
> KernelStack:         544 kB
> PageTables:          336 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:      117936 kB
> Committed_AS:      73872 kB
> VmallocTotal:     761856 kB
> VmallocUsed:       27740 kB
> VmallocChunk:     639888 kB
>
> This looks very much the same with either of the kernels, independent of
> the used toolchain. For me this looks ok, doesn't it? (Although I must
> admit I'm not really sure what VmallocTotal tells me and whether it is
> correct to be so large. Looks like RAM + Flash?)

Looks okay in your case. Some time ago we faced similar failures when the 
system starts to grow its memory consumption. Until it uses the really amount 
of memory everything was fine, and when it hits the border to the non existing 
memory unpredictable things happend similar to yours.

Regards,
Juergen

-- 
Pengutronix e.K.                              | Juergen Beisert             |
Linux Solutions for Science and Industry      | http://www.pengutronix.de/  |

-- 
ptxdist mailing list
ptxdist@pengutronix.de

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ptxdist] sporadic crashes of shell commands
  2014-01-17 13:04   ` Arno Euteneuer
@ 2014-01-20  7:49     ` Arno Euteneuer
  0 siblings, 0 replies; 7+ messages in thread
From: Arno Euteneuer @ 2014-01-20  7:49 UTC (permalink / raw)
  To: ptxdist

Good morning,
As Tim suggested I enabled all available errata in my kernel configuration and rebuild kernel and rootfs with my old OSELAS toolchain. I didn't get a single illegal instruction or segfault during the weekend from our test scripts, even though the CPU was running with 100% load all the time.

Thank you very much, Tim! This seems to be the trick.

I only have two questions now:

1. Why did the other toolchain help? What was its influence? Do toolchains have built-in erratas for certain processors?

2. Are there disadvantages of certain errata? Why are they selectable in the kernel config and not integrated into the respective architecture's sources by default?

Best regards
Arno

> -----Ursprüngliche Nachricht-----
> Von: ptxdist-bounces@pengutronix.de [mailto:ptxdist-
> bounces@pengutronix.de] Im Auftrag von Arno Euteneuer
> Gesendet: Freitag, 17. Januar 2014 14:04
> An: ptxdist@pengutronix.de
> Betreff: Re: [ptxdist] sporadic crashes of shell commands
> 
> Hi Tim,
> 
> > I discovered similar problems on an AM37xx and OMAP35xx with Linux
> 3.4
> > and 3.3.
> >
> > > Any ideas of what could be wrong will be highly appreciated.
> > Did you try to activate all the Kernel Arm-Errata?
> > At least for OMAP3503D we figured out that it suffers from more than
> > what the Kernel-Doc says: https://lkml.org/lkml/2013/2/18/349
> >
> > With all erratas enabled, the AM37xx was (in my opinion) more stable.
> >
> That's very interesting.
> 
> > > We also ruled out the kernel because running our kernel (build with
> > > OSELAS-Toolchain) in an ELDK rootfs showed no problems.
> > [..]
> > >  I would love to understand what is causing these problems and
> would
> > > highly appreciate any suggestion.
> > Yes, but you could try the errata and when they work, they may give
> > you a good hint what the real problem (maybe in the toolchain) is.
> >
> Yes, that's certainly worth a try.
> I didn't play around with the errata, yet, as I'm not deep enough in
> that topic and I hope these settings had been optimized by others.
> 
> Thanks for that hint!!
> 
> Best regards
> Arno
> 
> --
> ptxdist mailing list
> ptxdist@pengutronix.de

-- 
ptxdist mailing list
ptxdist@pengutronix.de

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-01-20  7:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-17 10:16 [ptxdist] sporadic crashes of shell commands Arno Euteneuer
2014-01-17 10:45 ` Tim Niemeyer
2014-01-17 13:04   ` Arno Euteneuer
2014-01-20  7:49     ` Arno Euteneuer
2014-01-17 11:15 ` Jürgen Beisert
2014-01-17 12:50   ` Arno Euteneuer
2014-01-17 13:20     ` Jürgen Beisert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox