Hallo Together,

 

We are fighting sporadic segmentation fault and illegal instruction crashes of standard shell commands like lsusb, lsmod, cp, du etc. on a rootfs build with ptxdist and OSELAS-Toolchains. The problems seem to disappear when we use a different Toolchain (ELDK).

 

Any ideas of what could be wrong will be highly appreciated.

 

Some details:

I’m working on a project using  a TAM3517 module from Technexion, incorporating an AM3517 Cortex A8 from TI.

The Kernel we are using for some reason is a 3.7.0-rc8.

We’re using ptxdist-2012.12.0 and tried OSELAS-Toolchains 2011.11.3 and 2012.12.1 with softfp.

Our build systems run either  Ubuntu (64bit, virtual machine on Linux or Windows host) or Fedora (native).

 

A while ago we noticed that commands like e.g. lsmod would sometimes crash on the target with an Illegal Instruction or segfault, just to work correctly again in the next second. Also we sometimes got kicked out of our ssh session on the target for no obvious reason. This happened very seldom first. However, now after investigating into it, we are able to cause these faults with a simple shell script. The script executes a few – more or less arbitrarily selected - commands (du, lsmod, lusb, cp /boot/uImage /tmp/) in an endless loop and collects stderr outputs in a logfile. We usually start like 10 instances of the script in parallel (with &) and after a few minutes we find several reports about Illegal Instructions and/or Segmentation faults in the logfile.

 

We ruled out hardware problems as the origin of the problem because for the tests we have been using Technexion Twister-Boards. Furthermore we got rid of the problem by using a different non-OSELAS Toolchain.

 

We also ruled out the kernel because running our kernel (build with OSELAS-Toolchain) in an ELDK rootfs showed no problems.

 

We ruled out version problems of gcc or glibc because the problems appeared with OSELAS-Toolchain 2012.12.1 (gcc-4.7.3-glibc-2.16.0-binutils-2.22-kernel-3.6) and OSELAS-Toolchain 2011.11.3 (gcc-4.6.2-glibc-2.14.1-binutils-2.21.1a-kernel-2.6.39), but NOT with ELDK-5.4 Toolchain (gcc-4.7.2-glibc-2.16.0-kernel-3.6).

 

The rootfs we are using now finally doesn’t show any illegal instructions or segfaults with our test script. It was generated by the same ptxdist-2012.12.0 as before with the same configurations for platform, kernel and so on. Only the toolchain was replaced by that from ELDK-5.4 (ftp://ftp.denx.de/pub/eldk/5.4/targets/armv7a/eldk-eglibc-i686-arm-toolchain-gmae-5.4.sh).

 

I would love to understand what is causing these problems and would highly appreciate any suggestion.

 

Arno Euteneuer