From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from gateway.tuioptics.com ([213.183.22.85]) by metis.ext.pengutronix.de with esmtp (Exim 4.72) (envelope-from ) id 1W46TB-00067n-E1 for ptxdist@pengutronix.de; Fri, 17 Jan 2014 11:16:14 +0100 From: Arno Euteneuer Date: Fri, 17 Jan 2014 10:16:02 +0000 Message-ID: <49BD654B8E52F64BBC06DF3F86638DFAF59FF7@MSE1MUC.toptica.com> Content-Language: de-DE MIME-Version: 1.0 Subject: [ptxdist] sporadic crashes of shell commands Reply-To: ptxdist@pengutronix.de List-Id: PTXdist Development Mailing List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============2140760013==" Sender: ptxdist-bounces@pengutronix.de Errors-To: ptxdist-bounces@pengutronix.de To: "ptxdist@pengutronix.de" --===============2140760013== Content-Language: de-DE Content-Type: multipart/alternative; boundary="_000_49BD654B8E52F64BBC06DF3F86638DFAF59FF7MSE1MUCtopticacom_" --_000_49BD654B8E52F64BBC06DF3F86638DFAF59FF7MSE1MUCtopticacom_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hallo Together, We are fighting sporadic segmentation fault and illegal instruction crashes= of standard shell commands like lsusb, lsmod, cp, du etc. on a rootfs buil= d with ptxdist and OSELAS-Toolchains. The problems seem to disappear when w= e use a different Toolchain (ELDK). Any ideas of what could be wrong will be highly appreciated. Some details: I'm working on a project using a TAM3517 module from Technexion, incorpora= ting an AM3517 Cortex A8 from TI. The Kernel we are using for some reason is a 3.7.0-rc8. We're using ptxdist-2012.12.0 and tried OSELAS-Toolchains 2011.11.3 and 201= 2.12.1 with softfp. Our build systems run either Ubuntu (64bit, virtual machine on Linux or Wi= ndows host) or Fedora (native). A while ago we noticed that commands like e.g. lsmod would sometimes crash = on the target with an Illegal Instruction or segfault, just to work correct= ly again in the next second. Also we sometimes got kicked out of our ssh se= ssion on the target for no obvious reason. This happened very seldom first.= However, now after investigating into it, we are able to cause these fault= s with a simple shell script. The script executes a few - more or less arbi= trarily selected - commands (du, lsmod, lusb, cp /boot/uImage /tmp/) in an = endless loop and collects stderr outputs in a logfile. We usually start lik= e 10 instances of the script in parallel (with &) and after a few minutes w= e find several reports about Illegal Instructions and/or Segmentation fault= s in the logfile. We ruled out hardware problems as the origin of the problem because for the= tests we have been using Technexion Twister-Boards. Furthermore we got rid= of the problem by using a different non-OSELAS Toolchain. We also ruled out the kernel because running our kernel (build with OSELAS-= Toolchain) in an ELDK rootfs showed no problems. We ruled out version problems of gcc or glibc because the problems appeared= with OSELAS-Toolchain 2012.12.1 (gcc-4.7.3-glibc-2.16.0-binutils-2.22-kern= el-3.6) and OSELAS-Toolchain 2011.11.3 (gcc-4.6.2-glibc-2.14.1-binutils-2.2= 1.1a-kernel-2.6.39), but NOT with ELDK-5.4 Toolchain (gcc-4.7.2-glibc-2.16.= 0-kernel-3.6). The rootfs we are using now finally doesn't show any illegal instructions o= r segfaults with our test script. It was generated by the same ptxdist-2012= .12.0 as before with the same configurations for platform, kernel and so on= . Only the toolchain was replaced by that from ELDK-5.4 (ftp://ftp.denx.de/= pub/eldk/5.4/targets/armv7a/eldk-eglibc-i686-arm-toolchain-gmae-5.4.sh). I would love to understand what is causing these problems and would highly = appreciate any suggestion. Arno Euteneuer --_000_49BD654B8E52F64BBC06DF3F86638DFAF59FF7MSE1MUCtopticacom_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hallo Together,

 

We are fighting sporadic segmen= tation fault and illegal instruction crashes of standard shell commands lik= e lsusb, lsmod, cp, du etc. on a rootfs build with ptxdist and OSELAS-Toolc= hains. The problems seem to disappear when we use a different Toolchain (ELDK).

 

Any ideas of what could be wron= g will be highly appreciated.

 

Some details:=

I’m working on a project = using  a TAM3517 module from Technexion, incorporating an AM3517 Corte= x A8 from TI.

The Kernel we are using for som= e reason is a 3.7.0-rc8.

We’re using ptxdist-2012.= 12.0 and tried OSELAS-Toolchains 2011.11.3 and 2012.12.1 with softfp.<= /o:p>

Our build systems run either &n= bsp;Ubuntu (64bit, virtual machine on Linux or Windows host) or Fedora (nat= ive).

 

A while ago we noticed that com= mands like e.g. lsmod would sometimes crash on the target with an Illegal I= nstruction or segfault, just to work correctly again in the next second. Al= so we sometimes got kicked out of our ssh session on the target for no obvious reason. This happened very seldom= first. However, now after investigating into it, we are able to cause thes= e faults with a simple shell script. The script executes a few – more= or less arbitrarily selected - commands (du, lsmod, lusb, cp /boot/uImage /tmp/) in an endless loop and collects s= tderr outputs in a logfile. We usually start like 10 instances of the scrip= t in parallel (with &) and after a few minutes we find several reports = about Illegal Instructions and/or Segmentation faults in the logfile.

 

We ruled out hardware problems = as the origin of the problem because for the tests we have been using Techn= exion Twister-Boards. Furthermore we got rid of the problem by using a diff= erent non-OSELAS Toolchain.

 

We also ruled out the kernel be= cause running our kernel (build with OSELAS-Toolchain) in an ELDK rootfs sh= owed no problems.

 

We ruled out version problems o= f gcc or glibc because the problems appeared with OSELAS-Toolchain 2012.12.= 1 (gcc-4.7.3-glibc-2.16.0-binutils-2.22-kernel-3.6) and OSELAS-Toolchain 20= 11.11.3 (gcc-4.6.2-glibc-2.14.1-binutils-2.21.1a-kernel-2.6.39), but NOT with ELDK-5.4 Toolchain (gcc-4.7.2-glibc-2.16.0-kernel-3.6).<= /o:p>

 

The rootfs we are using now fin= ally doesn’t show any illegal instructions or segfaults with our test= script. It was generated by the same ptxdist-2012.12.0 as before with the = same configurations for platform, kernel and so on. Only the toolchain was replaced by that from ELDK-5.4 (ftp://ftp.denx.de/pub/eldk/5.4/targets/armv7a/eldk-eglibc-i68= 6-arm-toolchain-gmae-5.4.sh).

 

I would love to understand what= is causing these problems and would highly appreciate any suggestion.=

 

Arno Euteneuer

 

 

 

--_000_49BD654B8E52F64BBC06DF3F86638DFAF59FF7MSE1MUCtopticacom_-- --===============2140760013== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline -- ptxdist mailing list ptxdist@pengutronix.de --===============2140760013==--