mailarchive of the ptxdist mailing list
 help / color / mirror / Atom feed
From: "Ulrich Ölmann" <u.oelmann@pengutronix.de>
To: ptxdist@pengutronix.de
Subject: Re: [ptxdist] [PATCH] rootfs: keep /var writable, even if the rootfs is read-only
Date: Wed, 05 Jun 2019 11:06:31 +0200	[thread overview]
Message-ID: <6rh894z9ig.fsf@pengutronix.de> (raw)
In-Reply-To: <20190604160020.30764-1-jbe@pengutronix.de>

Hi Jürgen,

please find some adjustments inline.

On Tue, Jun 04 2019 at 18:00 +0200, Juergen Borleis <jbe@pengutronix.de> wrote:
> Having a read-only root filesystem is always a source of pain and trouble.
> Many applications and tools expect to be able to store their state or
> caching data or at least their logs somewhere in the filesystem.
>
> The '/var' directory tree has a well known structure according to the
> "File System Hierarchy Standard" and is used by all carefully designed
> programs. Thus, this change provides a way to have this '/var' directory
> tree writable, even if the main root filesystem is mounted read-only. It
> uses an overlay filesystem and by default a RAM disk to store changed and
> added data to this directory tree in a non persistent manner.
>
> Due to the nature of the overlay filesystem the underlaying files from the
> main root filesystem can still be accessed.
>
> This approach requires the overlay filesystem support from the Linux
> kernel. In order to use it, the feature CONFIG_OVERLAY_FS must be enabled.
>
> A BSP can change the overlaying filesystem by providing its own
> 'run-varoverlay.mount' in order to restrict the used RAM disk differently
> or switch to a different local storage.
>
> Signed-off-by: Juergen Borleis <jbe@pengutronix.de>
> ---
>  doc/daily_work.inc                            | 97 +++++++++++++++++++
>  projectroot/etc/fstab                         |  6 +-
>  .../lib/systemd/system/run-varoverlayfs.mount | 10 ++
>  projectroot/usr/lib/systemd/system/var.mount  |  9 ++
>  projectroot/usr/sbin/mount.varoverlayfs       | 11 +++
>  rules/rootfs.in                               | 15 +++
>  rules/rootfs.make                             | 23 ++++-
>  7 files changed, 164 insertions(+), 7 deletions(-)
>  create mode 100644 projectroot/usr/lib/systemd/system/run-varoverlayfs.mount
>  create mode 100644 projectroot/usr/lib/systemd/system/var.mount
>  create mode 100644 projectroot/usr/sbin/mount.varoverlayfs
>
> diff --git a/doc/daily_work.inc b/doc/daily_work.inc
> index 74da11953..093f069bf 100644
> --- a/doc/daily_work.inc
> +++ b/doc/daily_work.inc
> @@ -1371,3 +1371,100 @@ in the build machine's filesystem also for the target filesystem image. With
>  a different ``umask`` than ``0022`` at build-time this may fail badly at
>  run-time with strange erroneous behaviour (for example some daemons with
>  regular user permissions cannot acces their own configuration files).
> +
> +Read Only Filesystem
> +--------------------
> +
> +A system can run a read-only root filesystem in order to have a unit which
> +can be powered off at any time, without any previous shutting down sequence.

s/shutting/shut/

> +
> +But many applications and tools are still expecting a writable filesystem to
> +temporarely store some kind of data or logging information for example. All

s/temporarely/temporarily/

> +these write attempts will fail and thus, the applications and tools will fail,
> +too.
> +
> +According to the *Filesystem Hierarchy Standard 2.3* the directory tree in
> +'/var/' is traditionally writable and its content is persistent across system
> +restarts. Thus, this directory tree is used by most applications and tools to
> +store their data.
> +
> +The *Filesystem Hierarchy Standard 2.3* defines the following directories
> +below '/var':
> +
> +- 'cache/': Application specific cache data
> +- 'crash/': System crash dumps
> +- 'lib/':   Application specific variable state information
> +- 'lock/':  Lock files
> +- 'log/':   Log files and directories
> +- 'run/':   Data relevant to run processes

s/run/running/

> +- 'spool/': Application spool data
> +- 'tmp/':   Temporary files preserved between system reboots
> +
> +Since this writable directory tree is useful and valid for full blown host

s/Since/Although/ ?

> +machines, an embedded system can behave differently here: For example the

s/For example the/for example a/

> +requirement can drop the persistency of changed data across reboots and always

s/persistency/persistence/

> +start with empty directories.
> +
> +Partially RAM Disks
> +~~~~~~~~~~~~~~~~~~~
> +
> +This is the default behaviour of PTXdist: it mounts a couple of RAM disks over
> +directories in ``/var`` expected to be writable by various applications and
> +tools. These RAM disks start alway in an empty state and are defined as follows:

s/alway/always/

> +
> ++-------------+---------------------------------------------------------------+
> +| mount point | mount options                                                 |
> ++=============+===============================================================+
> +| /var/log    | nosuid,nodev,noexec,mode=0755,size=10%                        |
> ++-------------+---------------------------------------------------------------+
> +| /var/lock   | nosuid,nodev,noexec,mode=0755,size=1M                         |
> ++-------------+---------------------------------------------------------------+
> +| /var/tmp    | nosuid,nodev,mode=1777,size=20%                               |
> ++-------------+---------------------------------------------------------------+
> +
> +This is a very simple and optimistic approach and works for surprisingly many use
> +cases. But some applications expect a writable ``/var/lib`` and will fail due
> +to this setup. Using an additional RAM disk for ``/var/lib`` might not help in
> +this use case, because it will bury at build-time generated data already present

s/bury at/bury all/

> +in this directory tree (``opkg`` package information for example or other
> +packages pre-defined configuration files).
> +
> +Overlay RAM Disk
> +~~~~~~~~~~~~~~~~
> +
> +A different approach to have a writable ``/var`` without persistency is to use

s/persistency/persistence/

> +a so called *overlay filesystem*. This *overlay filesystem* is a transparent
> +writable layer on top of the read-only filesystem. After system's start the

s/After system's start/After the system's start/

> +*overlay filesystem layer* is empty and all reads will be satisfied by the
> +underlaying read-only filesystem. Writes (new files, directories, changes of
> +existing files) are stored in the *overlay filesystem layer* and on the
> +next read satisfied by this layer instead of the underlaying read-only
> +filesystem.
> +
> +PTXdist supports this use case, by enabling the *overlay* feature for the ``/var``
> +directory in its configuration menu:
> +
> +.. code-block:: text
> +
> +   Root Filesystem                 --->
> +      directories in rootfs           --->
> +         [*]     overlay '/var' with RAM disk
> +
> +Keep in mind: this approach just enables write support to the ``/var`` directory
> +tree, but nothing stored/changed in there at run-time will be persistent and is
> +always lost if the system restarts. And each additional RAM disk consumes
> +additional main memory, and if applications and tools will fill up the directory
> +tree in ``/var`` the machine might run short on memory and slows down
> +dramatically.
> +
> +Thus, it is a good idea to check the amount of data written by applications and
> +tools to the ``/var`` directory tree and limit it by default.
> +You can limit the size of the *overlay filesystem* RAM disk as well. For this
> +you can provide your own
> +``projectroot/usr/lib/systemd/system/run-varoverlayfs.mount`` with restrictive
> +settings. But then the used applications and tools must deal with the
> +"no space left on device" error correctly...
> +
> +This *overlay filesystem* approach requires the *overlay filesystem feature*
> +from the Linux kernel. In order to use it, the feature CONFIG_OVERLAY_FS must
> +be enabled.
> diff --git a/projectroot/etc/fstab b/projectroot/etc/fstab
> index 0121c3076..c79c8de4d 100644
> --- a/projectroot/etc/fstab
> +++ b/projectroot/etc/fstab
> @@ -11,6 +11,6 @@ debugfs	/sys/kernel/debug	debugfs	noauto					0 0
>  # ramdisks
>  tmpfs	/tmp			tmpfs	nosuid,nodev,mode=1777,size=20%		0 0
>  tmpfs	/run			tmpfs	nosuid,nodev,strictatime,mode=0755	0 0
> -tmpfs	/var/log		tmpfs	nosuid,nodev,noexec,mode=0755,size=10%	0 0
> -tmpfs	/var/lock		tmpfs	nosuid,nodev,noexec,mode=0755,size=1M	0 0
> -tmpfs	/var/tmp		tmpfs	nosuid,nodev,mode=1777,size=20%		0 0
> +#log	/var/log		tmpfs	nosuid,nodev,noexec,mode=0755,size=10%	0 0
> +#lock	/var/lock		tmpfs	nosuid,nodev,noexec,mode=0755,size=1M	0 0
> +#tmp	/var/tmp		tmpfs	nosuid,nodev,mode=1777,size=20%		0 0
> diff --git a/projectroot/usr/lib/systemd/system/run-varoverlayfs.mount b/projectroot/usr/lib/systemd/system/run-varoverlayfs.mount
> new file mode 100644
> index 000000000..034dbfee1
> --- /dev/null
> +++ b/projectroot/usr/lib/systemd/system/run-varoverlayfs.mount
> @@ -0,0 +1,10 @@
> +[Unit]
> +Description=Overlay for '/var'
> +Before=local-fs.target
> +OnFailure=rescue.service
> +
> +[Mount]
> +Where=/run/varoverlayfs
> +What=tmpfs
> +Type=tmpfs
> +Options=nosuid,nodev,noexec,mode=0755,size=10%,nr_inodes=100
> diff --git a/projectroot/usr/lib/systemd/system/var.mount b/projectroot/usr/lib/systemd/system/var.mount
> new file mode 100644
> index 000000000..65bc81470
> --- /dev/null
> +++ b/projectroot/usr/lib/systemd/system/var.mount
> @@ -0,0 +1,9 @@
> +[Unit]
> +Description=Writeable support for '/var'
> +Before=local-fs.target
> +OnFailure=rescue.service
> +
> +[Mount]
> +Where=/var
> +What=varoverlayfs
> +Type=varoverlayfs
> diff --git a/projectroot/usr/sbin/mount.varoverlayfs b/projectroot/usr/sbin/mount.varoverlayfs
> new file mode 100644
> index 000000000..f50717aa3
> --- /dev/null
> +++ b/projectroot/usr/sbin/mount.varoverlayfs
> @@ -0,0 +1,11 @@
> +#!/bin/sh
> +# Mount helper tool to mount some kind of writeable filesystem over '/var'
> +# (which might be read-only).
> +# What kind of filesystem is used to mount over '/var' can be controlled via
> +# the 'run-varoverlayfs.mount' mount unit and is usually a RAM disk.
> +
> +systemctl start run-varoverlayfs.mount
> +mkdir -p /run/varoverlayfs/upper
> +mkdir -p /run/varoverlayfs/work
> +mount -t overlay overlay -olowerdir=/var,upperdir=/run/varoverlayfs/upper,workdir=/run/varoverlayfs/work /var
> +systemctl stop run-varoverlayfs.mount

Using a mount helper here feels very elegant - good idea! :-)

> diff --git a/rules/rootfs.in b/rules/rootfs.in
> index 04f7a5287..4d96779fa 100644
> --- a/rules/rootfs.in
> +++ b/rules/rootfs.in
> @@ -179,6 +179,21 @@ config ROOTFS_VAR
>
>  if ROOTFS_VAR
>
> +config ROOTFS_VAR_OVERLAYFS
> +	bool
> +	prompt "overlay '/var' with RAM disk"
> +	depends on INITMETHOD_SYSTEMD && !ROOTFS_VAR_VOLATILE
> +	help
> +	  This lets the whole '/var' content be writeable transparently via an
> +	  'overlayfs'.
> +	  Reading content happens from the underlaying root filesystem, while
> +	  changed content gets stored into a RAM disk instead. This enables all
> +	  applications to read initial data (configuration files for example)
> +	  and let them change this data even if the root filesystem is read-only.
> +	  Due to these behavior all changes made at run-time aren't persistent

s/these behavior/this behavior/

> +	  by default.
> +	  Read documentation chapter 'Read Only Filesystem' for further details.
> +
>  config ROOTFS_VAR_RUN
>  	bool
>  	select ROOTFS_RUN
> diff --git a/rules/rootfs.make b/rules/rootfs.make
> index ef5bba7df..aea04a7bf 100644
> --- a/rules/rootfs.make
> +++ b/rules/rootfs.make
> @@ -30,7 +30,7 @@ $(STATEDIR)/rootfs.targetinstall:
>  	@$(call install_fixup, rootfs,PRIORITY,optional)
>  	@$(call install_fixup, rootfs,SECTION,base)
>  	@$(call install_fixup, rootfs,AUTHOR,"Robert Schwebel <r.schwebel@pengutronix.de>")
> -	@$(call install_fixup, rootfs,DESCRIPTION,missing)
> +	@$(call install_fixup, rootfs,DESCRIPTION, "Filesystem Hierarchy Standard")

Is this and...

>
>  #	#
>  #	# install directories in rootfs
> @@ -100,7 +100,7 @@ ifdef PTXCONF_ROOTFS_VAR
>  	@$(call install_copy, rootfs, 0, 0, 0755, /var)
>  endif
>  ifdef PTXCONF_ROOTFS_VAR_LOG
> -	@$(call install_copy, rootfs, 0, 0, 0755, /var/log)
> +	@$(call install_copy, rootfs, 0, 0, 01777, /var/log)

... this and...

>  endif
>  ifdef PTXCONF_ROOTFS_VAR_RUN
>  	@$(call install_link, rootfs, ../run, /var/run)
> @@ -121,9 +121,13 @@ ifdef PTXCONF_ROOTFS_VAR_SPOOL_CRON
>  	@$(call install_copy, rootfs, 0, 0, 0755, /var/spool/cron)
>  endif
>  ifdef PTXCONF_ROOTFS_VAR_TMP
> -	@$(call install_copy, rootfs, 0, 0, 0755, /var/tmp)
> +	@$(call install_copy, rootfs, 0, 0, 01777, /var/tmp)

... this hunk necessary for your overlayfs approach? If not, perhaps put
it into a separate preparatory patch.

> +endif
> +ifdef PTXCONF_ROOTFS_VAR_OVERLAYFS
> +	@$(call install_alternative, rootfs, 0, 0, 0644, /usr/lib/systemd/system/run-varoverlayfs.mount)
> +	@$(call install_alternative, rootfs, 0, 0, 0755, /usr/sbin/mount.varoverlayfs)
> +	@$(call install_alternative, rootfs, 0, 0, 0644, /usr/lib/systemd/system/var.mount)
>  endif
> -
>
>  #	#
>  #	# install files in rootfs
> @@ -142,7 +146,18 @@ ifdef PTXCONF_ROOTFS_GSHADOW
>  endif
>  ifdef PTXCONF_ROOTFS_FSTAB
>  	@$(call install_alternative, rootfs, 0, 0, 0644, /etc/fstab)
> +ifndef PTXCONF_ROOTFS_VAR_OVERLAYFS
> +ifdef PTXCONF_ROOTFS_VAR_TMP
> +	@$(call install_replace, rootfs, /etc/fstab, #tmp, "tmpfs")
> +endif
> +ifdef PTXCONF_ROOTFS_VAR_LOG
> +	@$(call install_replace, rootfs, /etc/fstab, #log, "tmpfs")
> +endif
> +ifdef PTXCONF_ROOTFS_VAR_LOCK
> +	@$(call install_replace, rootfs, /etc/fstab, #lock, "tmpfs")
>  endif
> +endif # PTXCONF_ROOTFS_VAR_OVERLAYFS
> +endif # PTXCONF_ROOTFS_FSTAB
>  ifdef PTXCONF_ROOTFS_MTAB_FILE
>  	@$(call install_alternative, rootfs, 0, 0, 0644, /etc/mtab)
>  endif

Good work!

Best regards
Ulrich
--
Pengutronix e.K.                           | Ulrich Ölmann               |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

_______________________________________________
ptxdist mailing list
ptxdist@pengutronix.de

      reply	other threads:[~2019-06-05  9:06 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-04 16:00 Juergen Borleis
2019-06-05  9:06 ` Ulrich Ölmann [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6rh894z9ig.fsf@pengutronix.de \
    --to=u.oelmann@pengutronix.de \
    --cc=ptxdist@pengutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox