mailarchive of the ptxdist mailing list
 help / color / mirror / Atom feed
From: Juergen Borleis <jbe@pengutronix.de>
To: ptxdist@pengutronix.de
Subject: [ptxdist] [PATCH v3 01/10] rootfs: keep /var writable, even if the rootfs is read-only
Date: Fri, 28 Jun 2019 09:48:07 +0200	[thread overview]
Message-ID: <20190628074816.10115-2-jbe@pengutronix.de> (raw)
In-Reply-To: <20190628074816.10115-1-jbe@pengutronix.de>

Having a read-only root filesystem is always a source of pain and trouble.
Many applications and tools expect to be able to store their state or
caching data or at least their logs somewhere in the filesystem.

The '/var' directory tree has a well known structure according to the
"File System Hierarchy Standard" and is used by all carefully designed
programs. Thus, this change provides a way to have this '/var' directory
tree writable, even if the main root filesystem is mounted read-only. It
uses an overlay filesystem and by default a RAM disk to store changed and
added data to this directory tree in a non persistent manner.

Due to the nature of the overlay filesystem the underlaying files from the
main root filesystem can still be accessed.

This approach requires the overlay filesystem support from the Linux
kernel. In order to use it, the feature CONFIG_OVERLAY_FS must be enabled.

The ugly details to establish the required overlaying filesystem are hidden
behind a "mount helper" for a dummy filesystem (here called 'varoverlayfs').
Thus, a BSP can change the overlaying filesystem by providing its own
'run-varoverlay.mount' in order to restrict the default RAM disk
differently or to switch to a different local storage.

The '/etc/fstab' file gets touched in this change, to enable some already
used RAM disks on demand, to gain backward compatibility if no overlay
approach is used.

Signed-off-by: Juergen Borleis <jbe@pengutronix.de>
---
 doc/daily_work.inc                            | 101 ++++++++++++++++++
 projectroot/etc/fstab                         |   6 +-
 .../lib/systemd/system/run-varoverlayfs.mount |   9 ++
 projectroot/usr/lib/systemd/system/var.mount  |  11 ++
 projectroot/usr/sbin/mount.varoverlayfs       |  11 ++
 rules/rootfs.in                               |  66 +++++++-----
 rules/rootfs.make                             |  19 +++-
 7 files changed, 191 insertions(+), 32 deletions(-)
 create mode 100644 projectroot/usr/lib/systemd/system/run-varoverlayfs.mount
 create mode 100644 projectroot/usr/lib/systemd/system/var.mount
 create mode 100644 projectroot/usr/sbin/mount.varoverlayfs

diff --git a/doc/daily_work.inc b/doc/daily_work.inc
index 74da11953..6f1525aec 100644
--- a/doc/daily_work.inc
+++ b/doc/daily_work.inc
@@ -1371,3 +1371,104 @@ in the build machine's filesystem also for the target filesystem image. With
 a different ``umask`` than ``0022`` at build-time this may fail badly at
 run-time with strange erroneous behaviour (for example some daemons with
 regular user permissions cannot acces their own configuration files).
+
+Read Only Filesystem
+--------------------
+
+A system can run a read-only root filesystem in order to have a unit which
+can be powered off at any time, without any previous shut down sequence.
+
+But many applications and tools are still expecting a writable filesystem to
+temporarily store some kind of data or logging information for example. All
+these write attempts will fail and thus, the applications and tools will fail,
+too.
+
+According to the *Filesystem Hierarchy Standard 2.3* the directory tree in
+``/var/`` is traditionally writable and its content is persistent across system
+restarts. Thus, this directory tree is used by most applications and tools to
+store their data.
+
+The *Filesystem Hierarchy Standard 2.3* defines the following directories
+below ``/var/``:
+
+- ``cache/``: Application specific cache data
+- ``crash/``: System crash dumps
+- ``lib/``:   Application specific variable state information
+- ``lock/``:  Lock files
+- ``log/``:   Log files and directories
+- ``run/``:   Data relevant to running processes
+- ``spool/``: Application spool data
+- ``tmp/``:   Temporary files preserved between system reboots
+
+Although this writable directory tree is useful and valid for full blown host
+machines, an embedded system can behave differently here: For example a
+requirement can drop the persistency of changed data across reboots and always
+start with empty directories.
+
+Partially RAM Disks
+~~~~~~~~~~~~~~~~~~~
+
+This is the default behaviour of PTXdist: it mounts a couple of RAM disks over
+directories in ``/var`` expected to be writable by various applications and
+tools. These RAM disks start always in an empty state and are defined as follows:
+
++-------------+---------------------------------------------------------------+
+| mount point | mount options                                                 |
++=============+===============================================================+
+| /var/log    | nosuid,nodev,noexec,mode=0755,size=10%                        |
++-------------+---------------------------------------------------------------+
+| /var/lock   | nosuid,nodev,noexec,mode=0755,size=1M                         |
++-------------+---------------------------------------------------------------+
+| /var/tmp    | nosuid,nodev,mode=1777,size=20%                               |
++-------------+---------------------------------------------------------------+
+
+This is a very simple and optimistic approach and works for surprisingly many use
+cases. But some applications expect a writable ``/var/lib`` and will fail due
+to this setup. Using an additional RAM disk for ``/var/lib`` might not help in
+this use case, because it will bury all build-time generated data already present
+in this directory tree (package pre-defined configuration files for example).
+
+Overlay RAM Disk
+~~~~~~~~~~~~~~~~
+
+A different approach to have a writable ``/var`` without persistency is to use
+a so called *overlay filesystem*. This *overlay filesystem* is a transparent
+writable layer on top of a read-only filesystem. After the system's start the
+*overlay filesystem layer* is empty and all reads will be satisfied by the
+underlaying read-only filesystem. Writes (new files, directories, changes of
+existing files) are stored in the *overlay filesystem layer* and on the
+next read satisfied by this layer, instead of the underlaying read-only
+filesystem.
+
+PTXdist supports this use case, by enabling the *overlay* feature for the
+``/var`` directory in its configuration menu:
+
+.. code-block:: text
+
+   Root Filesystem                 --->
+      directories in rootfs           --->
+           /var                          --->
+              [*]     overlay '/var' with RAM disk
+
+Keep in mind: this approach just enables write support to the ``/var`` directory
+tree, but nothing stored/changed in there at run-time will be persistent and is
+always lost if the system restarts. And each additional RAM disk consumes
+additional main memory, and if applications and tools will fill up the directory
+tree in ``/var`` the machine might run short on memory and slows down
+dramatically.
+
+Thus, it is a good idea to check the amount of data written by applications and
+tools to the ``/var`` directory tree and limit it by default.
+You can limit the size of the *overlay filesystem* RAM disk as well. For this
+you can provide your own
+``projectroot/usr/lib/systemd/system/run-varoverlayfs.mount`` with restrictive
+settings. But then the used applications and tools must deal with the
+"no space left on device" error correctly...
+
+This *overlay filesystem* approach requires the *overlay filesystem feature*
+from the Linux kernel. In order to use it, the feature CONFIG_OVERLAY_FS must
+be enabled. A used mount option of the overlayfs in the default
+``projectroot/usr/lib/systemd/system/var.mount`` unit requires a Linux-4.18 or
+newer.
+If your kernel does not meet this requirement you can provide your own local
+and adapted variant of the mentioned mount unit.
diff --git a/projectroot/etc/fstab b/projectroot/etc/fstab
index 0121c3076..364b495a9 100644
--- a/projectroot/etc/fstab
+++ b/projectroot/etc/fstab
@@ -11,6 +11,6 @@ debugfs	/sys/kernel/debug	debugfs	noauto					0 0
 # ramdisks
 tmpfs	/tmp			tmpfs	nosuid,nodev,mode=1777,size=20%		0 0
 tmpfs	/run			tmpfs	nosuid,nodev,strictatime,mode=0755	0 0
-tmpfs	/var/log		tmpfs	nosuid,nodev,noexec,mode=0755,size=10%	0 0
-tmpfs	/var/lock		tmpfs	nosuid,nodev,noexec,mode=0755,size=1M	0 0
-tmpfs	/var/tmp		tmpfs	nosuid,nodev,mode=1777,size=20%		0 0
+@VAR_OVERLAYFS@tmpfs	/var/log		tmpfs	nosuid,nodev,noexec,mode=0755,size=10%	0 0
+@VAR_OVERLAYFS@tmpfs	/var/lock		tmpfs	nosuid,nodev,noexec,mode=0755,size=1M	0 0
+@VAR_OVERLAYFS@tmpfs	/var/tmp		tmpfs	nosuid,nodev,mode=1777,size=20%		0 0
diff --git a/projectroot/usr/lib/systemd/system/run-varoverlayfs.mount b/projectroot/usr/lib/systemd/system/run-varoverlayfs.mount
new file mode 100644
index 000000000..c067b9b96
--- /dev/null
+++ b/projectroot/usr/lib/systemd/system/run-varoverlayfs.mount
@@ -0,0 +1,9 @@
+[Unit]
+Description=Overlay for '/var'
+Before=local-fs.target
+
+[Mount]
+Where=/run/varoverlayfs
+What=tmpfs
+Type=tmpfs
+Options=size=20%
diff --git a/projectroot/usr/lib/systemd/system/var.mount b/projectroot/usr/lib/systemd/system/var.mount
new file mode 100644
index 000000000..bd6350237
--- /dev/null
+++ b/projectroot/usr/lib/systemd/system/var.mount
@@ -0,0 +1,11 @@
+[Unit]
+Description=Writable support for '/var'
+After=run-varoverlayfs.mount
+Before=local-fs.target
+
+[Mount]
+Where=/var
+# note: this is a dummy filesystem only to trigger the corresponding mount helper
+What=varoverlayfs
+Type=varoverlayfs
+Options=metacopy=on
diff --git a/projectroot/usr/sbin/mount.varoverlayfs b/projectroot/usr/sbin/mount.varoverlayfs
new file mode 100644
index 000000000..f8fc8c88f
--- /dev/null
+++ b/projectroot/usr/sbin/mount.varoverlayfs
@@ -0,0 +1,11 @@
+#!/bin/sh -e
+# Mount helper tool to mount some kind of writable filesystem over '/var'
+# (which might be read-only).
+# What kind of filesystem is used to mount over '/var' can be controlled via
+# the 'run-varoverlayfs.mount' mount unit and is usually a RAM disk.
+
+mkdir -p /run/varoverlayfs/upper
+mkdir -p /run/varoverlayfs/work
+mount -t overlay -olowerdir=/var,upperdir=/run/varoverlayfs/upper,workdir=/run/varoverlayfs/work "${@}"
+systemctl stop run-varoverlayfs.mount
+rmdir /run/varoverlayfs
diff --git a/rules/rootfs.in b/rules/rootfs.in
index f105dc477..f9951ffec 100644
--- a/rules/rootfs.in
+++ b/rules/rootfs.in
@@ -171,76 +171,90 @@ config ROOTFS_TMP
 
 menu "/var                        "
 
+config ROOTFS_VAR_OVERLAYFS
+	bool
+	prompt "overlay '/var' with RAM disk"
+	depends on INITMETHOD_SYSTEMD
+	help
+	  This lets the whole '/var' content be writable transparently via an
+	  'overlayfs'.
+	  Reading content happens from the underlaying root filesystem, while
+	  changed content gets stored into a RAM disk instead. This enables all
+	  applications to read initial data (configuration files for example)
+	  and let them change this data even if the root filesystem is read-only.
+	  Due to this behavior all changes made at run-time aren't persistent
+	  by default.
+	  Read documentation chapter 'Read Only Filesystem' for further details.
+	  In order to use the default mount units and mount options, you need
+	  to enable the 'mkdir' and 'rmdir' commands (from 'coreutils' or
+	  'busybox') and use a Linux kernel 4.18 or newer. By replacing the
+	  default files in
+	  'projectroot/usr/lib/systemd/system/run-varoverlayfs.mount',
+	  'projectroot/usr/lib/systemd/system/var.mount' and
+	  'projectroot/usr/sbin/mount.varoverlayfs' by your own variants,
+	  you can adapt these requirements.
+
 config ROOTFS_VAR_RUN
 	bool
 	select ROOTFS_RUN
 	prompt "/var/run"
 	default y
 	help
-	  This will not create a directory but a symlink to /run.
-	  Unless you want to mount a tmpfs on /var you should
-	  say yes here.
+	  Ensure a '/var/run' directory is available at run-time. This will
+	  always be a symlink to '/run'.
 
 config ROOTFS_VAR_LOG
 	bool
 	prompt "/var/log"
 	default y
 	help
-	  Create a /var/log directory in the root filesystem.
-	  Unless you want to mount a tmpfs on /var you should
-	  say yes here.
+	  This directory is intended for log files and directories. Say 'y' here
+	  to ensure a '/var/log' directory is available at run-time.
 
 config ROOTFS_VAR_LOCK
 	bool
 	prompt "/var/lock"
 	default y
 	help
-	  Create a /var/lock directory in the root filesystem.
-	  Unless you want to mount a tmpfs on /var you should
-	  say yes here.
+	  This directory is intended for application lock files. Say 'y' here
+	  to ensure a '/var/lock' directory is available at run-time.
 
 config ROOTFS_VAR_LIB
 	bool
 	prompt "/var/lib"
 	help
-	  Create a /var/lib directory in the root filesystem.
-	  Unless you want to mount a tmpfs on /var you should
-	  say yes here.
-	  If you are going to run an NFS server with file locking
-	  support this folder must be persistent!
+	  This directory is intended for application variable state information.
+	  Say 'y' here to ensure a '/var/lib' directory is available at
+	  run-time.
 
 config ROOTFS_VAR_CACHE
 	bool
 	prompt "/var/cache"
 	help
-	  Create a /var/cache directory in the root filesystem.
-	  Unless you want to mount a tmpfs on /var you should
-	  say yes here.
+	  This directory is intended for application cache data. Say 'y' here
+	  to ensure a '/var/cache' directory is available at run-time.
 
 config ROOTFS_VAR_SPOOL
 	bool
 	prompt "/var/spool"
 	help
-	  Create a /var/spool directory in the root filesystem.
-	  Unless you want to mount a tmpfs on /var you should
-	  say yes here.
+	  This directory is intended for application spool data. Say 'y' here to
+	  ensure a '/var/spool' directory is available at run-time.
 
 config ROOTFS_VAR_SPOOL_CRON
 	bool
 	prompt "/var/spool/cron"
 	help
-	  Create a /var/spool/cron directory in the root filesystem.
-	  Unless you want to mount a tmpfs on /var you should
-	  say yes here.
+	  Create a '/var/spool/cron' directory in the root filesystem.
 
 config ROOTFS_VAR_TMP
 	bool
 	prompt "/var/tmp"
 	default y
 	help
-	  Create a /var/tmp directory in the root filesystem.
-	  Unless you want to mount a tmpfs on /var you should
-	  say yes here.
+	  This directory is intended for temporary files preserved between
+	  system reboots. Say 'y' here to ensure a '/var/tmp' directory is
+	  available at run-time.
 
 endmenu
 endif # ROOTFS
diff --git a/rules/rootfs.make b/rules/rootfs.make
index 7164521a8..d7b7eccdc 100644
--- a/rules/rootfs.make
+++ b/rules/rootfs.make
@@ -30,7 +30,7 @@ $(STATEDIR)/rootfs.targetinstall:
 	@$(call install_fixup, rootfs,PRIORITY,optional)
 	@$(call install_fixup, rootfs,SECTION,base)
 	@$(call install_fixup, rootfs,AUTHOR,"Robert Schwebel <r.schwebel@pengutronix.de>")
-	@$(call install_fixup, rootfs,DESCRIPTION,missing)
+	@$(call install_fixup, rootfs,DESCRIPTION, "Filesystem Hierarchy Standard")
 
 #	#
 #	# install directories in rootfs
@@ -121,7 +121,18 @@ endif
 ifdef PTXCONF_ROOTFS_VAR_TMP
 	@$(call install_copy, rootfs, 0, 0, 01777, /var/tmp)
 endif
-
+ifdef PTXCONF_ROOTFS_VAR_OVERLAYFS
+	@$(call install_alternative, rootfs, 0, 0, 0644, \
+		/usr/lib/systemd/system/run-varoverlayfs.mount)
+	@$(call install_link, rootfs, ../run-varoverlayfs.mount, \
+		/usr/lib/systemd/system/local-fs.target.requires/run-varoverlayfs.mount)
+	@$(call install_alternative, rootfs, 0, 0, 0755, \
+		/usr/sbin/mount.varoverlayfs)
+	@$(call install_alternative, rootfs, 0, 0, 0644, \
+		/usr/lib/systemd/system/var.mount)
+	@$(call install_link, rootfs, ../var.mount, \
+		/usr/lib/systemd/system/local-fs.target.requires/var.mount)
+endif
 
 #	#
 #	# install files in rootfs
@@ -140,7 +151,9 @@ ifdef PTXCONF_ROOTFS_GSHADOW
 endif
 ifdef PTXCONF_ROOTFS_FSTAB
 	@$(call install_alternative, rootfs, 0, 0, 0644, /etc/fstab)
-endif
+	@$(call install_replace, rootfs, /etc/fstab, @VAR_OVERLAYFS@, \
+		$(call ptx/ifdef,PTXCONF_ROOTFS_VAR_OVERLAYFS,#))
+endif # PTXCONF_ROOTFS_FSTAB
 ifdef PTXCONF_ROOTFS_MTAB_FILE
 	@$(call install_alternative, rootfs, 0, 0, 0644, /etc/mtab)
 endif
-- 
2.20.1


_______________________________________________
ptxdist mailing list
ptxdist@pengutronix.de

  reply	other threads:[~2019-06-28  7:48 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-28  7:48 [ptxdist] '/var/ handling Juergen Borleis
2019-06-28  7:48 ` Juergen Borleis [this message]
2019-06-28  7:48 ` [ptxdist] [PATCH v3 02/10] conman: adapt package's '/var/lib' requirements Juergen Borleis
2019-06-28  7:48 ` [ptxdist] [PATCH v3 03/10] networkmanager: " Juergen Borleis
2019-06-28  7:48 ` [ptxdist] [PATCH v3 04/10] ntp: adapt package's '/var/lib' and '/var/log' requirements Juergen Borleis
2019-06-28  7:48 ` [ptxdist] [PATCH v3 05/10] dnsmasq: adapt package's '/var/lib' requirements Juergen Borleis
2019-06-28  7:48 ` [ptxdist] [PATCH v3 06/10] mariadb: " Juergen Borleis
2019-06-28  7:48 ` [ptxdist] [PATCH v3 07/10] samba: adapt package's '/var/lib' and '/var/cache' requirements Juergen Borleis
2019-06-28  7:48 ` [ptxdist] [PATCH v3 08/10] samba: show SysV related menue on demand only Juergen Borleis
2019-06-28  7:48 ` [ptxdist] [PATCH v3 09/10] nfsutils: adapt package's '/var/lib' requirements Juergen Borleis
2019-06-28  7:48 ` [ptxdist] [PATCH v3 10/10] logrotate: " Juergen Borleis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190628074816.10115-2-jbe@pengutronix.de \
    --to=jbe@pengutronix.de \
    --cc=ptxdist@pengutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox