From: Juergen Borleis <jbe@pengutronix.de>
To: ptxdist@pengutronix.de
Subject: [ptxdist] [PATCH 01/20] rootfs: keep /var writable, even if the rootfs is read-only
Date: Wed, 5 Jun 2019 14:54:02 +0200 [thread overview]
Message-ID: <20190605125421.20087-2-jbe@pengutronix.de> (raw)
In-Reply-To: <20190605125421.20087-1-jbe@pengutronix.de>
Having a read-only root filesystem is always a source of pain and trouble.
Many applications and tools expect to be able to store their state or
caching data or at least their logs somewhere in the filesystem.
The '/var' directory tree has a well known structure according to the
"File System Hierarchy Standard" and is used by all carefully designed
programs. Thus, this change provides a way to have this '/var' directory
tree writable, even if the main root filesystem is mounted read-only. It
uses an overlay filesystem and by default a RAM disk to store changed and
added data to this directory tree in a non persistent manner.
Due to the nature of the overlay filesystem the underlaying files from the
main root filesystem can still be accessed.
This approach requires the overlay filesystem support from the Linux
kernel. In order to use it, the feature CONFIG_OVERLAY_FS must be enabled.
The ugly details to establish the required overlaying filesystem are hidden
behind a "mount helper" for a dummy filesystem (here called 'varoverlayfs').
Thus, a BSP can change the overlaying filesystem by providing its own
'run-varoverlay.mount' in order to restrict the default RAM disk
differently or to switch to a different local storage.
The '/etc/fstab' file gets touched in this change, to enable some already
used RAM disks on demand, to gain backward compatibility if no overlay
approach is used.
Signed-off-by: Juergen Borleis <jbe@pengutronix.de>
---
doc/daily_work.inc | 97 +++++++++++++++++++
projectroot/etc/fstab | 6 +-
.../lib/systemd/system/run-varoverlayfs.mount | 10 ++
projectroot/usr/lib/systemd/system/var.mount | 10 ++
projectroot/usr/sbin/mount.varoverlayfs | 11 +++
rules/rootfs.in | 58 ++++++-----
rules/rootfs.make | 19 +++-
7 files changed, 180 insertions(+), 31 deletions(-)
create mode 100644 projectroot/usr/lib/systemd/system/run-varoverlayfs.mount
create mode 100644 projectroot/usr/lib/systemd/system/var.mount
create mode 100644 projectroot/usr/sbin/mount.varoverlayfs
diff --git a/doc/daily_work.inc b/doc/daily_work.inc
index 74da11953..470c14f93 100644
--- a/doc/daily_work.inc
+++ b/doc/daily_work.inc
@@ -1371,3 +1371,100 @@ in the build machine's filesystem also for the target filesystem image. With
a different ``umask`` than ``0022`` at build-time this may fail badly at
run-time with strange erroneous behaviour (for example some daemons with
regular user permissions cannot acces their own configuration files).
+
+Read Only Filesystem
+--------------------
+
+A system can run a read-only root filesystem in order to have a unit which
+can be powered off at any time, without any previous shut down sequence.
+
+But many applications and tools are still expecting a writable filesystem to
+temporarily store some kind of data or logging information for example. All
+these write attempts will fail and thus, the applications and tools will fail,
+too.
+
+According to the *Filesystem Hierarchy Standard 2.3* the directory tree in
+``/var/`` is traditionally writable and its content is persistent across system
+restarts. Thus, this directory tree is used by most applications and tools to
+store their data.
+
+The *Filesystem Hierarchy Standard 2.3* defines the following directories
+below ``/var/``:
+
+- ``cache/``: Application specific cache data
+- ``crash/``: System crash dumps
+- ``lib/``: Application specific variable state information
+- ``lock/``: Lock files
+- ``log/``: Log files and directories
+- ``run/``: Data relevant to running processes
+- ``spool/``: Application spool data
+- ``tmp/``: Temporary files preserved between system reboots
+
+Although this writable directory tree is useful and valid for full blown host
+machines, an embedded system can behave differently here: For example a
+requirement can drop the persistency of changed data across reboots and always
+start with empty directories.
+
+Partially RAM Disks
+~~~~~~~~~~~~~~~~~~~
+
+This is the default behaviour of PTXdist: it mounts a couple of RAM disks over
+directories in ``/var`` expected to be writable by various applications and
+tools. These RAM disks start always in an empty state and are defined as follows:
+
++-------------+---------------------------------------------------------------+
+| mount point | mount options |
++=============+===============================================================+
+| /var/log | nosuid,nodev,noexec,mode=0755,size=10% |
++-------------+---------------------------------------------------------------+
+| /var/lock | nosuid,nodev,noexec,mode=0755,size=1M |
++-------------+---------------------------------------------------------------+
+| /var/tmp | nosuid,nodev,mode=1777,size=20% |
++-------------+---------------------------------------------------------------+
+
+This is a very simple and optimistic approach and works for surprisingly many use
+cases. But some applications expect a writable ``/var/lib`` and will fail due
+to this setup. Using an additional RAM disk for ``/var/lib`` might not help in
+this use case, because it will bury all build-time generated data already present
+in this directory tree (``opkg`` package information for example or other
+packages pre-defined configuration files).
+
+Overlay RAM Disk
+~~~~~~~~~~~~~~~~
+
+A different approach to have a writable ``/var`` without persistency is to use
+a so called *overlay filesystem*. This *overlay filesystem* is a transparent
+writable layer on top of the read-only filesystem. After the system's start the
+*overlay filesystem layer* is empty and all reads will be satisfied by the
+underlaying read-only filesystem. Writes (new files, directories, changes of
+existing files) are stored in the *overlay filesystem layer* and on the
+next read satisfied by this layer, instead of the underlaying read-only
+filesystem.
+
+PTXdist supports this use case, by enabling the *overlay* feature for the ``/var``
+directory in its configuration menu:
+
+.. code-block:: text
+
+ Root Filesystem --->
+ directories in rootfs --->
+ [*] overlay '/var' with RAM disk
+
+Keep in mind: this approach just enables write support to the ``/var`` directory
+tree, but nothing stored/changed in there at run-time will be persistent and is
+always lost if the system restarts. And each additional RAM disk consumes
+additional main memory, and if applications and tools will fill up the directory
+tree in ``/var`` the machine might run short on memory and slows down
+dramatically.
+
+Thus, it is a good idea to check the amount of data written by applications and
+tools to the ``/var`` directory tree and limit it by default.
+You can limit the size of the *overlay filesystem* RAM disk as well. For this
+you can provide your own
+``projectroot/usr/lib/systemd/system/run-varoverlayfs.mount`` with restrictive
+settings. But then the used applications and tools must deal with the
+"no space left on device" error correctly...
+
+This *overlay filesystem* approach requires the *overlay filesystem feature*
+from the Linux kernel. In order to use it, the feature CONFIG_OVERLAY_FS must
+be enabled.
diff --git a/projectroot/etc/fstab b/projectroot/etc/fstab
index 0121c3076..c79c8de4d 100644
--- a/projectroot/etc/fstab
+++ b/projectroot/etc/fstab
@@ -11,6 +11,6 @@ debugfs /sys/kernel/debug debugfs noauto 0 0
# ramdisks
tmpfs /tmp tmpfs nosuid,nodev,mode=1777,size=20% 0 0
tmpfs /run tmpfs nosuid,nodev,strictatime,mode=0755 0 0
-tmpfs /var/log tmpfs nosuid,nodev,noexec,mode=0755,size=10% 0 0
-tmpfs /var/lock tmpfs nosuid,nodev,noexec,mode=0755,size=1M 0 0
-tmpfs /var/tmp tmpfs nosuid,nodev,mode=1777,size=20% 0 0
+#log /var/log tmpfs nosuid,nodev,noexec,mode=0755,size=10% 0 0
+#lock /var/lock tmpfs nosuid,nodev,noexec,mode=0755,size=1M 0 0
+#tmp /var/tmp tmpfs nosuid,nodev,mode=1777,size=20% 0 0
diff --git a/projectroot/usr/lib/systemd/system/run-varoverlayfs.mount b/projectroot/usr/lib/systemd/system/run-varoverlayfs.mount
new file mode 100644
index 000000000..034dbfee1
--- /dev/null
+++ b/projectroot/usr/lib/systemd/system/run-varoverlayfs.mount
@@ -0,0 +1,10 @@
+[Unit]
+Description=Overlay for '/var'
+Before=local-fs.target
+OnFailure=rescue.service
+
+[Mount]
+Where=/run/varoverlayfs
+What=tmpfs
+Type=tmpfs
+Options=nosuid,nodev,noexec,mode=0755,size=10%,nr_inodes=100
diff --git a/projectroot/usr/lib/systemd/system/var.mount b/projectroot/usr/lib/systemd/system/var.mount
new file mode 100644
index 000000000..764108924
--- /dev/null
+++ b/projectroot/usr/lib/systemd/system/var.mount
@@ -0,0 +1,10 @@
+[Unit]
+Description=Writable support for '/var'
+Before=local-fs.target
+OnFailure=rescue.service
+
+[Mount]
+Where=/var
+# note: this is a dummy filesystem only to trigger the corresponding mount helper
+What=varoverlayfs
+Type=varoverlayfs
diff --git a/projectroot/usr/sbin/mount.varoverlayfs b/projectroot/usr/sbin/mount.varoverlayfs
new file mode 100644
index 000000000..afd5f2076
--- /dev/null
+++ b/projectroot/usr/sbin/mount.varoverlayfs
@@ -0,0 +1,11 @@
+#!/bin/sh
+# Mount helper tool to mount some kind of writable filesystem over '/var'
+# (which might be read-only).
+# What kind of filesystem is used to mount over '/var' can be controlled via
+# the 'run-varoverlayfs.mount' mount unit and is usually a RAM disk.
+
+systemctl start run-varoverlayfs.mount
+mkdir -p /run/varoverlayfs/upper
+mkdir -p /run/varoverlayfs/work
+mount -t overlay overlay -olowerdir=/var,upperdir=/run/varoverlayfs/upper,workdir=/run/varoverlayfs/work /var
+systemctl stop run-varoverlayfs.mount
diff --git a/rules/rootfs.in b/rules/rootfs.in
index 04f7a5287..d844f825e 100644
--- a/rules/rootfs.in
+++ b/rules/rootfs.in
@@ -179,76 +179,82 @@ config ROOTFS_VAR
if ROOTFS_VAR
+config ROOTFS_VAR_OVERLAYFS
+ bool
+ prompt "overlay '/var' with RAM disk"
+ depends on INITMETHOD_SYSTEMD && !ROOTFS_VAR_VOLATILE
+ help
+ This lets the whole '/var' content be writable transparently via an
+ 'overlayfs'.
+ Reading content happens from the underlaying root filesystem, while
+ changed content gets stored into a RAM disk instead. This enables all
+ applications to read initial data (configuration files for example)
+ and let them change this data even if the root filesystem is read-only.
+ Due to this behavior all changes made at run-time aren't persistent
+ by default.
+ Read documentation chapter 'Read Only Filesystem' for further details.
+
config ROOTFS_VAR_RUN
bool
select ROOTFS_RUN
prompt "/var/run"
default y
help
- This will not create a directory but a symlink to /run.
- Unless you want to mount a tmpfs on /var you should
- say yes here.
+ Ensure a '/var/run' directory is available at run-time. This will
+ always be a symlink to '/run'.
config ROOTFS_VAR_LOG
bool
prompt "/var/log"
default y
help
- Create a /var/log directory in the root filesystem.
- Unless you want to mount a tmpfs on /var you should
- say yes here.
+ This directory is intended for log files and directories. Say 'y' here
+ to ensure a '/var/log' directory is available at run-time.
config ROOTFS_VAR_LOCK
bool
prompt "/var/lock"
default y
help
- Create a /var/lock directory in the root filesystem.
- Unless you want to mount a tmpfs on /var you should
- say yes here.
+ This directory is intended for application lock files. Say 'y' here
+ to ensure a '/var/lock' directory is available at run-time.
config ROOTFS_VAR_LIB
bool
prompt "/var/lib"
help
- Create a /var/lib directory in the root filesystem.
- Unless you want to mount a tmpfs on /var you should
- say yes here.
- If you are going to run an NFS server with file locking
- support this folder must be persistent!
+ This directory is intended for application variable state information.
+ Say 'y' here to ensure a '/var/lib' directory is available at
+ run-time.
config ROOTFS_VAR_CACHE
bool
prompt "/var/cache"
help
- Create a /var/cache directory in the root filesystem.
- Unless you want to mount a tmpfs on /var you should
- say yes here.
+ This directory is intended for application cache data. Say 'y' here
+ to ensure a '/var/cache' directory is available at run-time.
config ROOTFS_VAR_SPOOL
bool
prompt "/var/spool"
help
- Create a /var/spool directory in the root filesystem.
- Unless you want to mount a tmpfs on /var you should
- say yes here.
+ This directory is intended for application spool data. Say 'y' here to
+ ensure a '/var/spool' directory is available at run-time.
config ROOTFS_VAR_SPOOL_CRON
bool
prompt "/var/spool/cron"
help
- Create a /var/spool/cron directory in the root filesystem.
- Unless you want to mount a tmpfs on /var you should
- say yes here.
+ Create a '/var/spool/cron' directory in the root filesystem.
config ROOTFS_VAR_TMP
bool
prompt "/var/tmp"
default y
help
- Create a /var/tmp directory in the root filesystem.
- Unless you want to mount a tmpfs on /var you should
- say yes here.
+ This directory is intended for temporary files preserved between
+ system reboots. Say 'y' here to ensure a '/var/tmp' directory is
+ available at run-time.
endif # ROOTFS_VAR
endif # ROOTFS
diff --git a/rules/rootfs.make b/rules/rootfs.make
index ef5bba7df..21250e775 100644
--- a/rules/rootfs.make
+++ b/rules/rootfs.make
@@ -30,7 +30,7 @@ $(STATEDIR)/rootfs.targetinstall:
@$(call install_fixup, rootfs,PRIORITY,optional)
@$(call install_fixup, rootfs,SECTION,base)
@$(call install_fixup, rootfs,AUTHOR,"Robert Schwebel <r.schwebel@pengutronix.de>")
- @$(call install_fixup, rootfs,DESCRIPTION,missing)
+ @$(call install_fixup, rootfs,DESCRIPTION, "Filesystem Hierarchy Standard")
# #
# # install directories in rootfs
@@ -123,7 +123,11 @@ endif
ifdef PTXCONF_ROOTFS_VAR_TMP
@$(call install_copy, rootfs, 0, 0, 0755, /var/tmp)
endif
-
+ifdef PTXCONF_ROOTFS_VAR_OVERLAYFS
+ @$(call install_alternative, rootfs, 0, 0, 0644, /usr/lib/systemd/system/run-varoverlayfs.mount)
+ @$(call install_alternative, rootfs, 0, 0, 0755, /usr/sbin/mount.varoverlayfs)
+ @$(call install_alternative, rootfs, 0, 0, 0644, /usr/lib/systemd/system/var.mount)
+endif
# #
# # install files in rootfs
@@ -142,7 +146,18 @@ ifdef PTXCONF_ROOTFS_GSHADOW
endif
ifdef PTXCONF_ROOTFS_FSTAB
@$(call install_alternative, rootfs, 0, 0, 0644, /etc/fstab)
+ifndef PTXCONF_ROOTFS_VAR_OVERLAYFS
+ifdef PTXCONF_ROOTFS_VAR_TMP
+ @$(call install_replace, rootfs, /etc/fstab, #tmp, "tmpfs")
+endif
+ifdef PTXCONF_ROOTFS_VAR_LOG
+ @$(call install_replace, rootfs, /etc/fstab, #log, "tmpfs")
+endif
+ifdef PTXCONF_ROOTFS_VAR_LOCK
+ @$(call install_replace, rootfs, /etc/fstab, #lock, "tmpfs")
endif
+endif # PTXCONF_ROOTFS_VAR_OVERLAYFS
+endif # PTXCONF_ROOTFS_FSTAB
ifdef PTXCONF_ROOTFS_MTAB_FILE
@$(call install_alternative, rootfs, 0, 0, 0644, /etc/mtab)
endif
--
2.20.1
_______________________________________________
ptxdist mailing list
ptxdist@pengutronix.de
next prev parent reply other threads:[~2019-06-05 12:54 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-05 12:54 [ptxdist] More collected changes Juergen Borleis
2019-06-05 12:54 ` Juergen Borleis [this message]
2019-06-24 6:48 ` [ptxdist] [PATCH] fixup! rootfs: keep /var writable, even if the rootfs is read-only Michael Olbrich
2019-06-24 6:57 ` Michael Olbrich
2019-06-05 12:54 ` [ptxdist] [PATCH 02/20] rootfs: keep '/var/tmp' permissions in sync with 'systemd' Juergen Borleis
2019-06-05 12:54 ` [ptxdist] [PATCH 03/20] rootfs: '/var' is a mandatory directory according to FHS Juergen Borleis
2019-06-05 12:54 ` [ptxdist] [PATCH 04/20] conman: adapt package's '/var/lib' requirements Juergen Borleis
2019-06-11 8:17 ` Michael Olbrich
2019-06-05 12:54 ` [ptxdist] [PATCH 05/20] networkmanager: " Juergen Borleis
2019-06-05 12:54 ` [ptxdist] [PATCH 06/20] ntp: adapt package's '/var/lib' and '/var/log' requirements Juergen Borleis
2019-06-05 12:54 ` [ptxdist] [PATCH 07/20] dnsmasq: version bump 2.79 -> 2.80 Juergen Borleis
2019-06-05 12:54 ` [ptxdist] [PATCH 08/20] dnsmasq: clean up rule file Juergen Borleis
2019-06-05 12:54 ` [ptxdist] [PATCH 09/20] dnsmasq: adapt package's '/var/lib' requirements Juergen Borleis
2019-06-05 12:54 ` [ptxdist] [PATCH 10/20] mariadb: " Juergen Borleis
2019-06-05 12:54 ` [ptxdist] [PATCH 11/20] samba: adapt package's '/var/lib' and '/var/cache' requirements Juergen Borleis
2019-06-07 7:15 ` Michael Olbrich
2019-06-05 12:54 ` [ptxdist] [PATCH 12/20] polkit: adapt package's '/var/lib' requirements Juergen Borleis
2019-06-05 12:54 ` [ptxdist] [PATCH 13/20] nfsutils: " Juergen Borleis
2019-06-07 7:25 ` Michael Olbrich
2019-06-05 12:54 ` [ptxdist] [PATCH 14/20] logrotate: " Juergen Borleis
2019-06-05 12:54 ` [ptxdist] [PATCH 15/20] systemd: adapt package's '/var/lib' and '/var/cache' requirements Juergen Borleis
2019-06-05 12:54 ` [ptxdist] [PATCH 16/20] sysstat: adapt package's '/var/log' requirements Juergen Borleis
2019-06-05 12:54 ` [ptxdist] [PATCH 17/20] minicoredumper: adapt package's '/var/cache' requirements Juergen Borleis
2019-06-07 7:27 ` Michael Olbrich
2019-06-05 12:54 ` [ptxdist] [PATCH 18/20] opkg: move opkg-database to a read-only location Juergen Borleis
2019-06-11 8:19 ` Michael Olbrich
2019-06-05 12:54 ` [ptxdist] [PATCH 19/20] dbus: adapt run-time socket path Juergen Borleis
2019-06-05 12:54 ` [ptxdist] [PATCH 20/20] Update and sync autotools based package templates Juergen Borleis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190605125421.20087-2-jbe@pengutronix.de \
--to=jbe@pengutronix.de \
--cc=ptxdist@pengutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox