From: Juergen Borleis <jbe@pengutronix.de>
To: ptxdist@pengutronix.de
Subject: [ptxdist] [PATCH] rootfs: keep /var writable, even if the rootfs is read-only
Date: Tue, 4 Jun 2019 18:00:20 +0200 [thread overview]
Message-ID: <20190604160020.30764-1-jbe@pengutronix.de> (raw)
Having a read-only root filesystem is always a source of pain and trouble.
Many applications and tools expect to be able to store their state or
caching data or at least their logs somewhere in the filesystem.
The '/var' directory tree has a well known structure according to the
"File System Hierarchy Standard" and is used by all carefully designed
programs. Thus, this change provides a way to have this '/var' directory
tree writable, even if the main root filesystem is mounted read-only. It
uses an overlay filesystem and by default a RAM disk to store changed and
added data to this directory tree in a non persistent manner.
Due to the nature of the overlay filesystem the underlaying files from the
main root filesystem can still be accessed.
This approach requires the overlay filesystem support from the Linux
kernel. In order to use it, the feature CONFIG_OVERLAY_FS must be enabled.
A BSP can change the overlaying filesystem by providing its own
'run-varoverlay.mount' in order to restrict the used RAM disk differently
or switch to a different local storage.
Signed-off-by: Juergen Borleis <jbe@pengutronix.de>
---
doc/daily_work.inc | 97 +++++++++++++++++++
projectroot/etc/fstab | 6 +-
.../lib/systemd/system/run-varoverlayfs.mount | 10 ++
projectroot/usr/lib/systemd/system/var.mount | 9 ++
projectroot/usr/sbin/mount.varoverlayfs | 11 +++
rules/rootfs.in | 15 +++
rules/rootfs.make | 23 ++++-
7 files changed, 164 insertions(+), 7 deletions(-)
create mode 100644 projectroot/usr/lib/systemd/system/run-varoverlayfs.mount
create mode 100644 projectroot/usr/lib/systemd/system/var.mount
create mode 100644 projectroot/usr/sbin/mount.varoverlayfs
diff --git a/doc/daily_work.inc b/doc/daily_work.inc
index 74da11953..093f069bf 100644
--- a/doc/daily_work.inc
+++ b/doc/daily_work.inc
@@ -1371,3 +1371,100 @@ in the build machine's filesystem also for the target filesystem image. With
a different ``umask`` than ``0022`` at build-time this may fail badly at
run-time with strange erroneous behaviour (for example some daemons with
regular user permissions cannot acces their own configuration files).
+
+Read Only Filesystem
+--------------------
+
+A system can run a read-only root filesystem in order to have a unit which
+can be powered off at any time, without any previous shutting down sequence.
+
+But many applications and tools are still expecting a writable filesystem to
+temporarely store some kind of data or logging information for example. All
+these write attempts will fail and thus, the applications and tools will fail,
+too.
+
+According to the *Filesystem Hierarchy Standard 2.3* the directory tree in
+'/var/' is traditionally writable and its content is persistent across system
+restarts. Thus, this directory tree is used by most applications and tools to
+store their data.
+
+The *Filesystem Hierarchy Standard 2.3* defines the following directories
+below '/var':
+
+- 'cache/': Application specific cache data
+- 'crash/': System crash dumps
+- 'lib/': Application specific variable state information
+- 'lock/': Lock files
+- 'log/': Log files and directories
+- 'run/': Data relevant to run processes
+- 'spool/': Application spool data
+- 'tmp/': Temporary files preserved between system reboots
+
+Since this writable directory tree is useful and valid for full blown host
+machines, an embedded system can behave differently here: For example the
+requirement can drop the persistency of changed data across reboots and always
+start with empty directories.
+
+Partially RAM Disks
+~~~~~~~~~~~~~~~~~~~
+
+This is the default behaviour of PTXdist: it mounts a couple of RAM disks over
+directories in ``/var`` expected to be writable by various applications and
+tools. These RAM disks start alway in an empty state and are defined as follows:
+
++-------------+---------------------------------------------------------------+
+| mount point | mount options |
++=============+===============================================================+
+| /var/log | nosuid,nodev,noexec,mode=0755,size=10% |
++-------------+---------------------------------------------------------------+
+| /var/lock | nosuid,nodev,noexec,mode=0755,size=1M |
++-------------+---------------------------------------------------------------+
+| /var/tmp | nosuid,nodev,mode=1777,size=20% |
++-------------+---------------------------------------------------------------+
+
+This is a very simple and optimistic approach and works for surprisingly many use
+cases. But some applications expect a writable ``/var/lib`` and will fail due
+to this setup. Using an additional RAM disk for ``/var/lib`` might not help in
+this use case, because it will bury at build-time generated data already present
+in this directory tree (``opkg`` package information for example or other
+packages pre-defined configuration files).
+
+Overlay RAM Disk
+~~~~~~~~~~~~~~~~
+
+A different approach to have a writable ``/var`` without persistency is to use
+a so called *overlay filesystem*. This *overlay filesystem* is a transparent
+writable layer on top of the read-only filesystem. After system's start the
+*overlay filesystem layer* is empty and all reads will be satisfied by the
+underlaying read-only filesystem. Writes (new files, directories, changes of
+existing files) are stored in the *overlay filesystem layer* and on the
+next read satisfied by this layer instead of the underlaying read-only
+filesystem.
+
+PTXdist supports this use case, by enabling the *overlay* feature for the ``/var``
+directory in its configuration menu:
+
+.. code-block:: text
+
+ Root Filesystem --->
+ directories in rootfs --->
+ [*] overlay '/var' with RAM disk
+
+Keep in mind: this approach just enables write support to the ``/var`` directory
+tree, but nothing stored/changed in there at run-time will be persistent and is
+always lost if the system restarts. And each additional RAM disk consumes
+additional main memory, and if applications and tools will fill up the directory
+tree in ``/var`` the machine might run short on memory and slows down
+dramatically.
+
+Thus, it is a good idea to check the amount of data written by applications and
+tools to the ``/var`` directory tree and limit it by default.
+You can limit the size of the *overlay filesystem* RAM disk as well. For this
+you can provide your own
+``projectroot/usr/lib/systemd/system/run-varoverlayfs.mount`` with restrictive
+settings. But then the used applications and tools must deal with the
+"no space left on device" error correctly...
+
+This *overlay filesystem* approach requires the *overlay filesystem feature*
+from the Linux kernel. In order to use it, the feature CONFIG_OVERLAY_FS must
+be enabled.
diff --git a/projectroot/etc/fstab b/projectroot/etc/fstab
index 0121c3076..c79c8de4d 100644
--- a/projectroot/etc/fstab
+++ b/projectroot/etc/fstab
@@ -11,6 +11,6 @@ debugfs /sys/kernel/debug debugfs noauto 0 0
# ramdisks
tmpfs /tmp tmpfs nosuid,nodev,mode=1777,size=20% 0 0
tmpfs /run tmpfs nosuid,nodev,strictatime,mode=0755 0 0
-tmpfs /var/log tmpfs nosuid,nodev,noexec,mode=0755,size=10% 0 0
-tmpfs /var/lock tmpfs nosuid,nodev,noexec,mode=0755,size=1M 0 0
-tmpfs /var/tmp tmpfs nosuid,nodev,mode=1777,size=20% 0 0
+#log /var/log tmpfs nosuid,nodev,noexec,mode=0755,size=10% 0 0
+#lock /var/lock tmpfs nosuid,nodev,noexec,mode=0755,size=1M 0 0
+#tmp /var/tmp tmpfs nosuid,nodev,mode=1777,size=20% 0 0
diff --git a/projectroot/usr/lib/systemd/system/run-varoverlayfs.mount b/projectroot/usr/lib/systemd/system/run-varoverlayfs.mount
new file mode 100644
index 000000000..034dbfee1
--- /dev/null
+++ b/projectroot/usr/lib/systemd/system/run-varoverlayfs.mount
@@ -0,0 +1,10 @@
+[Unit]
+Description=Overlay for '/var'
+Before=local-fs.target
+OnFailure=rescue.service
+
+[Mount]
+Where=/run/varoverlayfs
+What=tmpfs
+Type=tmpfs
+Options=nosuid,nodev,noexec,mode=0755,size=10%,nr_inodes=100
diff --git a/projectroot/usr/lib/systemd/system/var.mount b/projectroot/usr/lib/systemd/system/var.mount
new file mode 100644
index 000000000..65bc81470
--- /dev/null
+++ b/projectroot/usr/lib/systemd/system/var.mount
@@ -0,0 +1,9 @@
+[Unit]
+Description=Writeable support for '/var'
+Before=local-fs.target
+OnFailure=rescue.service
+
+[Mount]
+Where=/var
+What=varoverlayfs
+Type=varoverlayfs
diff --git a/projectroot/usr/sbin/mount.varoverlayfs b/projectroot/usr/sbin/mount.varoverlayfs
new file mode 100644
index 000000000..f50717aa3
--- /dev/null
+++ b/projectroot/usr/sbin/mount.varoverlayfs
@@ -0,0 +1,11 @@
+#!/bin/sh
+# Mount helper tool to mount some kind of writeable filesystem over '/var'
+# (which might be read-only).
+# What kind of filesystem is used to mount over '/var' can be controlled via
+# the 'run-varoverlayfs.mount' mount unit and is usually a RAM disk.
+
+systemctl start run-varoverlayfs.mount
+mkdir -p /run/varoverlayfs/upper
+mkdir -p /run/varoverlayfs/work
+mount -t overlay overlay -olowerdir=/var,upperdir=/run/varoverlayfs/upper,workdir=/run/varoverlayfs/work /var
+systemctl stop run-varoverlayfs.mount
diff --git a/rules/rootfs.in b/rules/rootfs.in
index 04f7a5287..4d96779fa 100644
--- a/rules/rootfs.in
+++ b/rules/rootfs.in
@@ -179,6 +179,21 @@ config ROOTFS_VAR
if ROOTFS_VAR
+config ROOTFS_VAR_OVERLAYFS
+ bool
+ prompt "overlay '/var' with RAM disk"
+ depends on INITMETHOD_SYSTEMD && !ROOTFS_VAR_VOLATILE
+ help
+ This lets the whole '/var' content be writeable transparently via an
+ 'overlayfs'.
+ Reading content happens from the underlaying root filesystem, while
+ changed content gets stored into a RAM disk instead. This enables all
+ applications to read initial data (configuration files for example)
+ and let them change this data even if the root filesystem is read-only.
+ Due to these behavior all changes made at run-time aren't persistent
+ by default.
+ Read documentation chapter 'Read Only Filesystem' for further details.
+
config ROOTFS_VAR_RUN
bool
select ROOTFS_RUN
diff --git a/rules/rootfs.make b/rules/rootfs.make
index ef5bba7df..aea04a7bf 100644
--- a/rules/rootfs.make
+++ b/rules/rootfs.make
@@ -30,7 +30,7 @@ $(STATEDIR)/rootfs.targetinstall:
@$(call install_fixup, rootfs,PRIORITY,optional)
@$(call install_fixup, rootfs,SECTION,base)
@$(call install_fixup, rootfs,AUTHOR,"Robert Schwebel <r.schwebel@pengutronix.de>")
- @$(call install_fixup, rootfs,DESCRIPTION,missing)
+ @$(call install_fixup, rootfs,DESCRIPTION, "Filesystem Hierarchy Standard")
# #
# # install directories in rootfs
@@ -100,7 +100,7 @@ ifdef PTXCONF_ROOTFS_VAR
@$(call install_copy, rootfs, 0, 0, 0755, /var)
endif
ifdef PTXCONF_ROOTFS_VAR_LOG
- @$(call install_copy, rootfs, 0, 0, 0755, /var/log)
+ @$(call install_copy, rootfs, 0, 0, 01777, /var/log)
endif
ifdef PTXCONF_ROOTFS_VAR_RUN
@$(call install_link, rootfs, ../run, /var/run)
@@ -121,9 +121,13 @@ ifdef PTXCONF_ROOTFS_VAR_SPOOL_CRON
@$(call install_copy, rootfs, 0, 0, 0755, /var/spool/cron)
endif
ifdef PTXCONF_ROOTFS_VAR_TMP
- @$(call install_copy, rootfs, 0, 0, 0755, /var/tmp)
+ @$(call install_copy, rootfs, 0, 0, 01777, /var/tmp)
+endif
+ifdef PTXCONF_ROOTFS_VAR_OVERLAYFS
+ @$(call install_alternative, rootfs, 0, 0, 0644, /usr/lib/systemd/system/run-varoverlayfs.mount)
+ @$(call install_alternative, rootfs, 0, 0, 0755, /usr/sbin/mount.varoverlayfs)
+ @$(call install_alternative, rootfs, 0, 0, 0644, /usr/lib/systemd/system/var.mount)
endif
-
# #
# # install files in rootfs
@@ -142,7 +146,18 @@ ifdef PTXCONF_ROOTFS_GSHADOW
endif
ifdef PTXCONF_ROOTFS_FSTAB
@$(call install_alternative, rootfs, 0, 0, 0644, /etc/fstab)
+ifndef PTXCONF_ROOTFS_VAR_OVERLAYFS
+ifdef PTXCONF_ROOTFS_VAR_TMP
+ @$(call install_replace, rootfs, /etc/fstab, #tmp, "tmpfs")
+endif
+ifdef PTXCONF_ROOTFS_VAR_LOG
+ @$(call install_replace, rootfs, /etc/fstab, #log, "tmpfs")
+endif
+ifdef PTXCONF_ROOTFS_VAR_LOCK
+ @$(call install_replace, rootfs, /etc/fstab, #lock, "tmpfs")
endif
+endif # PTXCONF_ROOTFS_VAR_OVERLAYFS
+endif # PTXCONF_ROOTFS_FSTAB
ifdef PTXCONF_ROOTFS_MTAB_FILE
@$(call install_alternative, rootfs, 0, 0, 0644, /etc/mtab)
endif
--
2.20.1
_______________________________________________
ptxdist mailing list
ptxdist@pengutronix.de
next reply other threads:[~2019-06-04 16:00 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-04 16:00 Juergen Borleis [this message]
2019-06-05 9:06 ` Ulrich Ölmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190604160020.30764-1-jbe@pengutronix.de \
--to=jbe@pengutronix.de \
--cc=ptxdist@pengutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox