From: "Ulrich Ölmann" <u.oelmann@pengutronix.de>
To: ptxdist@pengutronix.de
Subject: Re: [ptxdist] [PATCH] rootfs: keep /var writable, even if the rootfs is read-only
Date: Wed, 05 Jun 2019 11:06:31 +0200 [thread overview]
Message-ID: <6rh894z9ig.fsf@pengutronix.de> (raw)
In-Reply-To: <20190604160020.30764-1-jbe@pengutronix.de>
Hi Jürgen,
please find some adjustments inline.
On Tue, Jun 04 2019 at 18:00 +0200, Juergen Borleis <jbe@pengutronix.de> wrote:
> Having a read-only root filesystem is always a source of pain and trouble.
> Many applications and tools expect to be able to store their state or
> caching data or at least their logs somewhere in the filesystem.
>
> The '/var' directory tree has a well known structure according to the
> "File System Hierarchy Standard" and is used by all carefully designed
> programs. Thus, this change provides a way to have this '/var' directory
> tree writable, even if the main root filesystem is mounted read-only. It
> uses an overlay filesystem and by default a RAM disk to store changed and
> added data to this directory tree in a non persistent manner.
>
> Due to the nature of the overlay filesystem the underlaying files from the
> main root filesystem can still be accessed.
>
> This approach requires the overlay filesystem support from the Linux
> kernel. In order to use it, the feature CONFIG_OVERLAY_FS must be enabled.
>
> A BSP can change the overlaying filesystem by providing its own
> 'run-varoverlay.mount' in order to restrict the used RAM disk differently
> or switch to a different local storage.
>
> Signed-off-by: Juergen Borleis <jbe@pengutronix.de>
> ---
> doc/daily_work.inc | 97 +++++++++++++++++++
> projectroot/etc/fstab | 6 +-
> .../lib/systemd/system/run-varoverlayfs.mount | 10 ++
> projectroot/usr/lib/systemd/system/var.mount | 9 ++
> projectroot/usr/sbin/mount.varoverlayfs | 11 +++
> rules/rootfs.in | 15 +++
> rules/rootfs.make | 23 ++++-
> 7 files changed, 164 insertions(+), 7 deletions(-)
> create mode 100644 projectroot/usr/lib/systemd/system/run-varoverlayfs.mount
> create mode 100644 projectroot/usr/lib/systemd/system/var.mount
> create mode 100644 projectroot/usr/sbin/mount.varoverlayfs
>
> diff --git a/doc/daily_work.inc b/doc/daily_work.inc
> index 74da11953..093f069bf 100644
> --- a/doc/daily_work.inc
> +++ b/doc/daily_work.inc
> @@ -1371,3 +1371,100 @@ in the build machine's filesystem also for the target filesystem image. With
> a different ``umask`` than ``0022`` at build-time this may fail badly at
> run-time with strange erroneous behaviour (for example some daemons with
> regular user permissions cannot acces their own configuration files).
> +
> +Read Only Filesystem
> +--------------------
> +
> +A system can run a read-only root filesystem in order to have a unit which
> +can be powered off at any time, without any previous shutting down sequence.
s/shutting/shut/
> +
> +But many applications and tools are still expecting a writable filesystem to
> +temporarely store some kind of data or logging information for example. All
s/temporarely/temporarily/
> +these write attempts will fail and thus, the applications and tools will fail,
> +too.
> +
> +According to the *Filesystem Hierarchy Standard 2.3* the directory tree in
> +'/var/' is traditionally writable and its content is persistent across system
> +restarts. Thus, this directory tree is used by most applications and tools to
> +store their data.
> +
> +The *Filesystem Hierarchy Standard 2.3* defines the following directories
> +below '/var':
> +
> +- 'cache/': Application specific cache data
> +- 'crash/': System crash dumps
> +- 'lib/': Application specific variable state information
> +- 'lock/': Lock files
> +- 'log/': Log files and directories
> +- 'run/': Data relevant to run processes
s/run/running/
> +- 'spool/': Application spool data
> +- 'tmp/': Temporary files preserved between system reboots
> +
> +Since this writable directory tree is useful and valid for full blown host
s/Since/Although/ ?
> +machines, an embedded system can behave differently here: For example the
s/For example the/for example a/
> +requirement can drop the persistency of changed data across reboots and always
s/persistency/persistence/
> +start with empty directories.
> +
> +Partially RAM Disks
> +~~~~~~~~~~~~~~~~~~~
> +
> +This is the default behaviour of PTXdist: it mounts a couple of RAM disks over
> +directories in ``/var`` expected to be writable by various applications and
> +tools. These RAM disks start alway in an empty state and are defined as follows:
s/alway/always/
> +
> ++-------------+---------------------------------------------------------------+
> +| mount point | mount options |
> ++=============+===============================================================+
> +| /var/log | nosuid,nodev,noexec,mode=0755,size=10% |
> ++-------------+---------------------------------------------------------------+
> +| /var/lock | nosuid,nodev,noexec,mode=0755,size=1M |
> ++-------------+---------------------------------------------------------------+
> +| /var/tmp | nosuid,nodev,mode=1777,size=20% |
> ++-------------+---------------------------------------------------------------+
> +
> +This is a very simple and optimistic approach and works for surprisingly many use
> +cases. But some applications expect a writable ``/var/lib`` and will fail due
> +to this setup. Using an additional RAM disk for ``/var/lib`` might not help in
> +this use case, because it will bury at build-time generated data already present
s/bury at/bury all/
> +in this directory tree (``opkg`` package information for example or other
> +packages pre-defined configuration files).
> +
> +Overlay RAM Disk
> +~~~~~~~~~~~~~~~~
> +
> +A different approach to have a writable ``/var`` without persistency is to use
s/persistency/persistence/
> +a so called *overlay filesystem*. This *overlay filesystem* is a transparent
> +writable layer on top of the read-only filesystem. After system's start the
s/After system's start/After the system's start/
> +*overlay filesystem layer* is empty and all reads will be satisfied by the
> +underlaying read-only filesystem. Writes (new files, directories, changes of
> +existing files) are stored in the *overlay filesystem layer* and on the
> +next read satisfied by this layer instead of the underlaying read-only
> +filesystem.
> +
> +PTXdist supports this use case, by enabling the *overlay* feature for the ``/var``
> +directory in its configuration menu:
> +
> +.. code-block:: text
> +
> + Root Filesystem --->
> + directories in rootfs --->
> + [*] overlay '/var' with RAM disk
> +
> +Keep in mind: this approach just enables write support to the ``/var`` directory
> +tree, but nothing stored/changed in there at run-time will be persistent and is
> +always lost if the system restarts. And each additional RAM disk consumes
> +additional main memory, and if applications and tools will fill up the directory
> +tree in ``/var`` the machine might run short on memory and slows down
> +dramatically.
> +
> +Thus, it is a good idea to check the amount of data written by applications and
> +tools to the ``/var`` directory tree and limit it by default.
> +You can limit the size of the *overlay filesystem* RAM disk as well. For this
> +you can provide your own
> +``projectroot/usr/lib/systemd/system/run-varoverlayfs.mount`` with restrictive
> +settings. But then the used applications and tools must deal with the
> +"no space left on device" error correctly...
> +
> +This *overlay filesystem* approach requires the *overlay filesystem feature*
> +from the Linux kernel. In order to use it, the feature CONFIG_OVERLAY_FS must
> +be enabled.
> diff --git a/projectroot/etc/fstab b/projectroot/etc/fstab
> index 0121c3076..c79c8de4d 100644
> --- a/projectroot/etc/fstab
> +++ b/projectroot/etc/fstab
> @@ -11,6 +11,6 @@ debugfs /sys/kernel/debug debugfs noauto 0 0
> # ramdisks
> tmpfs /tmp tmpfs nosuid,nodev,mode=1777,size=20% 0 0
> tmpfs /run tmpfs nosuid,nodev,strictatime,mode=0755 0 0
> -tmpfs /var/log tmpfs nosuid,nodev,noexec,mode=0755,size=10% 0 0
> -tmpfs /var/lock tmpfs nosuid,nodev,noexec,mode=0755,size=1M 0 0
> -tmpfs /var/tmp tmpfs nosuid,nodev,mode=1777,size=20% 0 0
> +#log /var/log tmpfs nosuid,nodev,noexec,mode=0755,size=10% 0 0
> +#lock /var/lock tmpfs nosuid,nodev,noexec,mode=0755,size=1M 0 0
> +#tmp /var/tmp tmpfs nosuid,nodev,mode=1777,size=20% 0 0
> diff --git a/projectroot/usr/lib/systemd/system/run-varoverlayfs.mount b/projectroot/usr/lib/systemd/system/run-varoverlayfs.mount
> new file mode 100644
> index 000000000..034dbfee1
> --- /dev/null
> +++ b/projectroot/usr/lib/systemd/system/run-varoverlayfs.mount
> @@ -0,0 +1,10 @@
> +[Unit]
> +Description=Overlay for '/var'
> +Before=local-fs.target
> +OnFailure=rescue.service
> +
> +[Mount]
> +Where=/run/varoverlayfs
> +What=tmpfs
> +Type=tmpfs
> +Options=nosuid,nodev,noexec,mode=0755,size=10%,nr_inodes=100
> diff --git a/projectroot/usr/lib/systemd/system/var.mount b/projectroot/usr/lib/systemd/system/var.mount
> new file mode 100644
> index 000000000..65bc81470
> --- /dev/null
> +++ b/projectroot/usr/lib/systemd/system/var.mount
> @@ -0,0 +1,9 @@
> +[Unit]
> +Description=Writeable support for '/var'
> +Before=local-fs.target
> +OnFailure=rescue.service
> +
> +[Mount]
> +Where=/var
> +What=varoverlayfs
> +Type=varoverlayfs
> diff --git a/projectroot/usr/sbin/mount.varoverlayfs b/projectroot/usr/sbin/mount.varoverlayfs
> new file mode 100644
> index 000000000..f50717aa3
> --- /dev/null
> +++ b/projectroot/usr/sbin/mount.varoverlayfs
> @@ -0,0 +1,11 @@
> +#!/bin/sh
> +# Mount helper tool to mount some kind of writeable filesystem over '/var'
> +# (which might be read-only).
> +# What kind of filesystem is used to mount over '/var' can be controlled via
> +# the 'run-varoverlayfs.mount' mount unit and is usually a RAM disk.
> +
> +systemctl start run-varoverlayfs.mount
> +mkdir -p /run/varoverlayfs/upper
> +mkdir -p /run/varoverlayfs/work
> +mount -t overlay overlay -olowerdir=/var,upperdir=/run/varoverlayfs/upper,workdir=/run/varoverlayfs/work /var
> +systemctl stop run-varoverlayfs.mount
Using a mount helper here feels very elegant - good idea! :-)
> diff --git a/rules/rootfs.in b/rules/rootfs.in
> index 04f7a5287..4d96779fa 100644
> --- a/rules/rootfs.in
> +++ b/rules/rootfs.in
> @@ -179,6 +179,21 @@ config ROOTFS_VAR
>
> if ROOTFS_VAR
>
> +config ROOTFS_VAR_OVERLAYFS
> + bool
> + prompt "overlay '/var' with RAM disk"
> + depends on INITMETHOD_SYSTEMD && !ROOTFS_VAR_VOLATILE
> + help
> + This lets the whole '/var' content be writeable transparently via an
> + 'overlayfs'.
> + Reading content happens from the underlaying root filesystem, while
> + changed content gets stored into a RAM disk instead. This enables all
> + applications to read initial data (configuration files for example)
> + and let them change this data even if the root filesystem is read-only.
> + Due to these behavior all changes made at run-time aren't persistent
s/these behavior/this behavior/
> + by default.
> + Read documentation chapter 'Read Only Filesystem' for further details.
> +
> config ROOTFS_VAR_RUN
> bool
> select ROOTFS_RUN
> diff --git a/rules/rootfs.make b/rules/rootfs.make
> index ef5bba7df..aea04a7bf 100644
> --- a/rules/rootfs.make
> +++ b/rules/rootfs.make
> @@ -30,7 +30,7 @@ $(STATEDIR)/rootfs.targetinstall:
> @$(call install_fixup, rootfs,PRIORITY,optional)
> @$(call install_fixup, rootfs,SECTION,base)
> @$(call install_fixup, rootfs,AUTHOR,"Robert Schwebel <r.schwebel@pengutronix.de>")
> - @$(call install_fixup, rootfs,DESCRIPTION,missing)
> + @$(call install_fixup, rootfs,DESCRIPTION, "Filesystem Hierarchy Standard")
Is this and...
>
> # #
> # # install directories in rootfs
> @@ -100,7 +100,7 @@ ifdef PTXCONF_ROOTFS_VAR
> @$(call install_copy, rootfs, 0, 0, 0755, /var)
> endif
> ifdef PTXCONF_ROOTFS_VAR_LOG
> - @$(call install_copy, rootfs, 0, 0, 0755, /var/log)
> + @$(call install_copy, rootfs, 0, 0, 01777, /var/log)
... this and...
> endif
> ifdef PTXCONF_ROOTFS_VAR_RUN
> @$(call install_link, rootfs, ../run, /var/run)
> @@ -121,9 +121,13 @@ ifdef PTXCONF_ROOTFS_VAR_SPOOL_CRON
> @$(call install_copy, rootfs, 0, 0, 0755, /var/spool/cron)
> endif
> ifdef PTXCONF_ROOTFS_VAR_TMP
> - @$(call install_copy, rootfs, 0, 0, 0755, /var/tmp)
> + @$(call install_copy, rootfs, 0, 0, 01777, /var/tmp)
... this hunk necessary for your overlayfs approach? If not, perhaps put
it into a separate preparatory patch.
> +endif
> +ifdef PTXCONF_ROOTFS_VAR_OVERLAYFS
> + @$(call install_alternative, rootfs, 0, 0, 0644, /usr/lib/systemd/system/run-varoverlayfs.mount)
> + @$(call install_alternative, rootfs, 0, 0, 0755, /usr/sbin/mount.varoverlayfs)
> + @$(call install_alternative, rootfs, 0, 0, 0644, /usr/lib/systemd/system/var.mount)
> endif
> -
>
> # #
> # # install files in rootfs
> @@ -142,7 +146,18 @@ ifdef PTXCONF_ROOTFS_GSHADOW
> endif
> ifdef PTXCONF_ROOTFS_FSTAB
> @$(call install_alternative, rootfs, 0, 0, 0644, /etc/fstab)
> +ifndef PTXCONF_ROOTFS_VAR_OVERLAYFS
> +ifdef PTXCONF_ROOTFS_VAR_TMP
> + @$(call install_replace, rootfs, /etc/fstab, #tmp, "tmpfs")
> +endif
> +ifdef PTXCONF_ROOTFS_VAR_LOG
> + @$(call install_replace, rootfs, /etc/fstab, #log, "tmpfs")
> +endif
> +ifdef PTXCONF_ROOTFS_VAR_LOCK
> + @$(call install_replace, rootfs, /etc/fstab, #lock, "tmpfs")
> endif
> +endif # PTXCONF_ROOTFS_VAR_OVERLAYFS
> +endif # PTXCONF_ROOTFS_FSTAB
> ifdef PTXCONF_ROOTFS_MTAB_FILE
> @$(call install_alternative, rootfs, 0, 0, 0644, /etc/mtab)
> endif
Good work!
Best regards
Ulrich
--
Pengutronix e.K. | Ulrich Ölmann |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
_______________________________________________
ptxdist mailing list
ptxdist@pengutronix.de
prev parent reply other threads:[~2019-06-05 9:06 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-04 16:00 Juergen Borleis
2019-06-05 9:06 ` Ulrich Ölmann [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6rh894z9ig.fsf@pengutronix.de \
--to=u.oelmann@pengutronix.de \
--cc=ptxdist@pengutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox