From: Roland Hieber <>
Cc: Roland Hieber <>,
	Felicitas Jung <>
Subject: [ptxdist] [PATCH] doc: working with licensing information in packages
Date: Tue,  8 Jun 2021 12:36:40 +0200	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

Co-authored-by: Felicitas Jung <>
Signed-off-by: Felicitas Jung <>
Signed-off-by: Roland Hieber <>
v1 -> v2:
 - rebase to current master
 - squash PATCH 1/2 ("link to the SPDX license list")
 - move from daily use into dev manual chapter
 - expand and rewrite some parts completely
 - absorb old content in doc/dev_add_new_pkgs.rst
 - address feedback from Michael Olbrich:
   - check all source files instead of "some relevant-sounding files"
   - introduce "custom" and "custom-exception" identifiers instead of
   - be restrictive and err on the side of caution when interpreting
   - shortly mention the AND, OR and bracket syntaxes

 doc/contributing.rst       |   4 +
 doc/         |   2 +
 doc/dev_add_new_pkgs.rst   |  46 +------
 doc/dev_licenses.rst       | 243 +++++++++++++++++++++++++++++++++++++
 doc/dev_manual.rst         |   1 +
 doc/ref_make_variables.rst |  20 ++-
 6 files changed, 267 insertions(+), 49 deletions(-)
 create mode 100644 doc/dev_licenses.rst

diff --git a/doc/contributing.rst b/doc/contributing.rst
index bdaddee245a9..496998c913f7 100644
--- a/doc/contributing.rst
+++ b/doc/contributing.rst
@@ -103,6 +103,10 @@ updated of removed after a version bump. Unknown PTXCONF_* variables or
 macros used in menu files. There are often typos or the variables was just
+New packages must also have licensing information in the ``<PKG>_LICENSE``
+and ``<PKG>_LICENSE_FILES`` variables.
+Refer to the section :ref:`licensing_in_packages` for more information.
 Helper Scripts
diff --git a/doc/ b/doc/
index 8fe7739aa0c8..ab901a54ee60 100644
--- a/doc/
+++ b/doc/
@@ -1480,3 +1480,5 @@ be enabled. A used mount option of the overlayfs in the default
 If your kernel does not meet this requirement you can provide your own local
 and adapted variant of the mentioned mount unit.
+.. include::
diff --git a/doc/dev_add_new_pkgs.rst b/doc/dev_add_new_pkgs.rst
index 4ae2765c2ce9..a9e8fcf236c4 100644
--- a/doc/dev_add_new_pkgs.rst
+++ b/doc/dev_add_new_pkgs.rst
@@ -248,6 +248,7 @@ PTXdist specific. What does it mean:
 -  ``*_LICENSE`` enables the user to get a list of licenses she/he is
    using in her/his project (licenses of the enabled packages).
+   See :ref:`licensing_in_packages` below for detailed information.
 After enabling the menu entry, we can start to check the *get* and
 *extract* stages, calling them manually one after another.
@@ -604,51 +605,6 @@ This will re-start with a **clean** BSP and builds exactly the new package and
 its (known) dependencies. If this builds successfully as well we are really done
 with the new package.
-Some Notes about Licenses
-The already mentioned rule variable ``*_LICENSE`` (e.g. ``FOO_LICENSE`` in our
-example) is very important and must be filled by the developer of the package.
-Many licenses bring in obligations using the corresponding package (*attribution*
-for example). To make life easier for everybody the license for a package must
-be provided. *SPDX* license identifiers unify the license names and are used
-in PTXdist to identify license types and obligations.
-If a package comes with more than one license, all of their SPDX identifiers
-must be listed and connected with the keyword ``AND``. If your package comes
-with GPL-2.0 and LGPL-2.1 licenses, the definition should look like this:
-.. code-block:: make
-One specific obligation cannot be detected examining the SPDX license identifiers
-by PTXdist: *the license choice*. In this case all licenses of choice must be
-listed and connected by the keyword ``OR``.
-If, for example, your obligation is to select one of the licenses *GPL-2.0* **or**
-*GPL-3.0*, the ``*_LICENSE`` variable should look like this:
-.. code-block:: make
-   FOO_LICENSE := GPL-2.0 OR GPL-3.0
-SPDX License Identifiers
-A list of SPDX license identifiers can be found here:
-Help to Detect the Correct License
-License identification isn't trivial. A help in doing so can be the following
-repository and its content. It contains a list of known licenses based on their
-SPDX identifier. The content is without formatting to simplify text search.
 Advanced Rule Files
diff --git a/doc/dev_licenses.rst b/doc/dev_licenses.rst
new file mode 100644
index 000000000000..06b4decd7728
--- /dev/null
+++ b/doc/dev_licenses.rst
@@ -0,0 +1,243 @@
+.. _licensing_in_packages:
+Tracking licensing information in packages
+PTXdist aims to track licensing information for every package.
+This includes the license(s) under which a package can be distributed,
+as well as the respective files in the package's source tree that state those terms.
+Sadly there is no widely adopted standard for machine-readable licensing
+information in source code (`yet <>`_),
+so here are a few hints where to look.
+In that process, we aim to collect the baseline set of licenses
+which at least apply to a package.
+There may be other licenses which apply too, but the complete set often cannot
+be found without a time-consuming review.
+Still, the extracted license information in PTXdist can serve as a hint for
+the full license compliance process,
+and can help to exclude certain software under certain licenses from the build.
+There are many older package rules in PTXdist which don't specify licensing information.
+If you want to help complete the database,
+you can use ``grep -L _LICENSE_FILES rules/*.make`` (in the PTXdist tree) to find those rules.
+Note however that this cannot find wrong or incomplete licensing information.
+Finding licensing information
+You should first select and extract the package in question, and then have a
+look at in the extracted package sources (usually something like
+``platform-nnn/build-target/mypackage-1.0`` in your BSP, if in doubt see
+``ptxdist package-info mypackage``).
+* Check for files named ``COPYING``, ``COPYRIGHT``,  or ``LICENSE``.
+  These often only contain the license text and, in case of GPL, no information
+  if the code is available under the *-only* or *-or-later* variant.
+  Sometimes these files are in a folder ``/doc`` or ``/legal``.
+* Check the ``README``, if there is any.
+  Often there is important information there, e.g. in case of GPL if the
+  software is *GPL-x.x-or-later* or *GPL-x.x-only*.
+* Check source files, like ``*.c`` for license headers.
+  Often additional information can be found here.
+* If you want to be extra sure, use a license compliance toolchain (e.g.
+  `FOSSology <>`__) on the project.
+Ideally you'll find two pieces of information:
+* A *license text* (e.g. a GNU General Public License v2.0 text)
+* A *license statement* that states that a certain license applies to (parts of) the project
+  (often also including copyright statements and a warranty disclaimer)
+Some licenses (e.g. BSD-style licenses) are also short enough so that both
+pieces are combined in a short comment header in a source file or a README.
+Strictly speaking, both the license text and the license statement must be
+present for a complete, unambiguous license, but see the next section about
+edge cases.
+On the other hand, there are some parts that can be ignored for our purposes:
+* Everything that is auto-generated, either by a script in the project source,
+  or by the build system previous to packaging.
+  The generator itself cannot hold copyright, although the authors of the
+  templates used for the generation or the authors of the generator can.
+* Most files belonging to the build system don't make it into the compiled code
+  and can therefore be ignored (e.g. configure scripts, Makefiles).
+  These cases sometimes can be hard to detect – if unsure, include the file in
+  your research.
+Some projects also include a COPYING.LIB containing an LGPL text, which is
+referenced nowhere in the project.
+In that case, ignore the COPYING.LIB – it probably comes from a boilerplate
+project skeleton and the maintainer forgot to delete it.
+Distillation into license identifiers
+In PTXdist, we use `SPDX license expressions <>`_.
+Either the license identifier is clear, e.g. because the README says "GPL 2.0
+or later" (check the license text to be sure), or you can use tools like
+`FOSSology <>`__,
+`licensecheck <>`_,
+or `spdx-license-match <>`_
+to match texts to SPDX license identifiers.
+License texts don't have to match exactly, you should apply the
+`SPDX Matching Guidelines <>`_
+The important part here is that the project's license and the SPDX identifier
+describe the same licensing terms.
+"Rather close" or "mostly similar" statements are not enough for a match,
+but simple unimportant changes like replacing *"The Author"* with the project's
+maintainer's name, or a change in e-mail adresses, are usually okay.
+For software that is not open-source according to the `OSI definition
+<>`_, use the identifier ``proprietary``.
+.. important::
+   If no license identifier matches, or if anything is unclear about the
+   licensing situation, use the identifier ``custom`` (for licenses)
+   or ``custom-exception`` (for license exceptions, e.g.: ``GPL-2.0-only WITH
+   custom-exception``).
+If SPDX doesn't know about a license yet, and the project is considered open
+source or free software, you can `report its license to be added to the SPDX
+license list
+Multiple licenses
+Open-source software is re-used all the time, so it can happen that some files
+make their way into a different project.
+This is usually no problem.
+If you encounter multiple parts of the project under different licenses, combine
+their license expressions with ``AND``.
+For example, in a project that contains both a library and command line tools,
+the license expression could be ``GPL-2.0-or-later AND LGPL-2.1-or-later``.
+Sometimes files are licensed under multiple licenses, and only one license is to
+be selected.
+In that case, combine the license expressions with ``OR``.
+This is often the case with Device Trees in the Linux kernel, e.g.:
+``GPL-2.0-only OR BSD-2-Clause``.
+No operator precedence is defined, use brackets ``(…)`` to group sub-statements.
+Conflicting and ambiguous statements
+Human interpretation is needed when statements inside the project conflict with
+each other.
+Some clues that can help you decide:
+  If the header in the COPYING file says *"GNU General Public License"*, but
+  the license text below that is in fact a BSD license, the correct license for
+  the license identifier is the BSD license.
+Author Intent:
+  If the README says *"this code is LGPL 2.1"*, but COPYING contains a GPL
+  boilerplate license text, the correct license identifier is probably *"LGPL 2.1"*
+  – the README written by the author prevails over the boilerplate text.
+  If README and COPYING are both clearly written by the author themselves, and
+  the README says *"don't do $thing*" and COPYING says *"do $thing*", the more
+  recent file prevails.
+  If no license statement can be found, but there is a COPYING file containing
+  a license text, infer that the whole project is licensed under that license.
+Err on the side of caution:
+  If all you can find is a GPL license text, this doesn't yet tell you whether
+  the project is licensed under the *-only* or the *-or-later* variant.
+  In that case, interpret the license restrictively and choose the *-only*
+  variant for the license identifier.
+Don't assume:
+  If anything is ambiguous or unclear, choose ``custom`` as a license identifier.
+.. note::
+   Any of these cases is considered a bug and should be reported to the upstream maintainers!
+"Public Domain" software
+For `good reasons <>`_,
+SPDX doesn't supply a license identifier for "Public Domain".
+Nevertheless, some PTXdist package rules specify ``public_domain`` as their
+respective license identifier.
+This is purely for historical reasons, and ``public_domain`` should normally
+*not* be used for new packages.
+Some of those "Public Domain" dedications in packages have since been accepted
+in SPDX, e.g. `libselinux <>`_ or
+`SQLite <>`_.
+No license information at all
+No license - no usage rights!
+Definitely report this bug to the upstream maintainer.
+Maybe even point them in the direction of `machine-readablity <>`_ :)
+Adding license files to PTXdist package rules
+The SPDX license identifier of the package goes into the ``<PKG>_LICENSE``
+variable in the respective package rule file.
+All relevant files identified in the steps above are then added to the variable ``<PKG>_LICENSE``,
+including a checksum so that PTXdist complains when they change.
+.. code-block:: make
+   :caption: ddrescue.make
+   DDRESCUE_LICENSE	:= GPL-2.0-or-later AND BSD-2-Clause
+           file://COPYING;md5=76d6e300ffd8fb9d18bd9b136a9bba13 \
+           file://;startline=1;endline=16;md5=a01d61d3293ce28b883d8ba0c497e968 \
+           file://;startline=1;endline=18;md5=41d1341d0d733a5d24b26dc3cbc1ac42
+See the section :ref:`package_specific_variables` for more information about
+the syntax of those two variables.
+The MD5 sum for a block of lines can be generated with sed's ``p`` (print)
+command applied to a range of lines.
+For the example above, lines 1 to 16 of would be::
+   $ sed -n 1,16p | md5sum -
+   a01d61d3293ce28b883d8ba0c497e968
+If the copyright statement contains a string of years, leave those lines out for
+the calculation of the checksum, as an added year does not change the license
+(in fact, not even a single year is needed for the license to be valid),
+but only makes package version updates more cumbersome.
+If additional information is in the README or license headers in source files
+are used, also include these files (for source code: one of each is enough),
+but use md5sum only on the relevant lines, so changes in the rest of the file
+do not appear as license changes.
+For rather chaotic directories with lots of license files, definitely include at
+least one relevant source file with license headers (if there are any), as some
+developers tend to accumulate license files without adjusting it to license
+changes in their source.
+.. note::
+   For each single license identifier in the license expression, include at
+   least one file with checksum in the ``<PKG>_LICENSE_FILES`` variable.
+PTXdist will include all files (or their respective lines) that were referenced
+in ``<PKG>_LICENSE_FILES`` as verbatim sources in the license report.
diff --git a/doc/dev_manual.rst b/doc/dev_manual.rst
index c232cc91428a..0a1eaf8a1413 100644
--- a/doc/dev_manual.rst
+++ b/doc/dev_manual.rst
@@ -13,6 +13,7 @@ This chapter shows all (or most) of the details of how PTXdist works.
+   dev_licenses
diff --git a/doc/ref_make_variables.rst b/doc/ref_make_variables.rst
index 674acdcea982..2ee34856dd02 100644
--- a/doc/ref_make_variables.rst
+++ b/doc/ref_make_variables.rst
@@ -127,6 +127,8 @@ Other useful variables:
   that are built and installed during the PTXdist build run.
   There are analogous ``-y`` and ``-m`` variants of those variables too.
+.. _package_specific_variables:
 Package Specific Variables
@@ -223,10 +225,19 @@ Package Definition
   'gdbserver' for an example.
-  The license of the package. The SPDX license identifiers should be used
-  here. Use ``proprietary`` for proprietary packages and ``ignore`` for
-  packages without their own license, e.g. meta packages or packages that
-  only install files from ``projectroot/``.
+  The license of the package in the form of an `SPDX license expression
+  <>`_.
+  The following values have special meaning for PTXdist:
+  - ``custom`` and ``custom-exception``: for licenses or license exceptions
+    that are considered free software, but do not match any license or license
+    exception known to SPDX.
+  - ``proprietary``: for proprietary (non-free) packages
+  - ``ignore`` for packages without their own license, e.g. meta packages or
+    packages that only install files from ``projectroot/``
+  - ``unknown``: no licensing information was extracted yet
+  See the section :ref:`licensing_in_packages` for more information.
   A space separated list of URLs of license text files. The URLs must be
@@ -238,6 +249,7 @@ Package Definition
   used in case the specified file contains more than just the license text,
   e.g. if the license is in the header of a source file. For non ASCII or
   UTF-8 files the encoding can be specified with ``encoding=<enc>``.
+  See the section :ref:`licensing_in_packages` for more information.
 For most packages the variables described above are undefined by default.
 However, for cross and host packages these variables default to the value

