mailarchive of the ptxdist mailing list
 help / color / mirror / Atom feed
From: Roland Hieber <rhi@pengutronix.de>
To: ptxdist@pengutronix.de
Cc: Roland Hieber <rhi@pengutronix.de>
Subject: [ptxdist] [PATCH] doc: working with licensing information in packages
Date: Fri,  6 Aug 2021 12:44:01 +0200	[thread overview]
Message-ID: <20210806104401.12401-1-rhi@pengutronix.de> (raw)
In-Reply-To: <20210806062948.GB1758096@pengutronix.de>

Co-authored-by: Felicitas Jung <f.jung@pengutronix.de>
Signed-off-by: Felicitas Jung <f.jung@pengutronix.de>
Signed-off-by: Roland Hieber <rhi@pengutronix.de>
---
PATCH v4:
 - remove dangling include to daily_work_licenses.inc (how did that ever
   work…?)

PATCH v3: https://lore.ptxdist.org/ptxdist/20210805091848.2855-1-rhi@pengutronix.de
 - rebase to current master
 - rewrite paragraph about always including the copyright statement
   lines in the checksum (feedback from Michael Olbrich)

PATCH v2: https://lore.ptxdist.org/ptxdist/20210608103639.24336-1-rhi@pengutronix.de
 - rebase to current master
 - squash PATCH 1/2 ("link to the SPDX license list")
 - move from daily use into dev manual chapter
 - expand and rewrite some parts completely
 - absorb old content in doc/dev_add_new_pkgs.rst
 - address feedback from Michael Olbrich:
   - check all source files instead of "some relevant-sounding files"
   - introduce "custom" and "custom-exception" identifiers instead of
     "unknown"
   - be restrictive and err on the side of caution when interpreting
     ambiguities
   - shortly mention the AND, OR and bracket syntaxes

PATCH v1: https://lore.ptxdist.org/ptxdist/20200511100306.7948-2-rhi@pengutronix.de
---
 doc/contributing.rst       |   4 +
 doc/dev_add_new_pkgs.rst   |  46 +------
 doc/dev_licenses.rst       | 245 +++++++++++++++++++++++++++++++++++++
 doc/dev_manual.rst         |   1 +
 doc/ref_make_variables.rst |  20 ++-
 5 files changed, 267 insertions(+), 49 deletions(-)
 create mode 100644 doc/dev_licenses.rst

diff --git a/doc/contributing.rst b/doc/contributing.rst
index e7cbd90e6cc3..e4209480893d 100644
--- a/doc/contributing.rst
+++ b/doc/contributing.rst
@@ -156,6 +156,10 @@ updated of removed after a version bump. Unknown PTXCONF_* variables or
 macros used in menu files. There are often typos or the variables was just
 removed.
 
+New packages must also have licensing information in the ``<PKG>_LICENSE``
+and ``<PKG>_LICENSE_FILES`` variables.
+Refer to the section :ref:`licensing_in_packages` for more information.
+
 Helper Scripts
 --------------
 
diff --git a/doc/dev_add_new_pkgs.rst b/doc/dev_add_new_pkgs.rst
index 3506436d78ec..6b1248563e6f 100644
--- a/doc/dev_add_new_pkgs.rst
+++ b/doc/dev_add_new_pkgs.rst
@@ -248,6 +248,7 @@ PTXdist specific. What does it mean:
 
 -  ``*_LICENSE`` enables the user to get a list of licenses she/he is
    using in her/his project (licenses of the enabled packages).
+   See :ref:`licensing_in_packages` below for detailed information.
 
 After enabling the menu entry, we can start to check the *get* and
 *extract* stages, calling them manually one after another.
@@ -603,48 +604,3 @@ to do (even if its boring and takes time):
 This will re-start with a **clean** BSP and builds exactly the new package and
 its (known) dependencies. If this builds successfully as well we are really done
 with the new package.
-
-Some Notes about Licenses
-~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The already mentioned rule variable ``*_LICENSE`` (e.g. ``FOO_LICENSE`` in our
-example) is very important and must be filled by the developer of the package.
-Many licenses bring in obligations using the corresponding package (*attribution*
-for example). To make life easier for everybody the license for a package must
-be provided. *SPDX* license identifiers unify the license names and are used
-in PTXdist to identify license types and obligations.
-
-If a package comes with more than one license, all of their SPDX identifiers
-must be listed and connected with the keyword ``AND``. If your package comes
-with GPL-2.0 and LGPL-2.1 licenses, the definition should look like this:
-
-.. code-block:: make
-
-   FOO_LICENSE := GPL-2.0 AND LGPL-2.1
-
-One specific obligation cannot be detected examining the SPDX license identifiers
-by PTXdist: *the license choice*. In this case all licenses of choice must be
-listed and connected by the keyword ``OR``.
-
-If, for example, your obligation is to select one of the licenses *GPL-2.0* **or**
-*GPL-3.0*, the ``*_LICENSE`` variable should look like this:
-
-.. code-block:: make
-
-   FOO_LICENSE := GPL-2.0 OR GPL-3.0
-
-SPDX License Identifiers
-^^^^^^^^^^^^^^^^^^^^^^^^
-
-A list of SPDX license identifiers can be found here:
-
-   https://spdx.org/licenses/
-
-Help to Detect the Correct License
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-License identification isn't trivial. A help in doing so can be the following
-repository and its content. It contains a list of known licenses based on their
-SPDX identifier. The content is without formatting to simplify text search.
-
-   https://github.com/spdx/license-list-data/tree/master/text
diff --git a/doc/dev_licenses.rst b/doc/dev_licenses.rst
new file mode 100644
index 000000000000..0bb1c8d77c5e
--- /dev/null
+++ b/doc/dev_licenses.rst
@@ -0,0 +1,245 @@
+.. _licensing_in_packages:
+
+Tracking licensing information in packages
+------------------------------------------
+
+PTXdist aims to track licensing information for every package.
+This includes the license(s) under which a package can be distributed,
+as well as the respective files in the package's source tree that state those terms.
+Sadly there is no widely adopted standard for machine-readable licensing
+information in source code (`yet <https://reuse.software>`_),
+so here are a few hints where to look.
+
+In that process, we aim to collect the baseline set of licenses
+which at least apply to a package.
+There may be other licenses which apply too, but the complete set often cannot
+be found without a time-consuming review.
+Still, the extracted license information in PTXdist can serve as a hint for
+the full license compliance process,
+and can help to exclude certain software under certain licenses from the build.
+
+There are many older package rules in PTXdist which don't specify licensing information.
+If you want to help complete the database,
+you can use ``grep -L _LICENSE_FILES rules/*.make`` (in the PTXdist tree) to find those rules.
+Note however that this cannot find wrong or incomplete licensing information.
+
+Finding licensing information
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You should first select and extract the package in question, and then have a
+look at in the extracted package sources (usually something like
+``platform-nnn/build-target/mypackage-1.0`` in your BSP, if in doubt see
+``ptxdist package-info mypackage``).
+
+* Check for files named ``COPYING``, ``COPYRIGHT``,  or ``LICENSE``.
+  These often only contain the license text and, in case of GPL, no information
+  if the code is available under the *-only* or *-or-later* variant.
+  Sometimes these files are in a folder ``/doc`` or ``/legal``.
+
+* Check the ``README``, if there is any.
+  Often there is important information there, e.g. in case of GPL if the
+  software is *GPL-x.x-or-later* or *GPL-x.x-only*.
+
+* Check source files, like ``*.c`` for license headers.
+  Often additional information can be found here.
+
+* If you want to be extra sure, use a license compliance toolchain (e.g.
+  `FOSSology <https://www.fossology.org/>`__) on the project.
+
+Ideally you'll find two pieces of information:
+
+* A *license text* (e.g. a GNU General Public License v2.0 text)
+* A *license statement* that states that a certain license applies to (parts of) the project
+  (often also including copyright statements and a warranty disclaimer)
+
+Some licenses (e.g. BSD-style licenses) are also short enough so that both
+pieces are combined in a short comment header in a source file or a README.
+Strictly speaking, both the license text and the license statement must be
+present for a complete, unambiguous license, but see the next section about
+edge cases.
+
+On the other hand, there are some parts that can be ignored for our purposes:
+
+* Everything that is auto-generated, either by a script in the project source,
+  or by the build system previous to packaging.
+  The generator itself cannot hold copyright, although the authors of the
+  templates used for the generation or the authors of the generator can.
+
+* Most files belonging to the build system don't make it into the compiled code
+  and can therefore be ignored (e.g. configure scripts, Makefiles).
+  These cases sometimes can be hard to detect – if unsure, include the file in
+  your research.
+
+Some projects also include a COPYING.LIB containing an LGPL text, which is
+referenced nowhere in the project.
+In that case, ignore the COPYING.LIB – it probably comes from a boilerplate
+project skeleton and the maintainer forgot to delete it.
+
+Distillation into license identifiers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In PTXdist, we use `SPDX license expressions <https://spdx.org/licenses/>`_.
+
+Either the license identifier is clear, e.g. because the README says "GPL 2.0
+or later" (check the license text to be sure), or you can use tools like
+`FOSSology <https://www.fossology.org>`__,
+`licensecheck <https://wiki.debian.org/CopyrightReviewTools#Command-line_tools_in_Debian>`_,
+or `spdx-license-match <https://github.com/rohieb/spdx-license-match>`_
+to match texts to SPDX license identifiers.
+
+License texts don't have to match exactly, you should apply the
+`SPDX Matching Guidelines <https://spdx.org/spdx-license-list/matching-guidelines>`_
+accordingly.
+The important part here is that the project's license and the SPDX identifier
+describe the same licensing terms.
+"Rather close" or "mostly similar" statements are not enough for a match,
+but simple unimportant changes like replacing *"The Author"* with the project's
+maintainer's name, or a change in e-mail adresses, are usually okay.
+
+For software that is not open-source according to the `OSI definition
+<https://opensource.org/osd>`_, use the identifier ``proprietary``.
+
+.. important::
+
+   If no license identifier matches, or if anything is unclear about the
+   licensing situation, use the identifier ``custom`` (for licenses)
+   or ``custom-exception`` (for license exceptions, e.g.: ``GPL-2.0-only WITH
+   custom-exception``).
+
+If SPDX doesn't know about a license yet, and the project is considered open
+source or free software, you can `report its license to be added to the SPDX
+license list
+<https://github.com/spdx/license-list-XML/blob/master/CONTRIBUTING.md#request-a-new-license-or-exception-be-added-to-the-spdx-license-list>`_.
+
+Multiple licenses
+^^^^^^^^^^^^^^^^^
+
+Open-source software is re-used all the time, so it can happen that some files
+make their way into a different project.
+This is usually no problem.
+If you encounter multiple parts of the project under different licenses, combine
+their license expressions with ``AND``.
+For example, in a project that contains both a library and command line tools,
+the license expression could be ``GPL-2.0-or-later AND LGPL-2.1-or-later``.
+
+Sometimes files are licensed under multiple licenses, and only one license is to
+be selected.
+In that case, combine the license expressions with ``OR``.
+This is often the case with Device Trees in the Linux kernel, e.g.:
+``GPL-2.0-only OR BSD-2-Clause``.
+
+No operator precedence is defined, use brackets ``(…)`` to group sub-statements.
+
+Conflicting and ambiguous statements
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Human interpretation is needed when statements inside the project conflict with
+each other.
+Some clues that can help you decide:
+
+Detailedness:
+  If the header in the COPYING file says *"GNU General Public License"*, but
+  the license text below that is in fact a BSD license, the correct license for
+  the license identifier is the BSD license.
+
+Author Intent:
+  If the README says *"this code is LGPL 2.1"*, but COPYING contains a GPL
+  boilerplate license text, the correct license identifier is probably *"LGPL 2.1"*
+  – the README written by the author prevails over the boilerplate text.
+
+Recency:
+  If README and COPYING are both clearly written by the author themselves, and
+  the README says *"don't do $thing*" and COPYING says *"do $thing*", the more
+  recent file prevails.
+
+Scope:
+  If no license statement can be found, but there is a COPYING file containing
+  a license text, infer that the whole project is licensed under that license.
+
+Err on the side of caution:
+  If all you can find is a GPL license text, this doesn't yet tell you whether
+  the project is licensed under the *-only* or the *-or-later* variant.
+  In that case, interpret the license restrictively and choose the *-only*
+  variant for the license identifier.
+
+Don't assume:
+  If anything is ambiguous or unclear, choose ``custom`` as a license identifier.
+
+.. note::
+
+   Any of these cases is considered a bug and should be reported to the upstream maintainers!
+
+"Public Domain" software
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+For `good reasons <https://wiki.spdx.org/view/Legal_Team/Decisions/Dealing_with_Public_Domain_within_SPDX_Files>`_,
+SPDX doesn't supply a license identifier for "Public Domain".
+Nevertheless, some PTXdist package rules specify ``public_domain`` as their
+respective license identifier.
+This is purely for historical reasons, and ``public_domain`` should normally
+*not* be used for new packages.
+Some of those "Public Domain" dedications in packages have since been accepted
+in SPDX, e.g. `libselinux <https://spdx.org/licenses/libselinux-1.0.html>`_ or
+`SQLite <https://spdx.org/licenses/blessing.html>`_.
+
+No license information at all
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+No license - no usage rights!
+
+Definitely report this bug to the upstream maintainer.
+Maybe even point them in the direction of `machine-readablity <https://reuse.software/>`_ :)
+
+Adding license files to PTXdist packages
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The SPDX license identifier of the package goes into the ``<PKG>_LICENSE``
+variable in the respective package rule file.
+All relevant files identified in the steps above are then added to the variable ``<PKG>_LICENSE``,
+including a checksum so that PTXdist complains when they change.
+
+Example:
+
+.. code-block:: make
+
+   DDRESCUE_LICENSE	:= GPL-2.0-or-later AND BSD-2-Clause
+   DDRESCUE_LICENSE_FILES	:= \
+           file://COPYING;md5=76d6e300ffd8fb9d18bd9b136a9bba13 \
+           file://main.cc;startline=1;endline=16;md5=a01d61d3293ce28b883d8ba0c497e968 \
+           file://arg_parser.cc;startline=1;endline=18;md5=41d1341d0d733a5d24b26dc3cbc1ac42
+
+See the section :ref:`package_specific_variables` for more information about
+the syntax of those two variables.
+
+The MD5 sum for a block of lines can be generated with sed's ``p`` (print)
+command applied to a range of lines.
+For the example above, lines 1 to 16 of main.cc would be::
+
+   $ sed -n 1,16p main.cc | md5sum -
+   a01d61d3293ce28b883d8ba0c497e968
+
+Always include the copyright statement ("Copyright YYYY (C) Some Person")
+for the calculation of the checksum, even if it means that the checksum changes
+on package updates when new years are added to the string.
+While it is not is needed for most licenses to be valid, some licenses require
+that it must not be removed (e.g. see GPLv2, section 1),
+and it is proper etiquette to give attribution to the maintainers in the
+license report document.
+
+If additional information is in the README or license headers in source files
+are used, also include these files (for source code: one of each is enough),
+but use md5sum only on the relevant lines, so changes in the rest of the file
+do not appear as license changes.
+
+For rather chaotic directories with lots of license files, definitely include at
+least one relevant source file with license headers (if there are any), as some
+developers tend to accumulate license files without adjusting it to license
+changes in their source.
+
+.. note::
+
+   For each single license identifier in the license expression, include at
+   least one file with checksum in the ``<PKG>_LICENSE_FILES`` variable.
+
+PTXdist will include all files (or their respective lines) that were referenced
+in ``<PKG>_LICENSE_FILES`` as verbatim sources in the license report.
diff --git a/doc/dev_manual.rst b/doc/dev_manual.rst
index e9a88c1a97f5..fe4307a86b80 100644
--- a/doc/dev_manual.rst
+++ b/doc/dev_manual.rst
@@ -15,6 +15,7 @@ This chapter shows all (or most) of the details of how PTXdist works.
    dev_patching
    dev_add_bin_only_files
    dev_create_new_pkg_templates
+   dev_licenses
    dev_layers_in_ptxdist
    dev_kconfig_diffs
    dev_code_signing
diff --git a/doc/ref_make_variables.rst b/doc/ref_make_variables.rst
index 674acdcea982..2ee34856dd02 100644
--- a/doc/ref_make_variables.rst
+++ b/doc/ref_make_variables.rst
@@ -127,6 +127,8 @@ Other useful variables:
   that are built and installed during the PTXdist build run.
   There are analogous ``-y`` and ``-m`` variants of those variables too.
 
+.. _package_specific_variables:
+
 Package Specific Variables
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -223,10 +225,19 @@ Package Definition
   'gdbserver' for an example.
 
 ``<PKG>_LICENSE``
-  The license of the package. The SPDX license identifiers should be used
-  here. Use ``proprietary`` for proprietary packages and ``ignore`` for
-  packages without their own license, e.g. meta packages or packages that
-  only install files from ``projectroot/``.
+  The license of the package in the form of an `SPDX license expression
+  <https://spdx.org/licenses/>`_.
+  The following values have special meaning for PTXdist:
+
+  - ``custom`` and ``custom-exception``: for licenses or license exceptions
+    that are considered free software, but do not match any license or license
+    exception known to SPDX.
+  - ``proprietary``: for proprietary (non-free) packages
+  - ``ignore`` for packages without their own license, e.g. meta packages or
+    packages that only install files from ``projectroot/``
+  - ``unknown``: no licensing information was extracted yet
+
+  See the section :ref:`licensing_in_packages` for more information.
 
 ``<PKG>_LICENSE_FILES``
   A space separated list of URLs of license text files. The URLs must be
@@ -238,6 +249,7 @@ Package Definition
   used in case the specified file contains more than just the license text,
   e.g. if the license is in the header of a source file. For non ASCII or
   UTF-8 files the encoding can be specified with ``encoding=<enc>``.
+  See the section :ref:`licensing_in_packages` for more information.
 
 For most packages the variables described above are undefined by default.
 However, for cross and host packages these variables default to the value
-- 
2.30.2


_______________________________________________
ptxdist mailing list
ptxdist@pengutronix.de
To unsubscribe, send a mail with subject "unsubscribe" to ptxdist-request@pengutronix.de

  reply	other threads:[~2021-08-06 10:45 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-11 10:03 [ptxdist] [PATCH 1/2] doc: ref_make_variables: link to the SPDX license list Roland Hieber
2020-05-11 10:03 ` [ptxdist] [PATCH 2/2] doc: working with licensing information in packages Roland Hieber
2020-05-26 10:29   ` Roland Hieber
2020-05-26 11:12   ` Alexander Dahl
2020-05-29  6:23   ` Michael Olbrich
2020-05-29  8:27     ` Roland Hieber
2020-05-29  8:55       ` Michael Olbrich
2020-05-29  9:40         ` Roland Hieber
2020-05-29 12:03           ` Michael Olbrich
2020-05-31 19:56             ` Roland Hieber
2020-06-02 13:16               ` Michael Olbrich
2020-06-02 15:14                 ` Roland Hieber
2021-06-08 10:36 ` [ptxdist] [PATCH] " Roland Hieber
2021-06-16 14:19   ` Michael Olbrich
2021-06-16 14:40     ` Roland Hieber
2021-08-05  9:18     ` [ptxdist] [PATCH v3] " Roland Hieber
2021-08-06  6:29       ` Michael Olbrich
2021-08-06 10:44         ` Roland Hieber [this message]
2021-10-07 10:18           ` [ptxdist] [APPLIED] " Michael Olbrich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210806104401.12401-1-rhi@pengutronix.de \
    --to=rhi@pengutronix.de \
    --cc=ptxdist@pengutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox