From 2006dba9422cfd9043e2fece611b92ae04929375 Mon Sep 17 00:00:00 2001 From: Mario Fetka Date: Tue, 2 Jun 2026 08:25:07 +0000 Subject: [PATCH] docs: document nwlog and zlog logging direction --- REDESIGN.md | 136 +++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 134 insertions(+), 2 deletions(-) diff --git a/REDESIGN.md b/REDESIGN.md index bb32afa..7ef471e 100644 --- a/REDESIGN.md +++ b/REDESIGN.md @@ -1347,6 +1347,137 @@ This keeps TCP/IP support compatible with the broader redesign: transport IO is separated from NCP semantics, but the existing `nwserv`/`nwconn` process model remains intact. +## Logging subsystem and optional zlog backend + +The dispatch, provider, directory, and transport redesigns all need better +logging than scattered ad-hoc debug messages. The goal is not only prettier +logs. The important requirements are: + +- consistent severity levels; +- consistent categories across processes and providers; +- request correlation from `nwserv`/`nwconn` through provider handoff and back; +- safe redaction of secrets before any backend sees the message; +- configurable routing to local files, syslog, or later remote collectors; +- auditable security events such as password recovery, TLS failures, rejected + provider IPC, and directory/bootstrap changes. + +The mars-nwe source should not call a third-party logging library directly from +random endpoint handlers. It should grow a small internal facade first: + +```c +typedef enum { + NWLOG_CORE, + NWLOG_CONFIG, + NWLOG_TRANSPORT, + NWLOG_NCP, + NWLOG_HANDOFF, + NWLOG_BINDERY, + NWLOG_QUEUE, + NWLOG_DIRECTORY, + NWLOG_NDS, + NWLOG_LDAP, + NWLOG_AUTH, + NWLOG_ACL, + NWLOG_RECOVERY, + NWLOG_SECURITY +} NwLogCategory; +``` + +Conceptual call sites should look like: + +```c +nwlog_info(NWLOG_HANDOFF, ctx, + "provider=%s request_id=%u selector=%s handoff=start", + provider_name, request_id, selector_path); + +nwlog_warn(NWLOG_RECOVERY, ctx, + "admin password recovery requested dn=%s uid=%lu", + redacted_dn, (unsigned long)uid); +``` + +That facade can initially keep using the existing mars-nwe logging functions, +`stderr`, or `syslog`. Later it may use an advanced backend. + +`zlog` is a good candidate for that advanced backend because it is a C logging +library with category, format, and rule based configuration. That model fits +mars-nwe well: code can emit category-specific events such as `ncp`, `handoff`, +`queue`, `directory`, `auth`, or `transport`, while the administrator decides in +the logging configuration whether those categories go to a file, stdout/stderr, +syslog-style output, a pipe, or an external log-forwarder path. The zlog +project documentation describes these three core concepts as categories, formats, +and rules, where rules bind a category/level to an output and format. Before +choosing it, packaging, license compatibility, portability, and maintenance state +still need to be verified for the target distributions. + +The preferred dependency shape is therefore: + +```text +mars-nwe code + -> nwlog facade + -> simple built-in backend: stderr/file/syslog + -> optional advanced backend: zlog + -> admin-configured zlog rules/formats/outputs +``` + +Do not make endpoint code depend on `zlog_category_t` or zlog macros directly. +Keeping `nwlog` in the middle gives mars-nwe one place to: + +- inject correlation fields such as `connection_id`, `request_id`, `sequence`, + `task_id`, provider name, and NCP selector path; +- redact or suppress sensitive fields before formatting; +- enforce no-secret logging rules even when logs are routed to remote systems; +- keep a fallback backend for minimal builds or platforms without zlog; +- change or add backends later without touching protocol handlers. + +Remote logging is useful, but it must be treated as a security boundary. A GELF +or Graylog-style collector, syslog relay, pipe, or any other remote forwarding +path must receive structured, redacted events only. It must never receive raw +NCP request bodies, decoded handoff payloads, passwords, one-shot recovery +tokens, private keys, or raw directory authentication material. + +A future documented INI could expose the logging policy without forcing admins +to edit C-style backend internals directly: + +```ini +[logging] +backend = zlog ; builtin, syslog, file, zlog +level = info +redact_secrets = yes +config = /etc/mars-nwe/zlog.conf + +[logging.category] +ncp = info +handoff = info +auth = warn +recovery = warn +directory = info +transport = info + +[logging.debug] +packet_hexdump = no +handoff_hexdump = no +unsafe_raw_payloads = no +``` + +Raw packet or handoff hexdumps should be opt-in developer diagnostics, not normal +admin logging. Even then, auth/password fields should be redacted where the +layout is known. The safe default is length-only logging for sensitive payloads. + +Important audit events should be logged even at normal levels: + +- provider IPC connection accepted/rejected; +- provider IPC TLS/mTLS validation failure; +- directory store initialization and schema migration; +- `nwsetup` password bootstrap or recovery actions; +- bindery-to-directory migration actions; +- failed authentication attempts with redacted identities; +- NCP handoff timeout, dead provider, or mismatched reply correlation ID. + +The logging cleanup should be a separate functional change from endpoint layout +patches. Documentation-only endpoint audit patches may add log design notes, but +they should not introduce new logging dependencies or change runtime logging +behavior. + ## Logging connection The dispatch redesign also supports the desired log cleanup. If every request @@ -1358,8 +1489,9 @@ INFO NCP 32/0 REPLY type=0x2222 fn=0x20 sub=0x00 result=0x00 len=4 WARN NCP 23/130 LAYOUT-MISMATCH sdk="32-bit JobNumber" code="16-bit parser" ``` -The logging cleanup should still reuse existing mars-nwe logging functions. Do -not add a second logging subsystem just to support the dispatch cleanup. +Until the `nwlog` facade exists, endpoint-dispatch cleanup should still reuse +existing mars-nwe logging functions. Do not add direct zlog calls or a parallel +logging path just to support one endpoint family. ## Migration plan