Files
mars-nwe/doc/LOG_LEVEL_AUDIT.md
OpenAI c6e6b86040
All checks were successful
Source release / source-package (push) Successful in 1m21s
0449 docs: use cumulative logging level masks
2026-06-13 07:31:11 +02:00

247 lines
12 KiB
Markdown

# MARS log-level audit
This audit records the legacy debug-level scheme before the namespace work starts.
It is based on the current source tree and the developer comments in
`opt/nw.ini.cmake`, `opt/nw.ini.hook.cmake`, `include/tools.h`, `include/debmask.h`
and `src/tools.c`.
## Current mechanics
Legacy logging is driven by `nw_debug` and `XDPRINTF((level, mode, ...))`.
`xdprintf()` prints a message when `nw_debug >= level`; therefore the configured
number is a threshold, not a named severity. The shipped configuration comments
for entries `100` through `106` say:
- `0` means no debug messages;
- `1` means errors and notes are reported;
- `99` means maximum debug levels.
The module-specific INI entries are:
| INI entry | Module constant | Process / area | Default in `opt/nw.ini.cmake` and hook file |
| --- | --- | --- | --- |
| `100` | local IPX helper | IPX kernel setup/debug | `0` |
| `101` | `NWSERV` | main server / watchdog / SAP startup | `1` |
| `102` | `NCPSERV` | NCP service helper | `1` |
| `103` | `NWCONN` | connection and file/namespace NCP path | `1` |
| `104` | `NWCLIENT` | startup client helper | `0`, comment says keep it zero |
| `105` | `NWBIND` | bindery/queue/server-management helper | `1` |
| `106` | `NWROUTED` | routing helper | `1` |
`get_debug_level()` also accepts an optional second token for `debug_mask`.
That mask gates `MDEBUG(mask, ...)` blocks from `include/debmask.h`:
| Mask | Meaning in source |
| --- | --- |
| `D_FH_OPEN` / `0x0001` | file open/close |
| `D_FH_LOCK` / `0x0002` | file lock/unlock |
| `D_FH_FLUSH` / `0x0004` | file flushes |
| `D_FN_NAMES` / `0x0008` | filename/name conversion |
| `D_FN_SEARCH` / `0x0010` | file search |
| `D_ACCESS` / `0x0020` | access rights |
| `D_TRUSTEES` / `0x0040` | trustees |
| `D_BIND_REQ` / `0x8000` | bindery request tracing; currently only appears in disabled code |
`xdprintf()` mode bits are independent from the level threshold:
| Mode bit | Meaning |
| --- | --- |
| `0x01` | no beginning-of-line prefix |
| `0x02` | no trailing newline; used for multi-part lines |
| `0x10` | append saved `errno` text |
## Observed level use
The active tree contains these `XDPRINTF` / direct `xdprintf` levels:
| Level | Count | Observed meaning |
| --- | ---: | --- |
| `0` | 17 | Always-on when a module has any debug enabled; packet hex fragments, unknown-NCP notes and very early diagnostics. |
| `1` | 182 | Default operational notes, warnings, login/bindery/volume/file events; matches the INI comment for “errors and notes”. |
| `2` | 219 | Common protocol/backend tracing; many `nwconn`, `nwdbm`, namespace and volume details. |
| `3` | 103 | Focused tracing, especially directory-handle mapping, semaphores and queue/share internals. |
| `4` | 18 | Moderate failure/handle/path diagnostics; often useful when debugging path resolution without full verbosity. |
| `5` | 144 | Verbose file, namespace, search and metadata path tracing. |
| `6` | 10 | Extra-deep traces; includes sensitive examples such as password-change strings and burst/direct-response details. |
| `7` | 4 | Rare deep namespace/search handle traces. |
| `8` | 2 | Rare burst/search traces. |
| `10` | 14 | Micro-level name matching, opendir and loop internals. |
| `20` | 1 | One NCPSERV loop trace. |
| `99` | 10 | Ad-hoc maximum/debug-only traces, mostly DOS alias/inode mapping plus some polling/raw-data/password traces. |
The main source distribution is:
| File | Count | Levels seen |
| --- | ---: | --- |
| `src/nwconn.c` | 196 | `0`, `1`, `2`, `3`, `5`, `6`, `8`, `99` |
| `src/nwdbm.c` | 74 | `0`, `1`, `2`, `3`, `5` |
| `src/namspace.c` | 73 | `1`, `2`, `3`, `4`, `5`, `7`, `8`, `10`, `99` |
| `src/connect.c` | 72 | `1`, `2`, `3`, `4`, `5`, `6`, `7`, `10`, `99` |
| `src/nwbind.c` | 58 | `1`, `2`, `3`, `5`, `6`, `10`, `99` |
| `src/nwserv.c` | 45 | `0`, `1`, `2`, `3`, `99` |
| `src/nwvolume.c` | 25 | `1`, `2`, `3`, `5`, `99` |
| `src/nwclient.c` | 23 | `1` |
| `src/ncpserv.c` | 18 | `1`, `2`, `5`, `6`, `10`, `20` |
| `src/nwroute.c` | 16 | `1`, `2`, `3`, `4`, `10` |
## Interpretation
The original scheme did not consistently enforce semantic severities from `0` to
`99`. It mostly follows a threshold convention, and the old developer did keep
the practical center of gravity in the low levels: the active `XDPRINTF`/direct
`xdprintf` calls are overwhelmingly in `1..5` (about nine tenths of observed call
sites). The higher values are real, but they are exceptions for deep tracing,
raw dumps, loop internals, alias/inode diagnostics and a few sensitive historical
messages, not a separate well-defined `6..99` severity ladder.
- `1` is the intended default production debug level for errors and notes.
- `2` and `3` are common troubleshooting detail.
- `4` and `5` are still normal debugging, but much more verbose.
- `6` and above are not suitable for routine logs. They can contain raw packet,
name-search, password, alias/inode or tight-loop details.
- `6..99` should be treated as one unsafe maintainer-detail class for new
facade work, not as a semantic severity ladder.
- `99` means “print only at maximum debug” and is not a stable severity.
There are outliers. Some level `0` messages are effectively “if any debug is on”
rather than true emergency severity. Some level `1` messages are ordinary
trace/noise, and some higher-level messages can expose sensitive fields. That
means a future log cleanup should not blindly translate legacy numbers to modern
severity names without reviewing each call site.
## Recommended MARS policy going forward
New code should use only the semantic levels `1` through `5` unless temporarily
auditing a specific legacy path. The purpose is to fix the old uneven
allocation where level `1` mixed errors, warnings and notes while `2..5` were mostly
various debug depths:
| New level | Facade name | Use for new code |
| --- | --- | --- |
| `1` | `error` | failed operation, protocol error, fatal setup or data loss risk |
| `2` | `warn` | suspicious but recoverable condition or degraded compatibility path |
| `3` | `info` | normal operational state change that an administrator may care about |
| `4` | `debug` | local diagnostic detail for a subsystem or endpoint |
| `5` | `trace` | packet/message/handoff path following across process or provider boundaries |
Do not add new raw level `0`, `6`, `7`, `8`, `10`, `20` or `99` messages.
For new code, unsafe deep diagnostics must use `nwlog_detail()`, which is a
maintainer-build-only facade call and has no INI switch. In normal builds
`nwlog_detail()` must compile to a no-op or return `0`; only builds compiled
with `MAINTAINER_BUILD` may emit it. In maintainer builds it is not gated by
the ordinary runtime `1..5` threshold: `level=error` plus `MAINTAINER_BUILD` is
allowed to emit detail if a detail call site is reached, even though developers
will normally set `level=trace` while debugging to see the surrounding path.
When old call sites are touched for namespace work, review the message content
instead of translating by the old number alone:
- failed operations become `nwlog_error()` / level `1`;
- recoverable oddities, fallbacks and compatibility degradation become
`nwlog_warn()` / level `2`;
- useful operational state changes become `nwlog_info()` / level `3`;
- ordinary local endpoint/subsystem diagnostics become `nwlog_debug()` / level `4`;
- safe packet/message/handoff path-following traces become `nwlog_trace()` / level `5`;
- legacy `6..99` sensitive/raw/deep traces become `nwlog_detail()` if still
useful;
- obsolete noisy traces should be removed instead of preserved at a lower
number;
- use a debug mask for noisy categories that developers may want independently
of the numeric threshold.
The long-term `nwlog`/zlog bridge should preserve this compatibility initially:
- INI entries `100` through `106` remain accepted as legacy thresholds;
- the optional mask token remains accepted for `MDEBUG` categories;
- zlog severities should be a presentation layer, not a silent semantic rewrite
of old `XDPRINTF` call sites;
- new libnwfs/namespace code should use a small wrapper that can later map
`1..5` to presentation severities without spreading `XDPRINTF` further;
- the wrapper should treat legacy numbers as verbosity thresholds first and only
then map them to backend severities/categories.
A practical first mapping for the future facade is:
| New facade call | Numeric level | Backend presentation hint |
| --- | ---: | --- |
| `nwlog_error()` | `1` | error |
| `nwlog_warn()` | `2` | warning |
| `nwlog_info()` | `3` | info |
| `nwlog_debug()` | `4` | local debug |
| `nwlog_trace()` | `5` | packet/message/handoff trace |
| `nwlog_detail()` | `6` internally | maintainer-only unsafe detail; emitted only in `MAINTAINER_BUILD`, independent of the normal `1..5` runtime threshold |
The future config parser should accept the cumulative numeric mask and the
semantic word as the same effective threshold value. The mask is written as
the concatenation of enabled normal levels, so `1234` means
`error+warn+info+debug`:
| Accepted config values | Effective threshold | Meaning |
| --- | ---: | --- |
| `0`, `off`, `none` | `0` | no normal log output |
| `1`, `error` | `1` | errors only |
| `12`, `warn`, `warning` | `2` | errors and warnings |
| `123`, `info` | `3` | errors, warnings and informational events |
| `1234`, `debug` | `4` | plus local debug diagnostics |
| `12345`, `trace` | `5` | plus packet/message/handoff path tracing |
A process configured as `trace` or `12345` therefore also emits `error`,
`warn`, `info` and `debug`. Historical single threshold values such as `2`,
`3`, `4` or `5` may be accepted only for legacy `100..106` compatibility, but
new examples and tools should write the cumulative masks. `detail` is not an accepted administrator config value. Legacy
entries `100..106` continue to provide per-process legacy thresholds during
migration, while the normalized form should use `[logging] level` plus
`[logging.process.<name>] level` overrides and the cumulative masks above.
Legacy thresholds remain accepted during migration, but old numbers must be
reviewed by message content before conversion. In particular, legacy
`XDPRINTF(1, ...)` may become error, warning or info, and legacy `XDPRINTF(2..5,
...)` may become debug or trace depending on whether they describe local
diagnostics or cross-boundary flow. The facade should provide semantic wrappers
for new code, keep the old threshold behaviour available during migration, and
review sensitive call sites before they are converted.
## Maintainer-detail rule for legacy high levels
The future facade should intentionally collapse legacy high debug values into one
explicitly unsafe API:
```c
nwlog_detail(...)
```
`nwlog_detail()` is the migration target for old `XDPRINTF`/`xdprintf` call sites
using levels `6` through `99` when the message is still useful. The individual
legacy values such as `6`, `7`, `8`, `10`, `20` and `99` were ad-hoc thresholds;
they should not survive as separate public facade levels.
The call is allowed to log sensitive or raw maintainer diagnostics, including
cleartext password-change paths, authentication material, decoded request bodies,
raw packet payloads, internal buffers, inode/alias maps and tight-loop state.
Because of that, it must not be controlled by an INI option. A production build
must not be able to enable it accidentally. The compile-time `MAINTAINER_BUILD`
choice is the only gate; once that build is running, `nwlog_detail()` is allowed
to emit independently from the normal runtime threshold:
```c
int nwlog_detail(const char *fmt, ...)
{
#ifdef MAINTAINER_BUILD
/* emit as maintainer-detail, conceptually legacy level 6;
* do not test the normal 1..5 runtime threshold here.
*/
#else
return 0;
#endif
}
```
Normal logging policy remains semantic `1..5`: error, warn, info, debug and
trace. `nwlog_detail()` is not a sixth administrator verbosity level and not a
normal threshold child of `trace`; it is a compile-time maintainer diagnostic
escape hatch for old high-level traces that should otherwise disappear from
normal builds. Running a maintainer binary with an ordinary level such as
`error` plus detail output is allowed, although it is usually less useful than
using `trace` so the detail line has context.