Files
mars-nwe/doc/LOG_LEVEL_AUDIT.md

12 KiB

MARS log-level audit

This audit records the legacy debug-level scheme before the namespace work starts. It is based on the current source tree and the developer comments in opt/nw.ini.cmake, opt/nw.ini.hook.cmake, include/tools.h, include/debmask.h and src/tools.c.

Current mechanics

Legacy logging is driven by nw_debug and XDPRINTF((level, mode, ...)). xdprintf() prints a message when nw_debug >= level; therefore the configured number is a threshold, not a named severity. The shipped configuration comments for entries 100 through 106 say:

  • 0 means no debug messages;
  • 1 means errors and notes are reported;
  • 99 means maximum debug levels.

The module-specific INI entries are:

INI entry Module constant Process / area Default in opt/nw.ini.cmake and hook file
100 local IPX helper IPX kernel setup/debug 0
101 NWSERV main server / watchdog / SAP startup 1
102 NCPSERV NCP service helper 1
103 NWCONN connection and file/namespace NCP path 1
104 NWCLIENT startup client helper 0, comment says keep it zero
105 NWBIND bindery/queue/server-management helper 1
106 NWROUTED routing helper 1

get_debug_level() also accepts an optional second token for debug_mask. That mask gates MDEBUG(mask, ...) blocks from include/debmask.h:

Mask Meaning in source
D_FH_OPEN / 0x0001 file open/close
D_FH_LOCK / 0x0002 file lock/unlock
D_FH_FLUSH / 0x0004 file flushes
D_FN_NAMES / 0x0008 filename/name conversion
D_FN_SEARCH / 0x0010 file search
D_ACCESS / 0x0020 access rights
D_TRUSTEES / 0x0040 trustees
D_BIND_REQ / 0x8000 bindery request tracing; currently only appears in disabled code

xdprintf() mode bits are independent from the level threshold:

Mode bit Meaning
0x01 no beginning-of-line prefix
0x02 no trailing newline; used for multi-part lines
0x10 append saved errno text

Observed level use

The active tree contains these XDPRINTF / direct xdprintf levels:

Level Count Observed meaning
0 17 Always-on when a module has any debug enabled; packet hex fragments, unknown-NCP notes and very early diagnostics.
1 182 Default operational notes, warnings, login/bindery/volume/file events; matches the INI comment for “errors and notes”.
2 219 Common protocol/backend tracing; many nwconn, nwdbm, namespace and volume details.
3 103 Focused tracing, especially directory-handle mapping, semaphores and queue/share internals.
4 18 Moderate failure/handle/path diagnostics; often useful when debugging path resolution without full verbosity.
5 144 Verbose file, namespace, search and metadata path tracing.
6 10 Extra-deep traces; includes sensitive examples such as password-change strings and burst/direct-response details.
7 4 Rare deep namespace/search handle traces.
8 2 Rare burst/search traces.
10 14 Micro-level name matching, opendir and loop internals.
20 1 One NCPSERV loop trace.
99 10 Ad-hoc maximum/debug-only traces, mostly DOS alias/inode mapping plus some polling/raw-data/password traces.

The main source distribution is:

File Count Levels seen
src/nwconn.c 196 0, 1, 2, 3, 5, 6, 8, 99
src/nwdbm.c 74 0, 1, 2, 3, 5
src/namspace.c 73 1, 2, 3, 4, 5, 7, 8, 10, 99
src/connect.c 72 1, 2, 3, 4, 5, 6, 7, 10, 99
src/nwbind.c 58 1, 2, 3, 5, 6, 10, 99
src/nwserv.c 45 0, 1, 2, 3, 99
src/nwvolume.c 25 1, 2, 3, 5, 99
src/nwclient.c 23 1
src/ncpserv.c 18 1, 2, 5, 6, 10, 20
src/nwroute.c 16 1, 2, 3, 4, 10

Interpretation

The original scheme did not consistently enforce semantic severities from 0 to 99. It mostly follows a threshold convention, and the old developer did keep the practical center of gravity in the low levels: the active XDPRINTF/direct xdprintf calls are overwhelmingly in 1..5 (about nine tenths of observed call sites). The higher values are real, but they are exceptions for deep tracing, raw dumps, loop internals, alias/inode diagnostics and a few sensitive historical messages, not a separate well-defined 6..99 severity ladder.

  • 1 is the intended default production debug level for errors and notes.
  • 2 and 3 are common troubleshooting detail.
  • 4 and 5 are still normal debugging, but much more verbose.
  • 6 and above are not suitable for routine logs. They can contain raw packet, name-search, password, alias/inode or tight-loop details.
  • 6..99 should be treated as one unsafe maintainer-detail class for new facade work, not as a semantic severity ladder.
  • 99 means “print only at maximum debug” and is not a stable severity.

There are outliers. Some level 0 messages are effectively “if any debug is on” rather than true emergency severity. Some level 1 messages are ordinary trace/noise, and some higher-level messages can expose sensitive fields. That means a future log cleanup should not blindly translate legacy numbers to modern severity names without reviewing each call site.

New code should use only the semantic levels 1 through 5 unless temporarily auditing a specific legacy path. The purpose is to fix the old uneven allocation where level 1 mixed errors, warnings and notes while 2..5 were mostly various debug depths:

New level Facade name Use for new code
1 error failed operation, protocol error, fatal setup or data loss risk
2 warn suspicious but recoverable condition or degraded compatibility path
3 info normal operational state change that an administrator may care about
4 debug local diagnostic detail for a subsystem or endpoint
5 trace packet/message/handoff path following across process or provider boundaries

Do not add new raw level 0, 6, 7, 8, 10, 20 or 99 messages. For new code, unsafe deep diagnostics must use nwlog_detail(), which is a maintainer-build-only facade call and has no INI switch. In normal builds nwlog_detail() must compile to a no-op or return 0; only builds compiled with MAINTAINER_BUILD may emit it. In maintainer builds it is not gated by the ordinary runtime 1..5 threshold: level=error plus MAINTAINER_BUILD is allowed to emit detail if a detail call site is reached, even though developers will normally set level=trace while debugging to see the surrounding path.

When old call sites are touched for namespace work, review the message content instead of translating by the old number alone:

  • failed operations become nwlog_error() / level 1;
  • recoverable oddities, fallbacks and compatibility degradation become nwlog_warn() / level 2;
  • useful operational state changes become nwlog_info() / level 3;
  • ordinary local endpoint/subsystem diagnostics become nwlog_debug() / level 4;
  • safe packet/message/handoff path-following traces become nwlog_trace() / level 5;
  • legacy 6..99 sensitive/raw/deep traces become nwlog_detail() if still useful;
  • obsolete noisy traces should be removed instead of preserved at a lower number;
  • use a debug mask for noisy categories that developers may want independently of the numeric threshold.

The long-term nwlog facade should preserve this compatibility initially:

  • INI entries 100 through 106 remain accepted as legacy thresholds;
  • the optional mask token remains accepted for MDEBUG categories;
  • backend presentation severities are a formatting/routing layer, not a silent semantic rewrite of old XDPRINTF call sites;
  • new libnwfs/namespace code should use a small wrapper that can later map 1..5 to presentation severities without spreading XDPRINTF further;
  • the wrapper should treat legacy numbers as verbosity thresholds first and only then map them to backend severities/categories.

zlog is not the advanced-backend target anymore: it is Apache-2.0 and must not be vendored into this GPL-2.0-only core. Future routing should use small project-owned backends behind nwlog such as simple stderr/file, optional journald, syslog, GELF and JSON-lines output.

A practical first mapping for the future facade is:

New facade call Numeric level Backend presentation hint
nwlog_error() 1 error
nwlog_warn() 2 warning
nwlog_info() 3 info
nwlog_debug() 4 local debug
nwlog_trace() 5 packet/message/handoff trace
nwlog_detail() 6 internally maintainer-only unsafe detail; emitted only in MAINTAINER_BUILD, independent of the normal 1..5 runtime threshold

The future config parser should accept the cumulative numeric mask and the semantic word as the same effective threshold value. The mask is written as the concatenation of enabled normal levels, so 1234 means error+warn+info+debug:

Accepted config values Effective threshold Meaning
0, off, none 0 no normal log output
1, error 1 errors only
12, warn, warning 2 errors and warnings
123, info 3 errors, warnings and informational events
1234, debug 4 plus local debug diagnostics
12345, trace 5 plus packet/message/handoff path tracing

A process configured as trace or 12345 therefore also emits error, warn, info and debug. Historical single threshold values such as 2, 3, 4 or 5 may be accepted only for legacy 100..106 compatibility, but new examples and tools should write the cumulative masks. detail is not an accepted administrator config value. Legacy entries 100..106 continue to provide per-process legacy thresholds during migration, while the normalized form should use [logging] level plus [logging.process.<name>] level overrides and the cumulative masks above.

Legacy thresholds remain accepted during migration, but old numbers must be reviewed by message content before conversion. In particular, legacy XDPRINTF(1, ...) may become error, warning or info, and legacy XDPRINTF(2..5, ...) may become debug or trace depending on whether they describe local diagnostics or cross-boundary flow. The facade should provide semantic wrappers for new code, keep the old threshold behaviour available during migration, and review sensitive call sites before they are converted.

Maintainer-detail rule for legacy high levels

The future facade should intentionally collapse legacy high debug values into one explicitly unsafe API:

nwlog_detail(...)

nwlog_detail() is the migration target for old XDPRINTF/xdprintf call sites using levels 6 through 99 when the message is still useful. The individual legacy values such as 6, 7, 8, 10, 20 and 99 were ad-hoc thresholds; they should not survive as separate public facade levels.

The call is allowed to log sensitive or raw maintainer diagnostics, including cleartext password-change paths, authentication material, decoded request bodies, raw packet payloads, internal buffers, inode/alias maps and tight-loop state. Because of that, it must not be controlled by an INI option. A production build must not be able to enable it accidentally. The compile-time MAINTAINER_BUILD choice is the only gate; once that build is running, nwlog_detail() is allowed to emit independently from the normal runtime threshold:

int nwlog_detail(const char *fmt, ...)
{
#ifdef MAINTAINER_BUILD
        /* emit as maintainer-detail, conceptually legacy level 6;
         * do not test the normal 1..5 runtime threshold here.
         */
#else
        return 0;
#endif
}

Normal logging policy remains semantic 1..5: error, warn, info, debug and trace. nwlog_detail() is not a sixth administrator verbosity level and not a normal threshold child of trace; it is a compile-time maintainer diagnostic escape hatch for old high-level traces that should otherwise disappear from normal builds. Running a maintainer binary with an ordinary level such as error plus detail output is allowed, although it is usually less useful than using trace so the detail line has context.