diff --git a/REDESIGN.md b/REDESIGN.md new file mode 100644 index 0000000..3dbefcd --- /dev/null +++ b/REDESIGN.md @@ -0,0 +1,386 @@ +# mars-nwe NCP dispatch redesign notes + +This file collects design notes for a possible cleanup of the internal NCP +handoff path. It is intentionally separate from `TODO.md`: the TODO file should +track concrete bugs and endpoint audit follow-ups, while this file describes a +larger architecture direction that can be implemented gradually. + +The goal is not to rewrite MARS-NWE at once. The goal is to make the current +handoff behavior explicit, reduce ambiguity around magic return values, and make +future endpoint work easier to audit against the Novell/Micro Focus SDK, WebSDK, +and NDK Core Protocols PDF. + +## Current problem + +The current NCP path grew around several cooperating processes and handlers: + +- `nwconn.c` owns the connection/session side and receives most packets first. +- `nwbind.c` handles bindery, queue, some server-management, and some final + reply construction. +- Other modules such as semaphore, message, namespace, AFP, file, salvage, and + queue code implement individual protocol families or backend actions. +- Some calls are handled completely in `nwconn.c`. +- Some calls are forwarded to `nwbind.c` by returning `-1` from the `nwconn.c` + dispatcher. +- Some calls are forwarded with saved request state by returning `-2`, so that + `nwconn.c` can do post-processing after `nwbind.c` has replied. +- Some forwarded paths mutate request payloads before handoff. +- Some code paths build responses locally, while other paths rely on the target + process to build the final completion code and payload. + +This works, but it is hard to reason about while auditing endpoint layouts. The +same looking value can mean different things depending on which file it appears +in. For example, `return(-1)` in the relevant `nwconn.c` dispatcher path means +"forward this request to `nwbind`". A disabled `return(-1)` inside a `#if 0` +block in `nwbind.c` does not have that forwarding meaning and should not be +copied into active code. + +The visible symptoms are: + +- endpoint documentation must follow a handoff across files before it can say the + request or reply layout is known; +- missing endpoints are difficult to distinguish from forwarded endpoints; +- request parsing, backend behavior, reply encoding, and process routing are + often mixed in one switch block; +- byte order differences are easy to miss because parsing and reply writing are + open-coded in different places; +- disabled future stubs can look like active dispatch behavior; +- `TODO.md` can become a dumping ground for architectural observations that are + not immediate endpoint bugs. + +## Desired shape + +A cleaner long-term structure would have one small internal NCP dispatch layer: + +```text +wire packet + -> NCP envelope parser + -> NcpContext + -> endpoint lookup + -> endpoint handler / provider + -> reply encoder + -> central reply sender +``` + +This does not need to be a general-purpose message bus. A full message bus would +probably be too large and too abstract for this code base. A typed internal NCP +context plus explicit dispatch results would be enough. + +The important separation is: + +1. decode the packet envelope; +2. identify the endpoint; +3. decode the endpoint request body; +4. execute the backend operation; +5. encode the endpoint reply body; +6. send the response from one well-defined place. + +## Proposed NCP context + +Introduce, in a later functional cleanup, a small context object that represents +one NCP request while it moves through the server. The exact field names should +fit the existing code style, but the conceptual shape would be: + +```c +typedef struct { + int connection; + + uint16_t request_type; /* 0x2222, 0x3333, 0x5555, ... */ + uint8_t function; /* top-level NCP function */ + int has_subfunction; + uint8_t subfunction; /* grouped calls such as 23/x, 32/x, 87/x */ + + const uint8_t *request; + int request_len; + + uint8_t *reply; + int reply_cap; + int reply_len; + + uint8_t completion; + uint8_t connection_status; + + uint32_t flags; +} NcpContext; +``` + +The context should not replace all old globals in one patch. It can start as a +thin wrapper around the existing request and response buffers, then gradually +become the preferred handler interface. + +The useful property is that endpoint documentation can point to a stable model: + +- `function` and `subfunction` identify the endpoint; +- `request` and `request_len` are the bytes after the already-decoded envelope; +- `reply` and `reply_len` are the bytes before the common NCP response envelope; +- `completion` is set once by the handler or by central error handling. + +## Replace magic return values with named results + +The current `0`, `-1`, and `-2` convention should be made explicit before any +larger refactor. The first step can be documentation-only or macro-only: + +```c +#define NCP_LOCAL_DONE 0 +#define NCP_FORWARD_NWBIND -1 +#define NCP_FORWARD_NWBIND_POST -2 +``` + +A later cleanup can replace those with an enum: + +```c +typedef enum { + NCP_DISPATCH_DONE, + NCP_DISPATCH_FORWARD_BIND, + NCP_DISPATCH_FORWARD_BIND_POST, + NCP_DISPATCH_NOT_IMPLEMENTED, + NCP_DISPATCH_BAD_REQUEST, + NCP_DISPATCH_INTERNAL_ERROR +} NcpDispatchResult; +``` + +The important rule is that the meaning must be scoped. A named result returned +from a `nwconn.c` dispatcher may request process handoff. A return statement in +`nwbind.c` should not silently inherit that meaning unless the function is +explicitly part of the same dispatch interface. + +## Endpoint table as audit index first + +Before replacing switch statements, add an endpoint inventory table as a +non-invasive audit aid. It can be compiled only for debug builds or kept as a +source-level documentation table. + +Conceptual form: + +```c +typedef struct { + uint16_t request_type; + uint8_t function; + int has_subfunction; + uint8_t subfunction; + const char *name; + const char *provider; + uint32_t flags; +} NcpEndpointDoc; +``` + +Example entries: + +```c +{ 0x2222, 23, 1, 109, "Change Queue Job Entry old", "nwbind/queue", NCPDOC_FORWARDED }, +{ 0x2222, 32, 1, 0, "Open Semaphore old", "sema", NCPDOC_LOCAL }, +{ 0x2222, 33, 0, 0, "Negotiate Buffer Size", "nwconn", NCPDOC_LOCAL }, +``` + +This table would help with the ongoing endpoint audit: + +- SDK/PDF/WebSDK listed and implemented; +- SDK/PDF/WebSDK listed and forwarded; +- SDK/PDF/WebSDK listed but disabled as a future stub; +- SDK/PDF/WebSDK listed but absent from the current compatibility target; +- later NetWare 4.x/OES/MOAB endpoint, not part of the default NetWare 3.x + compatibility target. + +The first version should not drive runtime dispatch. It should only make review +and missing-endpoint checks less error-prone. + +## Handler structure + +For newly touched endpoint families, prefer the following logical split even if +it remains in one C function at first: + +```text +request decode + -> validation + -> backend operation + -> reply encode +``` + +For complex endpoints this could become explicit helper functions: + +```c +static int decode_foo(NcpContext *ctx, FooRequest *out); +static int exec_foo(NcpContext *ctx, const FooRequest *req, FooReply *reply); +static void encode_foo(NcpContext *ctx, const FooReply *reply); +``` + +This is especially useful for endpoint families where the audit has already +found old/new layout differences: + +- 16-bit old queue job numbers versus newer 32-bit job numbers; +- big-endian versus little-endian SDK notation; +- old short replies versus newer long replies; +- connection-side prehandling that inserts or rewrites fields; +- bindery or queue paths that build final replies in a different process. + +Small endpoints do not need three separate helper functions if that would make +the code noisier. The rule is that request bytes and reply bytes should be easy +to identify and compare with the SDK documents. + +## Make handoff explicit + +Forwarded calls should say exactly what is handed off. A good comment should +answer: + +- which bytes are forwarded; +- whether the subfunction byte is preserved or stripped; +- whether `nwconn.c` mutates the request before forwarding; +- whether `nwbind.c` or another provider builds the final reply; +- whether `nwconn.c` expects post-processing after the provider reply. + +Examples of handoff cases that need this clarity: + +- Queue calls where `nwconn.c` expands paths or inserts job file handles before + `nwbind.c` sees the request. +- Quota/bindery prehandling where the destination handler receives an already + transformed request. +- Semaphore and message groups that are grouped in the SDK but routed through + local helper modules. +- Direct lifecycle calls such as End Of Job and Logout where local cleanup and + final success reply are split across files. + +The preferred future style is not "`nwbind` must do the rest" but something like: + +```text +Forward to nwbind with the original subfunction byte and payload unchanged. +No nwconn post-processing is expected; nwbind builds the completion-only reply. +``` + +or: + +```text +Forward to nwbind after saving the original request. nwbind validates bindery +state and returns the bindery result; nwconn then performs the file-handle +post-processing in handle_after_bind(). +``` + +## Response building rule + +Every endpoint audit should identify the reply builder, not only the request +parser. A handler is not fully documented until the response path is known. + +For each endpoint family, record: + +- completion-only reply; +- fixed-size payload reply; +- variable-length payload reply; +- provider-built reply; +- `nwconn.c` post-processed reply; +- intentionally unsupported reply status. + +Long-term, response sending should become centralized enough that endpoint code +only encodes payload bytes and a completion code. This reduces off-by-one reply +length bugs and makes the logs easier to normalize. + +## Provider boundaries + +A clean design would treat the existing modules as providers instead of hidden +fallback paths: + +```text +nwconn connection/session, packet IO, top-level envelope +ncpdispatch endpoint lookup, handoff policy, common errors +nwbind bindery database and bindery-backed services +queue queue metadata and print/backend adapter +sema semaphore state +message station/message/broadcast state +namespace path, directory handle, name-space operations +file file handle and read/write/open/close operations +salvage deleted-file scan/recover/purge backend +AFP AFP metadata and AFP namespace adapter +``` + +This is a design target, not a demand to move files immediately. The important +part is that future code should avoid making `nwbind` a catch-all sink for +unrelated NCPs just because it already has an IPC path. + +## Logging connection + +The dispatch redesign also supports the desired log cleanup. If every request +has a context, logs can consistently include: + +```text +INFO NCP 23/109 DISPATCH type=0x2222 fn=0x17 sub=0x6d provider=nwbind/queue +INFO NCP 32/0 REPLY type=0x2222 fn=0x20 sub=0x00 result=0x00 len=4 +WARN NCP 23/130 LAYOUT-MISMATCH sdk="32-bit JobNumber" code="16-bit parser" +``` + +The logging cleanup should still reuse existing mars-nwe logging functions. Do +not add a second logging subsystem just to support the dispatch cleanup. + +## Migration plan + +### Phase 1: Name the existing conventions + +Low risk. No behavior change. + +- Add named constants or comments for the current `0`, `-1`, and `-2` + dispatcher results. +- Keep existing control flow unchanged. +- Update comments so `return(-1)` is never described ambiguously outside the + exact dispatcher where it is meaningful. + +### Phase 2: Add an endpoint audit table + +Low risk. Mostly documentation/debug. + +- Add a table of known endpoints by request type, function, and subfunction. +- Mark provider, generation bucket, and implementation state. +- Use it to compare SDK/PDF/WebSDK coverage against actual handlers. +- Do not switch runtime dispatch to the table yet. + +### Phase 3: Introduce a thin `NcpContext` + +Moderate risk if kept small. + +- Wrap existing request and reply buffers without changing ownership. +- Use the context only in newly audited or newly implemented handlers. +- Keep old handlers callable until they are touched for another reason. + +### Phase 4: Convert small endpoint families first + +Moderate risk, easy to test. + +Good candidates: + +- `0x2222/32` old Semaphore calls; +- direct calls such as End Of Job, Logout, and Negotiate Buffer Size; +- small message/station groups once their handoff has been audited. + +Avoid converting queue and bindery first because they have more process coupling +and more old/new layout variants. + +### Phase 5: Move runtime dispatch to tables gradually + +Higher risk. Do this only after enough endpoint families have stable audit +coverage and tests. + +- Keep switch wrappers during the transition. +- Convert one family at a time. +- Preserve exact completion codes and reply lengths. +- Add targeted smoke tests for any family whose dispatch path changes. + +## Non-goals + +This redesign should not: + +- change protocol behavior merely to match a cleaner abstraction; +- remove NetWare 1.x/2.x/3.x compatibility paths; +- enable NetWare 4.x/OES/MOAB-only endpoints by default; +- replace existing mars-nwe path, bindery, queue, AFP, trustee, or salvage + backends with parallel databases; +- add a large external message bus dependency; +- rewrite all handlers in one patch; +- turn documentation-only endpoint audit patches into functional refactors. + +## Practical rule for future patches + +For the ongoing endpoint documentation pass, keep doing the conservative thing: + +1. enumerate SDK/PDF/WebSDK/include endpoints for the family; +2. compare them with actual `case` labels and forwarded destination handlers; +3. document missing, disabled, implemented, and later-generation slots; +4. document request parser/handoff and response builder; +5. record real layout differences, but do not change behavior in the same patch. + +Functional cleanup should come later in small patches with tests.