docs: document normalized NCP handoff replies

This commit is contained in:
Mario Fetka
2026-06-02 07:19:36 +00:00
parent 13622de301
commit 99df739c99

View File

@@ -304,6 +304,232 @@ Long-term, response sending should become centralized enough that endpoint code
only encodes payload bytes and a completion code. This reduces off-by-one reply
length bugs and makes the logs easier to normalize.
## Normalized inter-process handoff replies
The process handoff path should be normalized before adding more provider
processes. The current `nwconn` to `nwbind` forwarding path relies on magic
return values and implicit shared-buffer conventions. That is workable for the
historic two-process case, but it will not scale cleanly to future providers such
as `nwqueue`, `nwnds`, or a directory service.
The long-term rule should be:
```text
Every provider process returns exactly one internal handoff reply for every
internal handoff request it accepts.
```
That internal reply is not the same thing as a client-visible NCP reply. A
provider may explicitly say that no client reply should be sent, but it should
still send a formal internal result back to the caller. This avoids silent
success/failure paths and makes timeout/error handling deterministic.
Conceptual reply kinds:
```c
typedef enum {
NW_HR_REPLY, /* provider produced a client NCP reply payload */
NW_HR_NO_REPLY, /* provider handled it; nwconn must not send a client reply */
NW_HR_DEFERRED, /* accepted; final reply/event will be produced later */
NW_HR_FORWARD, /* provider requests forwarding to another provider */
NW_HR_ERROR /* internal provider or handoff failure */
} NwHandoffReplyKind;
```
The normal successful completion-only case is still a reply:
```text
kind = NW_HR_REPLY
completion = 0x00
reply_len = 0
```
The true "do not answer the client" case is explicit:
```text
kind = NW_HR_NO_REPLY
reply_len = 0
```
Do not encode this as text in a payload such as `"no reply"`. It should be a
machine-readable reply kind so that `nwconn` can make one central decision.
A conceptual internal handoff reply header could look like this:
```c
typedef struct {
uint16_t version;
uint16_t kind;
uint32_t request_id;
uint32_t connection_id;
uint32_t sequence;
uint32_t task_id;
uint8_t completion;
uint8_t connection_status;
uint32_t flags;
uint32_t reply_len;
} NwHandoffReply;
```
The matching request should carry the same correlation fields plus the NCP
selector path and payload length. The exact structure can follow existing
mars-nwe style, but the contract should be stable:
```text
nwconn -> provider:
request_id, connection_id, sequence, task, selector path, request payload
provider -> nwconn:
same request_id, reply kind, completion/status, reply payload length, payload
```
This gives future provider processes a uniform contract:
```text
nwconn -> nwbind
nwconn -> nwqueue
nwconn -> nwnds
nwconn -> nwdirectory
```
all use the same shape. Provider-specific behavior belongs in the payload and
provider API, not in special process-specific return conventions.
### Reply ownership
The preferred long-term ownership rule is:
```text
Provider builds the logical reply payload and completion/status.
nwconn owns the final client NCP response envelope and sends it to the client.
```
This means providers should not directly send client packets. They return an
internal result that says what should happen. `nwconn` then applies the original
sequence, connection number, task, transport, and NCP envelope rules in one
place.
This avoids several classes of bugs:
- duplicate replies;
- wrong sequence or task in replies;
- inconsistent completion-only reply lengths;
- provider-specific send/error paths;
- unclear post-processing after `nwbind` replies;
- future provider processes needing to know transport details.
Legacy paths may continue to have provider-built replies during migration, but
that should be marked as a legacy compatibility mode, not the design target.
### Post-processing without `return(-2)`
The current `return(-2)` convention means roughly "forward to bindery, save the
original request, then let `nwconn` do more work after the provider reply". In a
normalized handoff this should become explicit state, not a magic result value.
Possible flags:
```c
#define NW_HF_SAVE_ORIGINAL_REQUEST 0x00000001
#define NW_HF_POSTPROCESS_REPLY 0x00000002
#define NW_HF_LEGACY_PROVIDER_REPLY 0x00000004
```
The request context can also name the post-processing hook, or store a small
post-processing enum, so the provider does not need to know why the caller will
continue after the reply. This keeps the handoff transport generic.
### Error mapping and dead-provider behavior
The handoff layer should define what happens when a provider fails before a
protocol handler can return a normal NetWare completion code. Otherwise every
new provider will invent a slightly different failure path.
The normalized rules should cover:
- provider process not running;
- provider closes the IPC channel;
- malformed internal reply;
- mismatched `request_id`;
- provider timeout;
- reply payload longer than negotiated capacity;
- provider returned `NW_HR_ERROR` with an internal error code;
- provider returned `NW_HR_FORWARD` to an unsupported target.
For client-visible requests, `nwconn` should map those failures through one
central function to either an NCP completion code, a connection-level failure, or
an intentional disconnect. Endpoint handlers should not open-code provider IPC
failures as arbitrary completion bytes.
### Correlation and replay safety
Every internal handoff request should carry a monotonically useful correlation
identifier, even if the first implementation is only local to one `nwconn`
process. The tuple should include enough information to catch stale or crossed
replies:
```text
request_id + connection_id + sequence + task_id
```
This matters once there are multiple provider processes or deferred replies. It
also makes logging and debugging much easier, because an endpoint audit can show
the complete path of one request through several processes.
### Payload ownership and size limits
The handoff protocol should define who owns buffers and what size limits apply.
At minimum, document these rules before functional refactoring:
- request payload is immutable after handoff unless a mutating legacy wrapper is
explicitly documented;
- provider reply payload length must be checked against caller capacity;
- variable-length replies must report exact encoded length;
- zero-length payload is valid for completion-only replies;
- `NO_REPLY` is a reply kind, not a zero-length success reply;
- byte order inside payloads remains the NCP endpoint's documented wire order,
not native host order.
This is especially important for nested selector families and old/new endpoint
variants, where a provider may need to choose different reply structures based on
a level, verb, or information type.
### Logging and audit benefit
A normalized handoff reply gives logging one consistent shape:
```text
REQ id=42 conn=7 seq=19 ncp=0x2222/23/113 provider=queue len=...
RPLY id=42 conn=7 seq=19 kind=REPLY completion=0x00 len=...
RPLY id=43 conn=7 seq=20 kind=NO_REPLY reason=...
ERR id=44 conn=7 seq=21 provider=bindery error=timeout mapped=0xfb
```
This would also make the endpoint documentation pass easier: each audited
endpoint can identify the provider, the request layout, the logical reply kind,
the reply payload layout, and any caller-side post-processing.
### Migration order for handoff normalization
The safe order is:
1. document current `nwconn`/`nwbind` handoff behavior;
2. add names for current magic values without changing behavior;
3. add a small wrapper such as `ncp_handoff_to_provider()` that still calls the
old path internally;
4. introduce a formal internal reply object in the wrapper;
5. make the wrapper always return a formal reply, including `NO_REPLY`;
6. centralize final client reply sending in `nwconn` for converted paths;
7. only then attach future providers such as `nwqueue` or `nwnds`.
The rule is: do not create a new provider process until the caller can receive a
formal reply from it and can handle provider failure centrally.
## Provider boundaries
A clean design would treat the existing modules as providers instead of hidden