docs: document normalized NCP handoff replies
This commit is contained in:
226
REDESIGN.md
226
REDESIGN.md
@@ -304,6 +304,232 @@ Long-term, response sending should become centralized enough that endpoint code
|
||||
only encodes payload bytes and a completion code. This reduces off-by-one reply
|
||||
length bugs and makes the logs easier to normalize.
|
||||
|
||||
|
||||
|
||||
## Normalized inter-process handoff replies
|
||||
|
||||
The process handoff path should be normalized before adding more provider
|
||||
processes. The current `nwconn` to `nwbind` forwarding path relies on magic
|
||||
return values and implicit shared-buffer conventions. That is workable for the
|
||||
historic two-process case, but it will not scale cleanly to future providers such
|
||||
as `nwqueue`, `nwnds`, or a directory service.
|
||||
|
||||
The long-term rule should be:
|
||||
|
||||
```text
|
||||
Every provider process returns exactly one internal handoff reply for every
|
||||
internal handoff request it accepts.
|
||||
```
|
||||
|
||||
That internal reply is not the same thing as a client-visible NCP reply. A
|
||||
provider may explicitly say that no client reply should be sent, but it should
|
||||
still send a formal internal result back to the caller. This avoids silent
|
||||
success/failure paths and makes timeout/error handling deterministic.
|
||||
|
||||
Conceptual reply kinds:
|
||||
|
||||
```c
|
||||
typedef enum {
|
||||
NW_HR_REPLY, /* provider produced a client NCP reply payload */
|
||||
NW_HR_NO_REPLY, /* provider handled it; nwconn must not send a client reply */
|
||||
NW_HR_DEFERRED, /* accepted; final reply/event will be produced later */
|
||||
NW_HR_FORWARD, /* provider requests forwarding to another provider */
|
||||
NW_HR_ERROR /* internal provider or handoff failure */
|
||||
} NwHandoffReplyKind;
|
||||
```
|
||||
|
||||
The normal successful completion-only case is still a reply:
|
||||
|
||||
```text
|
||||
kind = NW_HR_REPLY
|
||||
completion = 0x00
|
||||
reply_len = 0
|
||||
```
|
||||
|
||||
The true "do not answer the client" case is explicit:
|
||||
|
||||
```text
|
||||
kind = NW_HR_NO_REPLY
|
||||
reply_len = 0
|
||||
```
|
||||
|
||||
Do not encode this as text in a payload such as `"no reply"`. It should be a
|
||||
machine-readable reply kind so that `nwconn` can make one central decision.
|
||||
|
||||
A conceptual internal handoff reply header could look like this:
|
||||
|
||||
```c
|
||||
typedef struct {
|
||||
uint16_t version;
|
||||
uint16_t kind;
|
||||
|
||||
uint32_t request_id;
|
||||
uint32_t connection_id;
|
||||
uint32_t sequence;
|
||||
uint32_t task_id;
|
||||
|
||||
uint8_t completion;
|
||||
uint8_t connection_status;
|
||||
|
||||
uint32_t flags;
|
||||
uint32_t reply_len;
|
||||
} NwHandoffReply;
|
||||
```
|
||||
|
||||
The matching request should carry the same correlation fields plus the NCP
|
||||
selector path and payload length. The exact structure can follow existing
|
||||
mars-nwe style, but the contract should be stable:
|
||||
|
||||
```text
|
||||
nwconn -> provider:
|
||||
request_id, connection_id, sequence, task, selector path, request payload
|
||||
|
||||
provider -> nwconn:
|
||||
same request_id, reply kind, completion/status, reply payload length, payload
|
||||
```
|
||||
|
||||
This gives future provider processes a uniform contract:
|
||||
|
||||
```text
|
||||
nwconn -> nwbind
|
||||
nwconn -> nwqueue
|
||||
nwconn -> nwnds
|
||||
nwconn -> nwdirectory
|
||||
```
|
||||
|
||||
all use the same shape. Provider-specific behavior belongs in the payload and
|
||||
provider API, not in special process-specific return conventions.
|
||||
|
||||
### Reply ownership
|
||||
|
||||
The preferred long-term ownership rule is:
|
||||
|
||||
```text
|
||||
Provider builds the logical reply payload and completion/status.
|
||||
nwconn owns the final client NCP response envelope and sends it to the client.
|
||||
```
|
||||
|
||||
This means providers should not directly send client packets. They return an
|
||||
internal result that says what should happen. `nwconn` then applies the original
|
||||
sequence, connection number, task, transport, and NCP envelope rules in one
|
||||
place.
|
||||
|
||||
This avoids several classes of bugs:
|
||||
|
||||
- duplicate replies;
|
||||
- wrong sequence or task in replies;
|
||||
- inconsistent completion-only reply lengths;
|
||||
- provider-specific send/error paths;
|
||||
- unclear post-processing after `nwbind` replies;
|
||||
- future provider processes needing to know transport details.
|
||||
|
||||
Legacy paths may continue to have provider-built replies during migration, but
|
||||
that should be marked as a legacy compatibility mode, not the design target.
|
||||
|
||||
### Post-processing without `return(-2)`
|
||||
|
||||
The current `return(-2)` convention means roughly "forward to bindery, save the
|
||||
original request, then let `nwconn` do more work after the provider reply". In a
|
||||
normalized handoff this should become explicit state, not a magic result value.
|
||||
|
||||
Possible flags:
|
||||
|
||||
```c
|
||||
#define NW_HF_SAVE_ORIGINAL_REQUEST 0x00000001
|
||||
#define NW_HF_POSTPROCESS_REPLY 0x00000002
|
||||
#define NW_HF_LEGACY_PROVIDER_REPLY 0x00000004
|
||||
```
|
||||
|
||||
The request context can also name the post-processing hook, or store a small
|
||||
post-processing enum, so the provider does not need to know why the caller will
|
||||
continue after the reply. This keeps the handoff transport generic.
|
||||
|
||||
### Error mapping and dead-provider behavior
|
||||
|
||||
The handoff layer should define what happens when a provider fails before a
|
||||
protocol handler can return a normal NetWare completion code. Otherwise every
|
||||
new provider will invent a slightly different failure path.
|
||||
|
||||
The normalized rules should cover:
|
||||
|
||||
- provider process not running;
|
||||
- provider closes the IPC channel;
|
||||
- malformed internal reply;
|
||||
- mismatched `request_id`;
|
||||
- provider timeout;
|
||||
- reply payload longer than negotiated capacity;
|
||||
- provider returned `NW_HR_ERROR` with an internal error code;
|
||||
- provider returned `NW_HR_FORWARD` to an unsupported target.
|
||||
|
||||
For client-visible requests, `nwconn` should map those failures through one
|
||||
central function to either an NCP completion code, a connection-level failure, or
|
||||
an intentional disconnect. Endpoint handlers should not open-code provider IPC
|
||||
failures as arbitrary completion bytes.
|
||||
|
||||
### Correlation and replay safety
|
||||
|
||||
Every internal handoff request should carry a monotonically useful correlation
|
||||
identifier, even if the first implementation is only local to one `nwconn`
|
||||
process. The tuple should include enough information to catch stale or crossed
|
||||
replies:
|
||||
|
||||
```text
|
||||
request_id + connection_id + sequence + task_id
|
||||
```
|
||||
|
||||
This matters once there are multiple provider processes or deferred replies. It
|
||||
also makes logging and debugging much easier, because an endpoint audit can show
|
||||
the complete path of one request through several processes.
|
||||
|
||||
### Payload ownership and size limits
|
||||
|
||||
The handoff protocol should define who owns buffers and what size limits apply.
|
||||
At minimum, document these rules before functional refactoring:
|
||||
|
||||
- request payload is immutable after handoff unless a mutating legacy wrapper is
|
||||
explicitly documented;
|
||||
- provider reply payload length must be checked against caller capacity;
|
||||
- variable-length replies must report exact encoded length;
|
||||
- zero-length payload is valid for completion-only replies;
|
||||
- `NO_REPLY` is a reply kind, not a zero-length success reply;
|
||||
- byte order inside payloads remains the NCP endpoint's documented wire order,
|
||||
not native host order.
|
||||
|
||||
This is especially important for nested selector families and old/new endpoint
|
||||
variants, where a provider may need to choose different reply structures based on
|
||||
a level, verb, or information type.
|
||||
|
||||
### Logging and audit benefit
|
||||
|
||||
A normalized handoff reply gives logging one consistent shape:
|
||||
|
||||
```text
|
||||
REQ id=42 conn=7 seq=19 ncp=0x2222/23/113 provider=queue len=...
|
||||
RPLY id=42 conn=7 seq=19 kind=REPLY completion=0x00 len=...
|
||||
RPLY id=43 conn=7 seq=20 kind=NO_REPLY reason=...
|
||||
ERR id=44 conn=7 seq=21 provider=bindery error=timeout mapped=0xfb
|
||||
```
|
||||
|
||||
This would also make the endpoint documentation pass easier: each audited
|
||||
endpoint can identify the provider, the request layout, the logical reply kind,
|
||||
the reply payload layout, and any caller-side post-processing.
|
||||
|
||||
### Migration order for handoff normalization
|
||||
|
||||
The safe order is:
|
||||
|
||||
1. document current `nwconn`/`nwbind` handoff behavior;
|
||||
2. add names for current magic values without changing behavior;
|
||||
3. add a small wrapper such as `ncp_handoff_to_provider()` that still calls the
|
||||
old path internally;
|
||||
4. introduce a formal internal reply object in the wrapper;
|
||||
5. make the wrapper always return a formal reply, including `NO_REPLY`;
|
||||
6. centralize final client reply sending in `nwconn` for converted paths;
|
||||
7. only then attach future providers such as `nwqueue` or `nwnds`.
|
||||
|
||||
The rule is: do not create a new provider process until the caller can receive a
|
||||
formal reply from it and can handle provider failure centrally.
|
||||
|
||||
## Provider boundaries
|
||||
|
||||
A clean design would treat the existing modules as providers instead of hidden
|
||||
|
||||
Reference in New Issue
Block a user