diff --git a/REDESIGN.md b/REDESIGN.md index 039bc91..8d75618 100644 --- a/REDESIGN.md +++ b/REDESIGN.md @@ -530,6 +530,89 @@ The safe order is: The rule is: do not create a new provider process until the caller can receive a formal reply from it and can handle provider failure centrally. +## `nwserv` as control plane, not data-plane router + +Future provider processes need a way to find and trust each other, but normal +request payloads should not all be routed through `nwserv`. `nwserv` should stay +the supervisor and registry for the mars-nwe process tree. It should not become +a central payload broker for every decoded NCP request. + +The preferred split is: + +```text +nwserv: + control plane + process supervision + provider registry + endpoint/socket ownership + restart and shutdown coordination + +nwconn <-> provider: + data plane + direct request/reply IPC + normalized handoff messages +``` + +So a future request path should look like this: + +```text +client -> nwconn -> direct provider IPC -> provider -> nwconn -> client +``` + +not like this: + +```text +client -> nwconn -> nwserv -> provider -> nwserv -> nwconn -> client +``` + +`nwserv` may still create, own, or advertise IPC endpoints. For example, it can +start `nwbind`, `nwqueue`, `nwdirectory`, or `nwnds`, create a protected runtime +directory such as `/run/mars-nwe`, assign socket paths or inherited file +descriptors, and record which provider is currently alive. A `nwconn` process +can then discover provider endpoints from configuration, inherited descriptors, +or a small `nwserv` registry query. After discovery, normal handoff traffic +should go directly to the provider. + +This keeps `nwserv` small and avoids several failure modes: + +- no extra copy and latency for every NCP handoff; +- no single data-plane bottleneck; +- no need for `nwserv` to understand every provider payload; +- fewer decoded password/auth/directory payloads visible to the supervisor; +- easier provider-specific timeouts and back-pressure; +- clearer ownership: `nwconn` owns the client connection, providers own their + service logic, and `nwserv` owns lifecycle. + +The kinds of messages that should go through `nwserv` are control messages: + +- provider started, registered, unhealthy, or exited; +- provider restart requested or refused; +- global shutdown or graceful drain; +- configuration reload notification; +- socket/FD registration and permission setup; +- health and version/capability queries. + +The kinds of messages that should not normally go through `nwserv` are data-plane +messages: + +- decoded NCP request payloads; +- Bindery object/property operations; +- Queue job lifecycle operations; +- Directory/NDS authentication or schema operations; +- file/volume provider payloads; +- provider replies carrying completion/status and reply payloads. + +There can be narrow exceptions during migration, especially for existing legacy +`nwconn`/`nwbind` plumbing, but those exceptions should be documented as legacy +wrappers. New provider processes should be designed around direct normalized IPC +from the caller to the provider. + +This also fits the secure IPC policy: local direct IPC can use protected +Unix-domain sockets, pipes, or inherited descriptors. If a future provider is +connected over TCP instead, that specific provider IPC channel must use the +separate wolfSSL/mTLS policy described below. `nwserv` discovery must not be used +as an excuse to downgrade provider data-plane traffic to plaintext TCP. + ## Secure internal provider IPC and client transport compatibility