Files
mars-unicode-tables/README.md
2026-06-17 18:12:53 +02:00

69 lines
2.3 KiB
Markdown

# mars-unicode-tables
Unicode data and generated table sources for MARS-NWE/libnwcore.
This repository intentionally keeps upstream Unicode source data separate from
MARS-NWE-generated output.
## Layout
- `UCD/`
Unicode Character Database input files, currently imported from Unicode 17.0.0.
- `MAPPINGS/`
Unicode mapping files from `https://www.unicode.org/Public/MAPPINGS/`,
preserving the upstream `VENDORS/...` hierarchy.
- `scripts/`
MARS-NWE helper scripts/generators.
- `TAB/`
Generated C table output consumed by MARS-NWE/libnwcore.
- `LICENSES/`
License notes for Unicode data and MARS-NWE-authored helper code.
## Policy
Do not copy Novell NSS `shared/sdk/unitables/*.tab` files into this repository.
They may be used only as compatibility/reference material outside the committed
source data.
Unicode case/codepage tables should be generated from Unicode.org data files.
## Codepage table generation
`MAPPINGS/` contains the Unicode.org vendor mapping files. The codepage
helper generator emits compact byte/code-to-Unicode descriptors under `TAB/`:
```sh
./scripts/gen_codepage_tables.py
```
`TAB/codepageTables.c` and `TAB/codepageTables.h` are generated from direct
single-BMP-code-point mappings only. Composite mappings, directional pseudo
mappings, historical `DatedVersions/`, and `WindowsBestFit/` reverse/fallback
files remain in the source tree but are not emitted into byte-to-Unicode tables.
MARS-NWE links these generated tables into `libnwcore`; they are not loaded as
runtime `.tab` files.
## NSS-compatible unitable generation
`scripts/gen_nss_unitables.py` emits binary `UNI_*.TAB`/Macintosh `.TAB` files
under `TAB/unitables/` from Unicode.org `MAPPINGS/`. The binary layout follows
the table loader shape used by NSS `unilib.c`: a 256-byte `Version 1.xx` header,
codepage-to-Unicode lookup tables, then Unicode-to-codepage lookup tables.
These files are generated compatibility data. Do not replace them with Novell
`shared/sdk/unitables/*.TAB` files; those remain reference-only because their
redistribution license is unclear.
```sh
./scripts/gen_nss_unitables.py
```
`UNI_000.TAB` is intentionally not emitted by this generator. It uses the
separate collation/case-table layout; MARS-NWE currently uses the generated C
case tables in `TAB/unicodeTables.c` for that data.