mars-unicode-tables
Unicode data and generated table sources for MARS-NWE/libnwcore.
This repository intentionally keeps upstream Unicode source data separate from MARS-NWE-generated output.
Layout
-
UCD/
Unicode Character Database input files, currently imported from Unicode 17.0.0. -
MAPPINGS/
Unicode mapping files fromhttps://www.unicode.org/Public/MAPPINGS/, preserving the upstreamVENDORS/...hierarchy. -
scripts/
MARS-NWE helper scripts/generators. -
TAB/
Generated C table output consumed by MARS-NWE/libnwcore. -
LICENSES/
License notes for Unicode data and MARS-NWE-authored helper code.
Policy
Do not copy Novell NSS shared/sdk/unitables/*.tab files into this repository.
They may be used only as compatibility/reference material outside the committed
source data.
Unicode case/codepage tables should be generated from Unicode.org data files.
Normal mars-nwe builds consume the committed files below TAB/. Regeneration
of committed generated files is an explicit maintainer action.
Codepage table generation
MAPPINGS/ contains the Unicode.org vendor mapping files. The codepage
helper generator emits compact byte/code-to-Unicode descriptors under TAB/:
./scripts/gen_codepage_tables.py
TAB/codepageTables.c and TAB/codepageTables.h are generated from direct
single-BMP-code-point mappings only. Composite mappings, directional pseudo
mappings, historical DatedVersions/, and WindowsBestFit/ reverse/fallback
files remain in the source tree but are not emitted into byte-to-Unicode tables.
MARS-NWE links these generated tables into libnwcore; they are not loaded as
runtime .tab files.
NSS-compatible unitable generation
scripts/gen_nss_unitables.py emits binary UNI_*.TAB/Macintosh .TAB files
under TAB/unitables/ from Unicode.org MAPPINGS/. The binary layout follows
the table loader shape used by NSS unilib.c: a 256-byte Version 1.xx header,
codepage-to-Unicode lookup tables, then Unicode-to-codepage lookup tables.
These files are generated compatibility data. Do not replace them with Novell
shared/sdk/unitables/*.TAB files; those remain reference-only because their
redistribution license is unclear.
./scripts/gen_nss_unitables.py
UNI_000.TAB is intentionally not emitted by this generator. It uses the
separate collation/case-table layout; MARS-NWE currently uses the generated C
case tables in TAB/unicodeTables.c for that data.
Maintainer refresh script
Use the root helper when the committed generated outputs should be refreshed:
./regenerate-tabs.sh
For a new Unicode Character Database release, pass the release version. The
script downloads UCD.zip, replaces UCD/, regenerates TAB/codepageTables.*
and TAB/unitables/*.TAB, validates the generated .TAB files, and can commit
the result:
./regenerate-tabs.sh --unicode-version 17.0.0 --commit