mars-unicode-tables

Unicode data and generated table sources for MARS-NWE/libnwcore.

This repository intentionally keeps upstream Unicode source data separate from MARS-NWE-generated output.

Layout

  • UCD/
    Unicode Character Database input files, currently imported from Unicode 17.0.0.

  • MAPPINGS/
    Unicode mapping files from https://www.unicode.org/Public/MAPPINGS/, preserving the upstream VENDORS/... hierarchy.

  • scripts/
    MARS-NWE helper scripts/generators.

  • TAB/
    Generated C table output consumed by MARS-NWE/libnwcore.

  • LICENSES/
    License notes for Unicode data and MARS-NWE-authored helper code.

Policy

Do not copy Novell NSS shared/sdk/unitables/*.tab files into this repository. They may be used only as compatibility/reference material outside the committed source data.

Unicode case/codepage tables should be generated from Unicode.org data files. Normal mars-nwe builds consume the committed files below TAB/. Regeneration of committed generated files is an explicit maintainer action.

Codepage table generation

MAPPINGS/ contains the Unicode.org vendor mapping files. The codepage helper generator emits compact byte/code-to-Unicode descriptors under TAB/:

./scripts/gen_codepage_tables.py

TAB/codepageTables.c and TAB/codepageTables.h are generated from direct single-BMP-code-point mappings only. Composite mappings, directional pseudo mappings, historical DatedVersions/, and WindowsBestFit/ reverse/fallback files remain in the source tree but are not emitted into byte-to-Unicode tables.

MARS-NWE links these generated tables into libnwcore; they are not loaded as runtime .tab files.

NSS-compatible unitable generation

scripts/gen_nss_unitables.py emits binary UNI_*.TAB/Macintosh .TAB files under TAB/unitables/ from Unicode.org MAPPINGS/. The binary layout follows the table loader shape used by NSS unilib.c: a 256-byte Version 1.xx header, codepage-to-Unicode lookup tables, then Unicode-to-codepage lookup tables.

These files are generated compatibility data. Do not replace them with Novell shared/sdk/unitables/*.TAB files; those remain reference-only because their redistribution license is unclear.

./scripts/gen_nss_unitables.py

UNI_000.TAB is intentionally not emitted by this generator. It uses the separate collation/case-table layout; MARS-NWE currently uses the generated C case tables in TAB/unicodeTables.c for that data.

Maintainer refresh script

Use the root helper when the committed generated outputs should be refreshed:

./regenerate-tabs.sh

For a new Unicode Character Database release, pass the release version. The script downloads UCD.zip, replaces UCD/, regenerates TAB/codepageTables.* and TAB/unitables/*.TAB, validates the generated .TAB files, and can commit the result:

./regenerate-tabs.sh --unicode-version 17.0.0 --commit
Description
No description provided
Readme 14 MiB
Languages
C 78.1%
HTML 21%
Python 0.8%