% Optional comments \_def \_unibidilua_version {1.0, 2026-05-19} \_codedecl \unibidilua {Unicode Bidi Algorithm for OpTeX <\_unibidilua_version>} \_initunifonts \_directlua{ local process = require('unibidi-lua-interface').reorder luatexbase.add_to_callback("pre_shaping_filter",process,"unibidi-lua") } \_endcode \sec Usage To use the package, as with other packages, you can do ¦\load[unibidi-lua]¦, ¦\usepackage{unibidi-lua}¦ or ¦\input unibidi-lua¦ if you are using \OpTeX, \LaTeX or Plain respectively. The process function is added to the ¦pre_shaping_filter¦ when you load the package. To modify things you can use the ¦\unibidilua¦ macro, which accepts key-value pairs separated by a space. \def\key #1:{{\bf #1:}\hskip1em\ignorespaces} The macro accepts the following keywords: \begitems \style O * \key enable: This key accepts a boolean value. When true, the unibidi-lua process function is active in the ¦pre_shaping_filter¦ callback. When false, the callback remains registered but processing is skipped. Default is true when the package is loaded. * \key fences: This key accepts a boolean value. It allows to disable or enable to consider matched brackets when resolving directional levels (\rulelink{N0}). Default is true when the package is loaded. * \key nsm: This key accepts a boolean value. It allows to disable or enable reordering of combining marks after the reordering of the text (\rulelink{L3}). This key is only relevant for the string functions, as there is no reordering in the nodes functions (dir nodes are used there instead). Default is true when the package is loaded. * \key mirror: This key accepts a boolean value. It allows to disable or enable mirroring of characters (\rulelink{L4}). Default is true when the package is loaded. * \key remove: This key accepts one of ¦none¦, ¦controls¦, or ¦full¦. This controls which characters are removed during processing. With ¦none¦, nothing is removed, with ¦controls¦ only ¦LRE¦, ¦RLE¦, ¦LRO¦, ¦RLO¦, ¦PDF¦ characters will be removed, and with ¦full¦ all characters specified in \rulelink{X9} are removed. * \key setdir: This key accepts an integer (or a range of integers) representing Unicode code points, and a direction value. It sets the directional property for the specified character(s). Example: ¦setdir `\A l¦ or ¦setdir `\A-`\Z al¦. * \key setmirror: This key accepts an integer (or a range of integers) representing Unicode code points, and mirror data. It sets the mirroring property for the specified character(s), indicating what character they should mirror to in RTL contexts. Example: ¦setmirror `\( `\)¦. * \key setbracket: This key accepts an integer (or a range of integers) representing Unicode code points, and a bracket type value (o (open) or c (close)) . It sets the bracket type property for the specified character(s). Example: ¦setbracket `\( o¦. * \key baselevel: This key accepts a Lua function that determines the base directionality level of the text. See ¦bidi.set("baselevel")¦ in \ref[lua]{section~@}. * \key mirrorchar: This key accepts a Lua function that applies the mirroring on nodes. See ¦bidi.set("mirrorchar")¦ in \ref[lua]{section~@}. * \key startlevel: This key accepts a Lua function that determines from which level to insert direction nodes. See ¦bidi.set("startlevel")¦ in \ref[lua]{section~@}. \enditems Note that you can also load the \TeX/ side interface without loading the package file using \begtt \directlua{require('unibidi-lua-interface')} \endtt This sets up the \TeX/ interface macros but does not add any function to a callback, and does not ensure ¦luaotfload¦ is loaded. The return value is a table with a ¦reorder¦ function that can be added to a callback manually: \begtt \directlua{ local process = require('unibidi-lua-interface').reorder luatexbase.add_to_callback("pre_shaping_filter", process, "unibidi-lua") } \endtt \sec[lua] Lua API If you just want the Lua API, you can load it directly using \begtt local bidi = require('unibidi-lua') \endtt This does not define the \TeX/ interface nor add a function to a callback, and works with ¦texlua¦ and plain ¦lua¦ as well as \LuaTeX/. Note that the node functions are only available when running under \LuaTeX/. \secc Options \begitems * ¦bidi.set(key, value)¦: Set an option. The following keys are accepted: \begitems \style - * ¦"fences"¦: boolean, enable/disable bracket pair resolution (\rulelink{N0}). * ¦"nsm"¦: boolean, enable/disable combining mark reordering (\rulelink{L3}). Only relevant for string processing functions. * ¦"mirror"¦: boolean, enable/disable character mirroring (\rulelink{L4}). * ¦"remove"¦: string, one of ¦"none"¦, ¦"controls"¦, or ¦"full"¦. Controls which characters are removed during processing. * ¦"baselevel"¦: function, a custom function to determine the base directionality level of the paragraph. See the source code for more details. * ¦"mirrorchar"¦: function, called for each glyph that has a mirror character. Receives the direct node and the mirror codepoint, and is responsible for applying the mirroring. The default implementation sets the character to its mirror for all non HarfBuzz fonts, as HarfBuzz handles mirroring (including using OpenType features). Note that this does not check OpenType mirroring features (¦rtlm¦/¦rtla¦); fonts that rely on those features for mirroring may not render correctly in node mode\fnote{If you want you can implement it using this hook. See for example\nl \url{https://github.com/latex3/luaotfload/blob/bidi-dev/src/luaotfload-mirror.lua}}. Only relevant for node processing functions. * ¦"startlevel"¦: function, called with the base level to determine the minimum embedding level from which direction nodes are inserted. The default returns the base level itself. For example consider the following \begtt \hbox bdir1 {קו 42} \endtt The list of nodes we start with is something like \nodesinit With the default ¦baselevel¦ function, the bidi algorithm is run with base level 1 (the direction of the hbox containing the text). The resolved levels are {\tt\levels}. Direction nodes are then inserted starting from ¦startlevel¦ to mark the boundaries of each directional run. With the default ¦startlevel¦ function the resulting node list is: \nodesdefault The default inserts direction nodes starting from the base level, ensuring that the logical reading order is preserved if the box is later unboxed (since unboxing discards the box direction), or if a custom ¦baselevel¦ function assigns a different level than the containing box direction. If unboxing and custom base level functions are not a concern, the outer direction nodes can be omitted by returning ¦baselevel+1¦: \begtt bidi.set("startlevel", function(baselevel) return baselevel+1 end) \endtt which produces: \nodescustom Only relevant for node processing functions. \enditems * ¦bidi.get(key)¦: Get the current value of an option. Accepts the same keys as ¦bidi.set¦. \enditems \secc Data tables \begitems * ¦bidi.directions¦: Table containing character directional properties. Indexed by Unicode code point, returns the bidi class string (e.g. ¦"l"¦, ¦"r"¦, ¦"al"¦, ¦"an"¦, etc.). * ¦bidi.mirrors¦: Table containing character mirroring mappings. Indexed by Unicode code point, returns the code point of the mirrored character. * ¦bidi.brackets¦: Table containing bracket type classifications. Indexed by Unicode code point, returns ¦"o"¦ for opening brackets, ¦"c"¦ for closing brackets. \enditems \secc String functions \begitems * ¦bidi.string.reorder(str, direction, where)¦: Apply the Unicode Bidirectional Algorithm to the string ¦str¦ and return the visually reordered string. ¦direction¦ is ¦0¦ for LTR, ¦1¦ for RTL, or ¦-1¦ (or ¦nil¦) for auto-detect (or it can mean something else if the base level function is replaced). ¦where¦ is just a value that is passed to the base level function, and the default one does not use it. * ¦bidi.string.levels(str, direction, where)¦: Apply the Unicode Bidirectional Algorithm to the string ¦str¦ and return two values: a levels array and a reorder array. The levels array maps each character position to its resolved bidi level, or ¦false¦ if the character was removed by \rulelink{X9}. The reorder array maps each visual position to the original logical index of the character. * ¦bidi.codepoints.reorder(codepoints, direction, where)¦: Same as ¦bidi.string.reorder¦ but accepts and returns an array of codepoints instead of a string. * ¦bidi.codepoints.levels(codepoints, direction, where)¦: Same as ¦bidi.string.levels¦ but accepts an array of codepoints instead of a string. \enditems \secc Node functions There are two variants of the node functions: ¦bidi.direct¦ which operates on direct node references (as used in ¦node.direct¦), and ¦bidi.node¦ which operates on regular node references. \begitems * ¦bidi.direct.reorder(head, direction, where)¦: Apply the Unicode Bidirectional Algorithm to the node list starting at ¦head¦ (a direct node reference). Direction nodes are inserted to mark directional runs. Returns the (possibly new) head of the node list. * ¦bidi.direct.levels(head, direction, where)¦: Apply the Unicode Bidirectional Algorithm to the node list starting at ¦head¦ (a direct node reference) and return a levels array. The levels array maps each node position to its resolved bidi level, or ¦false¦ if the node was removed by \rulelink{X9}. * ¦bidi.node.reorder(head, direction, where)¦: Same as ¦bidi.direct.reorder¦ but accepts and returns regular node references. * ¦bidi.node.levels(head, direction, where)¦: Same as ¦bidi.direct.levels¦ but accepts regular node references. \enditems \_doc % optex -jobname unibidi-lua-doc '\docgen unibidi-lua' \load [doc] \verbchar¦ \directlua{ local bidi = require('unibidi-lua') local function show_nodes(head,csname) local function nodename(n) if n.id == node.id("glyph") then return string.format('glyph  "\%s"', utf8.char(node.getchar(n))) elseif n.id == node.id("glue") then return string.format('glue   " "') elseif n.id == node.id("dir") then return string.format('dir    \%s', n.dir) else return string.format('\%s \%d', node.type(n.id), sub) end end local lines = {"\nbb begtt"} local n = head while n do lines[\csstring\#lines+1] = nodename(n) n = n.next end lines[\csstring\#lines+1] = "\nbb endtt" lines[\csstring\#lines+1] = "" define_lua_command(csname,function() tex.print(lines) end) end callback.add_to_callback("pre_shaping_filter",function(head,_,dir) show_nodes(head,"nodesinit") local levels = bidi.node.levels(head,dir) define_lua_command("levels",function() tex.print(levels) end) show_nodes(bidi.node.reorder(head,dir),"nodesdefault") bidi.set("startlevel",function(baselevel) return baselevel+1 end) show_nodes(bidi.node.reorder(head,dir),"nodescustom") callback.remove_from_callback("pre_shaping_filter","foo") end,"foo") } \hbox bdir1 {קו 42} \edef\rulelink#1{\noexpand\ulink[https://www.unicode.org/reports/tr9/\csstring\##1]{rule #1}} \addto\_fontfeatures{fallback=miriam;} \directlua{luaotfload.add_fallback("miriam", {"name:MiriamMonoCLM:mode=harf;"})} \overfullrule=0pt \tit Unicode Bidi Algorithm Implementation for \OpTeX, \LaTeX, and Plain \hfill Version: \_unibidilua_version \par \centerline{\it Udi Fogiel, 2025-2026} \parindent0pt\parskip5pt\parfillskip=20ptplus1fill The unibidi-lua LuaTeX package is an implementation of the \ulink[https://www.unicode.org/reports/tr9/]{Unicode Annex \#9} for \OpTeX, \LaTeX/ and Plain \LuaTeX/ formats. The core Lua module also works with ¦texlua¦ and plain ¦lua¦. It allows typesetting bidirectional documents without the need of special markup. \printdoctail unibidi-lua.opm % prints the documentation written after \_endcode \bye \_cod \endinput