[texhax] Any crazy math formulas for testing a TeX language interpreter
Joseph Wright
joseph.wright at morningstar2.co.uk
Tue Jan 12 09:47:59 CET 2016
On 12/01/2016 08:01, David Carlisle wrote:
>> -------------------------
>> Joseph Wright wrote -
>>
>>> (Aside: I'm be very keen to know about
>>
>>> primitive coverage beyond TeX90, particularly e-TeX, \pdfstrcmp or
>>> equivalent and Unicode-related primitives, in particular \Uchar and
>>> \Ucharcat. See expl3 for why these are important.)
>>
>>
>> These are not part of e-TeX, so I've not spent any time thinking about
>> them. String comparison in Unicode is a giant ball of wax, of course.
String comparisons here are pretty simple, actually :-)
Taking the Lua version needed for LuaTeX, the code is simply
function strcmp(A, B)
if A == B then
tex_write("0")
elseif A < B then
tex_write("-1")
else
tex_write("1")
end
end
Lua isn't an all-Unicode system: Lua strings are simply bytes and so the
question is one of byte order. Similarly, the XeTeX implementation (from
http://sourceforge.net/p/xetex/code/ci/master/tree/source/texk/web2c/xetexdir/xetex.web)
is
procedure compare_strings; {to implement \.{\\strcmp}}
label done;
var s1, s2: str_number;
i1, i2, j1, j2: pool_pointer;
begin
call_func(scan_toks(false, true));
s1:=tokens_to_string(def_ref);
delete_token_ref(def_ref);
call_func(scan_toks(false, true));
s2:=tokens_to_string(def_ref);
delete_token_ref(def_ref);
i1:=str_start_macro(s1);
j1:=str_start_macro(s1 + 1);
i2:=str_start_macro(s2);
j2:=str_start_macro(s2 + 1);
while (i1 < j1) and (i2 < j2) do begin
if str_pool[i1] < str_pool[i2] then begin
cur_val:=-1;
goto done;
end;
if str_pool[i1] > str_pool[i2] then begin
cur_val:=1;
goto done;
end;
incr(i1);
incr(i2);
end;
if (i1 = j1) and (i2 = j2) then
cur_val:=0
else if i1 < j1 then
cur_val:=1
else
cur_val:=-1;
done:
flush_str(s2);
flush_str(s1);
cur_val_level:=int_val;
end;
(I presume this is not-dissimilar to the pdfTeX one, but that of course
doesn't have to worry about anything other than single bytes.)
In any case, some form of expandable comparison that ignores catcodes is
very useful, and it's essential to use expl3 (we found some years ago
that this was the one post-e-TeX primitive that was vital in all cases,
though as David notes once you get to dealing with Unicode then some
ability to generate tokens across the full range is also needed).
Joseph
More information about the texhax
mailing list