[texhax] Any crazy math formulas for testing a TeX language interpreter

Tue Jan 12 09:47:59 CET 2016

On 12/01/2016 08:01, David Carlisle wrote:
>> -------------------------
>> Joseph Wright wrote -
>>
>>> (Aside: I'm be very keen to know about
>>
>>> primitive coverage beyond TeX90, particularly e-TeX, \pdfstrcmp or
>>> equivalent and Unicode-related primitives, in particular \Uchar and
>>> \Ucharcat. See expl3 for why these are important.)
>>
>>
>> These are not part of e-TeX, so I've not spent any time thinking about
>> them.  String comparison in Unicode is a giant ball of wax, of course.

String comparisons here are pretty simple, actually :-)

Taking the Lua version needed for LuaTeX, the code is simply

    function strcmp(A, B)
      if A == B then
        tex_write("0")
      elseif A < B then
        tex_write("-1")
      else
        tex_write("1")
      end
    end

Lua isn't an all-Unicode system: Lua strings are simply bytes and so the
question is one of byte order. Similarly, the XeTeX implementation (from
http://sourceforge.net/p/xetex/code/ci/master/tree/source/texk/web2c/xetexdir/xetex.web)
is

    procedure compare_strings; {to implement \.{\\strcmp}}
    label done;
    var s1, s2: str_number;
      i1, i2, j1, j2: pool_pointer;
    begin
      call_func(scan_toks(false, true));
      s1:=tokens_to_string(def_ref);
      delete_token_ref(def_ref);
      call_func(scan_toks(false, true));
      s2:=tokens_to_string(def_ref);
      delete_token_ref(def_ref);
      i1:=str_start_macro(s1);
      j1:=str_start_macro(s1 + 1);
      i2:=str_start_macro(s2);
      j2:=str_start_macro(s2 + 1);
      while (i1 < j1) and (i2 < j2) do begin
        if str_pool[i1] < str_pool[i2] then begin
          cur_val:=-1;
          goto done;
        end;
        if str_pool[i1] > str_pool[i2] then begin
          cur_val:=1;
          goto done;
        end;
        incr(i1);
        incr(i2);
      end;
      if (i1 = j1) and (i2 = j2) then
        cur_val:=0
      else if i1 < j1 then
        cur_val:=1
      else
        cur_val:=-1;
    done:
      flush_str(s2);
      flush_str(s1);
      cur_val_level:=int_val;
    end;

(I presume this is not-dissimilar to the pdfTeX one, but that of course
doesn't have to worry about anything other than single bytes.)

In any case, some form of expandable comparison that ignores catcodes is
very useful, and it's essential to use expl3 (we found some years ago
that this was the one post-e-TeX primitive that was vital in all cases,
though as David notes once you get to dealing with Unicode then some
ability to generate tokens across the full range is also needed).

Joseph