[texhax] Some puzzling TeX

Mon Feb 21 22:05:21 CET 2011

"Stephen Hicks" <sdh33 at cornell.edu> wrote 21.02.2011 02:08:10:
>On Sun, Feb 20, 2011 at 8:13 AM, Uwe Lueck  wrote:
>> "Philip Taylor (Webmaster, Ret'd)"  wrote 20.02.2011 01:40:24:
>>>Uwe Lueck wrote:
>>>> "Stephen Hicks"  wrote 17.02.2011 01:50:44:
>>>>> catcode 16
>>>> What's that?
>>> See page 209.
>> What evidence is there besides this one of Knuth's notoriously unreliable ("incredible") claims?
>>
>> I am unable to get a catcode of \relax or \bgroup by \showthe\catcode.
>> Only with \ifcat, I can see that \relax and \bgroup are different,
>> while, e.g., after \let\BGROUP{\et\EGROUP}, \bgroup and \BGROUP
>> are the same according to \ifcat.
>>
>> My conclusion at the moment is that one might better say that
>> with \ifcat, control sequences behave "as if they had catcode 16 or ..."
>> Especially, it seems to me that instead of "catcode 16" one could
>> as well speak of "catcode -1" or anything else that is not among
>> 0, ..., 15.
>>
>> I am almost a Pascal and truely a C illiterate,
>> so can't read the code of TeX, that may allow a more specific statement.
>
> You've provoked me to investigate.  While indeed the TeXbook claims
> that TeX treats control sequences as catcode 16 and character code
> 256, the source code suggests otherwise.  Here's a trace through what
> happens on \ifcat\noexpand\controlseq:
>
>[ Test if two characters match §506] calls get_x_token_or_active_char
> twice, comparing the (possibly-modified) value of cur_cmd.  This leads
> to get_x_token (§380) which calls get_next (§341).  Here we'll assume
> the tokens are coming from a token list rather than an input stream,
> so we end up in [Input from token list ... §357], which sets cur_cmd =
> no_expand (= 103).  Back in get_x_token (§380), since max_command (=
> 100) < no_expand < call (= 111), we call expand (§366).  Since cur_cmd
> = no_expand < call, we [Expand a nonmacro §367] and ultimately
> [Suppress expansion of the next token §369], which calls get_token
> (§365), which calls get_next a second time, returning cur_cmd := call
> = 111 (assuming  that \controlseq is neither \outer nor \long) and
> points cur_cs to the macro's definition.  §369 then backs up the
> packed cur_tok into t and calls back_input (§325) to "unread"
> \controlseq.  Since cur_tok was a control sequence, we insert a
> permanent \notexpanded into the front of the input stream.  We now
> return from expand (§366) back to get_x_token (§380) and goto restart.
> The second time through, get_next (§341) sees the \notexpanded, which
> sets cur_cmd to dont_expand (§210,258) and jumps to [Get the next
> token, suppressing expansion §358], which sets cur_cmd := relax (= 0,
> §207) and cur_chr := no_expand_flag because \controlseq, which was
> saved in cur_cs, had a cur_cmd > max_command.  Back in get_x_token
> (§380) we see cur_cmd = 0 < max_command so we're done.  Back to
> get_x_token_or_active_char (§506) - we now find cur_cmd = relax and
> cur_chr = no_expand_flag, so we set cur_cmd := active_char (= 13 (!))
> and cur_chr := cur_tok - cs_token_flag - active_base.  This seems
> strange, but the prose at the top of the section tips us off to the
> fact that "active characters have the smallest tokens" and therefore
> the check "if (cur_cmd > active_char) or (cur_chr > 255)" does indeed
> fire because of the second clause.  Thus, we (effectively) set cur_cmd
> = relax = 0, which is compared to the catcode of the next token.
>
> Whew...  so, it looks like this myth of "control sequences have
> catcode 16" can finally be put to rest.  The catcode of a control
> sequence is, in fact, 0.  But since there's no way to ever get a naked
> escape token (catcode 0) into the input stream, the fact that it has
> the same catcode is irrelevant.

Thanks a lot, great work!

So I am not the only one who sometimes spends much time 
with writing the documentation for other people's code 
(TeX macros in my case).

Actually, I am pondering writing a TUGboat paper about 
something I don't know: when TeX expands a macro with 
parameters ... i.e., some *occurrence* of the macro with 
some ensuing tokens "at a certain stage" ... thinking of an 
"input buffer" for tokens that may be emptied from the left 
maybe by the "execution processor" and feeded from the 
right by the tokenizer ... but with an \edef or a \write: 
is the expansion processor then still feeded from the 
"input buffer," or in a very different way?

Cheers, 

    Uwe.