Proposal  #2


Stack Groups on the Falcon.


	The hardware stack structure on the Falcon is different than that
on the lambda.  However, what the programmer wants to see (from the debugger, for
example) remains the same, a "stack machine".   This proposal details how we
reformat the various falcon stacks into stacks which look very much like the LAMBDA.
The basic idea is to do the reformatting always and at the very lowest level,
i.e. as the data is loaded or stored from the falcon hardware.

Doing it this way has several advantages and disadvantages.

advantages:
     (1)economy of mechanism-- no need for separate "scrolling" routines for example.
	     no matter what you are trying to do, you save out, do-it, and reload.
	     Thus, the only routines deal directly with the hardware are load and store.
     (2)minimum of code that runs in "embarrasing" circumstances
	     Somewhat a consequence of 1.  When the Falcon call hardware is "disturbed"
	     the code must VERY careful.  (no ordinary subroutines, may not be able to
	     reference local variables in the usual way, etc .)  Ultimate reliability
	     will be served if this sort of code is minimized.
     (3)avoid proliferation of hardware frame numbers into system datastructures.
	     the HEAP numbers used by the falcon hardware may cause difficulties if
	     they are allowed to "persist" over a long period.  (When you reload they
	     might not be the same, what if you have to scroll, what if there are more than
	     256 frames, etc. etc.)  We avoid these difficulties by discarding them
	     immediately (remapping them, if you will, into positions in a conventional
	     style stack.).
     (4)closest to lambda.
	     This will allow our current error handler to operate with minimum modification.
     (5)only two stacks need be allocate (or worried about) in most contexts, as on LAMBDA.
	     (these are the MAIN and SPECIAL stacks).  Only one EXTRANEOUS stack per entire
	     system is allocated, and it is considered to be in the same logical classification
	     as the hardware stacks.

disadvantages:
	some loss of efficiency
	     doesn't make "pooled" use of hardware  (i.e. allowing several processes frames
		to be in hardware at once.)  This is probably not a good idea anyway for various
		reasons. However, a slight inefficency will occur.  Process switch time will
		still be quite acceptable (by lisp machine standards) anyway.  (Due to the
		necessity to SWAP the special pdl, worst case latency is in principle unbounded
		anyway.)
	     mechanism of call hardware load-unload itself
		a small overhead may occur due to this mapping, etc, as compared to simply
		storing out the hardware in a fixed place, etc.   This is insignificantly
		small and may not ever exist since the code required is comparable.
	     catch and throw.  Storing things out and reloading to implement "stack molesting"
		operations could introduce some overhead.  The case which we might eventually
		want to fix by a "fast bypass" is short THROWs.  If you are THROWing to a place
		only a few frames up the stack, it will be quite a significant overhead to 
		store out the stack group, figure out the THROW, and reload.  This can be sped
		back up by an "fast case" test which looks for the catch frame within the FALCON 
		call hardware.  If found, it would do the right thing, etc.  If the catch-frame
		was not found, then you store out etc.  Note that in any case it is necessary for
		the system to be able to handle the case where you are throwing over more frames
		than will fit in the call hardware (no matter how big that might be unless it was
		capable of being truly arbitrarily large).
	    Copying data  in the EXTRANEOUS stack.  Since the EXTRANEOUS stack is considered to
		be on a level with the hardware stack(s), it is emptied and/or reloaded in parallel
		with the actual hardware.  This data might not need to be copied otherwise.  However,
		the overhead should not be significant since if you are using the extraneous stack,
		those frames are not running ultra fast anyway.  (note that it is not necessary to reload
		the "entire" EXTRANEOUS stack (if that were defined somehow) but only that portion
		that corresponds to the portion loaded in the PHYSICAL call hardware.

shortcomings:   (These dont really have to do with the scheme being proposed here, but are due
		to inherit design deficiencies in FLEABIT.  However, we might as well discuss them
		here.)
		Due to the lack of formatting information in fleabit format stacks, it may not be
		possible to put information in the FLEABIT BIND and EXTRANEOUS stacks in their
		proper frames on "stack group" stack.  Until this is fixed, it will not be
		possible to force a return with the error handler from an arbitrary frame.

		The FLEABIT system does record all the PDL levels CATCH frames.  The STACK
		dumper routine must detect CATCH frames (easily done with a single compare)
		make the corresponding MAIN stack frame contain the EXTRANEOUS pdl info
		up to the level in the CATCH frame.


-- background --

	The falcon hardware implements the following stacks, the hardware frame "heap" (and presumably software extension
	thereof), the EXTRANEOUS PDL (used for overflow from 16 slots and functions of >= K args
	(k= 16 now, this will have to be reduced to 14 or 15), the special stack (a usual lisp binding
	stack).  There is also a hardware return stack (which also contains return-destination and open-and-active
	information) which corresponds logically corresponds with the frame heap.

	Catch frames in the fleabit, ultra-simplified
	A catch frame is opened like any other.  However, it can be recognized by
	the fact the presence of a special marker (currently li:unwind-marker) in register 0
	of that frame.   The rest of the frame is available for containing other info
	necessary to unwind to that level in specific registers.  For example, register 1
	has the special-pdl pointer, register 2 the extraneous stack pointer, etc.
	Altho the details may change, we propose to retain this basic scheme.


	       ==================== Comments ====================

[smh 23sep88]

        "This proposal details how we reformat the various falcon stacks into stacks
        which look very much like the LAMBDA."

   It doesn't, really.  This proposal is mostly about justifications for doing
things this way rather than some other way.  It would be nice if the proposal
listed the _what_ in addition to the _why_, that is:

  - What modules/functions would have to be written?  By whom? (Presumably RG.)
  - Sketch (at least) the data structure definition for a STACK-GROUP.
  - What code would have to be modified to call the new stuff?  What are the
    implications for, say, the trap handler?
  - When should this project be done?  When is the earliest it could start because
    of dependencies on other tasks?  When is the latest it could start because of
    dependencies upon it?
  - How long will this project take to write, debug, and integrate?  While in
    process, might it poterntially cause additional flakines that would affect
    other development?


I don't see why it is ever necessary to copy the extraneous pdl.  The extraneous
pdl is accessed via a general register.  When a stack group is resumed the
extraneous pdl pointer could simply be loaded with the ``current'' pointer saved
in the stack group, with the actual vector of data staying ``in-place'' in the
stack group.  The two problems this leaves to resolve are: (1) detecting and
handling overflows so the extraneous pdl can be grown; and (2) guaranteeing the
gc doesn't move the beast, or if it does, that doing so doesn't break anything.


     mechanism of call hardware load-unload itself
	a small overhead may occur due to this mapping, etc, as compared to simply
	storing out the hardware in a fixed place, etc.   This is insignificantly
	small and may not ever exist since the code required is comparable.

I don't understand this.  It is really unclear what is being compared to what
with regard to efficiency.


     catch and throw.  Storing things out and reloading to implement "stack molesting"
	operations could introduce some overhead.

"Could" should here read "will".  It would be reasonable to claim that the
overhead will be acceptible, and that the more-efficient alternatives would be
very messy to implement, but some evidence to both points should be provided.


In summary, I don't disagree with the conclusion of this proposal, but for
planning purposes it would be better to spell out more of the implementation
details and implications before commencing the , or even accepting the proposal.

====================