I encountered Henry G. Baker's COMFY compiler a couple years ago. (Baker's site contains a text article, a TeX format article, and an Emacs Lisp implementation. The ACM published the two articles COMFY theory and COMFY-65) COMFY is not a very high-level language, in the conventional sense, but is an attempt to provide a clean but simple set of control structures on top of conventional machine code. Baker calls it 'medium-level.'
The resulting compiler is very small; its main task in life is to automate the generation of branch instructions, which is one of the tedious parts of tightly optimizing assembly programs. Yet it allows as well for arbitrary Lisp-style macros.
I found the concept intriguing, but I found the article and the compiler code itself rather obscure. Part of the obscurity is the unconventional names for the 6502 operations, unconventional notation for the addressing modes, and decimal numbers for opcodes (because Emacs Lisp does not accept other radixes). Another is that the implementation is lean-and-mean, emitting code bytes directly into a destination vector---the only output to look at is a vector filled with decimal numbers. The manipulations on the 6502 opcodes take advantage of the low-level bit patterns without explanation. Finally, it uses the term 'continuation', which, no matter how many times I think I understand it, scares me, probably because it introduces a highly abstract term into an area like assembly programming which is relentlessly concrete.
I've been learning about it by going through the code, restructuring a bit as I go. I'm beginning to be impressed by the subtlety of the code. One thing that is just dawning on me is that the return values of the code-emitting functions is as important as the side-effect. (To those of you chuckling, you see how far I have yet to go.) Lisp can hide that from you when it "looks imperative."
My longer term goal is to see if the same technique can be fruitful even within the more restrictive limits of the PIC. In order to get there, I'm going to have to gain confidence that I understand the formal concept of the 'win' and 'lose' continuations, which means, I think, having to come up with legible examples that compile to 6502 code, and use that to firm up my understanding of the compiler. Finally, I'll try to code the 'genbrc' routine for PIC branches.
For the "legible examples" part, I think I will try to break the lean-and-mean single-pass direct-to-binary compiler into a "compile to a vector or list with symbolic 6502 mnemonics" followed by a very simple 6502 assembly pass. A similar enhancement might accumulate a relocation table to allow more flexible linking. (I'm still not used to the idea that the code is emitted starting in high memory and working down.) I've begun abstracting out the addressing mode bit manipulations, and probably will add error-checking to make sure invalid opcodes are not generated by mistake.
One thing that I think that will have to go in the PIC version is the shallow binding mentioned in the TeX column, but not in the text version; without a stack, I'm not sure where to save anything. I'm also just a bit worried that the PIC has so many idiosyncracies (e.g., register pages controlled by special flags, instead of a uniform zero-page) that even a medium-level language doesn't help much.
(I should mention in passing Frode Vatvedt Fjeld's nifty little Lisp code to generate PIC instructions from the bit chart.)
[UPDATE: I should also mention a COMFY-based assember for x86, implemented in Scheme, called `Sassy', although I have never tried to use it.]