Current Status of my SBCL Repository


Merge Status at the End of 2012
Features Merged before SBCL-1.1.3
Probably To Be Merged Later
Probably Never To Be Merged
For Happy Users of my Fork
I have to Apologize...
Current Changes
Thread-Local Data Storage
Foreign Thread Callbacks
Stdcall Callbacks
Memory-Mapped Core on Win32
Foreign Symbol References in Cross-Compiled LISP-OBJS
Splitting GENESIS-2: Compile-Lisp-OBJS, Collect-Refs, Relink-Runtime
The Alternative: Dynamic Resolving of Runtime Symbols
SB-DYNAMIC-CORE Meets LINKAGE-TABLE
Linkage-table on Win32
Surviving Data Execution Prevention
Special Handling of Console FDs in read() and write() equivalents
Backtrace with Foreign Function Names
Most Questionable Hack: Lisp/C FPU Context Switch
GC and GetWriteWatch
32 KiB BACKEND_PAGE_BYTES
Thread Space: Lazy Commit
FD-STREAM Buffers: 64 KiB on Win32
Win32-specific Futex Support
:UCS-2 External Format For Console I/O
GC Internals: Work in Progress

Merge Status at the End of 2012

Since Spring 2012, I was unable to work on SBCL (I managed to make occasional merges, but that's all). Hence no new code, no new ideas... but it turned out to be, in a sense, beneficial — while there was no new development, David Lichteblau started to integrate the entire thing into SBCL mainline!

It was rather non-trivial peace of work, not at all mechanical. Many times he had to refactor my code, moving cross-platform things out of platform-specific files, throwing away garbage, cleaning up the rest. Now I am positively amazed at how much code was accepted. David was very careful not to throw away anything useful.

Remaining differences may be categorized the following way:

Features Merged before SBCL-1.1.3

Probably To Be Merged Later

Probably Never To Be Merged

For Happy Users of my Fork

As you can see, Windows support in the upstream SBCL made a big progress. That's especially true of the upcoming SBCL-1.1.3 release. Don't forget to give it a try when it's available. Please let me know if any differences (upstream vs. fork) affect your use cases, for better or worse, especially if the differences aren't documented here yet.

If you test for some *FEATURES* of my fork, please get ready for some changes (I won't maintain a separate code base for such minor reason):

I have to Apologize...

Two months ago I intended to restore cross-platform SBCL buildability as soon as possible, to stash away non-critical patches (those not directly related to win32 threading), and to provide a version ready for integration. Not that I forsaken those plans (hope to come back to these tasks in January), but I've postponed them significantly.

The explanation is trivial: some ideas that occured to me while I worked with SBCL code seemed so interesting that I felt a great urge to try them out.

There is one good thing about it, however: integration with SBCL upstream didn't become problematic during this time. Some code in my branch is now much cleaner than it used to be, so reintegration may be even easier now.

This text is written to describe present changes in the ``bleeding edge'' branch of my repository (tls63, which is now the default branch), relative to upstream SBCL and to Dmitry Kalyanov's code. For now I'm not yet trying to decide what I propose for integration, or to predict which patches will be useful in the long run.

Only the changes done on purpose are described here; bugs and other unintended things don't belong to this document.

Current Changes

Thread-Local Data Storage

Tls63 branch started with the idea to keep thread-specific data not in the arbitrary data slot in NT TIB, but in the `legally owned' place allocated by TlsAlloc().

Slot 63 is the highest TLS slot unconditionally available in TIB (slots above are allocated on demand when TlsSetValue is called). The offset of slot 63 in TIB is fixed (of course, modulo the `unofficial but stable' status of NT TIB layout itself). Machine code accessing slot 63 is as simple as one accessing arbitrary data slot.

TLS slots are allocated from lower to higher ones. SBCL runtime has several DLL dependencies; DllMain() entry points of those DLLs may allocate some TLS slots before SBCL's main() starts.

Current SBCL runtime in tls63 branch relies on slot 63 being free when runtime's main() is called. It's almost guaranteed to be this way with our current DLL dependencies (among which kernel32 and ws2_32 have special areas in TIB for their use, so they don't use TLS slots at all). If slot 63 happens to be busy, SBCL runtime will refuse to start.

If slot 63 is free, SBCL runtime allocates it (there is no official way to allocate TLS slot with known index; hence SBCL allocates all free slots up to 63, then frees lower slots).

Summary. Instead of the TIB arbitrary data slot, SBCL runtime in tls63 branch uses the resource with a known ownership protocol; while conflicts are theoretically possible, undetected conflicts aren't.

Foreign Thread Callbacks

What happens when SBCL runtime function is called in a thread that was not created by SBCL (and isn't the initial thread "adopted" by it)?

Real (upstream) SBCL does something smart in the special case of signal handlers: signal is automaticall delivered to one of SBCL native threads (it has nothing to do with win32: no signals there).

As of FFI callback functions, defined using SB-ALIEN::DEFINE-ALIEN-CALLBACK or SB-ALIEN::ALIEN-LAMBDA, calling them in foreign threads is guaranteed to fail in real SBCL.

Tls63 branch on win32 is different: it provides support for calling alien callbacks in foreign threads. Unfortunately, porting the solution to other platforms is non-trivial; only partially it's caused by my (mis)design decisions — there are some objective reasons why the problem should be solved differently on win32 and on other platforms.

Why bother? Well, win32 is special here — as usual. Even some trivial system programming tasks, like writing "services" (daemons), use foreign thread callbacks on this platform.

From the user's point of view, foreign thread callback support doesn't look complicated: when a foreign thread calls Lisp callback for the first time, it becomes known to SBCL; after that, it's visible in (SB-THREAD:LIST-ALL-THREADS) results (its automatically set THREAD-NAME reflects the fact that it's an autodetected foreign thread, and includes system thread identifier). JOIN-THREAD may wait for foreign thread exit, and THREAD-ALIVE-P may be used to test if a foreign thread already exited.

Internals of the foreign thread callback support are more complicated: the basis for win32 implementation is MS Windows system support for userspace-scheduled thread-like things called fibers. I've modified pthreads_win32 module, so it supports fibers as well as threads, assigning distinct pthread_self() identity to both kinds of objects. When foreign thread callback is detected, two things happens:

Further discussion of foreign thread callback implementation internals is outside the scope of this document.

Stdcall Callbacks

MS Windows interfaces use callbacks frequently, to the extent of looking bizarre to an experienced Unix programmer. That's why foreign thread callbacks, described in the section above, are important.

There is another sad story about callbacks, totally unrelated to threads. Most WinAPI functions on X86 use a calling convention named stdcall (arguments are passed in the stack, called function removes them before returning). Usual default convention of X86 C compires is called cdecl (arguments are passed in the stack, too, but called function doesn't remove them).

SBCL foreign function interface for X86 is designed to call external functions without requiring convention specification. SBCL will just do the right thing when the foreign function is either stdcall or cdecl. The method used to achieve it is simple and elegant: SBCL grants to all foreign functions an unquestionable right to clobber %ESP register. Code generated by SBCL for foreign C function calls normally saves %ESP before the call and restores it afterwards.

Unfortunately, there is still a problem with callback calling convention, and it cannot be solved without some way to specify a convention (it's impossible to determine if our caller expects us to clean up the stack).

In SBCL upstream, all alien callbacks are cdecl. Consequently, on Windows we have a problem: not only WinAPI functions are defined to be stdcall, but almost all callbacks that they invoke are required to be stdcall as well.

I decided to provide a way of specifying stdcall convention for alien callbacks.

It's interesting that the same problem was solved once by Alastair Bridgewater. He added calling convention to the callback specification syntax; when two callbacks differ only in calling convention, they had exactly the same alien function type.

I decided to go a bit further: in tls63 code, alien function type contains calling convention specification. NIL convention supposed to mean ``universal'' for callouts and ``cdecl'' for callbacks.

:STDCALL and :CDECL keywords may be used to specify callback convention explicitly; currently, they affect X86 code generation, but silently ignored on other platforms.

The greatest problem I faced when working on this thing was mostly an aestetic one: how to add convention spec to alien function type syntax without breaking compatibility? So far, my chosen syntax looks like this:

As you see above, calling convention is specified by replacing result type with a list of convention keyword and result type. This syntax is deceptive, in a sense: it looks like if convention belonged to result type, but it doesn't. There is no such alien type as (:STDCALL INT).

At least, my chosen syntax is not ambigious (it would become ambigious if (:STDCALL INT) were parsable into an alien type spec; but defining alien type parsers is outside public SBCL interface).

One thing that may change in the future: alien function types with the same result type and the same argument types, but different conventions, are now disjoint.

Memory-Mapped Core on Win32

Every day when I was reading SBCL sources, I pondered upon os_map() comment in win32-os.c: we copy core file data instead of mapping because "Windows semantics completely screws this up". As turned out after some experiments, the word completely is too harsh; I've managed to implement copy-on-write mapping of core files — and it was not the hardest of all things mentioned in this document; e.g. foreign thread callbacks were certainly harder.

MS Windows provides CreateFileMapping and MapViewOfFile functions; together, they resemble our familiar mmap(). Here is the list of problems that had to be solved for core file mapping:

I decided to simplify the implementation by restricting memory-mapping to dynamic space; other spaces are always copied. Their size is unnoticeable when compared to dynamic space size, so why bother?

User-visible results of this change:

Foreign Symbol References in Cross-Compiled LISP-OBJS

When I worked on some IO-related stuff, requiring me to modify src/code/win32.lisp frequently, one thing was annoying me, build after build: when I added foreign function reference to Lisp source, and this function wasn't used in runtime C code, SBCL failed to build.

Current method of resolving foreing references used in SBCL upstream requires any foreign symbol mentioned in cross-compiled Lisp files to be referenced by SBCL runtime too. Details are different for different platforms: Unix has ldso-stubs.S (and the tool to regenerate it), Win32 has win32-os.c:scratch() function.

After some days of that scratch()-induced torture, I decided to reshape SBCL's build process in a way that would remove any manual maintenance requirement for foreign symbol references. Of course, the first place where I looked for some guidance was the aforementioned tool for generating ldso-stubs.S. It's hard to describe my disappointment when I've found the same hand-written foreign function lists in tools-for-build/ldso-stubs.lisp.

Splitting GENESIS-2: Compile-Lisp-OBJS, Collect-Refs, Relink-Runtime

My first method of avoiding win32-os.c:scratch() maintenance was rather simple: do genesis-2 in two passes. The first pass (called genesis-1a in my code) cross-compiles Lisp sources into obj/from-xc; then it cold-loads resulting objects, with fixup-resolving FOP redefined to collect :foreign fixups.

After the first pass, foreign symbol names (with FOREIGN_SYMBOL_REFERENCE() around) are put into src/runtime/gen1a-undefs. Runtime executable is linked after this file is ready; one of its source files (win32-stubs.S) includes gen1a-undefs, after defining FOREIGN_SYMBOL_REFERENCE preprocessor macro, so it expands into some stuff like ``.long ForeignFunctionName'' (assembler is preferred to C here because the latter requires some heavy GCCisms to refer a name like CloseHandle@4).

Now runtime is ready; its symbol table is in sbcl.nm. The second half of genesis-2 (retaining the name genesis-2 in my branch) reads sbcl.nm symbol table, then cold-loads every lisp-obj again. This time, thanks to gen1a-undefs, every foreign fixup is resolved, and genesis-2 generates output/cold-sbcl.core. From this point on, build process continues in an unmodified way, like in upstream SBCL.

As of Unix platforms, reusing genesis-1a output described above is as easy as redefining FOREIGN_SYMBOL_REFERENCE to LDSO_STUBIFY. The only doubt I have on this approach is this:

The Alternative: Dynamic Resolving of Runtime Symbols

The method described above continues to work. The second method described below may provide some important goodies to someone who is developing SBCL runtime, continuously changing and recompiling it (that's why I prefer the second method for my own builds).

As long as foreign symbols in Lisp-OBJS are resolved using the information from src/runtime/sbcl.nm, modified runtime rebuild always requires core regeneration. Imported static symbol addresses cannot remain the same in the new core except by coincidence. What if we use some indirection method for linking against runtime — something like linkage-table used for shared objects? It turned out to be possible; win32 builds from tls63 branch with :sb-dynamic-core enabled work exactly this way.

With :sb-dynamic-core, genesis-2 doesn't use any information about real runtime symbols when cold-sbcl.core is created. Each foreign fixup in Lisp-OBJS is ``resolved'' to an address from the special memory area. Names of foreign symbols ``resolved'' this way are collected into a list; this list ends up in cold-sbcl.core as a SYMBOL-VALUE of SB-VM::*REQUIRED-RUNTIME-C-SYMBOLS*.

For :sb-dynamic-core builds, SB-VM::*REQUIRED-RUNTIME-C-SYMBOLS* is a newly-introduced static symbol; its (constant) address is available to runtime among other GENESIS data.

Here is the most unusual part of the story: real name resolving is done by the runtime itself. Upon startup, the runtime iterates over (SYMBOL-VALUE SB-VM::*REQUIRED-RUNTIME-C-SYMBOLS*); each name in this list is resolved, and the result is stored (in a manner resembling linkage-table implementanion) in the special memory area mentioned above. The order of symbols in the list is the same as the order of address allocation in the special memory area; that's why each symbol address is stored in the place where the core expects to find exactly this symbol.

How is a foreign symbol name resolved by the runtime? Well, it's a banality: runtime calls GetProcAddress(). Oh, some missing details:

The detail omitted earlier for the sake of simplicity: exactly as for the real linkage-table, we have to distinguish function references from variable references; the latter kind also requires some special handling in the client code (i.e. in the core itself). These two requirements are satisfied by modified FOREIGN-SYMBOL-SAP VOP: its xc-host version (with :sb-dynamic-core enabled) now generates :foreign-dataref fixups, that are normally disallowed in cross-compiled Lisp code. SB-VM::*REQUIRED-RUNTIME-C-SYMBOLS* is not just a list of names; for each name, GENESIS-2 records an additional value to distinguish data and function references.

Summary. When sb-dynamic-core is enabled, recompiled runtime continues to work with earlier-generated cores; however, two requirements for core and runtime compatibility are still in place:

SB-DYNAMIC-CORE Meets LINKAGE-TABLE

Previous section is obsolete in one respect. There is no ``special VM area'' for dynamic linking against the runtime now: all runtime symbols are registered in the common linkage-table.

With :SB-DYNAMIC-CORE enabled, therefore, there is no ``static symbols'' that require special cases here and there; all foreign symbols are linked the same way. The difference is restricted to initialization of linkage-table entries: the Lisp code managing foreign libraries can't run before the runtime is available, so the runtime has to ``preseed'' linkage-table by itself.

Linkage-table on Win32

In SBCL upstream, linkage-table support for win32 is very close to being complete. A couple of places related to linkage-table are unnecessary conditionalized on #!-win32: e. g. (update-linkage-table) just works on win32, if only permitted to try.

Long before my experimentation with runtime linking methods described in the previous section, I ensured that #!(and win32 linkage-table) support is on par with other platforms. Error handling for undefined alien variable references is among the things I had to add, but its reimplementation for win32 turned out to be a trivial task.

Some code related to linkage-table, ex. (LOAD-SHARED-OBJECT), is problematic for win32, but the problems aren't win32-specific: overlooking dlopen/dlclose refcounting issues prevents library unloads, but the nature and presence of this problem doesn't change at all when LoadLibrary() and FreeLibrary() become the real functions behind dlopen/dlclose.

When I started to build runtime executable with an export table, I had to remove #!-win32 conditionalizations of *RUNTIME-DLHANDLE* operations. *RUNTIME-DLHANDLE* has very close win32 equivalent, namely, the result of GetModuleHandle(NULL). In current tls63, initialization and usage of *RUNTIME-DLHANDLE* is done on win32 in the same way as on a typical Unix platform (well, except a no-op instead of dlclose'ing it: GetModuleHandle doesn't affect refcount, while dlopen(NULL) does).

Surviving Data Execution Prevention

Data Execution Prevention (DEP) is an optional feature of modern MS Windows OSes, intended to protect running programs against some common methods of malicious code injection. When DEP is available, system administrator may disable it, enable it for some system services, or enable it for any executable (minus list of exceptions). Of course, any program that fails when DEP is enabled may be added to aforementioned list of exceptions; however, as a rule, application developers don't express too much enthusiasm when an executable suddenly requires system administrator intervention to start working. That's why successfully working with DEP enabled would be a good thing for SBCL.

SBCL port for Windows has only one problem with DEP (solved in tls63). Though ``Data Execution'' is exactly the thing SBCL is doing all the time (and also the thing Common Lisp is all about), DEP doesn't complain, because all those data are explicitly marked executable. However, when SBCL registers an assembly routine called UWP-SEH-HANDLER as a SEH handler, DEP presence causes SBCL to die when the handler is about to be invoked. And it is invoked on any unwinding through UNWIND-PROTECT — quite common thing to happen.

Probably, once upon a time, DEP authors either anticipated or experienced some technique of DEP circumvention involving unexpected SEH handler installation. While the main functionality of DEP is easy to understand (for CPUs with NX bit: set NX bit for non-executable pages — end of story), additional layers of ``DEP circumvention prevention'' may bring some surprises, as we can see. Here is the bottom line: DEP normally requires SEH handler function to be in an executable section of an EXE or a DLL.

After some hours of striving with RtlUnwind, I've finally found some information about the aforementioned restriction. After that I added a similar SEH handler implementation to src/runtime/x86-assem.S, and made (SET-UNWIND-PROTECT) install the new handler. DEP accepts its new location as ``safe''; tls63 build of SBCL doesn't die anymore.

Special Handling of Console FDs in read() and write() equivalents

MSVCRT provides _read() and _write() functions, that are used in SBCL upstream as a drop-in replacement for Unix system calls (or libc wrappers around them). When Dmitry Kalyanov added win32 threading support to his branch of SBCL (that is the basis of my own one), he discovered that MS Windows synchronous I/O has a significant restriction: outstanding _read() blocks attempted _write() (or, to be precise, outstanding ReadFile() blocks attempted WriteFile(): this restriction is not related to MSVCRT ``lowio'' layer; its ultimate cause is the NT kernel, and NtReadFile(), NtWriteFile() are affected as well).

Multiprocessing Lisp code frequently uses separate threads for reading and writing the same stream, and never expects write operation to block until read operation completes. SLIME in multithreaded mode contains such code; SLIME is also the most obvious candidate for initial testing of threading support.

Dmitry made an initial reimplementation of _read() and _write() to prevent mutual blocking of reader and writer threads. That's how win32_unix_read() and win32_unix_write() first appeared in win32-os.c.

My acquaintance with SBCL runtime started with those two functions; my first changes were to make them use OVERLAPPED I/O mode when it's possible for a given FD (i.e. for an underlying file handle).

Later I worked on those functions to make them interruptable (such that SB-THREAD:INTERRUPT-THREAD makes I/O call in the target thread to return EINTR; in a typical case the flow of control leaves WITHOUT-INTERRUPTS section in the guts of REFILL-INPUT-BUFFER, and pending thread interruptions have a chance to run (unless interrupts are disabled around REFILL-INPUT-BUFFER as well).

Until recently, win32_unix_read() and win32_unix_write() were interruptible only when a subject file supported OVERLAPPED I/O. One recent step further: console device reading.

On MS Windows (as of NT family, which is the only realistic target of SBCL win32 port), console device handles differ from all other file or device handles in many ways. Normal handles refer to NT kernel entities (think "NTDLL" and "system calls"); console handles are internal to KERNEL32.DLL (think "userspace", despite the misleading library name). KERNEL32.DLL tries to create an illusion that console handlers aren't different:

Console handles are now handled specially by win32_unix_read and win32_unix_write:

Backtrace with Foreign Function Names

Lisp backtrace inevitable includes some foreign functions, at least when C-STACK-IS-CONTROL-STACK. Typical backtrace item corresponding to such a function sets a new standard of informativeness and beauty:

("foreign function: #x40CDD0")

Some evil people among SBCL developers have already started an effort to turn this hacker's delight into some boring thing, like a function name (search SBCL sources for SAP-FOREIGN-SYMBOL to learn further details). It's interesting that I can't imagine a single case when (upstream) SAP-FOREIGN-SYMBOL successfully finds the name of C function that called our Lisp code: it tries to find an address range enclosing the argument in the linkage table space, and if it ever succeeded with a return address from the frame pointer chain, it would mean that the CALL instruction is contained in linkage table. Well, maybe I've overlooked something here.

I added another chunk of code to SAP-FOREIGN-SYMBOL (currently conditionalized on #!+win32). Among foreign symbols exported by the runtime, it finds the highest address that is still lower than the argument address. Foreign symbol corresponding to the address found is taken as a foreign function name, and the difference between the argument and the search result is taken as an offset within that function. Name and offset are formatted into a string.

When our (ancestor) caller happens to be a C function from SBCL runtime, (backtrace) result items now look like this (last two items):

22: (SB-THREAD::INITIAL-THREAD-FUNCTION)
23: ("foreign function: call_into_lisp +#x6C")
24: ("foreign function: funcall0 +#x2D")

My code in (SAP-FOREIGN-SYMBOL) may be improved in many ways. Well, at least it finds something, and its result is correct in the common case when the caller is, indeed, an exported function of SBCL runtime.

Most Questionable Hack: Lisp/C FPU Context Switch

C calling conventions on X86 requires FPU stack to be empty on call entry and provides 0 or 1 values on call exit (the latter is the case when the function result is returned in FPU stack). SBCL code, when it's working with X86 FPU, requires FPU stack to be full (no empty registers as described by the tag word).

SBCL uses FPU rather frequently during its normal work, even when the task running has no explicit floating-point calculations; hash table internals use FP operations, for example. On the other hand, a vast majority of runtime C code, as well as a lot of available libraries, don't use FPU at all (i.e. they are unaffected by FPU stack and don't modify it).

I had a bad feeling about those FLDZs and FSTPs on each foreign call. Assuming that Windows handles contexts switches like other modern OSes on X86, there are two possible scenarios: either it sees SBCL as eligible for lazy FPU context loading (thus setting TS bit in CR0 as SBCL is running), or it doesn't (thus making SBCL the `owner' of the FPU instantly). While it goes unnoticeable when no other tasks are actively using FPU, it could cost a couple of context switches per C call entry/exit when there are such tasks (even SBCL's own threads).

The right thing to do is, of course, fixing SBCL compiler so it doesn't require full FPU stack. Unfortunately, back then I was more familiar with SEH and X86 FPU than with SBCL compiler internals; that's why I added some exception-handling code to do an equivalent of lazy FPU switching in userspace.

The idea is that when :INVALID trap is enabled, we catch both Lisp attempt to pop from the empty FPU stack and C attempt to push into the full one. The most tricky part of the solution is recovering after FPU exception, as it can't be restarted by simply returning ExceptionContinueExecution: failed FPU opcode should be retrieved and reinterpreted/restarted separately.

As long as no one disables the :INVALID trap, the solution works as intended, allowing Lisp code to run with empty FPU stack (and C code to run with full FPU stack) until CPU hits a FP instruction.

This part of code is to be dropped without regret as soon as the compiler is fixed.

GC and GetWriteWatch

Windows'2000 SP3 and above provide an interesting feature: if we reserve memory with MEM_WRITE_WATCH flag, all written-to pages in the reserved region are tracked. GetWriteWatch() function may be used to retrieve a list of dirty pages. This API is documented as intended for GCs, and actually used, when available, e.g. in Boehm GC and in .NET.

Tls63 branch now detects GetWriteWatch presence and uses it for the same purpose, instead of write-protecting pages from the dynamic space and then trapping into SEH in order to unprotect written page. GC requests for write protection are translated into ResetWriteWatch(), clearing the list of written-to addresses; at the beginning of collect_garbage(), all written-to pages that are `write protected' this way (per GC page table) are marked as unprotected.

As far as I remember, write-tracking is not just a sugar on the same wp-fault-unwp sequence (done in kernel), but a CPU-level feature, modifying page table on first write without causing any exception. There are several possible strategies of using this API; the one I've selected is not necessary the best.

Unfortunately, write tracking is unavailable for memory regions populated with MapViewOfFile(); therefore it's used for all dynamic-space pages above the memory-mapped core, while the traditional SEH (and real write-protection) is used inside the memory-mapped core space. There are some alternatives to consider here as well.

32 KiB BACKEND_PAGE_BYTES

Following SBCL upstream changes for other OSes, I switched to 32KiB page size as a default for Win32, fixing the code that assumed os_vm_page_size to be a system page size. If code simplicity were a priority, I would make it 64 KiB — the allocation unit. However, I considered debugging and testing the case of all-three-unequal BACKEND_PAGE_BYTES, dwPageSize, and dwAllocationGranularity a good thing in itself.

Thus there is no restriction on BACKEND_PAGE_BYTES by allocation granularity: everything up from 4KiB is still available.

Thread Space: Lazy Commit

Tls63 branch allocates the chunk around struct thread (i.e. dynamic values, alien space, binding stack...) using MEM_RESERVE, precommitting only a couple of pages here and there when they're known to be used. All the rest is committed by the exception handler on demand.

As of the control stack, that can't be the part of struct thread on win32, this branch:

There is a problem with lazy-commit approach to alien stack: some system functions verify user buffers and reject uncommitted memory without running our exception handler. The dual workaround for it is provided (1) by the VOP used for allocation: it always `touches' the topmost byte allocated, (2) by the exception handler: for a page fault in alien stack, all intermediate pages up to the bottom are committed as well.

FD-STREAM Buffers: 64 KiB on Win32

File buffer allocation code (the same in tls63 and the upstream SBCL) ends up in os_validate, thus having 64 KiB allocation granularity. Until it's modified to use malloc() instead, smaller FD stream buffers on Win32 are wasting space (15/16 of it for the original default of 4096 bytes).

For now, tls63 uses 64 KiB as a default buffer size. Switching to malloc() is equally easy, of course (and probably should be done anyway).

Win32-specific Futex Support

When I though about SBCL's WAITQUEUE and MUTEX, implemented with futex_wait() and futex_wake(), implemented with pthread_mutex() pthread_cond_timedwait(), implemented with (...) some Win32 API, there seemed to be way too many layers. Implementing Win32 futexes resulted in better performance and (more importantly) better support for interrupts: there is no `uninterruptible quantum' in the new futex implementation, and pthread_kill(), now being part of the same module, is able to do its best.

:UCS-2 External Format For Console I/O

Done exactly as planned and stated in the initial announce page.

GC Internals: Work in Progress

Current implementation of safepoints, interrupts, and GC signalling in Tls63 departed significantly from the original Dmitry Kalyanov's code. It should be considered an experiment, with the possibility that the result will be eventually thrown away, rolling it back to the original.

However, there are some logical changes, discovered during this experiment, that we should apply to the original code in this case: