Since Spring 2012, I was unable to work on SBCL (I managed to make occasional merges, but that's all). Hence no new code, no new ideas... but it turned out to be, in a sense, beneficial — while there was no new development, David Lichteblau started to integrate the entire thing into SBCL mainline!
It was rather non-trivial peace of work, not at all mechanical. Many times he had to refactor my code, moving cross-platform things out of platform-specific files, throwing away garbage, cleaning up the rest. Now I am positively amazed at how much code was accepted. David was very careful not to throw away anything useful.
Remaining differences may be categorized the following way:
As you can see, Windows support in the upstream SBCL made a big progress. That's especially true of the upcoming SBCL-1.1.3 release. Don't forget to give it a try when it's available. Please let me know if any differences (upstream vs. fork) affect your use cases, for better or worse, especially if the differences aren't documented here yet.
If you test for some *FEATURES*
of my fork, please get ready for some
changes (I won't maintain a separate code base for such minor reason):
SB-GC-SAFEPOINT
becomes SB-SAFEPOINT
SF-SAFEPOINT
too, migration is trivial.FDS-ARE-WINDOWS-HANDLES
disappears
SB-WIN32:GET-OSFHANDLE
and SB-WIN32:OPEN-OSFHANDLE
to
abstract away. They became noops when IO migrated to kernel32.ALIEN-CALLBACK-CONVENTIONS
, ALIEN-CALLBACK-STDCALL
,
ALIEN-CALLBACK-CDECL
are local for now, but maybe they will get
merged. To work everywhere, check (FBOUNDP
'SB-ALIEN::ALIEN-FUN-TYPE-CONVENTION)
instead (or use some other
similar hack).Two months ago I intended to restore cross-platform SBCL buildability as soon as possible, to stash away non-critical patches (those not directly related to win32 threading), and to provide a version ready for integration. Not that I forsaken those plans (hope to come back to these tasks in January), but I've postponed them significantly.
The explanation is trivial: some ideas that occured to me while I worked with SBCL code seemed so interesting that I felt a great urge to try them out.
There is one good thing about it, however: integration with SBCL upstream didn't become problematic during this time. Some code in my branch is now much cleaner than it used to be, so reintegration may be even easier now.
This text is written to describe present changes in the ``bleeding edge'' branch of my repository (tls63, which is now the default branch), relative to upstream SBCL and to Dmitry Kalyanov's code. For now I'm not yet trying to decide what I propose for integration, or to predict which patches will be useful in the long run.
Only the changes done on purpose are described here; bugs and other unintended things don't belong to this document.
Tls63 branch started with the idea to keep thread-specific data not in the arbitrary data slot in NT TIB, but in the `legally owned' place allocated by TlsAlloc().
Slot 63 is the highest TLS slot unconditionally available in TIB (slots above are allocated on demand when TlsSetValue is called). The offset of slot 63 in TIB is fixed (of course, modulo the `unofficial but stable' status of NT TIB layout itself). Machine code accessing slot 63 is as simple as one accessing arbitrary data slot.
TLS slots are allocated from lower to higher ones. SBCL runtime has several DLL dependencies; DllMain() entry points of those DLLs may allocate some TLS slots before SBCL's main() starts.
Current SBCL runtime in tls63 branch relies on slot 63 being free when runtime's main() is called. It's almost guaranteed to be this way with our current DLL dependencies (among which kernel32 and ws2_32 have special areas in TIB for their use, so they don't use TLS slots at all). If slot 63 happens to be busy, SBCL runtime will refuse to start.
If slot 63 is free, SBCL runtime allocates it (there is no official way to allocate TLS slot with known index; hence SBCL allocates all free slots up to 63, then frees lower slots).
What happens when SBCL runtime function is called in a thread that was not created by SBCL (and isn't the initial thread "adopted" by it)?
Real (upstream) SBCL does something smart in the special case of signal handlers: signal is automaticall delivered to one of SBCL native threads (it has nothing to do with win32: no signals there).
As of FFI callback functions, defined using SB-ALIEN::DEFINE-ALIEN-CALLBACK or SB-ALIEN::ALIEN-LAMBDA, calling them in foreign threads is guaranteed to fail in real SBCL.
Tls63 branch on win32 is different: it provides support for calling alien callbacks in foreign threads. Unfortunately, porting the solution to other platforms is non-trivial; only partially it's caused by my (mis)design decisions — there are some objective reasons why the problem should be solved differently on win32 and on other platforms.
Why bother? Well, win32 is special here — as usual. Even some trivial system programming tasks, like writing "services" (daemons), use foreign thread callbacks on this platform.
From the user's point of view, foreign thread callback support doesn't look complicated: when a foreign thread calls Lisp callback for the first time, it becomes known to SBCL; after that, it's visible in (SB-THREAD:LIST-ALL-THREADS) results (its automatically set THREAD-NAME reflects the fact that it's an autodetected foreign thread, and includes system thread identifier). JOIN-THREAD may wait for foreign thread exit, and THREAD-ALIVE-P may be used to test if a foreign thread already exited.
Internals of the foreign thread callback support are more complicated: the basis for win32 implementation is MS Windows system support for userspace-scheduled thread-like things called fibers. I've modified pthreads_win32 module, so it supports fibers as well as threads, assigning distinct pthread_self() identity to both kinds of objects. When foreign thread callback is detected, two things happens:
Further discussion of foreign thread callback implementation internals is outside the scope of this document.
MS Windows interfaces use callbacks frequently, to the extent of looking bizarre to an experienced Unix programmer. That's why foreign thread callbacks, described in the section above, are important.
There is another sad story about callbacks, totally unrelated to
threads. Most WinAPI functions on X86 use a calling convention named
stdcall
(arguments are passed in the stack, called function removes
them before returning). Usual default convention of X86 C compires is
called cdecl
(arguments are passed in the stack, too, but called
function doesn't remove them).
SBCL foreign function interface for X86 is designed to call external
functions without requiring convention specification. SBCL will just
do the right thing when the foreign function is either stdcall
or
cdecl
. The method used to achieve it is simple and elegant: SBCL
grants to all foreign functions an unquestionable right to clobber
%ESP
register. Code generated by SBCL for foreign C function calls
normally saves %ESP
before the call and restores it afterwards.
Unfortunately, there is still a problem with callback calling convention, and it cannot be solved without some way to specify a convention (it's impossible to determine if our caller expects us to clean up the stack).
In SBCL upstream, all alien callbacks are cdecl
. Consequently, on
Windows we have a problem: not only WinAPI functions are defined to be
stdcall
, but almost all callbacks that they invoke are required to be
stdcall
as well.
I decided to provide a way of specifying stdcall
convention for alien
callbacks.
It's interesting that the same problem was solved once by Alastair Bridgewater. He added calling convention to the callback specification syntax; when two callbacks differ only in calling convention, they had exactly the same alien function type.
I decided to go a bit further: in tls63 code, alien function type
contains calling convention specification. NIL
convention supposed to
mean ``universal'' for callouts and ``cdecl'' for callbacks.
:STDCALL
and :CDECL
keywords may be used to specify callback
convention explicitly; currently, they affect X86 code generation, but
silently ignored on other platforms.
The greatest problem I faced when working on this thing was mostly an aestetic one: how to add convention spec to alien function type syntax without breaking compatibility? So far, my chosen syntax looks like this:
(ALIEN-LAMBDA INT ((x INT)) (1+ x))
;; traditional spec, NIL
convention(ALIEN-LAMBDA (:STDCALL INT) ((x INT)) (1+ x))
;; the same function, but :STDCALL
(ALIEN-LAMBDA (:CDECL INT) ((x INT)) (1+ x))
;; the same function, but :CDECL
(CAST ... (FUNCTION (:STDCALL INT) INT))
;; standalone function type spec.As you see above, calling convention is specified by replacing result
type with a list of convention keyword and result type. This syntax is
deceptive, in a sense: it looks like if convention belonged to result
type, but it doesn't. There is no such alien type as (:STDCALL INT)
.
At least, my chosen syntax is not ambigious (it would become ambigious
if (:STDCALL INT)
were parsable into an alien type spec; but defining
alien type parsers is outside public SBCL interface).
One thing that may change in the future: alien function types with the same result type and the same argument types, but different conventions, are now disjoint.
Every day when I was reading SBCL sources, I pondered upon os_map() comment in win32-os.c: we copy core file data instead of mapping because "Windows semantics completely screws this up". As turned out after some experiments, the word completely is too harsh; I've managed to implement copy-on-write mapping of core files — and it was not the hardest of all things mentioned in this document; e.g. foreign thread callbacks were certainly harder.
MS Windows provides CreateFileMapping and MapViewOfFile functions; together, they resemble our familiar mmap(). Here is the list of problems that had to be solved for core file mapping:
I decided to simplify the implementation by restricting memory-mapping to dynamic space; other spaces are always copied. Their size is unnoticeable when compared to dynamic space size, so why bother?
User-visible results of this change:
When I worked on some IO-related stuff, requiring me to modify
src/code/win32.lisp
frequently, one thing was annoying me, build after
build: when I added foreign function reference to Lisp source, and
this function wasn't used in runtime C code, SBCL failed to build.
Current method of resolving foreing references used in SBCL upstream
requires any foreign symbol mentioned in cross-compiled Lisp files to
be referenced by SBCL runtime too. Details are different for different
platforms: Unix has ldso-stubs.S
(and the tool to regenerate it),
Win32 has win32-os.c:scratch()
function.
After some days of that scratch()-induced torture, I decided to
reshape SBCL's build process in a way that would remove any manual
maintenance requirement for foreign symbol references. Of course, the
first place where I looked for some guidance was the aforementioned
tool for generating ldso-stubs.S
. It's hard to describe my
disappointment when I've found the same hand-written foreign function
lists in tools-for-build/ldso-stubs.lisp
.
My first method of avoiding win32-os.c:scratch()
maintenance was
rather simple: do genesis-2 in two passes. The first pass (called
genesis-1a
in my code) cross-compiles Lisp sources into obj/from-xc
;
then it cold-loads resulting objects, with fixup-resolving FOP
redefined to collect :foreign fixups.
After the first pass, foreign symbol names (with
FOREIGN_SYMBOL_REFERENCE()
around) are put into
src/runtime/gen1a-undefs. Runtime executable is linked after this file
is ready; one of its source files (win32-stubs.S
) includes
gen1a-undefs
, after defining FOREIGN_SYMBOL_REFERENCE
preprocessor
macro, so it expands into some stuff like ``.long
ForeignFunctionName'' (assembler is preferred to C here because the
latter requires some heavy GCCisms to refer a name like
CloseHandle@4).
Now runtime is ready; its symbol table is in sbcl.nm
. The second half
of genesis-2
(retaining the name genesis-2
in my branch) reads sbcl.nm
symbol table, then cold-loads every lisp-obj again. This time, thanks
to gen1a-undefs, every foreign fixup is resolved, and genesis-2
generates output/cold-sbcl.core
. From this point on, build process
continues in an unmodified way, like in upstream SBCL.
As of Unix platforms, reusing genesis-1a
output described above is as
easy as redefining FOREIGN_SYMBOL_REFERENCE
to LDSO_STUBIFY
. The only
doubt I have on this approach is this:
The method described above continues to work. The second method described below may provide some important goodies to someone who is developing SBCL runtime, continuously changing and recompiling it (that's why I prefer the second method for my own builds).
As long as foreign symbols in Lisp-OBJS are resolved using the
information from src/runtime/sbcl.nm
, modified runtime rebuild always
requires core regeneration. Imported static symbol addresses cannot
remain the same in the new core except by coincidence. What if we use
some indirection method for linking against runtime — something like
linkage-table used for shared objects? It turned out to be possible;
win32 builds from tls63 branch with :sb-dynamic-core enabled work
exactly this way.
With :sb-dynamic-core, genesis-2
doesn't use any information about
real runtime symbols when cold-sbcl.core
is created. Each foreign
fixup in Lisp-OBJS is ``resolved'' to an address from the special
memory area. Names of foreign symbols ``resolved'' this way are
collected into a list; this list ends up in cold-sbcl.core
as a
SYMBOL-VALUE
of SB-VM::*REQUIRED-RUNTIME-C-SYMBOLS*
.
For :sb-dynamic-core builds, SB-VM::*REQUIRED-RUNTIME-C-SYMBOLS*
is a
newly-introduced static symbol; its (constant) address is available to
runtime among other GENESIS
data.
Here is the most unusual part of the story: real name resolving is
done by the runtime itself. Upon startup, the runtime iterates over
(SYMBOL-VALUE SB-VM::*REQUIRED-RUNTIME-C-SYMBOLS*)
; each name in this
list is resolved, and the result is stored (in a manner resembling
linkage-table implementanion) in the special memory area mentioned
above. The order of symbols in the list is the same as the order of
address allocation in the special memory area; that's why each symbol
address is stored in the place where the core expects to find exactly
this symbol.
How is a foreign symbol name resolved by the runtime? Well, it's a
banality: runtime calls GetProcAddress()
. Oh, some missing details:
-Wl,-export-all-symbols
for MinGW's gcc).HINSTANCE
with GetModuleHandle(NULL)
and uses
it as an argument for GetProcAddress
.HINSTANCE
value for each of its
directly-imported DLL. If GetProcAddress
returns NULL
with
runtime's own HINSTANCE
, runtime retries resolving with each of
those additional HINSTANCEs. Static references to kernel32, MSVCRT
or Winsock2 symbols are successfully resolved at this stage.The detail omitted earlier for the sake of simplicity: exactly as for
the real linkage-table, we have to distinguish function references
from variable references; the latter kind also requires some special
handling in the client code (i.e. in the core itself). These two
requirements are satisfied by modified FOREIGN-SYMBOL-SAP
VOP: its
xc-host version (with :sb-dynamic-core enabled) now generates
:foreign-dataref fixups, that are normally disallowed in
cross-compiled Lisp code. SB-VM::*REQUIRED-RUNTIME-C-SYMBOLS*
is not
just a list of names; for each name, GENESIS-2
records an additional
value to distinguish data and function references.
GENESIS
data (everything in src/runtime/genesis
) shouldn't change
since the generation of the core.EXTERN-ALIEN
(or
something with EXTERN-ALIEN
inside, like DEFINE-ALIEN-ROUTINE
), you
have a chance to get a diagnostic message mentioning undefined
alien; when missing symbol represents some fundamental runtime
support function, like call_into_c
, most probable outcome is a
crash.
Previous section is obsolete in one respect. There is no ``special VM area'' for dynamic linking against the runtime now: all runtime symbols are registered in the common linkage-table.
With :SB-DYNAMIC-CORE enabled, therefore, there is no ``static symbols'' that require special cases here and there; all foreign symbols are linked the same way. The difference is restricted to initialization of linkage-table entries: the Lisp code managing foreign libraries can't run before the runtime is available, so the runtime has to ``preseed'' linkage-table by itself.
In SBCL upstream, linkage-table support for win32 is very close to being complete. A couple of places related to linkage-table are unnecessary conditionalized on #!-win32: e. g. (update-linkage-table) just works on win32, if only permitted to try.
Long before my experimentation with runtime linking methods described in the previous section, I ensured that #!(and win32 linkage-table) support is on par with other platforms. Error handling for undefined alien variable references is among the things I had to add, but its reimplementation for win32 turned out to be a trivial task.
Some code related to linkage-table, ex. (LOAD-SHARED-OBJECT)
, is
problematic for win32, but the problems aren't win32-specific:
overlooking dlopen/dlclose refcounting issues prevents library
unloads, but the nature and presence of this problem doesn't change at
all when LoadLibrary()
and FreeLibrary()
become the real functions
behind dlopen/dlclose.
When I started to build runtime executable with an export table, I had
to remove #!-win32 conditionalizations of *RUNTIME-DLHANDLE*
operations. *RUNTIME-DLHANDLE*
has very close win32 equivalent,
namely, the result of GetModuleHandle(NULL). In current tls63,
initialization and usage of *RUNTIME-DLHANDLE*
is done on win32 in the
same way as on a typical Unix platform (well, except a no-op instead
of dlclose'ing it: GetModuleHandle doesn't affect refcount, while
dlopen(NULL) does).
Data Execution Prevention (DEP) is an optional feature of modern MS Windows OSes, intended to protect running programs against some common methods of malicious code injection. When DEP is available, system administrator may disable it, enable it for some system services, or enable it for any executable (minus list of exceptions). Of course, any program that fails when DEP is enabled may be added to aforementioned list of exceptions; however, as a rule, application developers don't express too much enthusiasm when an executable suddenly requires system administrator intervention to start working. That's why successfully working with DEP enabled would be a good thing for SBCL.
SBCL port for Windows has only one problem with DEP (solved in tls63).
Though ``Data Execution'' is exactly the thing SBCL is doing all the
time (and also the thing Common Lisp is all about), DEP doesn't
complain, because all those data are explicitly marked executable.
However, when SBCL registers an assembly routine called
UWP-SEH-HANDLER
as a SEH handler, DEP presence causes SBCL to die when the
handler is about to be invoked. And it is invoked on any
unwinding through UNWIND-PROTECT
— quite common thing to happen.
Probably, once upon a time, DEP authors either anticipated or experienced some technique of DEP circumvention involving unexpected SEH handler installation. While the main functionality of DEP is easy to understand (for CPUs with NX bit: set NX bit for non-executable pages — end of story), additional layers of ``DEP circumvention prevention'' may bring some surprises, as we can see. Here is the bottom line: DEP normally requires SEH handler function to be in an executable section of an EXE or a DLL.
After some hours of striving with RtlUnwind, I've finally found some
information about the aforementioned restriction. After that I added a
similar SEH handler implementation to src/runtime/x86-assem.S
, and
made (SET-UNWIND-PROTECT)
install the new handler. DEP accepts its
new location as ``safe''; tls63 build of SBCL doesn't die anymore.
MSVCRT provides _read()
and _write()
functions, that are used in SBCL
upstream as a drop-in replacement for Unix system calls (or libc
wrappers around them). When Dmitry Kalyanov added win32 threading
support to his branch of SBCL (that is the basis of my own one), he
discovered that MS Windows synchronous I/O has a significant
restriction: outstanding _read()
blocks attempted _write()
(or, to be
precise, outstanding ReadFile()
blocks attempted WriteFile()
: this
restriction is not related to MSVCRT ``lowio'' layer; its ultimate
cause is the NT kernel, and NtReadFile()
, NtWriteFile()
are affected
as well).
Multiprocessing Lisp code frequently uses separate threads for reading
and writing the same stream, and never expects write operation to
block until read operation completes. SLIME
in multithreaded mode
contains such code; SLIME
is also the most obvious candidate for
initial testing of threading support.
Dmitry made an initial reimplementation of _read()
and _write()
to
prevent mutual blocking of reader and writer threads. That's how
win32_unix_read()
and win32_unix_write()
first appeared in win32-os.c
.
My acquaintance with SBCL runtime started with those two functions; my first changes were to make them use OVERLAPPED I/O mode when it's possible for a given FD (i.e. for an underlying file handle).
Later I worked on those functions to make them interruptable (such
that SB-THREAD:INTERRUPT-THREAD
makes I/O call in the target thread to
return EINTR
; in a typical case the flow of control leaves
WITHOUT-INTERRUPTS
section in the guts of REFILL-INPUT-BUFFER
, and
pending thread interruptions have a chance to run (unless interrupts
are disabled around REFILL-INPUT-BUFFER
as well).
Until recently, win32_unix_read()
and win32_unix_write()
were
interruptible only when a subject file supported OVERLAPPED I/O. One
recent step further: console device reading.
On MS Windows (as of NT family, which is the only realistic target of
SBCL win32 port), console device handles differ from all other file or
device handles in many ways. Normal handles refer to NT kernel
entities (think "NTDLL" and "system calls"); console handles are
internal to KERNEL32.DLL
(think "userspace", despite the misleading
library name). KERNEL32.DLL
tries to create an illusion that console
handlers aren't different:
Console handles are now handled specially by win32_unix_read
and
win32_unix_write
:
ReadFile
or WriteFile
, we translate read/write I/O
to ReadConsoleW
and WriteConsoleW
. As we use Unicode functions
here, Lisp-side fd-stream initialization code doesn't have to do
anything with ``console codepage'' settings: external format for
console streams is always :UCS-2
.ReadConsoleW
is interrupted
immediately if the handle is closed by other thread. This fact is
now used to interrupt win32_unix_read
(see wake_thread()
) during
console input.GetConsoleMode()
, etc.).Lisp backtrace inevitable includes some foreign functions, at least when
C-STACK-IS-CONTROL-STACK
. Typical backtrace item corresponding to such
a function sets a new standard of informativeness and beauty:
("foreign function: #x40CDD0")
Some evil people among SBCL developers have already started an effort
to turn this hacker's delight into some boring thing, like a function
name (search SBCL sources for SAP-FOREIGN-SYMBOL
to learn further
details). It's interesting that I can't imagine a single case when
(upstream) SAP-FOREIGN-SYMBOL
successfully finds the name of C
function that called our Lisp code: it tries to find an address range
enclosing the argument in the linkage table space, and if it ever
succeeded with a return address from the frame pointer chain, it would
mean that the CALL instruction is contained in linkage table. Well,
maybe I've overlooked something here.
I added another chunk of code to SAP-FOREIGN-SYMBOL
(currently
conditionalized on #!+win32). Among foreign symbols exported by the
runtime, it finds the highest address that is still lower than the
argument address. Foreign symbol corresponding to the address found is
taken as a foreign function name, and the difference between the
argument and the search result is taken as an offset within that
function. Name and offset are formatted into a string.
When our (ancestor) caller happens to be a C function from SBCL runtime, (backtrace) result items now look like this (last two items):
22: (SB-THREAD::INITIAL-THREAD-FUNCTION) 23: ("foreign function: call_into_lisp +#x6C") 24: ("foreign function: funcall0 +#x2D")
My code in (SAP-FOREIGN-SYMBOL)
may be improved in many ways. Well, at
least it finds something, and its result is correct in the common case
when the caller is, indeed, an exported function of SBCL runtime.
C calling conventions on X86 requires FPU stack to be empty on call entry and provides 0 or 1 values on call exit (the latter is the case when the function result is returned in FPU stack). SBCL code, when it's working with X86 FPU, requires FPU stack to be full (no empty registers as described by the tag word).
SBCL uses FPU rather frequently during its normal work, even when the task running has no explicit floating-point calculations; hash table internals use FP operations, for example. On the other hand, a vast majority of runtime C code, as well as a lot of available libraries, don't use FPU at all (i.e. they are unaffected by FPU stack and don't modify it).
I had a bad feeling about those FLDZs and FSTPs on each foreign call. Assuming that Windows handles contexts switches like other modern OSes on X86, there are two possible scenarios: either it sees SBCL as eligible for lazy FPU context loading (thus setting TS bit in CR0 as SBCL is running), or it doesn't (thus making SBCL the `owner' of the FPU instantly). While it goes unnoticeable when no other tasks are actively using FPU, it could cost a couple of context switches per C call entry/exit when there are such tasks (even SBCL's own threads).
The right thing to do is, of course, fixing SBCL compiler so it doesn't require full FPU stack. Unfortunately, back then I was more familiar with SEH and X86 FPU than with SBCL compiler internals; that's why I added some exception-handling code to do an equivalent of lazy FPU switching in userspace.
The idea is that when :INVALID trap is enabled, we catch both Lisp
attempt to pop from the empty FPU stack and C attempt to push into the
full one. The most tricky part of the solution is recovering after FPU
exception, as it can't be restarted by simply returning
ExceptionContinueExecution
: failed FPU opcode should be retrieved and
reinterpreted/restarted separately.
As long as no one disables the :INVALID trap, the solution works as intended, allowing Lisp code to run with empty FPU stack (and C code to run with full FPU stack) until CPU hits a FP instruction.
This part of code is to be dropped without regret as soon as the compiler is fixed.
Windows'2000 SP3 and above provide an interesting feature: if we
reserve memory with MEM_WRITE_WATCH
flag, all written-to pages in the
reserved region are tracked. GetWriteWatch()
function may be used to
retrieve a list of dirty pages. This API is documented as intended for
GCs, and actually used, when available, e.g. in Boehm GC and in .NET.
Tls63 branch now detects GetWriteWatch presence and uses it for the
same purpose, instead of write-protecting pages from the dynamic space
and then trapping into SEH in order to unprotect written page. GC
requests for write protection are translated into ResetWriteWatch()
,
clearing the list of written-to addresses; at the beginning of
collect_garbage()
, all written-to pages that are `write protected'
this way (per GC page table) are marked as unprotected.
As far as I remember, write-tracking is not just a sugar on the same wp-fault-unwp sequence (done in kernel), but a CPU-level feature, modifying page table on first write without causing any exception. There are several possible strategies of using this API; the one I've selected is not necessary the best.
Unfortunately, write tracking is unavailable for memory regions
populated with MapViewOfFile()
; therefore it's used for all
dynamic-space pages above the memory-mapped core, while the
traditional SEH (and real write-protection) is used inside the
memory-mapped core space. There are some alternatives to consider here
as well.
Following SBCL upstream changes for other OSes, I switched to 32KiB
page size as a default for Win32, fixing the code that assumed
os_vm_page_size to be a system page size. If code simplicity were a
priority, I would make it 64 KiB — the allocation unit. However, I
considered debugging and testing the case of all-three-unequal
BACKEND_PAGE_BYTES
, dwPageSize
, and dwAllocationGranularity
a good
thing in itself.
Thus there is no restriction on BACKEND_PAGE_BYTES
by allocation
granularity: everything up from 4KiB is still available.
Tls63 branch allocates the chunk around struct thread
(i.e. dynamic
values, alien space, binding stack...) using MEM_RESERVE,
precommitting only a couple of pages here and there when they're known
to be used. All the rest is committed by the exception handler on
demand.
As of the control stack, that can't be the part of struct thread
on
win32, this branch:
struct thread
for the control stack, as we
can't move it anyway (and, considering DEP
— see section above —
we can't move it, definitely);CreateThread()
use default executable settings for thread
stack, reserving 2MiB but committing one page.There is a problem with lazy-commit approach to alien stack: some system functions verify user buffers and reject uncommitted memory without running our exception handler. The dual workaround for it is provided (1) by the VOP used for allocation: it always `touches' the topmost byte allocated, (2) by the exception handler: for a page fault in alien stack, all intermediate pages up to the bottom are committed as well.
File buffer allocation code (the same in tls63 and the upstream SBCL)
ends up in os_validate, thus having 64 KiB allocation
granularity. Until it's modified to use malloc()
instead, smaller FD
stream buffers on Win32 are wasting space (15/16 of it for the
original default of 4096 bytes).
For now, tls63 uses 64 KiB as a default buffer size. Switching to
malloc()
is equally easy, of course (and probably should be done
anyway).
When I though about SBCL's WAITQUEUE
and MUTEX
, implemented with
futex_wait()
and futex_wake()
, implemented with pthread_mutex()
pthread_cond_timedwait()
, implemented with (...) some Win32 API, there
seemed to be way too many layers. Implementing Win32 futexes resulted
in better performance and (more importantly) better support for
interrupts: there is no `uninterruptible quantum' in the new futex
implementation, and pthread_kill()
, now being part of the same module,
is able to do its best.
Done exactly as planned and stated in the initial announce page.
Current implementation of safepoints, interrupts, and GC signalling in Tls63 departed significantly from the original Dmitry Kalyanov's code. It should be considered an experiment, with the possibility that the result will be eventually thrown away, rolling it back to the original.
However, there are some logical changes, discovered during this experiment, that we should apply to the original code in this case:
call_into_lisp()
. Calls into lisp from the
runtime are normally done using a StaticSymbolFunction()
; and,
though the symbol is static, the function may be moved by GC at the
moment of calling StaticSymbolFunction()
, before call_into_lisp
is
entered. A good marker for the real points where unsafe region
should begin are existing calls to fake_foreign_function_call()
;
blocking GC signals (from Unix world) is a logical equivalent of
becoming GC-unsafe (for Win32 safepoint-based build). Calls like
alloc_sap()
and other pa_alloc()
wrappers can't be done from
supposedly safe region as well.gc_safepoint()
call waiting until current GC ends. However,
as long as nothing is done to prevent another GC from starting
between gc_safepoint()
and call_into_lisp()
, it may happen.gc_stop_the_world()
. E.g. SuspendThread()
,
GetThreadContext()
and ResumeThread()
will cut it — not when GCing
(it's too late), but when stopping the world. (I avoid
SuspendThread()
et al. altogether in my code — it's a debatable
design decision caused by portability issues, e.g. my desire to
support Wine — but then I have to use memory barriers in foreign
call entry/exit).