Re: [c++-pthreads] thread-safety definition
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [c++-pthreads] thread-safety definition



Mathieu Lacage wrote:

On Thu, 2004-01-08 at 12:34, Dave Butenhof wrote:

1) "inside cancelation": This is basically ExitThread (win32 name). It
exists on all the platforms which support a form of threads or another I
know of. It semantics vary a lot from one platform to the other
unfortunatly. On win32, it will not invoke any thread-specific cleanup
handlers (neither C++ exceptions nor SEH are involved). On BeOS
(exit_thread), it will behave just like on windows. On POSIX
(pthread_exit) systems, it will invoke the thread-specific cancelation
handlers.

The term "cancellation" seems heavy here. This is just a voluntary termination. But, yes, there are similar properties -- certainly from the point of view of the rest of the frames on the call stack at the time.
Indeed. For a C++ POSIX binding, I would assume you might want to make
such a function throw an exception caught by the thread-creation
function to unwind properly the stack. Or is this some kind of wild
stupid idea?
One example: on Tru64 UNIX and OpenVMS, pthread_exit() raises an exception, which is distinct from the exception provoked by pthread_cancel(), but with similar characteristics. Specifically, that an UNCAUGHT exception will terminate only the thread rather than the process (it's implicitly caught in the thread library's internal "thread base" routine), and that it's "generally improper" (though not impossible nor even illegal) for any other agency to finalize propagation of the exception.

It's an exception for exactly the same reason as cancel: so that each active frame on the stack has the opportunity to perform appropriate cleanup of resources before termination.

In the "pure POSIX model", without exceptions, both pthread_exit() and cancellation provoke sequential LIFO execution of a stack of "POSIX cleanup handlers" designated by the pthread_cleanup_push() operation. The intended implementation of pthread_cleanup_push() (and our actual implementation) is as a simple macro that initiates an exception scope, analogous to a C++ "try {".

2) "outside cancelation": There are two kinds of "outside cancelation":

	2.1) "async cancelation": The OS removes the thread from its list of
tasks to schedule and does nothing to cleanup the thread ressources.
This is the most extreme useless feature of a thread library. BeOS and
win32 provide it. POSIX does not provide it.
I should add: win32 (TerminateThread), BeOS (kill_thread).

POSIX already defines "async cancel", as a mode where posting a cancel to a thread will cause the cancellation to be delivered at any arbitrary time supported by the OS and hardware. (Usually on the next clock tick, though that's a "common implementation" rather than any rule or even recommendation.)
OK. I guess this definition of "POSIX async cancel" was already
explained on the list before but I missed it. I believe this POSIX async
cancel is similar enough (at least, it feels as unsafe to use) to
"abort" that we could count it in section 2.1. What do you think ?
No, not really. POSIX async cancel is still an exception, allowing hierarchical isolated cleanup of each active frame on the stack. It's just that, because of the resource ownership dilemma, there's no way to safely use async-cancel in "general code". It has to be restricted to areas of code that do not acquire or release resources, including any calls to external functions that might.

Nevertheless, async cancel CAN be used safely if you're careful, without disrupting the operation of the process. This is not true of TerminateThread, or the hypothetical pthread_abort() proposal, which immediately deschedule the victim thread and abandon any resources it might own -- including heap (which can cause memory leaks) and synchronization objects (which, far worse, is almost guaranteed to cause deadlocks).

And note that it's OK to allocate heap, or lock a mutex, and then enable async cancel for some section of code, disabling async cancel before freeing the memory or releasing the mutex. In such a sequence, the cleanup handlers invoked by async cancel DO know the state of the resources (they are "acquired"), and can clean up. You simply can't enable async cancel across a call that allocates or frees heap, locks or unlocks a mutex, because the cleanup handler couldn't tell whether the operation had completed.

In contrast, ANY use of TerminateThread trashes the process unrecoverably, except in extremely unusual circumstances where an embedded-type application really knows precisely what the victim thread might be doing and can reliably repair any predicates and release or safely discard any resources. You can NEVER do this with a thread that might be running arbitrary library code, because you can't possibly know what resources it might own or the effect of abandoning them. (That's why pthread_abort() was rejected. While it's useful and even essential for some class of embedded system application, it's very nearly useless, and extremely dangerous, in any more general environment. Since the real value of POSIX in true embedded system design is "programmer portability", not full portability of every API, there would have been no point to including this specialized function in the general standard.)

"Cancellation" (both deferred and async) come from the Digital "CMA" architecture (where it was called "alert"). The CMA concept derives from a less structured (but fundamentally similar) capability in the SRC research labs' Topaz thread package.
Do you know of other widely used system-level APIs which provide similar
features?
No; though that's no guarantee that some haven't cropped up somewhere.

Definition "Posix thread-safety":
---------------------------------
A library is "posix thread-safe" if it is thread-safe and
defered-cancelation-safe.

I wouldn't tack cancel-safety onto thread-safety so intimately, although
I used the POSIX name because I thought it was the only widely deployed
system which provides this service. Maybe we should rename this to
"strong thread-safety". Maybe "defered-cancel thread-safety"?
But my point was that it's perfectly reasonable to have POSIX thread-safety without cancel-safety. I don't see how it's relevant whether anything but POSIX also has cancel-safety.

(Async cancel is an oddity; there are, and should be, very few async-cancel-safe functions. Async-cancel regions of code cannot accomodate resource acquisition or release of any sort, as the recovery code is generally unable to determine the state of the resource.)
Yes. This is why I don't feel it's necessary to discuss it further since
so little code will be concerned with it, we can altogether not deal
with it for most C++ libraries.
Introducing asynchronous exceptions into C++ would be pointlessly disruptive, like introducing continuable exceptions. I'd rather not even consider it.

Even if it were supported, though, C++ is certainly free to follow the lead of POSIX. We designated only a very few functions to be async-cancel safe; and even at that I think we ended up with more than we really should have had. (I never really figured out why we ended up with pthread_cancel() being async-cancel safe, and I don't think it makes any sense. The guy who write the text couldn't remember either, but in the end we decided not to risk changing it.) Really, in terms of POSIX standard APIs, all you can do with async cancel enabled is to DISABLE async-cancel. I like it that way. There's no reason at all that ANY of the standard C++ runtime should be designated (or coded) to be async-cancel safe.

Nevertheless, it's quite reasonable to write a "thread-safe" special purpose application routine that doesn't deal with cancellation simply because the designer KNOWS that a thread running that code cannot be cancelled. One might even make this choice within in a general purpose library in some cases -- say, for a daemon thread that could never run application code nor be identified to the application, and that therefore cannot be cancelled.
Yes. Exactly. I have written a lot of code like that. The core C++
threaded code is hidden far away from the user which cannot therefore
"posix-defer-cancel" it. It can't even ever get the C++ exceptions since
they are catch (...) and transformed into C error codes.
This doesn't sound like the same thing, though. Your catch(...) may prevent the cancel from doing what it SHOULD do, but it won't prevent delivery, and you've just ignored the application's cancel request. That's bad, and while it may be "cancel safe" in some trivial respect, (an unexpected cancel request won't corrupt the library state), it's not useful to anyone.

If code runs in an application thread, or a thread for which application code might have a valid handle, then that thread can be cancelled at the whim of the application. You can of course simply DOCUMENT that doing this is an error. You can say it'll be ignored, or you can say that it may arbitrarily corrupt application state; but that's not a true general purpose library.

What I'm talking about is a separate thread created within the library to which no application code could possibly have a reference. It is physically impossible for the application code to ever REQUEST cancellation. (Yeah, very little is "physically impossible", and a simple uninitialized variable could end up holding the handle of such a thread; but that's an application error against which nobody can reasonably defend.) Anyway, if the application "CAN'T" cancel the thread, and the library knows that it WON'T cancel the thread, there's no point in writing code that runs ONLY within that thread to be cancel safe.

As a conclusion to these (tentative) definitions, I believe the purpose
of this mailing list is to find a solution to develop "defered-cancel
thread-safe" C++ libraries: simple "thread-safe" libraries do not
require special attention. If everyone could agree to the statement
above, it would probably make the discussion more productive: other
threading models which do not support async cancelation are of no
interest to the discussion and can be forgotten.
Code that cannot ever be subject to cancellation need not be cancel safe, if that's what you mean. If code was written to a thread model without cancellation, or written specifically for an environment where it would not be cancelled, that code can be brought into a new "cancellable C++" environment safely as long as that basic premise continues -- that it will not be run in a thread that's cancelled.

If people agree on this statement, the only issue I can see which
delimits the design space for the solution to this problem is whether or
not you wish to allow the C++ library calling into C code (which uses
pthreads) and/or allow C code to use the C++ library (which uses our C++
threading solution).

Maybe it would help to consider the two cases separatly and try to
figure out what requirements each case creates:
	1) C++ library calls C++ code and is called by C++ code.
	2) C++ library calls into C code.
	3) C code calls C++ library.

The hard part seems to be 2) and 3) where, if you use exceptions to
propagate a cancel operation from either a cancelation point or a
pthread_exit call, you need to correctly handle the registered
cancelation handlers _and_ the C++ catch blocks in the right order. That
seems pretty hard (ie: impossible) to me, being just a _user_ of thread
libraries.
The impact extends beyond C and C++, to every facility that deals with exceptions; Java, Ada, Modula-2+, or whatever else. The call stack must be unwound once, and all handlers, no matter how declared or in what language, called in the correct sequence. You're right -- it's nearly impossible without exceptions; yet it's trivial, natural, and all but unavoidable if everyone uses the same common exception/unwind package. (And I might point out that any "non exception" mechanism that could accomplish it would be indistinguishable from a common exception infrastructure anyway!) That's precisely why cancellation and thread exit ARE exceptions, were always intended to be exceptions, and cannot practically be anything else. ;-)

If people are not interested in 2) and 3) and just want to design a
solution for 1), then I think it will make the discussion more
productive to acknowledge it.
The ANSI C++ committee could well do that; just as POSIX and C++ have so far essentially ignored each other. However, we might look back at the recently revealed origin of the name and subject of this mailing list, which is tangled up with actual implementation on a real system, specifically gcc. THEY cannot ignore interoperability between C and C++; and nor can anyone else in the real world. So even if the committee were to decide it cannot or is unwilling to address 2 and 3, I don't think that decision would be relevant to this mailing list!

--
/--------------------[ David.Butenhof@xxxxxx ]--------------------\
| Hewlett-Packard Company       Tru64 UNIX & VMS Thread Architect |
|     My book: http://www.awl.com/cseng/titles/0-201-63392-2/     |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/