Declarations in compilers (feedback welcome)

Discussion:

Juan Jose Garcia-Ripoll

2011-12-29 10:24:37 UTC

After struggling mentally with this for a few weeks, I would like to have
some consultation before I introduce some changes in ECL -- not that I
expect many users here, but at least some implementor-fellows and power
users of other implementations.

My concerns right now relate to how declarations should be used by a
compiler, and in particular how declarations interact with SAFETY levels.
Please correct me if I am wrong, but I have seen more or less the following
approaches

[a]- Most implementations blindly believe declarations below a certain
safety level. Above it, they seem more or less useless.

[b]- SBCL takes declarations (and THE) as type assertions. For instance, in
(LET ((Y (FOO X))) (DECLARE (FIXNUM Y))) ...) the assignment to Y would be
checked to be a FIXNUM. This means the type declaration is actually
enforced and believed and only at SAFETY 0 the checks are dropped (*)

In both cases one ends up with a model in which in order to truly believe a
declaration and have no extra burden (assertions), one has to drop to
SAFETY 0 in all code that is involved with it, which is a mess, because it
might inadvertently affect other parts of the code. It is for this reason
that I am considering an alternative model for ECL which would grade safety
as follows

- Type declarations are always believed
- SAFETY >= 1 adds type checks to enforce them.
- SAFETY = 0, no checks.
- SAFETY = 1, the special form THE or additional proclamations on the
functions can be used to deactivate the check. As in (LET ((Y (THE FIXNUM
(FOO X))) ...)

This would allow one to keep most code safe, while deactivating some checks
when they are really known to be true (**). Do you think this is
useful/useless? The problem I see with this approach is that all code
around is written for model [a] or [b], but I could not come up with
something more sensible so far.

Juanjo

(*) Actually the checks are also deactivated when SBCL can infer the type
of the value that is assigned to Y. This is somewhat contradictory, because
(SETF Y (THE FIXNUM (FOO X))) would still generate a check, but proclaiming
FOO to return a FIXNUM would completely bypass the check.

(**) Yes, indeed I know that LOCALLY exists for a reason, but it does more
than THE. For instance, if I (LOCALLY (DECLARE (SAFETY 0)) (THE FIXNUM (FOO
(SLOT-ACCESSOR X)))... this influences the safety of the code that accesses
a structure, which is not good.

P.S.: Thanks to Paul Khuong for pointing out that SBCL behaves differently
w.r.t. declarations.

--
Instituto de Física Fundamental, CSIC
c/ Serrano, 113b, Madrid 28006 (Spain)
http://juanjose.garciaripoll.googlepages.com

Nikodemus Siivola

2011-12-29 12:33:27 UTC

Permalink

On 29 December 2011 12:24, Juan Jose Garcia-Ripoll

Post by Juan Jose Garcia-Ripoll
[b]- SBCL takes declarations (and THE) as type assertions. For instance, in
(LET ((Y (FOO X))) (DECLARE (FIXNUM Y))) ...) the assignment to Y would be
checked to be a FIXNUM. This means the type declaration is actually enforced
and believed and only at SAFETY 0 the checks are dropped (*)

This is correct, but incomplete. At SAFETY 1 SBCL will weaken complex
assertions: eg.

(OR (INTEGER 0 2) (INTEGER 7 10))

will be simplified to a range check for (INTEGER 0 10). At SAFETY 2 no
types are weakened. At SAFETY 3 all the extra bells and whistles
required by ANSI come into play.

CMUCL's approach is very similar to SBCL's, but IIRC the policy on
weakening assertions is a bit different.

Post by Juan Jose Garcia-Ripoll
In both cases one ends up with a model in which in order to truly believe a
declaration and have no extra burden (assertions), one has to drop to SAFETY
0 in all code that is involved with it, which is a mess, because it might
inadvertently affect other parts of the code. It is for this reason that I
am considering an alternative model for ECL which would grade safety as
follows

Actual cost of assertions (for SBCL generated code at least) is fairly
small. They should for the most part be branches which the static
branch prediction model gets right every time.

Post by Juan Jose Garcia-Ripoll
- Type declarations are always believed
- SAFETY >= 1 adds type checks to enforce them.
- SAFETY = 0, no checks.
- SAFETY = 1, the special form THE or additional proclamations on the
functions can be used to deactivate the check. As in (LET ((Y (THE FIXNUM
(FOO X))) ...)
This would allow one to keep most code safe, while deactivating some checks
when they are really known to be true (**). Do you think this is
useful/useless? The problem I see with this approach is that all code around
is written for model [a] or [b], but I could not come up with something more
sensible so far.

I somewhat dislike making THE a loophole -- IMO it complicates the
mental model necessary to understand how things work, especially if it
works differently at a specific SAFETY level. SBCL has
SB-EXT:TRULY-THE for just this purpose, which is roughly equivalent
to:

(defmacro truly-the (type values)
`(flet ((the-values () ,values))
(declare (optimize (safety 0)))
(the ,type (the-values))))

CMUCL has the equivalent as EXT:TRULY-THE. You may want to consider
something like that as well.

I know that when I write (UNSAFE-FUN-THAT-CHECKS-NOTHING (THE FIXNUM
X)) I intend the THE as an assertion. Granted, most of the code I
write is intended for SBCL-only consumption, so this is probably a
moot point. Still, loading bunch of stuff from Quicklisp and
instrumenting the compiler to see how often THE's like that occur
might be instructive.

At the end, as long as SAFETY 0 = trust everything blindly and SAFETY
3 = check everything, I think you're well within the bounds of custom
and sanity if you choose to make SAFETY 1 a bit magical.

Post by Juan Jose Garcia-Ripoll
(*) Actually the checks are also deactivated when SBCL can infer the type of
the value that is assigned to Y.

This is actually a major point for us. Because SBCL open codes /
partial-evaluates things rather agressively, and has a fairly
extensive derivation machinery, in idiomatic Lisp code with type
declarations type checks mostly occur only for function arguments,
return values, and iteration variables -- and the cost of those type
checks if trivial for the most part. What sometimes makes them look
more expensive then they actually are is the suboptimal representation
selection they cause.

Post by Juan Jose Garcia-Ripoll
This is somewhat contradictory, because (SETF Y (THE FIXNUM (FOO X))) would still
generate a check, but proclaiming FOO to return a FIXNUM would completely bypass
the check.

Yes, but if you proclaim FOO to return a fixnum before compiling it,
then FOO will take care of that assertion. (Trusting proclamations
made after a function has been compiled is considered a long-standing
bug, not a feature.)

Cheers,

-- Nikodemus

Juan Jose Garcia-Ripoll

2011-12-29 14:04:48 UTC

Permalink

On Thu, Dec 29, 2011 at 1:33 PM, Nikodemus Siivola <

Post by Nikodemus Siivola
On 29 December 2011 12:24, Juan Jose Garcia-Ripoll

I somewhat dislike making THE a loophole [...]
CMUCL has the equivalent as EXT:TRULY-THE. You may want to consider
something like that as well. [...]
At the end, as long as SAFETY 0 = trust everything blindly and SAFETY
3 = check everything, I think you're well within the bounds of custom
and sanity if you choose to make SAFETY 1 a bit magical.

I believe there can be a compromise between safety and speed. There are
many macros and user code that can perform assertions about the code that
the compiler will never be able to, and it is in my opinion unfortunate
that all the safety checks have to be removed to take full advantage of
those.

I also understand that some of the type checks are cheap, specially if the
compiler is allowed to "simplify" them, as SBCL does for SAFETY=1, but the
result is code bloat. Lots of avoidable checks, branching and error
messages that we could do without, without actually sacrificing safety.
That does not seem like a bad case for something in between both extremes.

Conceptually, in the model above, I do not see the THE as a loophole, but
rather as two different things: variable declarations = type checked
assignments, value declarations = compiler hints. For instance, if I invoke
a function with a THE argument, SBCL will not generate a check: (FOO (THE
FIXNUM X)) is just (FOO X), am I wrong? (I just checked in Ubuntu's SBCL)
Then in that sense THE does not really make much sense at all, because the
type checks are introduced by assignments to variables, not by this special
form.

Juanjo

--
Instituto de Física Fundamental, CSIC
c/ Serrano, 113b, Madrid 28006 (Spain)
http://juanjose.garciaripoll.googlepages.com

Nikodemus Siivola

2011-12-29 15:08:20 UTC

Permalink

On 29 December 2011 16:04, Juan Jose Garcia-Ripoll

Post by Juan Jose Garcia-Ripoll
Conceptually, in the model above, I do not see the THE as a loophole, but
rather as two different things: variable declarations = type checked
assignments, value declarations = compiler hints.

Fair enough. SBCL disagrees, but it and CMUCL stand apart from most
implementations when it comes to handling of types.

Post by Juan Jose Garcia-Ripoll
For instance, if I invoke
a function with a THE argument, SBCL will not generate a check: (FOO (THE
FIXNUM X)) is just (FOO X), am I wrong? (I just checked in Ubuntu's SBCL)

Actually, SBCL /should/ generate the check, unless you are using the
interpreter. If it didn't then I'm guessing the Ubuntu version is an
old one:

CL-USER> (defun foo (x) x)
CL-USER> (foo (the fixnum t))
; in: FOO (THE FIXNUM T)
; (THE FIXNUM T)
;
; caught WARNING:
; Constant T conflicts with its asserted type FIXNUM.
; See also:
; The SBCL Manual, Node "Handling of Types"
;
; compilation unit finished
; caught 1 WARNING condition
; Evaluation aborted on #<SIMPLE-TYPE-ERROR expected-type: FIXNUM datum: T>.

...plus entry to debugger is the expected behaviour.

Both THE generates and and assignment to a variable whose type has
been declared generates a identical cast node in SBCL's IR.

Cheers,

-- Nikodemus

Gail Zacharias

2011-12-29 16:46:49 UTC

Permalink

Using declarations vs using THE is often a stylistic consideration, and
while you may be able to get ECL-only users to accept your additional
semantics, you might have trouble getting maintainers of portable libraries
to observe this arbitrary distinction.

Why not let SPEED into the mix? E.g. if SPEED > SAFETY then don't compile
typechecks.

On Thu, Dec 29, 2011 at 5:24 AM, Juan Jose Garcia-Ripoll <

Post by Juan Jose Garcia-Ripoll
After struggling mentally with this for a few weeks, I would like to have
some consultation before I introduce some changes in ECL -- not that I
expect many users here, but at least some implementor-fellows and power
users of other implementations.
My concerns right now relate to how declarations should be used by a
compiler, and in particular how declarations interact with SAFETY levels.
Please correct me if I am wrong, but I have seen more or less the following
approaches
[a]- Most implementations blindly believe declarations below a certain
safety level. Above it, they seem more or less useless.
[b]- SBCL takes declarations (and THE) as type assertions. For instance,
in (LET ((Y (FOO X))) (DECLARE (FIXNUM Y))) ...) the assignment to Y would
be checked to be a FIXNUM. This means the type declaration is actually
enforced and believed and only at SAFETY 0 the checks are dropped (*)
In both cases one ends up with a model in which in order to truly believe
a declaration and have no extra burden (assertions), one has to drop to
SAFETY 0 in all code that is involved with it, which is a mess, because it
might inadvertently affect other parts of the code. It is for this reason
that I am considering an alternative model for ECL which would grade safety
as follows
- Type declarations are always believed
- SAFETY >= 1 adds type checks to enforce them.
- SAFETY = 0, no checks.
- SAFETY = 1, the special form THE or additional proclamations on the
functions can be used to deactivate the check. As in (LET ((Y (THE FIXNUM
(FOO X))) ...)
This would allow one to keep most code safe, while deactivating some
checks when they are really known to be true (**). Do you think this is
useful/useless? The problem I see with this approach is that all code
around is written for model [a] or [b], but I could not come up with
something more sensible so far.
Juanjo
(*) Actually the checks are also deactivated when SBCL can infer the type
of the value that is assigned to Y. This is somewhat contradictory, because
(SETF Y (THE FIXNUM (FOO X))) would still generate a check, but proclaiming
FOO to return a FIXNUM would completely bypass the check.
(**) Yes, indeed I know that LOCALLY exists for a reason, but it does more
than THE. For instance, if I (LOCALLY (DECLARE (SAFETY 0)) (THE FIXNUM (FOO
(SLOT-ACCESSOR X)))... this influences the safety of the code that accesses
a structure, which is not good.
P.S.: Thanks to Paul Khuong for pointing out that SBCL behaves differently
w.r.t. declarations.
--
Instituto de Física Fundamental, CSIC
c/ Serrano, 113b, Madrid 28006 (Spain)
http://juanjose.garciaripoll.googlepages.com
_______________________________________________
pro mailing list
http://lists.common-lisp.net/cgi-bin/mailman/listinfo/pro

Juan Jose Garcia-Ripoll

2011-12-29 17:50:05 UTC

Permalink

Post by Gail Zacharias
Using declarations vs using THE is often a stylistic consideration, and
while you may be able to get ECL-only users to accept your additional
semantics, you might have trouble getting maintainers of portable libraries
to observe this arbitrary distinction.

Precisely what I mean is that the current semantics is really inconvenient
for library writers. I also believe that this change can be introduced at
no cost for library maintainers because it effectively does not change the
semantics at the safety levels that code are typically compiled (0 or
default ones). Let me try to explain it further below.

Post by Gail Zacharias
Why not let SPEED into the mix? E.g. if SPEED > SAFETY then don't compile
typechecks.

The issue is not SPEED, it is safety. Safety need not be sacrificed to gain
speed. Moreover, the problem with this SAFETY vs SPEED thing is that it has
no granularity at all. It is a simplistic view of the world which assumes
that all code is the same.

Let me explain the situation with an ordinary library, say a regular
expression parser. Somebody who writes the library has to understand that
there are various types of routines or sections of code that she is going
to write:

2- Code that handles user input (strings, lists which might be malformed,
etc)
1- Code that handles internal data (structures that will not change, sealed
classes, lists of known lengths)
0- Small sections of code that handles internal data and needs speed

I would expect that only 0 should be compiled with SAFETY = 0, and
explicitly marked so. However, we also have 1 and 2, which typically 1 and
2 are going to coexist and sometimes appear intermixed in the same
function. Here one must either resort to high safety levels for everything,
or end up wrapping around different sections of code with (LOCALLY (...
UNSAFE ...) ...) declarations. This is not good in my opinion.

The problem is that we are implicitly advocating that SAFETY = 0 is good
for everything once the code is mature enough and you need speed, but such
level implies much more than believing type declarations, it typically
implies that the arguments to functions are not checked at all. Take (CAR
(THE CONS X)). There are multiple ways in which this CAR call can be
inlined. To get the optimal case in this situation where I am telling the
compiler "believe me, this is a CONS", I may be opening the can of worms by
lifting all type checks in other uses of CAR.

Why do I believe this does not really change the semantics in a significant
way? First of all because apart from SBCL's declaration policy there is not
an explicitly written commitment in any of the free (natively compiling)
common lisps out there about the meaning of optimization settings. In such
a panorama, I would guess that currently library maintainers more or less
follow the approach of lowering safety levels to 0 in speed-critical code
and leaving it at some default value that works with their favorite
implementation elsewhere. See for instance CL-PPCRE

(defvar *standard-optimize-settings*
'(optimize speed (safety 0)(space 0) (debug 1) (compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0))...

Post by Gail Zacharias
From the user's point of view, the approach seems to be: if safety level is

zero, the compiler will make fast code, in default settings mode, I will
get type checking. The PCL also suggests this, and it seems to be a common
entry point for many new users. Moreover, users also cannot rely on CMUCL's
or SBCL's or ECL's type checking behavior for function arguments, because
they are not really standard, and manual type checking is required in most
libraries.

OTOH, if one comes up with a set of sensible settings that users may choose
from and which may be applicable throughout the library, without disrupting
the current behavior at SAFETY 0 or default, then the cost of adoption is
zero.

I am just trying to figure out a non-disruptive way of choosing those
settings, documenting them (
http://ecls.sourceforge.net/new-manual/ch02.html#ansi.declarations.optimize),
and perhaps even sparking a debate about it, so that there may be some more
uniformity throughout implementations.

Cheers,

Juanjo

--
Instituto de Física Fundamental, CSIC
c/ Serrano, 113b, Madrid 28006 (Spain)
http://juanjose.garciaripoll.googlepages.com

Martin Simmons

2011-12-30 16:29:05 UTC

Permalink

Content preview: >>>>> On Thu, 29 Dec 2011 11:24:37 +0100, Juan Jose Garcia-Ripoll
said: > > After struggling mentally with this for a few weeks, I would like
to have > some consultation before I introduce some changes in ECL -- not
that I > expect many users here, but at least some implementor-fellows and
power > users of other implementations. > > My concerns right now relate
to how declarations should be used by a > compiler, and in particular how
declarations interact with SAFETY levels. > Please correct me if I am wrong,
but I have seen more or less the following > approaches > > [a]- Most implementations
blindly believe declarations below a certain > safety level. Above it, they
seem more or less useless. > > [b]- SBCL takes declarations (and THE) as
type assertions. For instance, in > (LET ((Y (FOO X))) (DECLARE (FIXNUM Y)))
...) the assignment to Y would be > checked to be a FIXNUM. This means the
type declaration is actually > enforced and believed and only at SAFETY 0
the checks are dropped (*) > > In both cases one ends up with a model in
which in order to truly believe a > declaration and have no extra burden (assertions),
one has to drop to > SAFETY 0 in all code that is involved with it, which
is a mess, because it > might inadvertently affect other parts of the code.
It is for this reason > that I am considering an alternative model for ECL
which would grade safety > as follows > > - Type declarations are always
believed > - SAFETY >= 1 adds type checks to enforce them. > - SAFETY = 0,
no checks. > - SAFETY = 1, the special form THE or additional proclamations
on the > functions can be used to deactivate the check. As in (LET ((Y (THE
FIXNUM > (FOO X))) ...) > > This would allow one to keep most code safe,
while deactivating some checks > when they are really known to be true (**).
Do you think this is > useful/useless? The problem I see with this approach
is that all code > around is written for model [a] or [b], but I could not
come up with > something more sensible so far. [...]

Content analysis details: (1.9 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
3.1 SINGLE_HEADER_2K A single header contains 2K-3K characters
-1.3 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain
Archived-At: <http://permalink.gmane.org/gmane.lisp.cl-pro/606>

I don't like this because it contradicts the CL spec:

"The meaning of a type declaration is equivalent to changing each reference to
a variable (var) within the scope of the declaration to (the typespec var),
changing each expression assigned to the variable (new-value) within the scope
of the declaration to (the typespec new-value), and executing (the typespec
var) at the moment the scope of the declaration is entered."

(from http://www.lispworks.com/documentation/HyperSpec/Body/d_type.htm).

In LispWorks, type declarations and THE forms have the same semantics and they
are checked when safety = 3 and debug = 3. The reason for involving debug is
that the checking code can be large and relatively slow.

__Martin