Discussion:
"fhash"
Daniel Weinreb
2011-06-13 19:18:51 UTC
Permalink
Friends,

I wrote a little package for "fash hash tables", which provide an
abstraction that is analogous to that of Common Lisp hash tables, but
is faster for tables with few elements, and only slightly inferior for
tables with many elements.

I did this because performance analysis showed that our system was
spending too much time in hash table operations, and using the new
package helped.

I have recently been cleaning this up, one reason being that I'd like
to open source it. The function names used to be things like getfhash
and mapfhash. Now they are like fhash:get and fhash:map-elements and
so on.

However, before I open-source it, I was to make sure it's "right". It
recently occurred to me that the package name "fhash" has problems.

Here are pros and cons of changing it that I can see.

Pro: I's not a hash table in the small-cardinality case; it's a linear
lookup. So the name is not actually accurate.

Pro: Calling such a data structure a "hash table", even as Common Lisp
does, is an abstraction violation. Whether it works by hashing is an
implementation detail. The Java collection library calls this a Map.
Python calls it a dictionary. Clojure calls it a map. Those are both
better names.

Con: Common Lisp already uses the name "hash table", so it would be
easier for existing Common Lisp programmers to see the analogy.

Advice? Thanks!

Thanks!

-- Dan
Gary King
2011-06-14 00:53:28 UTC
Permalink
FWIW, I'd call 'em maps.

I think that would be more accurate and fit better with the rest of the programming culture in the large (whatever _that_ might be!).


--
Gary Warren King, metabang.com
Cell: (413) 559 8738
Fax: (206) 338-4052
gwkkwg on Skype * garethsan on AIM * gwking on twitter
Pascal J. Bourguignon
2011-06-14 01:52:11 UTC
Permalink
Content preview: Daniel Weinreb <dlw-hpIqsD4AKlfQT0dZR+***@public.gmane.org> writes: > Friends, > > I wrote
a little package for "fash hash tables", which provide an > abstraction that
is analogous to that of Common Lisp hash tables, but > is faster for tables
with few elements, and only slightly inferior for > tables with many elements.
I did this because performance analysis showed that our system was >
spending too much time in hash table operations, and using the new > package
helped. > > I have recently been cleaning this up, one reason being that
I'd like > to open source it. The function names used to be things like getfhash
and mapfhash. Now they are like fhash:get and fhash:map-elements and >
so on. > > However, before I open-source it, I was to make sure it's "right".
It > recently occurred to me that the package name "fhash" has problems.
Here are pros and cons of changing it that I can see. > > Pro: I's not
a hash table in the small-cardinality case; it's a linear > lookup. So the
name is not actually accurate. > > Pro: Calling such a data structure a "hash
table", even as Common Lisp > does, is an abstraction violation. Whether
it works by hashing is an > implementation detail. The Java collection library
calls this a Map. > Python calls it a dictionary. Clojure calls it a map.
Those are both > better names. > > Con: Common Lisp already uses the name
"hash table", so it would be > easier for existing Common Lisp programmers
to see the analogy. > > Advice? Thanks! [...]

Content analysis details: (-2.3 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
0.0 SINGLE_HEADER_2K A single header contains 2K-3K characters
-2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, medium
trust
[80.91.229.12 listed in list.dnswl.org]
-0.0 SPF_HELO_PASS SPF: HELO matches SPF record
-0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay
domain
-0.0 SPF_PASS SPF: sender matches SPF record
Archived-At: <http://permalink.gmane.org/gmane.lisp.cl-pro/442>
Friends,
I wrote a little package for "fash hash tables", which provide an
abstraction that is analogous to that of Common Lisp hash tables, but
is faster for tables with few elements, and only slightly inferior for
tables with many elements.
I did this because performance analysis showed that our system was
spending too much time in hash table operations, and using the new
package helped.
I have recently been cleaning this up, one reason being that I'd like
to open source it.  The function names used to be things like getfhash
and mapfhash.  Now they are like fhash:get and fhash:map-elements and
so on.
However, before I open-source it, I was to make sure it's "right".  It
recently occurred to me that the package name "fhash" has problems.
Here are pros and cons of changing it that I can see.
Pro: I's not a hash table in the small-cardinality case; it's a linear
lookup.  So the name is not actually accurate.
Pro: Calling such a data structure a "hash table", even as Common Lisp
does, is an abstraction violation.  Whether it works by hashing is an
implementation detail.  The Java collection library calls this a Map.
Python calls it a dictionary.  Clojure calls it a map.  Those are both
better names.
Con: Common Lisp already uses the name "hash table", so it would be
easier for existing Common Lisp programmers to see the analogy.
Advice?  Thanks!
Cesarum has an ADAPTATIVE-DICTIONARY abstraction that does that:

https://gitorious.org/com-informatimago/com-informatimago/blobs/master/common-lisp/cesarum/dictionary.lisp

It uses hash-tables when it's fast, and faster data structure when
hash-tables are slow (ie. when they don't have enough elements).
--
__Pascal Bourguignon__ http://www.informatimago.com/
A bad day in () is better than a good day in {}.
Daniel Weinreb
2011-06-14 13:36:48 UTC
Permalink
Thank you for letting me know about this.

One of the things I feel strongly about, though, is avoiding names like
dictionary-get. Fhash's package is the kind where you're not supposed to do
a "use". You're supposed to get at the external symbols by using explicit
package prefixes, so the dictionary- isn't there.

Anyway, I'll take a look at what you wrote. Thanks again.

-- Dan

On Mon, Jun 13, 2011 at 9:52 PM, Pascal J. Bourguignon <
Post by Pascal J. Bourguignon
Post by Daniel Weinreb
Friends,
I wrote a little package for "fash hash tables", which provide an
abstraction that is analogous to that of Common Lisp hash tables, but
is faster for tables with few elements, and only slightly inferior for
tables with many elements.
I did this because performance analysis showed that our system was
spending too much time in hash table operations, and using the new
package helped.
I have recently been cleaning this up, one reason being that I'd like
to open source it. The function names used to be things like getfhash
and mapfhash. Now they are like fhash:get and fhash:map-elements and
so on.
However, before I open-source it, I was to make sure it's "right". It
recently occurred to me that the package name "fhash" has problems.
Here are pros and cons of changing it that I can see.
Pro: I's not a hash table in the small-cardinality case; it's a linear
lookup. So the name is not actually accurate.
Pro: Calling such a data structure a "hash table", even as Common Lisp
does, is an abstraction violation. Whether it works by hashing is an
implementation detail. The Java collection library calls this a Map.
Python calls it a dictionary. Clojure calls it a map. Those are both
better names.
Con: Common Lisp already uses the name "hash table", so it would be
easier for existing Common Lisp programmers to see the analogy.
Advice? Thanks!
https://gitorious.org/com-informatimago/com-informatimago/blobs/master/common-lisp/cesarum/dictionary.lisp
It uses hash-tables when it's fast, and faster data structure when
hash-tables are slow (ie. when they don't have enough elements).
--
__Pascal Bourguignon__ http://www.informatimago.com/
A bad day in () is better than a good day in {}.
_______________________________________________
pro mailing list
http://lists.common-lisp.net/cgi-bin/mailman/listinfo/pro
Scott L. Burson
2011-06-14 06:23:20 UTC
Permalink
Content preview: On Mon, Jun 13, 2011 at 12:18 PM, Daniel Weinreb <dlw-hpIqsD4AKlfQT0dZR+***@public.gmane.org>
wrote: > Here are pros and cons of changing it that I can see. > Pro: I's
not a hash table in the small-cardinality case; it's a linear > lookup. So
the name is not actually accurate. [...]

Content analysis details: (-100.7 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
-0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low
trust
[209.85.213.179 listed in list.dnswl.org]
-100 USER_IN_WHITELIST From: address is in the user's white-list
-0.0 SPF_PASS SPF: sender matches SPF record
Archived-At: <http://permalink.gmane.org/gmane.lisp.cl-pro/443>
Post by Daniel Weinreb
Here are pros and cons of changing it that I can see.
Pro: I's not a hash table in the small-cardinality case; it's a linear
lookup.  So the name is not actually accurate.
Yes it is! It's a hash table with one bucket.

But I prefer the term "map" anyway :-)

-- Scott
Steve Haflich
2011-06-14 07:34:45 UTC
Permalink
If a fhash provided exactly the same API as a hashtable (which it could)
then it is a hashtable, and this improved performance for small numbers of
entries ought be a feature of any modern quality implementation. The only
reason it needs a different name is that if the implementation doesn't
provide it, there is no portable way to modify the implementation's
implementation.

BTW, the implementation probably could do a better job than portable user
code at reducing the slight reduced performance for large fhash.

Lisp has called these things hash tables ever since they were invented -- do
we need two names for the same thing? If so, I've got a problem with this
"cons" thingus.

Or perhaps I'm just feeling grumpy tonite.

A more serious future issue is that as CL slowly moves towards SMP one of
the hairy implementation issues is providing efficient hash tables that are
hazard free. It is really hard to do this in user code without huge
efficiency loss. So fhash might be restricted to uses that cannot be shared
between threads, unless it's performance improvement were provided by the
native implementation.
Daniel Weinreb
2011-06-14 13:32:50 UTC
Permalink
Oh, yes, thank you very much for bringing up the issue of threads!

I have to mention that they are not thread-safe. The CL definition does not
say whether hash tables are thread-safe since the CL definition has no
thread concept. CCL hash tables are thread-safe.

Right now, they aren't CLOS instances at all. It was sort of a little
pedagogical point of the exercise that an abstraction doesn't have to be a
CLOS object. This is part of my whole claim that CLOS, unlike, say, CLU,
isn't trying to be an encapsulation mechanism. If you want encapsulation,
that's what packages are for. Java tries to both and ends up with baroque
rules about how "protected" interacts with "in different packages".

But Fare pointed out to me that being able to add generic functions
specialized on these would be a good thing. This would mean making them use
CLOS not for encapsulation but for genericty.

So maybe there could be a subclass that did the locking. Java's new
collection library also provides basic non-thread-safe collections, and then
thread-safe ones built upon them. Thread-safety costs. I really ought to
measure that, though.
Martin Simmons
2011-06-14 17:30:53 UTC
Permalink
But Fare pointed out to me that being able to add generic functions >
specialized on these would be a good thing. This would mean making them use
CLOS not for encapsulation but for genericty. [...]
Content analysis details: (-0.0 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
-0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay
domain
Archived-At: <http://permalink.gmane.org/gmane.lisp.cl-pro/452>
But Fare pointed out to me that being able to add generic functions
specialized on these would be a good thing. This would mean making them use
CLOS not for encapsulation but for genericty.
Use can specialize methods on defstruct classes too, so they don't have to be
CLOS instances defined with defclass.
--
Martin Simmons
LispWorks Ltd
http://www.lispworks.com/
Raymond Wiker
2011-06-14 17:42:32 UTC
Permalink
Content preview: On Jun 14, 2011, at 19:30 , Martin Simmons wrote: >>>>>> On
Tue, 14 Jun 2011 09:32:50 -0400, Daniel Weinreb said: >> >> But Fare pointed
out to me that being able to add generic functions >> specialized on these
would be a good thing. This would mean making them use >> CLOS not for encapsulation
but for genericty. > > Use can specialize methods on defstruct classes too,
so they don't have to be > CLOS instances defined with defclass. [...]

Content analysis details: (-0.7 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
0.0 FREEMAIL_FROM Sender email is freemail (rwiker[at]gmail.com)
-0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low
trust
[209.85.215.179 listed in list.dnswl.org]
-0.0 SPF_PASS SPF: sender matches SPF record
0.0 RFC_ABUSE_POST Both abuse and postmaster missing on sender domain
0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid
0.0 T_TO_NO_BRKTS_FREEMAIL T_TO_NO_BRKTS_FREEMAIL
Archived-At: <http://permalink.gmane.org/gmane.lisp.cl-pro/453>
Post by Martin Simmons
Post by Daniel Weinreb
But Fare pointed out to me that being able to add generic functions
specialized on these would be a good thing. This would mean making them use
CLOS not for encapsulation but for genericty.
Use can specialize methods on defstruct classes too, so they don't have to be
CLOS instances defined with defclass.
Built-in classes too[1], which means that you can have methods specialized on (e.g.) single-float and double-float, fixnum and bignum, list and vector, etc.

[1] I think, but I cannot find the relevant reference just now.
Zach Beane
2011-06-14 17:48:27 UTC
Permalink
Content preview: Raymond Wiker <rwiker-***@public.gmane.org> writes: > On Jun 14, 2011,
at 19:30 , Martin Simmons wrote: > >>>>>>> On Tue, 14 Jun 2011 09:32:50 -0400,
Daniel Weinreb said: >>> >>> But Fare pointed out to me that being able to
add generic functions >>> specialized on these would be a good thing. This
would mean making them use >>> CLOS not for encapsulation but for genericty.
Post by Raymond Wiker
Post by Martin Simmons
Post by Daniel Weinreb
Post by Martin Simmons
Use can specialize methods on defstruct classes too, so they don't
have to be >> CLOS instances defined with defclass. > > Built-in classes too[1],
which means that you can have methods > specialized on (e.g.) single-float
and double-float, fixnum and > bignum, list and vector, etc. [...]

Content analysis details: (-100.7 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
-100 USER_IN_WHITELIST From: address is in the user's white-list
-0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low
trust
[209.85.212.51 listed in list.dnswl.org]
Archived-At: <http://permalink.gmane.org/gmane.lisp.cl-pro/454>
Post by Raymond Wiker
Post by Martin Simmons
Post by Daniel Weinreb
But Fare pointed out to me that being able to add generic functions
specialized on these would be a good thing. This would mean making them use
CLOS not for encapsulation but for genericty.
Use can specialize methods on defstruct classes too, so they don't have to be
CLOS instances defined with defclass.
Built-in classes too[1], which means that you can have methods
specialized on (e.g.) single-float and double-float, fixnum and
bignum, list and vector, etc.
Well, only if those types also happen to have a corresponding
implementation-specific class. SINGLE-FLOAT, DOUBLE-FLOAT, FIXNUM, and
BIGNUM are specified as types. LIST and VECTOR are system classes. FLOAT
is also a system class.

I recently ran into some accidentally unportable code that specialized a
method argument on DOUBLE-FLOAT. It worked in SBCL but failed in CLISP
because CLISP provides no DOUBLE-FLOAT class.

Zach
Raymond Wiker
2011-06-14 18:10:18 UTC
Permalink
Content preview: On Jun 14, 2011, at 19:48 , Zach Beane wrote: > Raymond Wiker
writes: > >> On Jun 14, 2011, at 19:30 , Martin Simmons wrote: >> >>>>>>>>
On Tue, 14 Jun 2011 09:32:50 -0400, Daniel Weinreb said: >>>> >>>> But Fare
pointed out to me that being able to add generic functions >>>> specialized
on these would be a good thing. This would mean making them use >>>> CLOS
not for encapsulation but for genericty. >>> >>> Use can specialize methods
on defstruct classes too, so they don't have to be >>> CLOS instances defined
with defclass. >> >> Built-in classes too[1], which means that you can have
methods >> specialized on (e.g.) single-float and double-float, fixnum and
Post by Zach Beane
bignum, list and vector, etc. > > Well, only if those types also happen
to have a corresponding > implementation-specific class. SINGLE-FLOAT, DOUBLE-FLOAT,
FIXNUM, and > BIGNUM are specified as types. LIST and VECTOR are system classes.
FLOAT > is also a system class. > > I recently ran into some accidentally
unportable code that specialized a > method argument on DOUBLE-FLOAT. It
worked in SBCL but failed in CLISP > because CLISP provides no DOUBLE-FLOAT
class. [...]

Content analysis details: (-0.7 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
0.0 SINGLE_HEADER_2K A single header contains 2K-3K characters
0.0 FREEMAIL_FROM Sender email is freemail (rwiker[at]gmail.com)
-0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low
trust
[209.85.215.179 listed in list.dnswl.org]
-0.0 SPF_PASS SPF: sender matches SPF record
0.0 RFC_ABUSE_POST Both abuse and postmaster missing on sender domain
0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid
0.0 T_TO_NO_BRKTS_FREEMAIL T_TO_NO_BRKTS_FREEMAIL
Archived-At: <http://permalink.gmane.org/gmane.lisp.cl-pro/455>
Post by Zach Beane
Post by Martin Simmons
Post by Daniel Weinreb
But Fare pointed out to me that being able to add generic functions
specialized on these would be a good thing. This would mean making them use
CLOS not for encapsulation but for genericty.
Use can specialize methods on defstruct classes too, so they don't have to be
CLOS instances defined with defclass.
Built-in classes too[1], which means that you can have methods
specialized on (e.g.) single-float and double-float, fixnum and
bignum, list and vector, etc.
Well, only if those types also happen to have a corresponding
implementation-specific class. SINGLE-FLOAT, DOUBLE-FLOAT, FIXNUM, and
BIGNUM are specified as types. LIST and VECTOR are system classes. FLOAT
is also a system class.
I recently ran into some accidentally unportable code that specialized a
method argument on DOUBLE-FLOAT. It worked in SBCL but failed in CLISP
because CLISP provides no DOUBLE-FLOAT class.
You're right - looks like I should have spent a little time on looking things up (again) :-)

CLTL2 lists (table 28-1, page 846) a number of predefined types with their class precedence lists. This table includes things like float, rational, integer and complex, but not things like single-float, double-float. Sooo, it *is* possible to specialize on built-in types, but not necessarily on *all* of them.
Erik Winkels
2011-06-14 21:34:30 UTC
Permalink
Post by Zach Beane
Post by Zach Beane
I recently ran into some accidentally unportable code that specialized
a > method argument on DOUBLE-FLOAT. It worked in SBCL but failed in CLISP
because CLISP provides no DOUBLE-FLOAT class. [...]

Content analysis details: (-100.0 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
-100 USER_IN_WHITELIST From: address is in the user's white-list
-0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, low
trust
[194.109.24.22 listed in list.dnswl.org]
-0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay
domain
Archived-At: <http://permalink.gmane.org/gmane.lisp.cl-pro/456>
Post by Zach Beane
I recently ran into some accidentally unportable code that specialized a
method argument on DOUBLE-FLOAT. It worked in SBCL but failed in CLISP because CLISP provides no DOUBLE-FLOAT class.
Yes, I had the same thing once with specializing on SIMPLE-VECTOR in SBCL while it had to be a VECTOR in CLISP.
Marco Antoniotti
2011-06-14 07:37:26 UTC
Permalink
Content preview: On Jun 13, 2011, at 21:18 , Daniel Weinreb wrote: > Friends,
I wrote a little package for "fash hash tables", which provide an > abstraction
that is analogous to that of Common Lisp hash tables, but > is faster for
tables with few elements, and only slightly inferior for > tables with many
elements. > > I did this because performance analysis showed that our system
was > spending too much time in hash table operations, and using the new
package helped. > > I have recently been cleaning this up, one reason being
that I'd like > to open source it. The function names used to be things like
getfhash > and mapfhash. Now they are like fhash:get and fhash:map-elements
and > so on. > > However, before I open-source it, I was to make sure it's
"right". It > recently occurred to me that the package name "fhash" has problems.
Here are pros and cons of changing it that I can see. > > Pro: I's not
a hash table in the small-cardinality case; it's a linear > lookup. So the
name is not actually accurate. > > Pro: Calling such a data structure a "hash
table", even as Common Lisp > does, is an abstraction violation. Whether
it works by hashing is an > implementation detail. The Java collection library
calls this a Map. > Python calls it a dictionary. Clojure calls it a map.
Those are both > better names. [...]

Content analysis details: (-0.0 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
-0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay
domain
-0.0 SPF_PASS SPF: sender matches SPF record
Archived-At: <http://permalink.gmane.org/gmane.lisp.cl-pro/445>
Friends,
I wrote a little package for "fash hash tables", which provide an
abstraction that is analogous to that of Common Lisp hash tables, but
is faster for tables with few elements, and only slightly inferior for
tables with many elements.
I did this because performance analysis showed that our system was
spending too much time in hash table operations, and using the new
package helped.
I have recently been cleaning this up, one reason being that I'd like
to open source it. The function names used to be things like getfhash
and mapfhash. Now they are like fhash:get and fhash:map-elements and
so on.
However, before I open-source it, I was to make sure it's "right". It
recently occurred to me that the package name "fhash" has problems.
Here are pros and cons of changing it that I can see.
Pro: I's not a hash table in the small-cardinality case; it's a linear
lookup. So the name is not actually accurate.
Pro: Calling such a data structure a "hash table", even as Common Lisp
does, is an abstraction violation. Whether it works by hashing is an
implementation detail. The Java collection library calls this a Map.
Python calls it a dictionary. Clojure calls it a map. Those are both
better names.
Actually, in Java the naming reflects the separation between abstraction and implementation. AFAIU, clojure does not quite do this and neither does Python.

I would advocate settling down on a Java-esque nomenclature with MAP or DICTIONARY as "names" for the abstraction and with different names for the implementations; e.g., DICTIONARY-TREE, DICTIONARY-FHASH, DICTIONARY-HASH-TABLE, you name it....

Cheers
--
MA




--
Marco Antoniotti, Associate Professor tel. +39 - 02 64 48 79 01
DISCo, Università Milano Bicocca U14 2043 http://bimib.disco.unimib.it
Viale Sarca 336
I-20126 Milan (MI) ITALY

Please note that I am not checking my Spam-box anymore.
Please do not forward this email without asking me first.
Alessio Stalla
2011-06-14 07:50:49 UTC
Permalink
Content preview: On Tue, Jun 14, 2011 at 9:37 AM, Marco Antoniotti wrote: >
On Jun 13, 2011, at 21:18 , Daniel Weinreb wrote: > >> Friends, >> >> I
wrote a little package for "fash hash tables", which provide an >> abstraction
that is analogous to that of Common Lisp hash tables, but >> is faster for
tables with few elements, and only slightly inferior for >> tables with many
elements. >> >> I did this because performance analysis showed that our system
was >> spending too much time in hash table operations, and using the new
package helped. >> >> I have recently been cleaning this up, one reason
being that I'd like >> to open source it. The function names used to be things
like getfhash >> and mapfhash. Now they are like fhash:get and fhash:map-elements
and >> so on. >> >> However, before I open-source it, I was to make sure
it's "right". It >> recently occurred to me that the package name "fhash"
has problems. >> >> Here are pros and cons of changing it that I can see.
Post by Daniel Weinreb
Pro: I's not a hash table in the small-cardinality case; it's a linear
lookup. So the name is not actually accurate. >> >> Pro: Calling such
a data structure a "hash table", even as Common Lisp >> does, is an abstraction
violation. Whether it works by hashing is an >> implementation detail. The
Java collection library calls this a Map. >> Python calls it a dictionary.
Clojure calls it a map. Those are both >> better names. > > Actually, in
Java the naming reflects the separation between abstraction and implementation.
AFAIU, clojure does not quite do this and neither does Python. > > I would
advocate settling down on a Java-esque nomenclature with MAP or DICTIONARY
as "names" for the abstraction and with different names for the implementations;
e.g., DICTIONARY-TREE, DICTIONARY-FHASH, DICTIONARY-HASH-TABLE, you name
it.... [...]

Content analysis details: (-100.7 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
-100 USER_IN_WHITELIST From: address is in the user's white-list
0.0 SINGLE_HEADER_2K A single header contains 2K-3K characters
0.0 FREEMAIL_FROM Sender email is freemail (alessiostalla[at]gmail.com)
-0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low
trust
[209.85.216.179 listed in list.dnswl.org]
-0.0 SPF_PASS SPF: sender matches SPF record
0.0 RFC_ABUSE_POST Both abuse and postmaster missing on sender domain
0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid
Archived-At: <http://permalink.gmane.org/gmane.lisp.cl-pro/446>

On Tue, Jun 14, 2011 at 9:37 AM, Marco Antoniotti
Friends,
I wrote a little package for "fash hash tables", which provide an
abstraction that is analogous to that of Common Lisp hash tables, but
is faster for tables with few elements, and only slightly inferior for
tables with many elements.
I did this because performance analysis showed that our system was
spending too much time in hash table operations, and using the new
package helped.
I have recently been cleaning this up, one reason being that I'd like
to open source it.  The function names used to be things like getfhash
and mapfhash.  Now they are like fhash:get and fhash:map-elements and
so on.
However, before I open-source it, I was to make sure it's "right".  It
recently occurred to me that the package name "fhash" has problems.
Here are pros and cons of changing it that I can see.
Pro: I's not a hash table in the small-cardinality case; it's a linear
lookup.  So the name is not actually accurate.
Pro: Calling such a data structure a "hash table", even as Common Lisp
does, is an abstraction violation.  Whether it works by hashing is an
implementation detail.  The Java collection library calls this a Map.
Python calls it a dictionary.  Clojure calls it a map.  Those are both
better names.
Actually, in Java the naming reflects the separation between abstraction and implementation.  AFAIU, clojure does not quite do this and neither does Python.
I would advocate settling down on a Java-esque nomenclature with MAP or DICTIONARY as "names" for the abstraction and with different names for the implementations; e.g.,  DICTIONARY-TREE, DICTIONARY-FHASH, DICTIONARY-HASH-TABLE, you name it....
AFAIK, Java started just like Lisp with the Hashtable class
(implementation without interface) which later was deprecated in favor
of Map (interface) + HashMap, TreeMap, ... (implementations).
There is already a somewhat official API for extensible sequences,
designed by Christophe Rhodes and implemented in SBCL and ABCL (and
maybe others, I don't know). We could similarly have extensible maps,
even though "sequence" is an already recognized abstraction in CL,
while "map" is not.

Alessio
Daniel Weinreb
2011-06-14 13:38:28 UTC
Permalink
Could you tell me where to find that? Thanks. -- Dan
Post by Alessio Stalla
On Tue, Jun 14, 2011 at 9:37 AM, Marco Antoniotti
Post by Marco Antoniotti
Post by Daniel Weinreb
Friends,
I wrote a little package for "fash hash tables", which provide an
abstraction that is analogous to that of Common Lisp hash tables, but
is faster for tables with few elements, and only slightly inferior for
tables with many elements.
I did this because performance analysis showed that our system was
spending too much time in hash table operations, and using the new
package helped.
I have recently been cleaning this up, one reason being that I'd like
to open source it. The function names used to be things like getfhash
and mapfhash. Now they are like fhash:get and fhash:map-elements and
so on.
However, before I open-source it, I was to make sure it's "right". It
recently occurred to me that the package name "fhash" has problems.
Here are pros and cons of changing it that I can see.
Pro: I's not a hash table in the small-cardinality case; it's a linear
lookup. So the name is not actually accurate.
Pro: Calling such a data structure a "hash table", even as Common Lisp
does, is an abstraction violation. Whether it works by hashing is an
implementation detail. The Java collection library calls this a Map.
Python calls it a dictionary. Clojure calls it a map. Those are both
better names.
Actually, in Java the naming reflects the separation between abstraction
and implementation. AFAIU, clojure does not quite do this and neither does
Python.
Post by Marco Antoniotti
I would advocate settling down on a Java-esque nomenclature with MAP or
DICTIONARY as "names" for the abstraction and with different names for the
implementations; e.g., DICTIONARY-TREE, DICTIONARY-FHASH,
DICTIONARY-HASH-TABLE, you name it....
AFAIK, Java started just like Lisp with the Hashtable class
(implementation without interface) which later was deprecated in favor
of Map (interface) + HashMap, TreeMap, ... (implementations).
There is already a somewhat official API for extensible sequences,
designed by Christophe Rhodes and implemented in SBCL and ABCL (and
maybe others, I don't know). We could similarly have extensible maps,
even though "sequence" is an already recognized abstraction in CL,
while "map" is not.
Alessio
_______________________________________________
pro mailing list
http://lists.common-lisp.net/cgi-bin/mailman/listinfo/pro
Alessio Stalla
2011-06-14 14:13:15 UTC
Permalink
Content preview: On Tue, Jun 14, 2011 at 3:38 PM, Daniel Weinreb wrote: > Could
you tell me where to find that? Thanks. -- Dan The paper - titled "User-extensible
sequences in Common Lisp" by C. Rhodes - can be found for example here: <http://citeseerx.ist.psu.edu/viewdoc/download?doi.1.1.65.1604&rep=rep1&type=pdf>
[...]

Content analysis details: (-100.7 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
-100 USER_IN_WHITELIST From: address is in the user's white-list
0.0 FREEMAIL_FROM Sender email is freemail (alessiostalla[at]gmail.com)
-0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low
trust
[209.85.216.51 listed in list.dnswl.org]
-0.0 SPF_PASS SPF: sender matches SPF record
0.0 RFC_ABUSE_POST Both abuse and postmaster missing on sender domain
0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid
Archived-At: <http://permalink.gmane.org/gmane.lisp.cl-pro/450>
Could you tell me where to find that?  Thanks. -- Dan
The paper - titled "User-extensible sequences in Common Lisp" by C.
Rhodes - can be found for example here:
<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.65.1604&rep=rep1&type=pdf>

I don't remember how well the paper describes SBCL's implementation; I
think it's worth taking a look at it to see how it combines CLOS (for
genericity) with regular functions special-cased on CL built-in types.
ABCL's impl is almost a clone of SBCL's, with only minor adaptations.

I agree that classes in CLOS are overrated. CLOS is mainly about
generic functions and it's a pity, imho, that GFs can only be
specialized on classes. With minor changes CLOS could be more general.

Cheers,
Alessio
Daniel Weinreb
2011-06-28 22:06:58 UTC
Permalink
Hi. I read Christophe's paper on extensible sequences. I don't think
this bears on my new package, though, for two reasons:

(1) it's only about sequences; maps don't fit into its framework.

(2) He is proposing here a change that would have to be made
to every Common Lisp implementation. As may have been
apparent from other email I've sent, I am, sadly, pessimistic
that we can really get all of the implementors to make changes
in harmony. It's not that they are bad or incompetent or
anything like that. It's just that they're busy people with
other priorities. In some cases, the priorities include
"putting food on the table" (in the metaphorical sense),
i.e. it would be easier if someone could pay them to
do this, but I don't see how that would happen.

Anyway, thanks for pointing me at this very interesting
paper.

-- Dan
Post by Alessio Stalla
Post by Daniel Weinreb
Could you tell me where to find that? Thanks. -- Dan
The paper - titled "User-extensible sequences in Common Lisp" by C.
<
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.65.1604&rep=rep1&type=pdf
I don't remember how well the paper describes SBCL's implementation; I
think it's worth taking a look at it to see how it combines CLOS (for
genericity) with regular functions special-cased on CL built-in types.
ABCL's impl is almost a clone of SBCL's, with only minor adaptations.
I agree that classes in CLOS are overrated. CLOS is mainly about
generic functions and it's a pity, imho, that GFs can only be
specialized on classes. With minor changes CLOS could be more general.
Cheers,
Alessio
Pascal Costanza
2011-06-28 22:18:06 UTC
Permalink
Please don't take this personal, but: Pessimism is infectious. It would be better if you would keep your pessimism for yourself. There is some amount of good enthusiasm still in this community, and it needs to be encouraged further, not discouraged!

If there is enough interest and enthusiasm, then things do move. I have had very positive experiences myself with the Closer to MOP project and ContextL, where many - almost all - Common Lisp vendors were very open to bug fixes and suggestions for improvements, to the extent that the overall support for MOP-based extensions has been considerably improved. Another more recent example is the adoption of ASDF 2.x, which is by now included as a default system definition facility in many Common Lisp implementations. There are more such examples.

Thanks,
Pascal
Post by Daniel Weinreb
Hi. I read Christophe's paper on extensible sequences. I don't think
(1) it's only about sequences; maps don't fit into its framework.
(2) He is proposing here a change that would have to be made
to every Common Lisp implementation. As may have been
apparent from other email I've sent, I am, sadly, pessimistic
that we can really get all of the implementors to make changes
in harmony. It's not that they are bad or incompetent or
anything like that. It's just that they're busy people with
other priorities. In some cases, the priorities include
"putting food on the table" (in the metaphorical sense),
i.e. it would be easier if someone could pay them to
do this, but I don't see how that would happen.
Anyway, thanks for pointing me at this very interesting
paper.
-- Dan
Post by Daniel Weinreb
Could you tell me where to find that? Thanks. -- Dan
The paper - titled "User-extensible sequences in Common Lisp" by C.
<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.65.1604&rep=rep1&type=pdf>
I don't remember how well the paper describes SBCL's implementation; I
think it's worth taking a look at it to see how it combines CLOS (for
genericity) with regular functions special-cased on CL built-in types.
ABCL's impl is almost a clone of SBCL's, with only minor adaptations.
I agree that classes in CLOS are overrated. CLOS is mainly about
generic functions and it's a pity, imho, that GFs can only be
specialized on classes. With minor changes CLOS could be more general.
Cheers,
Alessio
_______________________________________________
pro mailing list
http://lists.common-lisp.net/cgi-bin/mailman/listinfo/pro
--
Pascal Costanza
The views expressed in this email are my own, and not those of my employer.
Alessio Stalla
2011-06-29 21:24:59 UTC
Permalink
Content preview: On Wed, Jun 29, 2011 at 12:06 AM, Daniel Weinreb wrote: >
Hi. I read Christophe's paper on extensible sequences. I don't think > this
bears on my new package, though, for two reasons: > (1) it's only about sequences;
maps don't fit into its framework. [...]

Content analysis details: (-100.7 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
-100 USER_IN_WHITELIST From: address is in the user's white-list
0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider
(alessiostalla[at]gmail.com)
-0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low
trust
[209.85.216.179 listed in list.dnswl.org]
-0.0 SPF_PASS SPF: sender matches SPF record
0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid
Archived-At: <http://permalink.gmane.org/gmane.lisp.cl-pro/480>
Hi.   I read Christophe's paper on extensible sequences.  I don't think
(1) it's only about sequences; maps don't fit into its framework.
Yes, I was aware of that; I suggested the paper because it details
what is, imho, an effort analogous to yours, and might give
inspiration wrt. the general spirit, the API and the integration with
the rest of CL.
(2) He is proposing here a change that would have to be made
to every Common Lisp implementation.
Such a change would have to be made because CL already has a sequence
abstraction, so it makes sense to extend it rather than to provide
another abstraction with the same goals. Since there's no map
abstraction in CL, a similar API for maps would not need any special
support from implementations. Extra features like LOOP integration
might be provided only on CL implementations with an extensible LOOP,
or by using Iterate instead, or, shameless plug, doplus [1].
As may have been
apparent from other email I've sent, I am, sadly, pessimistic
that we can really get all of the implementors to make changes
in harmony.  It's not that they are bad or incompetent or
anything like that.  It's just that they're busy people with
other priorities.  In some cases, the priorities include
"putting food on the table" (in the metaphorical sense),
i.e. it would be easier if someone could pay them to
do this, but I don't see how that would happen.
Anyway, thanks for pointing me at this very interesting
paper.
I'm with Pascal on pessimism: I'm sure all sane Lisp implementers will
add any feature that is reasonably easy to implement and is demanded
by sufficiently many users. Fixes to the MOP generally satisfy both
these rules. Extensible sequences do not, yet, at least because most
Lispers don't know about them or don't find them sufficiently useful
to bug their vendors about them. In my personal experience with ABCL,
where dealing with Java libraries is fairly common, having Java Lists
be natively understood as CL sequences is valuable. I imagine that if,
e.g., FSet would get more users, having it integrated with sequences
would be appealing (even if it would open another can of worms since
the CL sequence API assumes mutability - a design mistake, imho).

[1] http://code.google.com/p/tapulli/wiki/doplus

Alessio
Daniel Weinreb
2011-06-30 22:12:02 UTC
Permalink
Just for the record, I think extensible sequences would be really great!
-- Dan
Post by Alessio Stalla
Post by Daniel Weinreb
Hi. I read Christophe's paper on extensible sequences. I don't think
(1) it's only about sequences; maps don't fit into its framework.
Yes, I was aware of that; I suggested the paper because it details
what is, imho, an effort analogous to yours, and might give
inspiration wrt. the general spirit, the API and the integration with
the rest of CL.
Post by Daniel Weinreb
(2) He is proposing here a change that would have to be made
to every Common Lisp implementation.
Such a change would have to be made because CL already has a sequence
abstraction, so it makes sense to extend it rather than to provide
another abstraction with the same goals. Since there's no map
abstraction in CL, a similar API for maps would not need any special
support from implementations. Extra features like LOOP integration
might be provided only on CL implementations with an extensible LOOP,
or by using Iterate instead, or, shameless plug, doplus [1].
Post by Daniel Weinreb
As may have been
apparent from other email I've sent, I am, sadly, pessimistic
that we can really get all of the implementors to make changes
in harmony. It's not that they are bad or incompetent or
anything like that. It's just that they're busy people with
other priorities. In some cases, the priorities include
"putting food on the table" (in the metaphorical sense),
i.e. it would be easier if someone could pay them to
do this, but I don't see how that would happen.
Anyway, thanks for pointing me at this very interesting
paper.
I'm with Pascal on pessimism: I'm sure all sane Lisp implementers will
add any feature that is reasonably easy to implement and is demanded
by sufficiently many users. Fixes to the MOP generally satisfy both
these rules. Extensible sequences do not, yet, at least because most
Lispers don't know about them or don't find them sufficiently useful
to bug their vendors about them. In my personal experience with ABCL,
where dealing with Java libraries is fairly common, having Java Lists
be natively understood as CL sequences is valuable. I imagine that if,
e.g., FSet would get more users, having it integrated with sequences
would be appealing (even if it would open another can of worms since
the CL sequence API assumes mutability - a design mistake, imho).
[1] http://code.google.com/p/tapulli/wiki/doplus
Alessio
Sam Steingold
2011-06-30 23:05:33 UTC
Permalink
Post by Daniel Weinreb
Post by Daniel Weinreb
Just for the record, I think extensible sequences would be really great!
Just for the record, CLISP has had extensible sequences since forever (far
predating my involvement). See clisp/src/defseq.lisp which defines the standard
sequences (vectors, lists et al); the infrastructure is in C file clisp/src/sequence.d:
[...]

Content analysis details: (-100.7 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
-100 USER_IN_WHITELIST From: address is in the user's white-list
0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider
(sam.steingold[at]gmail.com)
-0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low
trust
[209.85.216.51 listed in list.dnswl.org]
-0.0 SPF_PASS SPF: sender matches SPF record
0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid
Archived-At: <http://permalink.gmane.org/gmane.lisp.cl-pro/482>
Post by Daniel Weinreb
Just for the record, I think extensible sequences would be really great!
Just for the record, CLISP has had extensible sequences since forever
(far predating my involvement).
See clisp/src/defseq.lisp which defines the standard sequences (vectors,
lists et al); the infrastructure is in C file clisp/src/sequence.d:

/* O(seq_types) contains a list of type descriptors for sequences.
These are simple-vectors of length 16, containing:
SEQ-TYPE ; the type of the sequence, usually a symbol
Access functions:
SEQ-INIT
SEQ-UPD
SEQ-ENDTEST
SEQ-FE-INIT
SEQ-FE-UPD
SEQ-FE-ENDTEST
SEQ-ACCESS
SEQ-ACCESS-SET
SEQ-COPY
SEQ-LENGTH
SEQ-MAKE
SEQ-ELT
SEQ-SET-ELT
SEQ-INIT-START
SEQ-FE-INIT-END

Explanation of the functions SEQ-XXX:

A "pointer" is something, that can step through a sequence.
There are pointers, that move from left to right;
they are created with INIT or INIT-START, copied with COPY,
UPD to advance one step,
ENDTEST for testing, if they have reached the end of the Sequence,
ACCESS for fetching the element, which is pointed to by the pointer,
ACCESS-SET for setting the element, which is pointed to by the pointer.
There are also pointers, that move from right to left;
they are created with FE-INIT or FE-INIT-END, copied with COPY,
FE-UPD for moving them one step to the left,
FE-ENDTEST for testing, if they have reached the end of the Sequence,
ACCESS for fetching the element, which is pointed to by the pointer.
For them, ACCESS-SET does not work.

Movement operations:
INIT (lambda (seq) ...) -> pointer
returns the leftmost pointer of SEQ.
UPD (lambda (seq pointer) ...) -> pointer
returns a pointer to the adjacent neighbor at the right.
SEQ-UPD can assume, that the right border of
SEQ is not stepped over.
ENDTEST (lambda (seq pointer) ...) -> bool
tests, if this pointer is at the right end of SEQ.
The same "FROM END" :
FE-INIT (lambda (seq) ...) -> pointer
returns the rightmost pointer of SEQ.
FE-UPD (lambda (seq pointer) ...) -> pointer
returns a pointer to the adjacent neighbor at the left.
SEQ-FE-UPD can assume, that the left border of
SEQ is not stepped over.
FE-ENDTEST (lambda (seq pointer) ...) -> bool
tests, if this pointer is at the left end of SEQ.
Access via pointer:
ACCESS (lambda (seq pointer) ...) -> value
returns the element in SEQ the pointer is pointing to.
ACCESS-SET (lambda (seq pointer value) ...) ->
sets the element where the pointer is pointing to in SEQ, to the
specified value. Works only for pointers that move from left to
right!
COPY (lambda (pointer) ...) -> pointer
returns a copy of the Pointer to SEQ (because UPD and FE-UPD
can operate destructively on the pointers)
Total length:
LENGTH (lambda (seq) ...) -> size
returns the (active) length of the Sequence SEQ.
MAKE (lambda (size) ...) -> sequence
returns a newly allocated, empty sequence, that has the type
SEQ-TYPE and the specified length.
Access via index (usually more inefficient than via pointer):
ELT (lambda (seq index) ...) -> value
returns (ELT SEQ index)
SET-ELT (lambda (seq index value) ...) ->
sets (ELT SEQ index) to value.
INIT-START (lambda (seq index) ...) -> pointer
returns a pointer which moves in SEQ from left to right
from Position index. Must execute the Range-test by itself.
FE-INIT-END (lambda (seq index) ...) -> pointer
returns a pointer which moves in SEQ from right to left
from Position index. Must execute the Range-test by itself.
*/

of course, the presence of index accessors prevents things like hash
tables, trees and maps from being considered sequences...
--
Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 11.0.60900031
http://camera.org http://truepeace.org http://dhimmi.com
http://iris.org.il http://memri.org http://mideasttruth.com
Our business is run on trust. We trust you will pay in advance.
Ala'a Mohammad
2011-06-14 14:32:34 UTC
Permalink
Content preview: [Daniel, excuse me for the double post] Forwarded message
From: Ala'a Mohammad Date: Tue, Jun 14, 2011 at 11:29 AM Subject: Re: [pro]
"fhash" To: Daniel Weinreb On Mon, Jun 13, 2011 at 11:18 PM, Daniel Weinreb
wrote: >... > I have recently been cleaning this up, one reason being that
I'd like > to open source it. The function names used to be things like getfhash
and mapfhash. Now they are like fhash:get and fhash:map-elements and >
... > Here are pros and cons of changing it that I can see. > ... > Con: Common
Lisp already uses the name "hash table", so it would be > easier for existing
Common Lisp programmers to see the analogy. [...]

Content analysis details: (-0.7 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
-0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low
trust
[209.85.212.51 listed in list.dnswl.org]
0.0 FREEMAIL_FROM Sender email is freemail (amalawi[at]gmail.com)
-0.0 SPF_PASS SPF: sender matches SPF record
0.0 RFC_ABUSE_POST Both abuse and postmaster missing on sender domain
0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid
Archived-At: <http://permalink.gmane.org/gmane.lisp.cl-pro/451>

[Daniel, excuse me for the double post]

---------- Forwarded message ----------
From: Ala'a Mohammad <amalawi-***@public.gmane.org>
Date: Tue, Jun 14, 2011 at 11:29 AM
Subject: Re: [pro] "fhash"
...
I have recently been cleaning this up, one reason being that I'd like
to open source it.  The function names used to be things like getfhash
and mapfhash.  Now they are like fhash:get and fhash:map-elements and
...
Here are pros and cons of changing it that I can see.
...
Con: Common Lisp already uses the name "hash table", so it would be
easier for existing Common Lisp programmers to see the analogy.
I can see this as a Con only if you will use the same API as
hash-table. However, the fhash library uses different APIs (get,
map-elements) instead of (gethash, maphash). This means the benefit of
knowing/familiarity-with CL hash-table will not help/aid while using
fhash (guessing the API), since I have to lookup (or learn) the new
API wording.

- Ala'a
Loading...