[Review Request] Small example of modern application of Common Lisp

Discussion:

Christian von Essen

2011-10-18 23:15:59 UTC

Content preview: Hi, I use CL to scrape several comic websites and generate
a website that collects the daily strips from that. The (small) program's
features: * Easy definition of comic sources * Uses xpath to get the comics
* Stores an archive of daily comics * Generate web pages with comics per
day [...]

Content analysis details: (-2.8 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
-2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, medium
trust
[129.88.30.5 listed in list.dnswl.org]
-0.0 SPF_PASS SPF: sender matches SPF record
-0.5 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain
Archived-At: <http://permalink.gmane.org/gmane.lisp.cl-pro/563>

Hi,

I use CL to scrape several comic websites and generate a website that
collects the daily strips from that. The (small) program's features:

* Easy definition of comic sources
* Uses xpath to get the comics
* Stores an archive of daily comics
* Generate web pages with comics per day

A feature that is IMHO still missing is easier (maybe interactive)
comic specification. Currently you better get your xpath right.

I think it would be possible to promote CL with an application like
that.

Could you have a look at the code and give me some hints on style and CL
in general, so that the code actually becomes good enough for that
purpose?

You can find it there:
https://github.com/Neronus/Lisp-Utils/blob/master/comics.lisp
And an example of the generated output here:
http://christian.ftwca.de/comics/

Thank you,

Christian

Faré

2011-10-28 15:29:38 UTC

Permalink

On Tue, Oct 18, 2011 at 19:15, Christian von Essen

Post by Christian von Essen
Hi,
I use CL to scrape several comic websites and generate a website that
* Easy definition of comic sources
* Uses xpath to get the comics
* Stores an archive of daily comics
* Generate web pages with comics per day
A feature that is IMHO still missing is easier (maybe interactive)
comic specification. Currently you better get your xpath right.
I think it would be possible to promote CL with an application like
that.
Could you have a look at the code and give me some hints on style and CL
in general, so that the code actually becomes good enough for that
purpose?
https://github.com/Neronus/Lisp-Utils/blob/master/comics.lisp
http://christian.ftwca.de/comics/
Thank you,
Christian

Dear Christian,

I'm interested in your web scraping technology in CL.

I'd like to build a distributed web proxy that persistently records
everything one views, so that you can always read and share the pages
you like even when the author dies, the servers are taken off-line,
the domain name is bought by someone else, and the new owner puts a
new robots.txt that tells archive.org to not display the pages
anymore.

I don't know if this adventure tempts you, but I think the time is
ripe for end-user-controlled peer-to-peer distributed archival and
sharing of information. Obvious application, beyond archival, is a
distributed facebook/g+ replacement.

PS: in shelisp, maybe you could use
xcvb-driver:run-program/process-output-stream instead of yet another
partial run-program interface. I really would like to see half-baked
portability layers die. If you really need input as well as output, I
could hack that into the xcvb-driver utility.

—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org

Samium Gromoff

2011-11-01 19:20:45 UTC

Permalink

Post by FarÃ©
Dear Christian,
I'm interested in your web scraping technology in CL.
I'd like to build a distributed web proxy that persistently records
everything one views, so that you can always read and share the pages
you like even when the author dies, the servers are taken off-line,
the domain name is bought by someone else, and the new owner puts a
new robots.txt that tells archive.org to not display the pages
anymore.
I don't know if this adventure tempts you, but I think the time is
ripe for end-user-controlled peer-to-peer distributed archival and
sharing of information. Obvious application, beyond archival, is a
distributed facebook/g+ replacement.

I cannot add anything, but express an emphatic agreement.

One important thing, IMO, would be a mathematically-sound, peer-to-peer
archive authenticity co-verification -- perhaps in the same sense as
git manages to do it.

--
regards,
Samium Gromoff
--
"Actually I made up the term 'object-oriented', and I can tell you I
did not have C++ in mind." - Alan Kay (OOPSLA 1997 Keynote)

Paul Nathan

2011-11-02 01:32:01 UTC

Permalink

Post by Samium Gromoff

I cannot add anything, but express an emphatic agreement.
One important thing, IMO, would be a mathematically-sound, peer-to-peer
archive authenticity co-verification -- perhaps in the same sense as
git manages to do it.

I agree. It's becoming pretty obvious to me that the 'web' can be
described as being in a state of constant rot and regrowth (sites go down.
other sites go up). Unfortunately, the rot takes with it some really
valuable pieces of information.

An interesting definition of a website might be to be actually a git
repository - hyperlinks take both a file and a changeset hash the file was
valid at; a 'certified' website might have a gpg signature on the commits
as well.

One interesting application might be an 'archiving browser', which caches
all/most of the sites you visit. Instead of rummaging through google trying
to figure out what the search terms were to hit that one site (if it's
still indexed by google and if it's still up), you can instead run a query
on your local application.

As a personal project, I have been contemplating putting together a web
spider/index for better web searching; it would be nice to contribute
components from that to a larger project relating to web storage &
archiving.

Regards,
Paul

Matthew Mondor

2011-11-02 14:53:43 UTC

Permalink

Content preview: On Tue, 1 Nov 2011 18:32:01 -0700 Paul Nathan <pnathan.software-***@public.gmane.org>
wrote: > One interesting application might be an 'archiving browser', which
caches > all/most of the sites you visit. Instead of rummaging through google
trying > to figure out what the search terms were to hit that one site (if
it's > still indexed by google and if it's still up), you can instead run
a query > on your local application. > > As a personal project, I have been
contemplating putting together a web > spider/index for better web searching;
it would be nice to contribute > components from that to a larger project
relating to web storage & > archiving. [...]

Content analysis details: (-1.2 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
-1.2 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain
Archived-At: <http://permalink.gmane.org/gmane.lisp.cl-pro/567>

On Tue, 1 Nov 2011 18:32:01 -0700

Post by Paul Nathan
One interesting application might be an 'archiving browser', which caches
all/most of the sites you visit. Instead of rummaging through google trying
to figure out what the search terms were to hit that one site (if it's
still indexed by google and if it's still up), you can instead run a query
on your local application.
As a personal project, I have been contemplating putting together a web
spider/index for better web searching; it would be nice to contribute
components from that to a larger project relating to web storage &
archiving.

I really like this idea. There exist a few distributed spider+search
engine projects which could perhaps one day with enough participants
allow to replace commercial search engines, while permitting
unrestricted searches (ever noticed how the public google search
interface used to be more powerful, but was "censored" since?).
Unfortunately, those projects are yet unpopular and could not at all
compete at current time.

A distributed archiving system could also embed such a distributed
search engine...

--
Matt