Andrew Cooke | Contents | Latest | RSS | Previous | Next


Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Choochoo Training Diary

Last 100 entries

Felicitations - Empowerment Grant; [Bike] Fixing Spyre Brakes (That Need Constant Adjustment); [Computing, Music] Raspberry Pi Media (Audio) Streamer; [Computing] Amazing Hack To Embed DSL In Python; [Bike] Ruta Del Condor (El Alfalfal); [Bike] Estimating Power On Climbs; [Computing] Applying Azure B2C Authentication To Function Apps; [Bike] Gearing On The Back Of An Envelope; [Computing] Okular and Postscript in OpenSuse; There's a fix!; [Computing] Fail2Ban on OpenSuse Leap 15.3 (NFTables); [Cycling, Computing] Power Calculation and Brakes; [Hardware, Computing] Amazing Pockit Computer; Bullying; How I Am - 3 Years Post Accident, 8+ Years With MS; [USA Politics] In America's Uncivil War Republicans Are The Aggressors; [Programming] Selenium and Python; Better Walking Data; [Bike] How Fast Before Walking More Efficient Than Cycling?; [COVID] Coronavirus And Cycling; [Programming] Docker on OpenSuse; Cadence v Speed; [Bike] Gearing For Real Cyclists; [Programming] React plotting - visx; [Programming] React Leaflet; AliExpress Independent Sellers; Applebaum - Twilight of Democracy; [Politics] Back + US Elections; [Programming,Exercise] Simple Timer Script; [News] 2019: The year revolt went global; [Politics] The world's most-surveilled cities; [Bike] Hope Freehub; [Restaurant] Mama Chau's (Chinese, Providencia); [Politics] Brexit Podcast; [Diary] Pneumonia; [Politics] Britain's Reichstag Fire moment; install cairo; [Programming] GCC Sanitizer Flags; [GPU, Programming] Per-Thread Program Counters; My Bike Accident - Looking Back One Year; [Python] Geographic heights are incredibly easy!; [Cooking] Cookie Recipe; Efficient, Simple, Directed Maximisation of Noisy Function; And for argparse; Bash Completion in Python; [Computing] Configuring Github Jekyll Locally; [Maths, Link] The Napkin Project; You can Masquerade in Firewalld; [Bike] Servicing Budget (Spring) Forks; [Crypto] CIA Internet Comms Failure; [Python] Cute Rate Limiting API; [Causality] Judea Pearl Lecture; [Security, Computing] Chinese Hardware Hack Of Supermicro Boards; SQLAlchemy Joined Table Inheritance and Delete Cascade; [Translation] The Club; [Computing] Super Potato Bruh; [Computing] Extending Jupyter; Further HRM Details; [Computing, Bike] Activities in ch2; [Books, Link] Modern Japanese Lit; What ended up there; [Link, Book] Logic Book; Update - Garmin Express / Connect; Garmin Forerunner 35 v 230; [Link, Politics, Internet] Government Trolls; [Link, Politics] Why identity politics benefits the right more than the left; SSH Forwarding; A Specification For Repeating Events; A Fight for the Soul of Science; [Science, Book, Link] Lost In Math; OpenSuse Leap 15 Network Fixes; Update; [Book] Galileo's Middle Finger; [Bike] Chinese Carbon Rims; [Bike] Servicing Shimano XT Front Hub HB-M8010; [Bike] Aliexpress Cycling Tops; [Computing] Change to ssh handling of multiple identities?; [Bike] Endura Hummvee Lite II; [Computing] Marble Based Logic; [Link, Politics] Sanity Check For Nuclear Launch; [Link, Science] Entropy and Life; [Link, Bike] Cheap Cycling Jerseys; [Link, Music] Music To Steal 2017; [Link, Future] Simulated Brain Drives Robot; [Link, Computing] Learned Index Structures; Solo Air Equalization; Update: Higher Pressures; Psychology; [Bike] Exercise And Fuel; Continental Race King 2.2; Removing Lowers; Mnesiacs; [Maths, Link] Dividing By Zero; [Book, Review] Ray Monk - Ludwig Wittgenstein: The Duty Of Genius; [Link, Bike, Computing] Evolving Lacing Patterns; [Jam] Strawberry and Orange Jam; [Chile, Privacy] Biometric Check During Mail Delivery; [Link, Chile, Spanish] Article on the Chilean Drought; [Bike] Extended Gear Ratios, Shimano XT M8000 (24/36 Chainring); [Link, Politics, USA] The Future Of American Democracy; Mass Hysteria

© 2006-2017 Andrew Cooke (site) / post authors (content).

Workshop on Web Information Retrieval

From: "andrew cooke" <andrew@...>

Date: Fri, 12 Aug 2005 15:06:09 -0400 (CLT)

[en castellano mas abajo]

A short summary of the morning talks at the Workshop on Web Information
Retrieval - - hosted by the
Centre for Web Research - (U Chile).

Efficient and Expressively Complete XML Query Languages; XML Data
Exchange: Consistency and Query Answering:
Bleagh.  Both talks way over my head.  As far as I could work out (though
I don't think anyone said this) XPATH and XQUERY were designed by some
rather pragmatic (possible read: ignorant of the theory) people.  As a
consequence, they have the usual problems that come with "pragmatic"
solutions - they're difficult to study analytically and behave very poorly
in certain cases.  Seems like a bit more effort could have been taken to
build on previous knowledge and design something that not only had a
friendly syntax, but was easy to map onto known systems (first order
logic, modal second order logic, whatever those are) and where the bits
that imply more expensive processing are added in such a way that a more
efficient subset of the language could be defined.

Temporal RDF:
RDF is a way of putting semantics on the web.  You define relations: "this
is XXX wrt YYY".  For example, "fred is the son of bob", or "P is a
subclass of Q" or "this relation is of type Z" (they can refer to
themselves).  So you have a bunch of triples (subject, object, relation)
and get a graph out the end.
Turns out that you can define a normal form for these graphs, due to
recent work.
Temporal RDF, then, is a way of extensing this to include time.  Which
adds more relations.  The problem is not doing this - there are an
infinite number of approaches - but finding the most useful approach.
Incidentally, I suggested using RDF to place the NSA metatdata on the web.
 This work would allow timeseries to be expressed in a natural manner.

Query Languages for Graph Databases:
Very good talk.  Most data can be nicely expressed as graphs.  That's why
pointers (and graph theory!) are so importnt in programming.  Yet
relational database are horribly inefficient ways of manging such
structures (as anyone who has had to encode a tree and doesn't know
Celko's hack can testify!).
Anyway, XML data are trees (and references give DCGs).  And RDF gives
DCGs.  So these things are coming back into fashion.  Trouble is, again,
people are ignoring past results.  Turns out that none of the suggested
RDF database systems (and there's a good half dozen) answer common
questions as well as generic "graph databases" from research in the 90s.

Interactive Cross Language Retrieval:
Very entertaining talk on searching documents which are written in a
language you don't understand (including the obvious point: why?!).
Key point: machine translation (at least currently) needs to be focussed
on a particular task.  How you do translation for one task (eg searching
documents) is different from another (eg presenting documents to the user
for them to assess their relevance, or doing a translation for "use"
Anyway, for cross language search they can now do as well as searching in
a single language!  Impressive result.  Done via machine learning on large
bi-lingual corpuses (corpi?).  Take the two translations of the same text
and see how words match up.  Typically you get word X in language A
matching to a set of workds (P, Q, R, S) in language B.  For search, use
all those with their relative weighting.
Interactive tools that allow you to refine the (cross-language) search (in
various cool ways) are only worth using if you have more than 10mins to
spend fiddling.
Search tip: When you think you have the answer to a question, do a search
including the answer.  Large number of hits indicates success.

Precision Recall with User Modelling Applied to XML Retrieval:
How to rate different XML searches in a fair, standard way.  A bit
technical and focussed for me, but the technique he suggested sounded good
(the user model includes the idea that you give a user a node and if the
answer is in a neighbouring node, they'll probably see it).

Efficient Searchable Natural Language Adaptive Compression
Another interesting talk.  There are two kinds of compression.  You either
decide what you're going to do before hand (typically by studying your
data) or optimize your encoding on the fly.  Huffman coding is the
traditional example of the first approach, lzip the second.
Typically, adaptive coding (the second approach) is best, but it makes
searching difficult, since the encoding keeps changing.
Key point: In natural languages, word frequency is much more skewed than
letter frequency, so encode whole words.
Anyway, the speaker presented a really cool hybrid solution.  If you use
bytes to encode words then you don't care about the ordering of words
except near n/n+1 byte boundaries.  So you get almost static ordering plus
an occasional "swap encoding" when a word bumps up over a boundary.  The
number of swaps stabilizes after a Mb or two of text, so you get very
efficient encoding, easy decoding, and the possibility of searching (as
long as you pay attention to the swaps).

[traduccion en castellano]
bueno, cuando comence, no sabia que ib a escribir tanto.  lo siento, pero
tienen que practicar su ingles si quieren enetenderlo...  una cosa -
exequiel estaba preguntando sobre la diferencia entre "database" y
"relational database".  parece que Codd definio una base de datos en un
paper escrito en 1980.  tienen tres componentes - un metodo de estructurar
los datos, un sistema para buscarlos, y una manera en que se puede
verificar (enforcar?) que estan bien. (no se puede leer el paper
entero, desafortunademente).


` __ _ __ ___  ___| |_____   personal web site: ttp://
 / _` / _/ _ \/ _ \ / / -_)  list: ttp://
 \__,_\__\___/\___/_\_\___|  aim: acookeorg; skype: andrew-cooke

compute mailing list

Comment on this post