Andrew Cooke | Contents | Latest | RSS | Twitter | Previous | Next


Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Lepl parser for Python.

Colorless Green.

Photography around Santiago.

SVG experiment.

Professional Portfolio

Calibration of seismometers.

Data access via web services.

Cache rewrite.

Extending OpenSSH.

C-ORM: docs, API.

Last 100 entries

Copier Quotes for Cat Soft LLC; [Book] Galileo's Middle Finger; VOIP quote for Cat Soft LLC; [Bike] Chinese Carbon Rims; Collection Agencies for Cat Soft LLC; Get Coffee Quotes for Cat Soft LLC; [Bike] Servicing Shimano XT Front Hub HB-M8010; [Bike] Aliexpress Cycling Tops; Now Is Cat Soft LLC's Chance To Save Up To 32% On Mail; Call Center Services for Cat Soft LLC; [Computing] Change to ssh handling of multiple identities?; [Bike] Endura Hummvee Lite II; [Computing] Marble Based Logic; [Link, Politics] Sanity Check For Nuclear Launch; [Link, Science] Entropy and Life; [Link, Bike] Cheap Cycling Jerseys; [Link, Music] Music To Steal 2017; [Link, Future] Simulated Brain Drives Robot; [Link, Computing] Learned Index Structures; Solo Air Equalization; Update: Higher Pressures; Psychology; [Bike] Exercise And Fuel; Continental Race King 2.2; Removing Lowers; Mnesiacs; [Maths, Link] Dividing By Zero; [Book, Review] Ray Monk - Ludwig Wittgenstein: The Duty Of Genius; [Link, Bike, Computing] Evolving Lacing Patterns; [Jam] Strawberry and Orange Jam; [Chile, Privacy] Biometric Check During Mail Delivery; [Link, Chile, Spanish] Article on the Chilean Drought; [Bike] Extended Gear Ratios, Shimano XT M8000 (24/36 Chainring); [Link, Politics, USA] The Future Of American Democracy; Mass Hysteria; [Review, Books, Links] Kazuo Ishiguro - Never Let Me Go; [Link, Books] David Mitchell's Favourite Japanese Fiction; [Link, Bike] Rear Suspension Geometry; [Link, Cycling, Art] Strava Artwork; [Link, Computing] Useful gcc flags; [Link] Voynich Manuscript Decoded; [Bike] Notes on Servicing Suspension Forks; [Links, Computing] Snap, Flatpack, Appimage; [Link, Computing] Oracle is leaving Java (to die); [Link, Politics] Cubans + Ultrasonics; [Book, Link] Laurent Binet; VirtualBox; [Book, Link] No One's Ways; [Link] The Biggest Problem For Cyclists Is Bad Driving; [Computing] Doxygen, Sphinx, Breathe; [Admin] Brokw Recent Permalinks; [Bike, Chile] Buying Bearings in Santiago; [Computing, Opensuse] Upgrading to 42.3; [Link, Physics] First Support for a Physics Theory of Life; [Link, Bike] Peruvian Frame Maker; [Link] Awesome Game Theory Tit-For-Tat Thing; [Food, Review] La Fabbrica - Good Italian Food In Santiago; [Link, Programming] MySQL UTF8 Broken; [Link, Books] Latin American Authors; [Link, Computing] Optimizatin Puzzle; [Link, Books, Politics] Orwell Prize; [Link] What the Hell Is Happening With Qatar?; [Link] Deep Learning + Virtual Tensor Machines; [Link] Scaled Composites: Largest Wingspan Ever; [Link] SCP Foundation; [Bike] Lessons From 2 Leading 2 Trailing; [Link] Veg Restaurants in Santiago; [Link] List of Contemporary Latin American Authors; [Bike] FTHR; [Link] Whoa - NSA Reduces Collection (of US Residents); [Link] Red Bull's Breitbart; [Link] Linux Threads; [Link] Punycode; [Link] Bull / Girl Statues on Wall Street; [Link] Beautiful Chair Video; Update: Lower Pressures; [Link] Neat Python Exceptions; [Link] Fix for Windows 10 to Avoid Ads; [Link] Attacks on ZRTP; [Link] UK Jazz Invasion; [Review] Cuba; [Link] Aricle on Gender Reversal of US Presidential Debate; {OpenSuse] Fix for Network Offline in Updater Applet; [Link] Parkinson's Related to Gut Flora; Farellones Bike Park; [Meta] Tags; Update: Second Ride; Schwalbe Thunder Burt 2.1 v Continental X-King 2.4; Mountain Biking in Santiago; Books on Ethics; Security Fail from Command Driven Interface; Everything Old is New Again; Interesting Take on Trump's Lies; Chutney v6; References on Entropy; Amusing "Alexa.." broadcast; The Shame of Chile's Education System; Playing mp4 gifs in Firefox on Opensuses Leap 42.2; Concurrency at Microsoft; Globalisation: Uk -> Chile; OpenSuse 42.2 and Synaptics Touch-Pads

© 2006-2017 Andrew Cooke (site) / post authors (content).

Workshop on Web Information Retrieval

From: "andrew cooke" <andrew@...>

Date: Fri, 12 Aug 2005 15:06:09 -0400 (CLT)

[en castellano mas abajo]

A short summary of the morning talks at the Workshop on Web Information
Retrieval - - hosted by the
Centre for Web Research - (U Chile).

Efficient and Expressively Complete XML Query Languages; XML Data
Exchange: Consistency and Query Answering:
Bleagh.  Both talks way over my head.  As far as I could work out (though
I don't think anyone said this) XPATH and XQUERY were designed by some
rather pragmatic (possible read: ignorant of the theory) people.  As a
consequence, they have the usual problems that come with "pragmatic"
solutions - they're difficult to study analytically and behave very poorly
in certain cases.  Seems like a bit more effort could have been taken to
build on previous knowledge and design something that not only had a
friendly syntax, but was easy to map onto known systems (first order
logic, modal second order logic, whatever those are) and where the bits
that imply more expensive processing are added in such a way that a more
efficient subset of the language could be defined.

Temporal RDF:
RDF is a way of putting semantics on the web.  You define relations: "this
is XXX wrt YYY".  For example, "fred is the son of bob", or "P is a
subclass of Q" or "this relation is of type Z" (they can refer to
themselves).  So you have a bunch of triples (subject, object, relation)
and get a graph out the end.
Turns out that you can define a normal form for these graphs, due to
recent work.
Temporal RDF, then, is a way of extensing this to include time.  Which
adds more relations.  The problem is not doing this - there are an
infinite number of approaches - but finding the most useful approach.
Incidentally, I suggested using RDF to place the NSA metatdata on the web.
 This work would allow timeseries to be expressed in a natural manner.

Query Languages for Graph Databases:
Very good talk.  Most data can be nicely expressed as graphs.  That's why
pointers (and graph theory!) are so importnt in programming.  Yet
relational database are horribly inefficient ways of manging such
structures (as anyone who has had to encode a tree and doesn't know
Celko's hack can testify!).
Anyway, XML data are trees (and references give DCGs).  And RDF gives
DCGs.  So these things are coming back into fashion.  Trouble is, again,
people are ignoring past results.  Turns out that none of the suggested
RDF database systems (and there's a good half dozen) answer common
questions as well as generic "graph databases" from research in the 90s.

Interactive Cross Language Retrieval:
Very entertaining talk on searching documents which are written in a
language you don't understand (including the obvious point: why?!).
Key point: machine translation (at least currently) needs to be focussed
on a particular task.  How you do translation for one task (eg searching
documents) is different from another (eg presenting documents to the user
for them to assess their relevance, or doing a translation for "use"
Anyway, for cross language search they can now do as well as searching in
a single language!  Impressive result.  Done via machine learning on large
bi-lingual corpuses (corpi?).  Take the two translations of the same text
and see how words match up.  Typically you get word X in language A
matching to a set of workds (P, Q, R, S) in language B.  For search, use
all those with their relative weighting.
Interactive tools that allow you to refine the (cross-language) search (in
various cool ways) are only worth using if you have more than 10mins to
spend fiddling.
Search tip: When you think you have the answer to a question, do a search
including the answer.  Large number of hits indicates success.

Precision Recall with User Modelling Applied to XML Retrieval:
How to rate different XML searches in a fair, standard way.  A bit
technical and focussed for me, but the technique he suggested sounded good
(the user model includes the idea that you give a user a node and if the
answer is in a neighbouring node, they'll probably see it).

Efficient Searchable Natural Language Adaptive Compression
Another interesting talk.  There are two kinds of compression.  You either
decide what you're going to do before hand (typically by studying your
data) or optimize your encoding on the fly.  Huffman coding is the
traditional example of the first approach, lzip the second.
Typically, adaptive coding (the second approach) is best, but it makes
searching difficult, since the encoding keeps changing.
Key point: In natural languages, word frequency is much more skewed than
letter frequency, so encode whole words.
Anyway, the speaker presented a really cool hybrid solution.  If you use
bytes to encode words then you don't care about the ordering of words
except near n/n+1 byte boundaries.  So you get almost static ordering plus
an occasional "swap encoding" when a word bumps up over a boundary.  The
number of swaps stabilizes after a Mb or two of text, so you get very
efficient encoding, easy decoding, and the possibility of searching (as
long as you pay attention to the swaps).

[traduccion en castellano]
bueno, cuando comence, no sabia que ib a escribir tanto.  lo siento, pero
tienen que practicar su ingles si quieren enetenderlo...  una cosa -
exequiel estaba preguntando sobre la diferencia entre "database" y
"relational database".  parece que Codd definio una base de datos en un
paper escrito en 1980.  tienen tres componentes - un metodo de estructurar
los datos, un sistema para buscarlos, y una manera en que se puede
verificar (enforcar?) que estan bien. (no se puede leer el paper
entero, desafortunademente).


` __ _ __ ___  ___| |_____   personal web site: ttp://
 / _` / _/ _ \/ _ \ / / -_)  list: ttp://
 \__,_\__\___/\___/_\_\___|  aim: acookeorg; skype: andrew-cooke

compute mailing list

Comment on this post