| Andrew Cooke | Contents | Latest | RSS | Twitter | Previous | Next

C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Lepl parser for Python.

Colorless Green.

Photography around Santiago.

SVG experiment.

Professional Portfolio

Calibration of seismometers.

Data access via web services.

Cache rewrite.

Extending OpenSSH.

C-ORM: docs, API.

Last 100 entries

An Outsider's Guide To Julia Packages; Nobody gives a shit; Lepton Decay Irregularity; An Easier Way; Julia's BinDeps (aka How To Install Cairo); Good Example Of Good Police Work (And Anonymity Being Hard); Best Santiago Burgers; Also; Michael Emmerich (Vibrator Translator) Interview (Japanese Books); Clarice Lispector (Brazillian Writer); Books On Evolution; Looks like Ara (Modular Phone) is dead; Index - Translations From Chile; More Emotion in Chilean Wines; Week 7; Aeon Magazine (Science-ish); QM, Deutsch, Constructor Theory; Interesting Talk Transcripts; Interesting Suggestion Of Election Fraud; "Hard" Books; Articles or Papers on depolarizing the US; Textbook for "QM as complex probabilities"; SFO Get Libor Trader (14 years); Why Are There Still So Many Jobs?; Navier Stokes Incomplete; More on Benford; FBI Claimed Vandalism; Architectural Tessellation; Also: Go, Blake's 7; Delusions of Gender (book); Crypto AG DID work with NSA / GCHQ; UNUMS (Universal Number Format); MOOCs (Massive Open Online Courses); Interesting Looking Game; Euler's Theorem for Polynomials; Weeks 3-6; Reddit Comment; Differential Cryptanalysis For Dummies; Japanese Graphic Design; Books To Be Re-Read; And Today I Learned Bugs Need Clear Examples; Factoring a 67 bit prime in your head; Islamic Geometric Art; Useful Julia Backtraces from Tasks; Nothing, however, is lost with less discomfort than that which, when lost, cannot be missed; Article on Didion; Cost of Living by City; British Slavery; Derrida on Metaphor; African SciFi; Traits in Julia; Alternative Japanese Lit; Pulic Key as Address (Snow); Why Information Grows; The Blindness Of The Chilean Elite; Some Victoriagate Links; This Is Why I Left StackOverflow; New TLS Implementation; Maths for Physicists; How I Am 8; 1000 Word Philosophy; Cyberpunk Reading List; Detailed Discussion of Message Dispatch in ParserCombinator Library for Julia; FizzBuzz in Julia w Dependent Types; kokko - Design Shop in Osaka; Summary of Greece, Currently; LLVM and GPUs; See Also; Schoolgirl Groyps (Maths); Japanese Lit; Another Example - Modular Arithmetic; Music from United; Python 2 and 3 compatible alternative.; Read Agatha Christie for the Plot; A Constructive Look at TempleOS; Music Thread w Many Recommendations; Fixed Version; A Useful Julia Macro To Define Equality And Hash; k3b cdrom access, OpenSuse 13.1; Week 2; From outside, the UK looks less than stellar; Huge Fonts in VirtualBox; Keen - Complex Emergencies; The Fallen of World War II; Some Spanish Fiction; Calling C From Fortran 95; Bjork DJ Set; Z3 Example With Python; Week 1; Useful Guide To Starting With IJulia; UK Election + Media; Review: Reinventing Organizations; Inline Assembly With Julia / LLVM; Against the definition of types; Dumb Crypto Paper; The Search For Quasi-Periodicity...; Is There An Alternative To Processing?; CARDIAC (CARDboard Illustrative Aid to Computation); The Bolivian Case Against Chile At The Hague; Clear, Cogent Economic Arguments For Immigration; A Program To Say If I Am Working

© 2006-2015 Andrew Cooke (site) / post authors (content).

Talk on SDSS

From: "andrew cooke" <andrew@...>

Date: Fri, 7 Apr 2006 11:37:11 -0400 (CLT)

Listened to a talk on SDSS - http://cas.sdss.org/dr4/en/ - at work
yesterday.  These are my notes:

- 35 queries
The MS guru database chap asked them for 20 queries before designing the
database.  That grew to 35 that they now repeatedly run as benchmarks
after upgrades.  I was surprised at the number (20 seems a lot, and it got
bigger).  Good way to get non-experts to talk about the data model.

- keep all versions (inc bugs)
They had a fixed set of data they wanted to put on the web.  They did
that, and then started finding ways to improve things.  Great, but old
versions must stay - people are using the data in long term projects.

- raw sql
- user tables
Got burnt very early with OODB.  They do everything in SQL.  Data
remediation.  User's have their own scratch tables.  This is a big point
of conflict with opinions in our team.

- two phase loader - chunked
- first stage no indices
- verification in sql
- parallel loading
- second stage faster once data trusted
First loading stage takes raw data and builds index-free tables.  Data are
then remediated.  Second stage moves remediated data into indexed tables.

- sql nice for //n (cpus and disks)
- as many volumes as processors
- single table scans are slowest - disk read limited
Avoid big disks.  Parallelize across disks and processors.

- 2Mb crossover (file v sql)
That's pretty big for a blob, but much less than our binary data (images).

- submission queues to channel user expectations (slow web pages bad)
Batch processing is batch processing.  Admit it and make it clear to your
users.

- spatial features surprisingly popular
- "ferris wheel" scan - sequence of filters; scan database again and again
(for cross-matching)
Lots of details about spatial indexing and searching.  2D indexing is
hard.  If you often end up doing a scan, optimize for scans.  Ferris whell
model is repeated scanning, in chunks.  Scan a bunch of queries together. 
Queue queries for a chunk; go through the datbase scanning one chunk at a
time.

Andrew

Comment on this post