| Andrew Cooke | Contents | Latest | RSS | Twitter | Previous | Next


Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Lepl parser for Python.

Colorless Green.

Photography around Santiago.

SVG experiment.

Professional Portfolio

Calibration of seismometers.

Data access via web services.

Cache rewrite.

Extending OpenSSH.

C-ORM: docs, API.

Last 100 entries

Consciousness From Max Entropy; Democrats; Harvard Will Fix Black Poverty; Modelling Bicycle Wheels; Amusing Polling Outlier; If Labour keeps telling working class people...; Populism and Choice; Books on Defeat; Enrique Ferrari - Argentine Author; Transcript of German Scientists on Learning of Hiroshima; Calvert Journal; Telephone System Quotes for Cat Soft LLC; Owen Jones on Twitter; Telephone System Quotes for Cat Soft LLC; Possible Japanese Authors; Complex American Literature; Chutney v5; Weird Componentized Virus; Interesting Argentinian Author - Antonio Di Benedetto; Useful Thread on MetaPhysics; RAND on fighting online anarchy (2001); Now Is Cat Soft LLC's Chance To Save Up To 32% On Mail; NSA Hacked; Call Center Services for Cat Soft LLC; Very Good LRB Article on Brexit; Nussbaum on Anger; Credit Card Processing for Cat Soft LLC; Discover new movies on demand in our online cinema; Tasting; Credit Card Processing for Cat Soft LLC; Apple + Kiwi Jam; Hit Me; Increase Efficiency with GPS Vehicle Tracking for Cat Soft LLC; Sudoku - CSP + Chaos; Recycling Electronics In Santiago; Vector Displays in OpenGL; Call Center Services for Cat Soft LLC; And Anti-Aliased; OpenGL - Render via Intermediate Texture; And Garmin Connect; Using Garmin Forerunner 230 With Linux; Payroll Service Quotes for Cat Soft LLC; (Beating Dead Horse) StackOverflow; Current State of Justice in China; Now Is Cat Soft LLC's Chance To Save Up To 32% On Mail; Axiom of Determinacy; Ewww; Fee Chaos Book; Course on Differential Geometry; Increase Efficiency with GPS Vehicle Tracking for Cat Soft LLC; Okay, but...; Sparse Matrices, Deep Learning; Sounds Bad; Applebaum Rape; Tomato Chutney v4; Have to add...; Culturally Liberal and Nothing More; Weird Finite / Infinite Result; Your diamond is a beaten up mess; Maths Books; Good Bike Route from Providencia / Las Condes to Panul\; Iain Pears (Author of Complex Plots); Plum Jam; Excellent; More Recently; For a moment I forgot StackOverflow sucked; A Few Weeks On...; Chilean Book Recommendations; How To Write Shared Libraries; Jenny Erpenbeck (Author); Dijkstra, Coins, Tables; Python libraries error on OpenSuse; Deserving Trump; And Smugness; McCloskey Economics Trilogy; cmocka - Mocks for C; Concept Creep (Americans); Futhark - OpenCL Language; Moved / Gone; Fan and USB issues; Burgers in Santiago; The Origin of Icosahedral Symmetry in Viruses; autoenum on PyPI; Jars Explains; Tomato Chutney v3; REST; US Elections and Gender: 24 Point Swing; PPPoE on OpenSuse Leap 42.1; SuperMicro X10SDV-TLN4F/F with Opensuse Leap 42.1; Big Data AI Could Be Very Bad Indeed....; Cornering; Postcapitalism (Paul Mason); Black Science Fiction; Git is not a CDN; Mining of Massive Data Sets; Rachel Kaadzi Ghansah; How great republics meet their end; Raspberry, Strawberry and Banana Jam; Interesting Dead Areas of Math; Later Taste; For Sale

© 2006-2015 Andrew Cooke (site) / post authors (content).

Talk on SDSS

From: "andrew cooke" <andrew@...>

Date: Fri, 7 Apr 2006 11:37:11 -0400 (CLT)

Listened to a talk on SDSS - http://cas.sdss.org/dr4/en/ - at work
yesterday.  These are my notes:

- 35 queries
The MS guru database chap asked them for 20 queries before designing the
database.  That grew to 35 that they now repeatedly run as benchmarks
after upgrades.  I was surprised at the number (20 seems a lot, and it got
bigger).  Good way to get non-experts to talk about the data model.

- keep all versions (inc bugs)
They had a fixed set of data they wanted to put on the web.  They did
that, and then started finding ways to improve things.  Great, but old
versions must stay - people are using the data in long term projects.

- raw sql
- user tables
Got burnt very early with OODB.  They do everything in SQL.  Data
remediation.  User's have their own scratch tables.  This is a big point
of conflict with opinions in our team.

- two phase loader - chunked
- first stage no indices
- verification in sql
- parallel loading
- second stage faster once data trusted
First loading stage takes raw data and builds index-free tables.  Data are
then remediated.  Second stage moves remediated data into indexed tables.

- sql nice for //n (cpus and disks)
- as many volumes as processors
- single table scans are slowest - disk read limited
Avoid big disks.  Parallelize across disks and processors.

- 2Mb crossover (file v sql)
That's pretty big for a blob, but much less than our binary data (images).

- submission queues to channel user expectations (slow web pages bad)
Batch processing is batch processing.  Admit it and make it clear to your

- spatial features surprisingly popular
- "ferris wheel" scan - sequence of filters; scan database again and again
(for cross-matching)
Lots of details about spatial indexing and searching.  2D indexing is
hard.  If you often end up doing a scan, optimize for scans.  Ferris whell
model is repeated scanning, in chunks.  Scan a bunch of queries together. 
Queue queries for a chunk; go through the datbase scanning one chunk at a


Comment on this post