Andrew Cooke | Contents | Latest | RSS | Twitter | Previous | Next

C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Lepl parser for Python.

Colorless Green.

Photography around Santiago.

SVG experiment.

Professional Portfolio

Calibration of seismometers.

Data access via web services.

Cache rewrite.

Extending OpenSSH.

C-ORM: docs, API.

Last 100 entries

[Link, Future] Simulated Brain Drives Robot; [Link, Computing] Learned Index Structures; Solo Air Equalization; Update: Higher Pressures; Psychology; [Bike] Exercise And Fuel; Continental Race King 2.2; Removing Lowers; Mnesiacs; [Maths, Link] Dividing By Zero; [Book, Review] Ray Monk - Ludwig Wittgenstein: The Duty Of Genius; [Link, Bike, Computing] Evolving Lacing Patterns; [Jam] Strawberry and Orange Jam; [Chile, Privacy] Biometric Check During Mail Delivery; [Link, Chile, Spanish] Article on the Chilean Drought; [Bike] Extended Gear Ratios, Shimano XT M8000 (24/36 Chainring); [Link, Politics, USA] The Future Of American Democracy; Mass Hysteria; [Review, Books, Links] Kazuo Ishiguro - Never Let Me Go; [Link, Books] David Mitchell's Favourite Japanese Fiction; [Link, Bike] Rear Suspension Geometry; [Link, Cycling, Art] Strava Artwork; [Link, Computing] Useful gcc flags; [Link] Voynich Manuscript Decoded; [Bike] Notes on Servicing Suspension Forks; [Links, Computing] Snap, Flatpack, Appimage; [Link, Computing] Oracle is leaving Java (to die); [Link, Politics] Cubans + Ultrasonics; [Book, Link] Laurent Binet; VirtualBox; [Book, Link] No One's Ways; [Link] The Biggest Problem For Cyclists Is Bad Driving; [Computing] Doxygen, Sphinx, Breathe; [Admin] Brokw Recent Permalinks; [Bike, Chile] Buying Bearings in Santiago; [Computing, Opensuse] Upgrading to 42.3; [Link, Physics] First Support for a Physics Theory of Life; [Link, Bike] Peruvian Frame Maker; [Link] Awesome Game Theory Tit-For-Tat Thing; [Food, Review] La Fabbrica - Good Italian Food In Santiago; [Link, Programming] MySQL UTF8 Broken; [Link, Books] Latin American Authors; [Link, Computing] Optimizatin Puzzle; [Link, Books, Politics] Orwell Prize; [Link] What the Hell Is Happening With Qatar?; [Link] Deep Learning + Virtual Tensor Machines; [Link] Scaled Composites: Largest Wingspan Ever; [Link] SCP Foundation; [Bike] Lessons From 2 Leading 2 Trailing; [Link] Veg Restaurants in Santiago; [Link] List of Contemporary Latin American Authors; [Bike] FTHR; [Link] Whoa - NSA Reduces Collection (of US Residents); [Link] Red Bull's Breitbart; [Link] Linux Threads; [Link] Punycode; [Link] Bull / Girl Statues on Wall Street; [Link] Beautiful Chair Video; Update: Lower Pressures; [Link] Neat Python Exceptions; [Link] Fix for Windows 10 to Avoid Ads; [Link] Attacks on ZRTP; [Link] UK Jazz Invasion; [Review] Cuba; [Link] Aricle on Gender Reversal of US Presidential Debate; {OpenSuse] Fix for Network Offline in Updater Applet; [Link] Parkinson's Related to Gut Flora; Farellones Bike Park; [Meta] Tags; Update: Second Ride; Schwalbe Thunder Burt 2.1 v Continental X-King 2.4; Mountain Biking in Santiago; Books on Ethics; Security Fail from Command Driven Interface; Everything Old is New Again; Interesting Take on Trump's Lies; Chutney v6; References on Entropy; Amusing "Alexa.." broadcast; The Shame of Chile's Education System; Playing mp4 gifs in Firefox on Opensuses Leap 42.2; Concurrency at Microsoft; Globalisation: Uk -> Chile; OpenSuse 42.2 and Synaptics Touch-Pads; Even; Cherry Jam; Lebanese Writer Amin Maalouf; C++ - it's the language of the future; Learning From Trump; Chinese Writer Hu Fayun; And; Apricot Jam; Also; Excellent Article on USA Politics; Oh Metafilter; Prejudice Against The Rurals; Also, Zizek; Trump; Why Trump Won; Doxygen + Latex on CentOS 6; SMASH - Solve 5 Biggest Problems in Physics

© 2006-2017 Andrew Cooke (site) / post authors (content).

Optimising Clojure

From: andrew cooke <andrew@...>

Date: Wed, 31 Aug 2011 08:54:22 -0300

I had some Clojure code that needed optimising.  Eventually I achieved a 5x
speedup.  This is how.


Finding Something to Measure
----------------------------

First, I needed something to measure, so that I could tell how effective my
changes were.  This was complicated by three things: the need to generate test
data to process; the delayed action of the JIT; garbage collection.

Generating test data is quite expensive, so I needed to carefully tune
parameters so that my processing used more time than the test data generation
(not necessary for the simple timings, but important for profiling, since I
could find no simple way to delay profiling to start after the data was
ready).  Then I placed the processing in a loop, repeating it multiple times
(re-using the same test data).  Each pass through the loop was timed - the
successive times let me see when the JIT engaged (typically the first
calculation was 50% slower) and also reduce the effect of GC by selecting the
lowest value.


Profiling
---------

Once I had a good test running in my IDE, I needed to make standalone Java
classes to profile.  The simplest way to do this seems to be to use a build
tool called leiningen - https://github.com/technomancy/leiningen - which was
very easy to use.  You just download and run it, create an empty project, and
modify the project descriptor file (which I then copied into my existing
project).

My file contents are:

  (defproject compressive "0.0.0"
    :description "set-related experiments"
    :aot [#"com.isti.compset.*"]
    :main com.isti.compset.stack
    :dependencies [[org.clojure/clojure "1.3.0-beta1"]
		   [org.clojure/clojure-contrib "1.2.0"]])

where:

  - aot indicates which classes need compiling
  - main indicates the location of the "main" file
  - dependencies are downloaded automatically

I also added the (-main) method and put (:gen-class) in my ns at the top of
the file.

With all that, "lein compile" generated a set of classes that could be run
with java:

  java -cp classes:lib/clojure-1.3.0-beta1.jar \
    -agentlib:hprof=cpu=samples,depth=10,file=hprof com.isti.compset.stack

giving a profile in the file "hprof".

This profile was dominated by expected methods related to generating
and consuming sequences, and adding and multiplying generic numbers.


Vector of Doubles
-----------------

The central loop in my code sums various spectra and the square of their
values.  The spectra were stored as "vec" instances.  I hope to remove the
use of generic numbers by replacing this with a vector of doubles and, if
necessary, adding some type annotations to various functions.

So I replaced "vec" with

  (defn d-vec [collection]
    (apply conj (vector-of :double) collection))

However, this did not have the desired effect - the code was
significantly slower and the trace was dominated by calls to "Vec.count".

I asked for help on stack overflow http://stackoverflow.com/q/7223297/181772
but didn't get any very useful response, so I removed these changes.


Array of Doubles
----------------

At this point I was a little disheartened.  The traces didn't have much useful
information (that I could see, even using a nice little tool called PerfAnal
http://java.sun.com/developer/technicalArticles/Programming/perfanal/).

So I looked at my inner loop and decided that I would simply rewrite it as I
thought it should be written, relying on experience rather than profiling.  I
also found this page http://clojure.org/java_interop on the use of arrays
helpful.

I made the following changes:

 - replace the vector in which I was accumulating results with a native array
   of doubles (using double-array)

 - mutate the accumulator rather than creating a new instance each loop

 - use additional indices to identify the "active range" of the accumulator
   (which may get smaller over time, due to restricted data ranges in
   spectra).    This had not been necessary when I was generating new
   immutable instances - the instances simply decreased in size.

 - change the spectra from vectors to arrays.

 - replace some maps and reduces with explicit loops that mutate data.

This was, initially a complete disaster.  Running time increased by roughly a
hundredfold.


Type Annotations
----------------

Rather than reverting these changes, which seemed justified from experience, I
looked again at profiling.  The profile was now dominated by methods related
to java introspection.

This is a known issue with Clojure and has a simple solution - add

  (set! *warn-on-reflection* true)

and remove all warnings by adding type hints.  I did not enable this for all
code - only for the central loop and related functions.  And I used the
"^doubles" hint which indicates a native array of doubles.

This, finally, gave a significant speedup - about 5x faster than the initial
code.


Conclusions
-----------

 - Clojure code is simplest, and reasonably efficient, when you stay within
   Clojure's own types (vec, seq etc).

 - Profiling with hprof can give broad information, but I, at least, couldn't
   understand it in detail (I normally can read profiling data!) - there
   seemed to be information missing, or I simply do not understand how Clojure
   works.

 - It was fairly simple to change the code to use typed, mutable data in the
   inner loop, which gave significant speedups.

 - Dynamic interop with Java is apallingly slow, but easy to fix once you see
   that it is happening.

 - More generally, Clojure works fairly well for exploratory data processing.
   It's nicely high-level and easy to work with, and can be optimised where
   necessary.  I have also implemented this particular algorithm in C and on
   GPUs, and I am sure it is faster in both cases, but (without doing any
   formal measurements) the code feels close enough to C for me to experiment
   with new ideas quickly.  I doubt this would have been possible in Python,
   for example (even with numpy - the inner loop is annoyingly hard to
   vectorize)

More on (vector-of :double)

From: andrew cooke <andrew@...>

Date: Wed, 31 Aug 2011 22:50:48 -0300

Justin Kramer at http://stackoverflow.com/q/7223297/181772 sugegsted using 

  (into (vector-of :double) collection)

to create vectors of doubles.  This seemed like a good idea for the immutable
spectra (it's probably not clear above but I introduced arrays of soubles in
two places - the mutable accumulator used to sum spectra, and the indvidual
spectra themselves).  However, when I tried it, I found that there was no way
to add type hints for these vectors.  That led to warnings of dynamic code
when the contents were used.

So using arrays of doubles is heping in two ways - it's supporting fast,
mutable access and also helping the compiler infer (from the ^doubles hint for
the array) the tyoe of the content.

Andrew

PS I posted this on HN where it gained some view, but no useful comments.

Comment on this post