| Andrew Cooke | Contents | Latest | RSS | Twitter | Previous | Next

C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Lepl parser for Python.

Colorless Green.

Photography around Santiago.

SVG experiment.

Professional Portfolio

Calibration of seismometers.

Data access via web services.

Cache rewrite.

Extending OpenSSH.

Last 100 entries

SQL Performance Explained; The Little Manual of API Design; Multiple Word Sizes; CRC - Next Steps; FizzBuzz; Update on CRCs; Decent Links / Discussion Community; Automated Reasoning About LLVM Optimizations and Undefined Behavior; A Painless Guide To CRC Error Detection Algorithms; Tests in Julia; Dave Eggers: what's so funny about peace, love and Starship?; Cello - High Level C Programming; autoreconf needs tar; Will Self Goes To Heathrow; Top 5 BioInformatics Papers; Vasovagal Response; Good Food in Vina; Chilean Drug Criminals Use Subsitution Cipher; Adrenaline; Stiglitz on the Impact of Technology; Why Not; How I Am 5; Lenovo X240 OpenSuse 13.1; NSA and GCHQ - Psychological Trolls; Finite Fields in Julia (Defining Your Own Number Type); Julian Assange; Starting Qemu on OpenSuse; Noisy GAs/TMs; Venezuela; Reinstalling GRUB with EFI; Instructions For Disabling KDE Indexing; Evolving Speakers; Changing Salt Size in Simple Crypt 3.0.0; Logarithmic Map (Moved); More Info; Words Found in Voynich Manuscript; An Inventory Of 3D Space-Filling Curves; Foxes Using Magnetic Fields To Hunt; 5 Rounds RC5 No Rotation; JP Morgan and Madoff; Ori - Secure, Distributed File System; Physical Unclonable Functions (PUFs); Prejudice on Reddit; Recursion OK; Optimizing Julia Code; Cash Handouts in Brazil; Couple Nice Music Videos; It Also Works!; Adaptive Plaintext; It Works!; RC5 Without Rotation (2); 8 Years...; Attack Against Encrypted Linux Disks; Pushing Back On NSA At IETF; Summary of Experimental Ethics; Very Good Talk On Security, Snowden; Locusts are Grasshoppers!; Vagrant (OpenSuse and IDEs); Interesting Take On Mandela's Context; Haskell Cabal O(n^2) / O(n) Fix; How I Am 4; Chilean Charity Supporting Women; Doing SSH right; Festival of Urban Intervention; Neat Idea - Wormholes Provide Entanglement; And a Link....; Simple Encryption for Python 2.7; OpenSuse 13.1 Is Better!; Little Gain...; More Details on Technofull Damage; Palmrest Cracked Too....; Tecnofull (Lenovo Support) Is Fucking Useless; The Neuroscientist Who Discovered He Was a Psychopath; Interpolating Polynomials; Bottlehead Crack as Pre-amp; Ooops K702!; Bottlehead Crack, AKG K701; Breaking RC5 Without Rotation; Great post thank you; Big Balls of Mud; Phabricator - Tools for working together; Amazing Julia RC5 Code Parameterized By Word Size; Chi-Square Can Be Two-Sided; Why Do Brits Accept Surveillance?; Statistics Done Wrong; Mesas Trape from Bravo; European Report on Crypto Primitives and Protocols; Interesting Omissions; Oryx And Crake (Margaret Atwood); Music and Theory; My Arduino Programs; Elliptic Curve Crypto; Re: Licensing Interpreted Code; Licensing Interpreted Code; ASUS 1015E-DS03 OpenSuse 12.3 SSD; translating lettuce feature files into stub steps files; Re: translating lettuce feature files into stub steps files; A Tale of Two Psychiatrists; The Real Reason the Poor Go Without Bank Accounts; The Rational Choices of Crack Addicts; Forgot grouped

© 2006-2013 Andrew Cooke (site) / post authors (content).

Why and How Writing Crypto is Hard

From: andrew cooke <andrew@...>

Date: Tue, 25 Dec 2012 18:56:30 -0300

Over the last few days I wrote a simple library to encrypt data in Python.
This blog post describes my experience writing that code.  I focus on the
various mistakes I made, and try to understand the underlying causes.


But first a little context.  I'm aware of the phrase (exhortation? slogan?)
"Typing The Letters A-E-S Into Your Code? You’re Doing It Wrong"
http://news.ycombinator.com/item?id=639647
but I couldn't find a Python 3 library that let me encrypt a string using a
simple password.

So I decided to go ahead, write the code, and then solicit feedback.  If I
had made any mistakes then perhaps someone else would correct me, and the
result would be something other people could use.


To be honest, when I started, I thought could do a pretty good job.  I've
worked with security-related code several times (a JNI wrapper for OpenSSL
back in the day; more recently, for example, making OpenSSH talk to hardware
key stores) and I thought a fair amount of crypto knowledge had "rubbed off" -
I can explain what CTR mode is, for example, and why you should never use the
same key+IV twice.  And also, I am not so dumb; how hard can this stuff be?

Even so, I searched around for some guidance on best practice.  And I was
lucky enough to stumble across
http://www.daemonology.net/blog/2009-06-11-cryptographic-right-answers.html
which I decided to follow.


My first attempt was broken (although I eventually found the mistake myself).
It had exactly the vulnerability I said I could explain above: messages with
the same key used the same counter sequence.  This was because the "iv"
parameter in the pycrypto Cipher API is ignored in CTR mode.  Instead, you
need to provide the data to the Counter object.

I don't know if I am being muddle-headed in thinking of the initial counter
value as an IV, but I was a little annoyed with pycrypto.  Couldn't it throw
an error if it's given an IV in CTR mode, instead of simply ignoring it?  On
the other hand it doesn't seem fair to expect a library of crypto primitives
to educate users - it's intended for experts, who should know what they are
doing.

Anyway, that was my first mistake.  The root cause being, I think, that crypto
APIs are complex because they provide access to powerful primitives that can
be combined in many ways, but which, at the same time, must also be efficient
(the need for efficiency affects the design of Counter, for example, which is
why the IV is ignored).  A box of sharp tools.


Next, I started to worry about the API for *my* users.  I couldn't really
expect them to provide a 256 bit key; this was a library for "anyone".  So
it had to take something more like a password.

Unfortunately, although I knew about key derivation functions, which is what
you need to go from password to key, I thought they were used only for
storing passwords.  I have no idea why I thought this, but as a consequence
I started to cobble together my own hand-rolled attempt at key strengthening.

Thankfully, as my code got more complex, I realised I must be reinventing an
already-existing technique.  Once I was convinced of that, finding PBKDF2 (it
was mentioned in the link I said I would follow - although nowhere near the
paragraph on symmetric ciphers) was easy.

So mistake 2 (which I eventually avoided) was not knowing about an existing
solution to a common problem.  Or rather, not knowing that it could be
applied in a more general sense than I had understood.


At this point I believed my code was pretty solid so I posted it to HN at
http://news.ycombinator.com/item?id=4962983

It took a while to get useful feedback, but when I did, it was awesome.  So
awesome it identified FIVE more problems.  Ouch.

  1.  Don't expose salt in the API.
  2.  Use separate keys for cipher and HMAC.
  3.  Avoid a possible timing attack when comparing HMACs.
  4.  Manage the counter in a standard (NIST) way.
  5.  PBKDF was using a weaker hash than expected.


The first (user gives salt) is plain embarrassing - it's just bad API design.
If I can blame anything other than incompetence, salt appeared in the original
API because it "seemed odd" to generate data and then append it to the
message.  I felt that even though it is how you handle the IV (and, in fact,
the final code uses the same data for both salt and IV).  So it's not a
particularly logical explanation for my mistake, but it's all I have.

The second (separate keys) was an open question - I just didn't know what
best practice was.  So lack of experience there.

The third (timing attack) was a subtle implementation detail I would never
have noticed.  A lack of knowledge of the current literature.

Fourth (counter management) was more damning.  I already knew the
normal way to handle counters, from using CTR mode to generate a stream
of random data in another project.  I thought I was being smart and improving
things by using a different approach (yes, I know that sounds like the kind
of thing a newbie would say, but I thought it *despite knowing that*).

Fifth (weaker hash) I blame partly on the pycrypto API (again) (the way that
the hash is exposed is rather obscure), but also on a lack of familiarity
with key derivation standards - I didn't know that the MAC was a likely
parameter.


So, in one simple piece of crypto code I had a total of seven errors (so far).
The sources of error were:

  *  Being unaware of existing solutions to common problems.
  *  Being unaware of existing best practices.
  *  Misunderstanding the complex API of a crypto toolkit.
  *  Bad API design.
  *  Ignoring existing solutions and "improving" things.

The last of these I can't do much about.  In theory I should be smart enough
to not do that.  I guess the lesson there is that sometimes you make even
dumber mistakes than you expect.

The rest divide nicely into two groups: experience and API design.

I was surprised how important experience was.  Despite having some experience
with security-related code.  Despite having a good set of guidelines on what
to do.  Despite being able to search the Internet.  Despite all that, I still
made mistakes that only experience could spot.

As for API design.  Well, I think that just confirms how important (and hard,
and overlooked) API design is.


So, what are the conclusions?  Experience and API design matter.  And even
when you are aware of the kind of pitfalls that face people that write crypto
code, you can still make dumb mistakes.

Andrew

PS The current library is at https://github.com/andrewcooke/simple-crypt

I can relate to that ...

From: Michiel Buddingh' <michiel@...>

Date: Thu, 27 Dec 2012 07:16:34 +0100

. . . I recently wrote some cryptographic code that encrypted some
very short (10-20 byte) messages.  There was a requirement that we'd
be able to decrypt any of these messages individually, without having
access to the other messages.

And so, I recycled the iv, and I didn't even bother with key
strengthening, knowing well that whoever reads this code in ten years
is going to think me an idiot.  But of course, 1) I really couldn't
justify the time to do it properly 2) we were just trying to
discourage onlookers, not thwart the NSA.

What still bothers me about that situation, though, is that, for all I
know, recycling the iv is the worst compromise to make; there might be
cleverer ways to accomplish what I was trying to do.

 . . . the thing is, the cryptography sector doesn't "do" trade-offs;
your security is either resilient to a government agency running a
chosen-plaintext attack on their FPGA cluster, or it's considered
embarassingly broken.

The very people who do have the capability to write high-level APIs,
to make sensible trade offs in designing algorithms and approaches to
security problems also have a, seemingly cultural, inhibition against
simplification.

--
Michiel

Re: I can relate to that ...

From: andrew cooke <andrew@...>

Date: Thu, 27 Dec 2012 08:55:14 -0300

Space constraints are difficult.  At work they were trying to encryot the body
of SMS.  I am not sure what happened in the end, but it wasn't looking good.

When it comes to "make it hard, but don't worry if it's not impossible" I feel
like there should be some kind of standard.  Perhaps there is, and it is
ROT13.  And maybe just suggesting that can help, because when people start to
object to ROT13 the same arguments typically apply to anything else that isn't
"proof against government".

Anyway, I just want to emphasise that I fixed all the bugs I discussed, and
simple-crypt, which is now on PyPi http://pypi.python.org/pypi/simple-crypt is
supposed to be able to "thwart the NSA".  Of course, it may still contain bugs
(which is why it is (1) in beta and (2) includes a header in the encrypted
data that will allow a fixed version to be deployed and work even when people
have used a previous, buggy version, should it be needed).

Andrew

Fixing this

From: Laurens Van Houtven <_@...>

Date: Sun, 11 Aug 2013 10:32:51 +0200

Hi Andrew,


Excellent points, and I agree wholeheartedly.

For the library situation, I've joined some people in writing a library:
https://github.com/alex/cryptography

Right now, it's mostly just primitives, but the end goal is an API that you
simply couldn't get wrong, which sounds to me like what you wanted in the
first place.

Additionally, I agree that education is lacking. Hence, I'm busy turning my
talk from last year, Crypto 101 (http://pyvideo.org/video/1778/crypto-101)
into a book. Hopefully this will make the journey for future programmers a
little easier :)

I eludicated further in a HN comment:
https://news.ycombinator.com/item?id=6194332

HTH,
lvh

Re: Why and How Writing Crypto is Hard

From: Teddy Hogeborn <teddy@...>

Date: Sun, 11 Aug 2013 16:24:52 +0200

> but I couldn't find a Python 3 library that let me encrypt a string
> using a simple password.

Well, use GPG for data at rest. You could just simply call GPG on the
command line.  Here's a class I wrote to do just that:

import subprocess
import binascii
import tempfile

class PGPError(Exception):
    """Exception if encryption/decryption fails"""
    pass

class PGPEngine(object):
    """A simple class for OpenPGP symmetric encryption & decryption

    with PGPEngine() as pgp:
        password = "password"
        data = "plaintext data"
        crypto = pgp.encrypt(data, password)
        decrypted = pgp.decrypt(crypto, password)
    """
    def __init__(self):
        self.tempdir = tempfile.mkdtemp()
        self.gnupgargs = ['--batch',
                          '--home', self.tempdir,
                          '--force-mdc',
                          '--quiet',
                          '--no-use-agent']
    
    def __enter__(self):
        return self
    
    def __exit__(self, exc_type, exc_value, traceback):
        self._cleanup()
        return False
    
    def __del__(self):
        self._cleanup()
    
    def _cleanup(self):
        if self.tempdir is not None:
            # Delete contents of tempdir
            for root, dirs, files in os.walk(self.tempdir,
                                             topdown = False):
                for filename in files:
                    os.remove(os.path.join(root, filename))
                for dirname in dirs:
                    os.rmdir(os.path.join(root, dirname))
            # Remove tempdir
            os.rmdir(self.tempdir)
            self.tempdir = None
    
    def password_encode(self, password):
        # Passphrase can not be empty and can not contain newlines or
        # NUL bytes.  So we prefix it and hex encode it.
        return b"foo" + binascii.hexlify(password)
    
    def encrypt(self, data, password):
        passphrase = self.password_encode(password)
        with tempfile.NamedTemporaryFile(dir=self.tempdir
                                         ) as passfile:
            passfile.write(passphrase)
            passfile.flush()
            proc = subprocess.Popen(['gpg', '--symmetric',
                                     '--passphrase-file',
                                     passfile.name]
                                    + self.gnupgargs,
                                    stdin = subprocess.PIPE,
                                    stdout = subprocess.PIPE,
                                    stderr = subprocess.PIPE)
            ciphertext, err = proc.communicate(input = data)
        if proc.returncode != 0:
            raise PGPError(err)
        return ciphertext
    
    def decrypt(self, data, password):
        passphrase = self.password_encode(password)
        with tempfile.NamedTemporaryFile(dir = self.tempdir
                                         ) as passfile:
            passfile.write(passphrase)
            passfile.flush()
            proc = subprocess.Popen(['gpg', '--decrypt',
                                     '--passphrase-file',
                                     passfile.name]
                                    + self.gnupgargs,
                                    stdin = subprocess.PIPE,
                                    stdout = subprocess.PIPE,
                                    stderr = subprocess.PIPE)
            decrypted_plaintext, err = proc.communicate(input
                                                        = data)
        if proc.returncode != 0:
            raise PGPError(err)
        return decrypted_plaintext

/Teddy Hogeborn

-- 
The Mandos Project
http://www.recompile.se/mandos

Comment on this post