Platypus Header

Platypus Innovation Blog

17 February 2014

There are no AAA databases

It's a mistake to believe absolutely in uncertain things. That's one of the lessons of the financial crisis. Uncertain loans were dressed up as triple-A reliable assets, but it turned out to be wishful thinking.

Dice bag (cc) KaptainKobold@Flickr

I see similar practices in databases and business intelligence.

We all know that databases contain errors. The errors come from many sources: data is mis-entered, or it was accurate but people move on, or the database schema was changed, but not all the data was correctly updated, or two databases are merged, but the join is dodgy: same name doesn't always mean same person. I've yet to encounter a database that didn't contain errors.

Everyone knows this. And yet people build business processes that assume the database is 100% correct. Even best practice in data analysis is only to try and limit errors entering the system -- but once they're in, the mistakes can run free.

In business intelligence, we see claims that everything can be measured. Claims that are plausible & we'd like to believe. All too often it's over-confidence and over-selling.

Accepting uncertainty does not mean giving up on measurement. It just means accepting errors are part of measurement. Once we accept that, we can deal with it. We should estimate the things we cannot directly & accurately measure. But remember that is an estimate. And know how good that estimate is, and how much that affects your decisions. There are cases where the-right-order-of-magnitude is fine, and others where even 99% accuracy isn't good enough.

It's especially important to know the blind-spots in your KPIs -- the things you can't properly measure. And there are always blind spots.

Anyone who promotes KPIs and ROIs without talking about errors is selling something unreliable. It's easy enough to hide uncertainty & inaccuracy - but you pay the cost down the line with interest. Remember the AAA sub-prime loans -- not all that glitters is gold. We ignore uncertainty at your peril.

The salesmen of over-confidence cannot have it both ways: if data is important, you'd better be honest about its quality.

10 February 2014

Geocoding Twitter: Who cares about New Zealand?

Geo-coding is where you take descriptions of a place -- such as the location people give out on Twitter -- and work out where on Earth it actually is.

Geo-coding is not an exact science. E.g. "Cambridge" could refer to a city in the UK, or one next to Boston in the USA (and oddly, both cities are home to world-class universities). And that's the easy stuff. Twitter locations can be... interesting -- such as "wherever there is dancing", or "city of purple".

So geocoding software can be forgiven for making occasional mistakes & odd choices. Here are some we've found:

Heaven is in Iran, but Paradise is in the USA.
Iran also counts as far far away.
Reality is in India
Gun Shaped State is Oman, as is Somewhere Yu Aint! (I suppose there's a kind of logic here-- for most of us, Oman is somewhere we aren't).
Wonderful Island is Taiwan...
...but Whore Island is somewhat cruelly identifed as Iceland
Atop of a Whovian Bum is in Azerbaijan

My favourite malapropism:
Who cares? and Who knows? mean you're in New Zealand

NB: We currently use a mix of Google, Yahoo & Twitter geocoders (each of which has it's own strengths and weaknesses). The examples above come from one of those three. It's usually Google -- who have the largest most varied coverage -- for the random. We are developing our own in-house geocoder based on Open Street Map data.

Spam, spam, lovely spam

This comment was such a great piece of spam, we had to publish it (minus the url).
Do you have а spam isѕue оn thіs ѕite;
I also аm a blogger, аnd Ι wаs wanting to knoω your ѕіtuation;
many of us hаve developeԁ some nicе mеthods
and wе агe looking to trade strаtegіes
ωith other folks, bе suгe to shoot me an emаil
if іntereѕtеd.

Нere iѕ my homеpagе viagra

Javascript Enums

Enums are a useful way to handle a set of constants. They protect against typos and bad-values, help to spot missed cases, and make refactoring a lot safer.

Javascript does not have enums. So how can we get the same benefits?

Enum.js is a simple class which gives you enum-like behaviour.

Example: Instead of writing e.g. if (sibling == 'BROTHER') ... throughout your code, write Sibling = new Enum('BROTHER SISTER'); then use if (Sibling.isBROTHER(x)) ....

Here's the Enum.js code as a gist

Good-Loop Unit