Platypus Header

Platypus Innovation Blog

1 March 2019

Why I'm Giving This Talk (And not a Bot)

This is the talk notes and slides from a talk I gave at a Scotland Internet of Things workshop. My apologies for where the notes are incomplete.

Hello
Thank You

Let's start with me.

I'm Daniel Winterstein. I came to Edinburgh in 1999 to study Artificial Intelligence. It's a good city. It's a good subject.

I'm the Founder and CTO at Winterwell, we're a machine-learning consultancy. We make a product called SoDash, which is a social media tool, used by Harrods, Selfridges, Network Rail, and others.

We're pivoting to become Good-Loop, which is an ethical advertising and data-management platform.



Conversational UI - or "bots"


Why?


What if we're successful?



Someday, you're going to be sacked by a computer.

Which is convenient, as you'll presumably be able to get your P45 at the same time. The joined up process will be so smooth, it will be a bureaucrats wet dream. With cross-channel conversational follow-through and automated data-entry - It will make grown men weep.

Solution: Citizen's Wage / Basic Income



It's understandable to find this scary.

However, it's a sad reflection on the human condition that a life without hard or menial work scares us Imagine a life of pleasant contented happiness what a scourge on the face of the earth it would truly be... Douglas Adams' writing on the dolphins springs to mind.

Bots should deliver freedom from drudge work



Let's talk a bit about how today's bots go wrong, or make things worse.

Insincerity, Poor Etiquette, and Being Useless

These sins are not inherent to bots. 
Pushy sales-people and useless customer-service is not a new invention. 
But bots allow companies to be insincere, annoying, and useless at scale. 




I tried getting a bot to do the talk.   
Me: Hey Cortana, Could you help with my talk?
Cortana:
Me: Thank you Cortana
Cortana:

So that wasn't a success.



Let's look at another example. There's an anti-pattern emerging here: Bots shouldn't pretend to be human.

x.ai - brilliant idea: you want to schedule a meeting, you cc their bot, and it arranges the meeting.

Simple focused - Where does it go wrong?

It turns out even this really focused problem is surprisingly hard. They've been going 3 years, and they haven't cracked it yet. Right now, x.ai is only part AI; they also have teams of people processing messages. So in order for the bot to pretend to be human, they have people pretending to be bots.
This is not living the dream.

And the kicker: Doodle is a better service, inspite of being much simpler.
Because Doodle isn't confined by pretending to be human. So it can offer a user-interface that fits the problem.


Example emails

“Daniel, open this email for 12 people you should meet :)”
spam

“Re: Making Great Customer Experiences”
spam

If it's a sales message - don't pretend to be friends. If it's a cold email,
don't pretend we're having a conversation.

A simple test for if you want to deploy a chat bot: How would you feel as the recipient?

If the person you're talking to knew the full picture -- what's automated and what the goals are -- what would they think?
Would they be happy to receive fast service? Or annoyed at a pretence at caring?

We need a New Etiquette for Bots




Clippy - was intrusive. Though the modern web has bots that are worse.


Etiqutte and Sincerity are about how we as companies use bots. The solution is not technical - it's caring for our public.

Being Useless -- that is a technical problem.



Fear: That the bot will do more harm than good.

Quality: The bots can't deliver (yet).

Time/Cost: To learn a system, work out the common conversations,
and code them up.





Instead of programming the Bot. What if the bot learns from you?

We want bots to do repetitive tasks. If it's repetitive - there will be lots of examples for
them to learn form.

In general -- machine learning bots is hard, because communication is hard and needs a full
human understanding, and there is never enough data. But that's in general.

If you frame the task -- something specific, structured, and where failing to understand is OK
(bots should know when to stop and hand over gracefully) -- then it becomes possible.

15 February 2019

GPT-2: A new AI with amazing creative writing skills

This language AI is really amazing. Let's cut to the unicorns:

The AI was given a creative writing task: Given a prompt, and asked to continue writing the story. The example reported has been cherry-picked (i.e. the researchers filtered out the duds and are showing off the best output), but is still very very impressive.

Prompt:
In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.
Model completion (machine written!)
The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science. 
Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved. 
Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several companions, were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Pérez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow.
Pérez and the others then ventured further into the valley. “By the time we reached the top of one peak, the water looked blue, with some crystals on top,” said Pérez. 
Pérez and his friends were astonished to see the unicorn herd. These creatures could be seen from the air without having to move too much to see them – they were so close they could touch their horns. 
While examining these bizarre creatures the scientists discovered that the creatures also spoke some fairly regular English. Pérez stated, “We can see, for example, that they have a common ‘language,’ something like a dialect or dialectic.” 
Dr. Pérez believes that the unicorns may have originated in Argentina, where the animals were believed to be descendants of a lost race of people who lived there before the arrival of humans in those parts of South America. 
While their origins are still unclear, some believe that perhaps the creatures were created when a human and a unicorn met each other in a time before human civilization. According to Pérez, “In South America, such incidents seem to be quite common.” 
However, Pérez also pointed out that it is likely that the only way of knowing for sure if unicorns are indeed the descendants of a lost alien race is through DNA. “But they seem to be able to communicate in English quite well, which I believe is a sign of evolution, or at least a change in social organization,” said the scientist.
Wow.

And: the AI was entirely self-taught. It has a structure designed for sequence learning, which is then trained on the text from millions of web pages. It learns to predict the next word when reading. The impressive knowledge of language patterns and story structures shown above is all learned from the data.

As the researchers note, this level of AI has a lot of applications - good and bad. So they are not releasing the full model yet, asking the AI and wider society to consider how we manage this technology.

I read this yesterday. Still processing it with my jaw on the floor.

Naming things is an important part of humanising them, so the researchers have called this system GPT-2. See https://blog.openai.com/better-language-models/ for a summary of GPT-2 and a link to the technical paper. The neural net architecture is not given, but the paper and partial code suggest it may be surprisingly simple and generic, though large and expensive to train. Spoiler alert: it's not an LSTM - long short-term memory, the neural net architecture which has ruled NLP work for the last few years. It uses an attention-based short term memory in an encode-decode setup called a Transformer. Though attention functions do have some common ground with the memory-gates of an LSTM. So it's evolution not revolution. Except there's a point where evolution becomes revolutionary.

By Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever from OpenAI.com

21 November 2018

Redundant Onion or Vital Old Pipe?

I was talking the other day with a fellow tech entrepreneur (Matthew Davis from Wittin[1]) about the complexities of some public sector processes -- the red tape which creates extra work, and worse, which acts as an impedance that stops people using the services. It is tempting to blame the bureaucrats - to think government process is complex because it is made so by bureaucrats who like complexity. But this is not the case (well, with a couple of exceptions - immigration policy perhaps). Most government process is made by civil servants who are doing their best, who have the right intentions, and who want to serve the public.

That is not to say that things cannot be improved.
Undoubtedly there are places where it can be made better, where the people making the process do not have all the answers - how could they? - or where the process is out of date.

The first challenge is understanding: before we can improve we must understand.

Business processes are rarely well documented.
Sometimes the steps are documented -- often there is documentation for the public, the internal steps less so but sometimes. However the rational behind the process, the reasons that drove the process to be the way it is -- that crucial information is almost never documented.

It is all too easy for a process to become cumbersome, where every step was done for a good reason at the time. The road to over-complexity is paved with good intentions.

I am reminded of a story told by Primo Levi in The Periodic Table, the story of the redundant onion in the oil.

Levi was an industrial chemist by trade, and worked at one stage in varnish production. In a textbook on the topic he had found the strange advice, when making varnish, to introduce two slices of onion into the principal ingredient of linseed oil. No comment was given on the purpose of this curious additive. Levi spoke about it with Signor Giacomasso Olindo, his predecessor and teacher:
Smiling benevolently behind his thick white moustache, he explained to me that in actual fact, when he was young and boiled the oil personally, thermometers had not yet come into use: one judged the temperature of the batch by observing the smoke, or spitting into it, or, more efficiently, immersing a slice of onion in the oil on the point of a skewer; when the onion began to fry, the boiling was finished. Evidently, with the passing of the years, what had been a crude measuring operation had lost its significance and was transformed into a mysterious and magical practice. [2]
Processes accumulate, like sediment into stone.

But the counterpoint to the Redundant Onion, is the Vital Old Pipe.

This story is my own from a few years ago, when I was living on the top floor of a block of tenement flats in Edinburgh. At some unknown time earlier, the flat below had redone their kitchen. Whilst clearing out the old kitchen, the builders found an antiquated old pipe for which they could see no purpose. It was in the way of what they wanted to do. The block of flats was 200 years old, and they summized that this pipe had no purpose, that it was some obsolete piece of junk, and so they simply cut it out. Nothing went wrong, validating their decision.

Or rather, nothing went wrong until later, when our boiler broke. We arranged for a plumber to install a new boiler. First, the plumber flushed out the water from the radiators - an easy task as there is a drainage pipe in old Edinburgh tenement flat buildings to do exactly this. This specialised drain pipe is rarely used. This was the drainage pipe which had been cut sometime before in the flat below.

We and the other residents in the block learned all about these drainage pipes when water started coming through the walls of the flat below us. And through the ceiling of the flat below them. Finally it took out the ceiling of the shop on the ground floor.

What had seemed to be a useless old pipe, was in fact an important part of the system, that was just poorly known and not properly understood.

Therein lies the conundrum. In reforming government process, we start out being unable to tell which bits are Redundant Onions, and can be safely be reformed, and which bits are Vital Old Pipes that still serve a purpose, even though it is not instantly obvious. What's lacking is a manual, or more precisely, the equivalent of good code documentation.[3] If our block of flats had come with a manual, the builders could have checked that, and would have known what the pipe was, and why it was there. Business processes need to come with such a manual, something that says this is how we do it, and this is why we do it -- because that understanding is the key to being able to change it. So that understanding is empowering.

That manual lets you reorganize and evolve your processes. It's also really useful for normal operations, e.g. when a new staff member comes in. They need to know how to do things. They usually learn that from their colleagues - but this is a slow and patchy way to manage knowledge. The learning is gradual and piecemeal, and they must hold it all in their head.

But if there's a manual, they can look it up as and when they need it.   
And if the process changes, they know where to look for up-to-date answers.

More than that, the manual is key to improving process.

Here at Good-Loop we use an internal wiki which anyone in the company can edit, and it documents our processes: how to book a holiday, how to onboard a new colleague (and also, since we are an adtech company, technical material like how to spin up a new server). This has been invaluable in both avoiding mistakes and efficient working.

We're a small company. Does a wiki approach (by which I mean, having a shared central knowledge-base, maintained by the team) extend to large organisations? I believe it can. Wikipedia is a truly inspiring example here - a high-quality knowledge-base built by the many, for the many. Let's do something similar for government processes.


[1] Wittin - https://www.wittin.co.uk
[3] Documentation for the Educated Stranger 
[2] Primo Levi - Opening extract of Chromium from The Periodic Table

How could it be otherwise? Talking with Matthew from Wittin

I was talking the other day with Matthew Davis from Wittin, a Dundee-based start-up. He's doing some work with a council, and we were talking about government processes -- how government works, and doesn't always work, and often government isn't really sure itself how it works. By government here I mean the range of public sector institutions that make up the modern UK state. It starts with Westminster and its departments, but includes the regional parliaments of Holyrood / Cardiff / London, the councils, the big public sector services like the NHS, and the many smaller organisations who together carry out the sprawling complex business of running a modern country.

Whilst the modern UK government (in the broad sense described above; I don't mean the Prime Minister) sincerely wants to be open -- Matt made a good point that it is, and always has been, the preserve of a very select group of people. To take the case of a council he's working with, the council has been in existence in its current form for over 200 years, and during that time the people who ran the council -- sitting in it's chambers as elected officials, or in its offices as civil servants -- have been largely from a limited demographic. If we look at the people who have been making policy, and extend this to include the newspaper journalists and others who get involved in policy making -- Even extending to this wider group, it's still quite a narrow set of people. Let's characterise that as educated middle-class busy-bodies (amongst which I proudly count myself). Even amongst this group, the vast majority are not involved in making policy or carrying out policy. Running the country is left to the few who are willing to do it. How could it be otherwise?

How could it be otherwise? A small change in emphasis, but a big change in direction.

The answer that Matt is exploring is around transparency, and using software to make policy more accessible to the many. Ideally it should be easy for people to contribute, and in a way that those contributions are genuinely useful to policy makers.

It's a big challenge, Matt has two prongs to his approach. One is around more open data, and this is something that the British government is overall very good at. The public sector has been implementing more open data for some years. It's not easy, because government IT is not easy, and there are also privacy issues that limit how data can be opened up. Year by year, progress is being made.

24 October 2018

EgBot Maths Q&A Dataset

(If you want to skip ahead to our dataset visit our Zenodo community page and for our code there is our Github repository)

Guest post by Irina Preda from the EgBot project, a collaboration between the University of Dundee and Good-Loop

Our research focuses on the role of examples in mathematical discourse. One of the ways in which we examine this is through the construction of an autonomous example generator. The generator would be able to contribute to mathematical discussions by proposing appropriate examples, in a socially-appropriate way. This example generator would be a first step towards a machine-enhanced mathematics, where humans and machines collaborate to produce novel mathematics research.
To be able to build a model of how humans use examples, we would need a large dataset of examples and the context in which they are provided. Unfortunately such a dataset does not exist, but there is a lot of potential for generating one. First we must find a source that would allow us to collect all of this data. Online collaborative mathematics platforms (such as Polymath and MathOverflow) provide a remarkable opportunity to explore mathematical discourse and elements of the mathematical process. They are also high quality data-rich sources that provide the perfect resources to analyse discourse as well as train models.

As StackExchange is a platform with an abundant amount of data (MathStackExchange has approximately 1 million questions) and a well-documented API, we decided to use the StackExchange API to extract their Math Q&A data and thus generate our dataset. From the start of the project we focused on making our work accessible, which is why we decided to publish our code openly on Github, as well as to publish our dataset online. We considered this to be very important as a good dataset is an extremely valuable resource for the data science and machine learning community and can provide a significant boost to further research efforts. Making a dataset truly accessible requires for it to be well-constructed and documented such as to be easily understood, but also needs to exist on a platform that allows for it to be easily found. So we turned to Zenodo, which is an open access research publication framework. It assigns all publicly available uploads a DOI (Digital Object Identifier) to make the data uniquely citeable and also supports harvesting of all content via well-known open web standards (OAI-PMH protocol) which makes the meta-data easily searchable. The only limitation we found with using Zenodo is that it doesn’t allow the uploading of json files using the online upload tool, however this was easily fixed by archiving the files (which conveniently reduces the size of the download as well).

Data collection was only the first stage in our project, there is also data analysis, building a conversational model and an interactive web application. Our intention to use deep learning to build the conversational model meant that this first stage was very important, as deep learning neural networks require an immense amount of data to train. Thankfully the approach we sketched above was successful and we were able to harvest 6GB worth of mathematical question-answering and discourse data. If you would like to take a look at our dataset visit our Zenodo community page and for our code there is our Github repository.

17 August 2018

Kurt Vonnegut on Being Obsolete

Thinking on AI and the effects it will have on unemployment -- Kurt Vonnegut, the great American sci-fi writer, wrote back in 1965 on how we should adjust our attitude to unemployment:

"Americans have long been taught to hate all people who will not or cannot work, to hate even themselves for that. We can thank the vanquished frontier for that piece of common-sense cruelty.  
The time is coming, if it isn’t here now, when it will no longer be common sense. It will simply be cruel."

From the excellent, funny, angry, and hopeful book: God Bless You, Mr. Rosewater.

31 July 2018

My company is over 50% female - Why I'm not happy about that


There are far more men in the tech sector than women. So it is surely an achievement to be celebrated that my company, Good-Loop, is over half (55%) female. Draw up humble speeches about our greatness, put us on posters and give us an award!

Well yes and no.

Firstly, that 55% depends on how you count it. If we adjust for full-time vs part-time, the balance would be 1/3 female to 2/3 male -- still a lot more balanced than the sector average, but less worthy of a poster. Stats can often be chosen to suit the message ("cherry picked" to use the technical term). Statistics should be compiled by someone independent, with no axe in the race or horse to grind. Or produced to a fixed formula, which will no doubt be inappropriate in many cases, but has the benefit of being consistent and comprehensible.

Moreover, 50/50 is not the goal of diversity. The gender % is a symptom, and should be treated as such -- it is not the disease itself. If you visit a doctor with a fever, you expect to be tested for the underlying cause, not put in a fridge. We should treat gender % statistics, and indeed, all statistics, in the same way. 

All things being equal we expect 50/50 male/female, 2% black, 6% red-heads etc.[1] If there's an imbalance, then prejudice at work might be a problem - but there are other possible causes. For example, when I collect my child from nursery today, most of the staff are female -- but I doubt the nursery is sexist; the cause is at the society level, where more men aren't encouraged towards care roles. Also, nice stats could merely mask problems, and treating the stats would certainly mask problems. For example, if a workplace is biased against mothers who hit a glass-ceiling, then promoting non-mothers won't fix that. 

Statistics (and more generally: data) is a powerful lens for examining the world. But like any lens, it distorts. The useful thing about a statistic is it simplifies the world and allows for easy comparisons. This is also the dangerous thing -- the world is rarely simple. The data scientist must be alert to the wider picture, the complex causes behind the circumstantial summary. And we all use statistics, so we are all data scientists now.

The goal of diversity is a workplace free from prejudice, where all kinds of people can achieve their potential and contribute. I believe that is our company, and that is worth being proud of.

Good-Loop Unit