Platypus Header

Platypus Innovation Blog

24 October 2018

EgBot Maths Q&A Dataset

(If you want to skip ahead to our dataset visit our Zenodo community page and for our code there is our Github repository)

Guest post by Irina Preda from the EgBot project, a collaboration between the University of Dundee and Good-Loop

Our research focuses on the role of examples in mathematical discourse. One of the ways in which we examine this is through the construction of an autonomous example generator. The generator would be able to contribute to mathematical discussions by proposing appropriate examples, in a socially-appropriate way. This example generator would be a first step towards a machine-enhanced mathematics, where humans and machines collaborate to produce novel mathematics research.
To be able to build a model of how humans use examples, we would need a large dataset of examples and the context in which they are provided. Unfortunately such a dataset does not exist, but there is a lot of potential for generating one. First we must find a source that would allow us to collect all of this data. Online collaborative mathematics platforms (such as Polymath and MathOverflow) provide a remarkable opportunity to explore mathematical discourse and elements of the mathematical process. They are also high quality data-rich sources that provide the perfect resources to analyse discourse as well as train models.

As StackExchange is a platform with an abundant amount of data (MathStackExchange has approximately 1 million questions) and a well-documented API, we decided to use the StackExchange API to extract their Math Q&A data and thus generate our dataset. From the start of the project we focused on making our work accessible, which is why we decided to publish our code openly on Github, as well as to publish our dataset online. We considered this to be very important as a good dataset is an extremely valuable resource for the data science and machine learning community and can provide a significant boost to further research efforts. Making a dataset truly accessible requires for it to be well-constructed and documented such as to be easily understood, but also needs to exist on a platform that allows for it to be easily found. So we turned to Zenodo, which is an open access research publication framework. It assigns all publicly available uploads a DOI (Digital Object Identifier) to make the data uniquely citeable and also supports harvesting of all content via well-known open web standards (OAI-PMH protocol) which makes the meta-data easily searchable. The only limitation we found with using Zenodo is that it doesn’t allow the uploading of json files using the online upload tool, however this was easily fixed by archiving the files (which conveniently reduces the size of the download as well).

Data collection was only the first stage in our project, there is also data analysis, building a conversational model and an interactive web application. Our intention to use deep learning to build the conversational model meant that this first stage was very important, as deep learning neural networks require an immense amount of data to train. Thankfully the approach we sketched above was successful and we were able to harvest 6GB worth of mathematical question-answering and discourse data. If you would like to take a look at our dataset visit our Zenodo community page and for our code there is our Github repository.

17 August 2018

Kurt Vonnegut on Being Obsolete

Thinking on AI and the effects it will have on unemployment -- Kurt Vonnegut, the great American sci-fi writer, wrote back in 1965 on how we should adjust our attitude to unemployment:

"Americans have long been taught to hate all people who will not or cannot work, to hate even themselves for that. We can thank the vanquished frontier for that piece of common-sense cruelty.  
The time is coming, if it isn’t here now, when it will no longer be common sense. It will simply be cruel."

From the excellent, funny, angry, and hopeful book: God Bless You, Mr. Rosewater.

31 July 2018

My company is over 50% female - Why I'm not happy about that

There are far more men in the tech sector than women. So it is surely an achievement to be celebrated that my company, Good-Loop, is over half (55%) female. Draw up humble speeches about our greatness, put us on posters and give us an award!

Well yes and no.

Firstly, that 55% depends on how you count it. If we adjust for full-time vs part-time, the balance would be 1/3 female to 2/3 male -- still a lot more balanced than the sector average, but less worthy of a poster. Stats can often be chosen to suit the message ("cherry picked" to use the technical term). Statistics should be compiled by someone independent, with no axe in the race or horse to grind. Or produced to a fixed formula, which will no doubt be inappropriate in many cases, but has the benefit of being consistent and comprehensible.

Moreover, 50/50 is not the goal of diversity. The gender % is a symptom, and should be treated as such -- it is not the disease itself. If you visit a doctor with a fever, you expect to be tested for the underlying cause, not put in a fridge. We should treat gender % statistics, and indeed, all statistics, in the same way. 

All things being equal we expect 50/50 male/female, 2% black, 6% red-heads etc.[1] If there's an imbalance, then prejudice at work might be a problem - but there are other possible causes. For example, when I collect my child from nursery today, most of the staff are female -- but I doubt the nursery is sexist; the cause is at the society level, where more men aren't encouraged towards care roles. Also, nice stats could merely mask problems, and treating the stats would certainly mask problems. For example, if a workplace is biased against mothers who hit a glass-ceiling, then promoting non-mothers won't fix that. 

Statistics (and more generally: data) is a powerful lens for examining the world. But like any lens, it distorts. The useful thing about a statistic is it simplifies the world and allows for easy comparisons. This is also the dangerous thing -- the world is rarely simple. The data scientist must be alert to the wider picture, the complex causes behind the circumstantial summary. And we all use statistics, so we are all data scientists now.

The goal of diversity is a workplace free from prejudice, where all kinds of people can achieve their potential and contribute. I believe that is our company, and that is worth being proud of.

28 May 2018

Splitting out a React js project for code reuse - The easy way

We like React, and we run a few projects on it: Good-Loop's portal, SoGive, and some internal tools. So we wanted to reuse code between these projects.
One way would be to make separate npm packages. But this is a painful solution: The compilation setup is painful, and you lose source-code maps (so you end up debugging from babelled code).
Instead, we found a simple solution using symlinks:
  1. Just sym-link the "packages" in, using ln -s (for Linux or Mac; I don't know what the equivalent is for Windows).
  2. Set webpack's resolve.symlinks property to false. In our setup that meant editing webpack.config.js to have resolve: {symlinks: false }}.
Then your code is split out -- but as far as the build process is concerned, or your code editor, nothing has changed.
You can also unit test with tests in each "package".
E.g. your project folders might end up looking something like this:
  • myapp
    • src
      • subpackage: a symlink to subpackage/src
    • webpack.config.js, package.json, etc.
  • subpackage
    • src
    • test: unit tests for src files
    • webpack.config.js for the unit tests
This is easy and it works.

12 April 2018

Invest in Good-Loop

This is an unusual post for me. After 10 years as an entrepreneur, I am asking for money.

Amy and I set out in 2016 with a mission: to make something positive and good in the often sordid world of online advertising. Good-Loop is an ad network, but different from any other:

  • 50% of the money goes to charity. To be clear: that's not 50% of profits, or 50% of our commission. That's 50% of the total revenue. The rest is split, approximately 30-40% to the publisher and 10-20% to us.
  • The user is in control: of whether we show them an ad, and what we do with their data. This is the "ad-choice" alternative to adblockers, which avoids annoying ads whilst still supporting publishers to create.

We believe that if you treat people with respect and aim to do good, then you can create both a positive impact (money raised for charity), and a better space for advertising (happy viewers are more likely to listen than annoyed ones).

The results so far back this up: because we engage users in a positive way, Good-Loop adverts are watched rather than skipped (benchmarked against YouTube and Facebook video ads), and generate more actual customer activity (measured with one of our clients, Lifecake by Canon). Unilever's chief marketing officer, Keith Weed, describes our approach as a "win-win!" (namedropping, because, wow, not only are we working Unilever, who have one of the world's largest ad budgets and an ethical ethos, but their CMO tweeted about it).

We are not a charity: we plan to make profits and deliver a return. Our aim is to be both profitable and good for the community. I think this investment is a good bet: we have a reasonable chance of delivering an unreasonable return.

Warning: Any investment in a start-up is risky - do not invest money unless you're OK losing it. I cannot provide financial advice here.

If you want to be part of what we're doing - now you can! Over April 2018, we are running an open investment round:

This is open to anyone. The bulk of the round will likely be established investors who we've pitched to. However if you know and like what we're doing, then you can invest, regardless of cheque size -- the investments so far range from £15 to £30,000.

I hope you'll join us.
If you have any questions, feel free to contact me:

10 January 2018

Upgrading from Jetty 8 to 9

I'm upgrading from Jetty 8.1 to Jetty 9.4. Here are the changes I've encountered so far:

Several classes renamed / replaced:

* SelectChannelConnector has become ServerConnector
* ServletContextHandler has become ServletHandler

Server.setThreadPool() has gone.

Lots of jar changes.

The JSON class (org.eclipse.jetty.util.ajax.JSON) has moved out into a separate Maven project:, which gives you jetty-util-ajax.jar

We also needed the Maven packages: jetty-server, jetty-servlet

You must upgrade to servlet-api 3.1 (so change anything which pulls in the servlet-api-3.0.jar or earlier).

Good-Loop Unit