Creating a Simple Website for Free

Creating a Simple Website for Free

It’s really easy now to create a simple website from scratch!

A few months ago I was looking for a directory of ongoing machine learning competitions (like the kind you get on Kaggle), but I couldn’t find one.

So I decided to build a simple web page that just listed them. I was lucky that the domain mlcontests.com was available and my hosting provider LCN were having a sale on domains, so I got that domain for free for the first year. (their customer service is great and they’re currently doing free .co.uk domains!)

Since I didn’t need anything more fancy than a static page, I went with GitHub Pages for hosting, which is free and fast. The initial version of the page looked like this:

I was tempted to go with a database back-end, but ended up just keeping the list of competitions in a JSON file. This meant anyone could propose changes through pull requests on the project’s GitHub repository, and I was surprised that within a few weeks several people had added competitions I didn’t yet know about!

After I got a few visitors I realised a big grid really didn’t work on mobile, so I spent a bit of time trying to improve that. I hadn’t done any real web development in almost a decade, but with a little help (thanks Natasha!) and a bit of trial and error I got it to be much more usable on mobile.

From the start I’d had a form on the page so visitors could join the mailing list. It’s really easy (and free) to set this up through mailchimp, and I feel like it’s worth doing even if you’re not sure you’ll send many emails. So far around 500 people have joined the mailing list, and I’ve sent a handful of updates.

I also added Google Analytics early on, and it’s been nice to see the traffic growth. There was an temporary spike at launch when I posted the site on Reddit and Hacker News, followed by a brief plateau, after which the traffic increase has been slow but steady.

Most of the traffic comes through search, and I’ve found Google Search Console super valuable in trying to see which phrases people are searching for when they end up on my site, and for which phrases I could be ranking higher. It also pointed out a few site usability issues which could have been hurting my ranking.

I used the free Moz Pro trial to figure out which sites I should be trying to get links from in order to rank higher for relevant keywords. This led me to a few Medium posts, and after contacting the authors through LinkedIn they were usually more than happy to include a link to my site in their articles.

I’m hoping to see continued growth to the point where I can monetise the site a bit, but so far it’s been a good learning experience and it hasn’t cost me anything but time.

There’s one link on there advertising Genesis Cloud – a cloud provider I’ve been using for training some machine learning models (their GPUs are very cheap!). I contacted them as a customer looking to promote their service, and they gave me a link I could put on my site. If they get any new paying customers through my link I get some credits to use on their cloud compute service.

I hope my experience is helpful to others trying to set up a simple site.

Here’s a recap of the tools I mentioned:

And here’s my site, listing ongoing machine learning/data science competitions: mlcontests.com. Sign up to the mailing list for occasional (~once/month) updates on new competitions. All the code for the site is here.

For a more detailed guide on how to get set up your own blog with GitHub pages, I’d recommend this fast.ai article.

Douglas Hofstadter on Love and Death

This weekend I read a short fragment from Douglas Hofstadter’s book I Am a Strange Loop at my friends James and Lucy’s wedding.

James is the person who initially introduced me to Douglas Hofstadter’s earlier book – Godel, Escher, Bach: an Eternal Golden Braid – first published 40 years ago this year. GEB quickly became my favourite book; an irresistibly playful tapestry of mathematics, art, music, programming, artificial intelligence, language, logic, and philosophy.

I Am a Strange Loop, written almost 30 years after GEB, was a sort of reinterpretation of that book. Hofstadter felt that the technical and subtle approach led many readers to miss the point of GEB, so he decided to write a less technical, more accessible, and more explicit treatment of the nature of self and machine consciousness.

Strange Loop is a deeply personal and emotional book, covering the death of Hofstadter’s wife Carol, and exploring Hofstadter’s moral basis for his vegetarian diet, among many other things. It speaks of love and death in intimate detail, in Hofstadter’s signature style. I’d highly recommend reading this book if you’re not attracted by the idea of the maths-dense GEB.

For the reading at James and Lucy’s wedding I edited a few paragraphs for brevity and context (and to remove references to death!). These are those paragraphs – I think they give a good introduction to Hofstadter’s views on human relationships:

What is really going on when you dream or think more than fleetingly about someone you love? In the terminology of Strange Loops, there is no ambiguity about what is going on.

The symbol for that person has been activated inside your skull, lurched out of dormancy, as surely as if it had an icon that someone had double-clicked. And the moment this happens, much as with a game that has opened up on your screen, your mind starts acting differently from how it acts in a “normal” context. You have allowed yourself to be invaded by an “alien universal being”, and to some extent the alien takes charge inside your skull, starts pushing things around in its own fashion, making words, ideas, memories, and associations bubble up inside your brain that ordinarily would not do so. The activation of the symbol for the loved person swivels into action whole sets of coordinated tendencies that represent that person’s cherished style, their idiosyncratic way of being embedded in the world and looking out at it.

As a consequence, during this visitation of your cranium, you will surprise yourself by coming out with different jokes from those you would normally make, seeing things in a different emotional light, making different value judgments, and so forth. Each one of us has a brain inhabited to varying extents by other I’s, other souls, the extent of each one depending on the degree to which you faithfully represent, and resonate with, the individual in question. But one can’t just slip into any old soul, no more than one can slip into any old piece of clothing; some souls and some suits simply “fit” better than others do.

Douglas Hofstadter, I Am a Strange Loop (edited)

In discussing an appropriate portion of the book, James and I also agreed to read the following at each other’s funeral, if either of us were ever to die:

In the wake of a human being’s death, what survives is a set of afterglows, some brighter and some dimmer, in the collective brains of all those who were dearest to them. And when those people in turn pass on, the afterglow becomes extremely faint. And when that outer layer in turn passes into oblivion, then the afterglow is feebler still, and after a while there is nothing left.

This slow process of extinction I’ve just described, though gloomy, is a little less gloomy than the standard view. Because bodily death is so clear, so sharp, and so dramatic, and because we tend to cling to the caged-bird view, death strikes us as instantaneous and absolute, as sharp as a guillotine blade. Our instinct is to believe that the light has all at once gone out altogether. I suggest that this is not the case for human souls, because the essence of a human being — truly unlike the essence of a mosquito or a snake or a bird or a pig — is distributed over many a brain. It takes a couple of generations for a soul to subside, for the flickering to cease, for all the embers to burn out. Although “ashes to ashes, dust to dust” may in the end be true, the transition it describes is not so sharp as we tend to think.

It seems to me, therefore, that the instinctive although seldom articulated purpose of holding a funeral or memorial service is to reunite the people most intimate with the deceased, and to collectively rekindle in them all, for one last time, the special living flame that represents the essence of that beloved person, profiting directly or indirectly from the presence of one another, feeling the shared presence of that person in the brains that remain, and thus solidifying to the maximal extent possible those secondary personal gemmae that remain aflicker in all these different brains. Though the primary brain has been eclipsed, there is, in those who remain and who are gathered to remember and reactivate the spirit of the departed, a collective corona that still glows. This is what human love means. The word “love” cannot, thus, be separated from the word “I”; the more deeply rooted the symbol for someone inside you, the greater the love, the brighter the light that remains behind.

Douglas Hofstadter, I Am a Strange Loop

Reproducibility issues using OpenAI Gym

Reproducibility is hard.

Last week I wrote a simple Reinforcement Learning agent, and I ran into some reproducibility problems while testing it on CartPole. This should be one of the simplest tests of an RL agent, and even here I found it took me a while to get repeatable results.

I was trying to follow Andrej Karpathy and Matthew Rahtz‘s recommendations to focus on reproducibility and set up random seeds early, but this was taking me much longer than expected – despite adding seeds everywhere I thought necessary, sometimes my agent would learn a perfect policy in a few hundred episodes, whereas other times it didn’t find a useful policy even after a thousand episodes.

I checked the obvious – setting seeds for PyTorch, NumPy, and the OpenAI gym environment I was using. I even added a seed for Python’s random module, even though I was pretty sure I didn’t use that anywhere.

RANDOM_SEED = 0
torch.manual_seed(RANDOM_SEED)
env.seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)
random.seed(RANDOM_SEED)

Still I got different results on each run. I found a few resources pointing me to other things to check:

  • Consistency in data preparation and processing (not really relevant here- all the data I’m processing comes from the gym environment)
  • CuDNN specific seeding in PyTorch (my network is small enough to run quickly on CPU, so I’m not using CuDNN)

Out of ideas, I returned to debugging. My initial policy and target network weights were the same each run. Good. The first environment observation was the same too. Also good. But then, when I came to selecting a random action, I noticed env.action_space.sample() sometimes gave different results. Bad.

I looked through the OpenAI gym code for random seeds, and couldn’t find any seeding being done on the action space, even when the environment is passed a specific seed! I then found this commit, where Greg Brockman and others discuss how seeding should be done in OpenAI Gym environments. It looks like they initially wanted to seed action spaces as well as environments, but decided not to because they see action space sampling as belonging to the agent rather than the environment.

So here’s the solution, in one extra line:

env.action_space.seed(RANDOM_SEED)

I’d love to know why this isn’t called from env.seed()!

Anyway, now I’m getting reproducible results. To get an idea of how significant the difference is between different seeds even on a problem as simple of CartPole, here are five runs with different seeds:

Some resources with other useful reproducibility suggestions:

Pommerman: post mortem

Last month I spent a few days, together with some friends, trying to get an entry together for the Pommerman competition at this year’s NeurIPS.

While we learnt a huge amount, we didn’t manage to get an entry together in time for the conference.

All of us were pretty new to reinforcement learning, so maybe it’s not hugely surprising that we didn’t succeed. Still, I think if we’d done things differently we may have got there in time.

Some things we managed to achieve:

  • Get the game running, and set up some basic reinforcement learning agents (specifically, DQN and PPO) that could play the game.
  • Set up a training environment on a cloud server, to which we could deploy any number of training configs and have them run 16 at a time.
  • Set up Tensorboard logging for rewards, wins, and various intermediate metrics (% frequency for each action, # of survivors in each game, etc)
  • Train hundreds of PPO and DQN agents with different hyperparameters and network architectures.
  • Set up a validation environment that outputs performance stats for trained agents acting deterministically.
  • Experiment with experience replay, different types of exploration, CNNs, dropout, and simplified features.
  • Create different reward models intended to guide the agent to various strategies.

Despite all this, we didn’t manage to train an agent which figured out how to bomb through walls and find opponents. Our most successful agents would (mostly) avoid bombs near them, but otherwise be static.

What mistakes did we make?

  • We underestimated the difficulty of the problem. We figured we could just set some stock algorithms running on the environment and they’d figure out a basic strategy which we could then iterate on, but this wasn’t the case.
  • We committed fairly early on to a library (TensorForce) that we hadn’t used before without checking how good it was, or how easy it would be to change things. So when we realised, more than halfway into the project, that we really needed to get our agents to explore more, it was really hard for us to try to debug exploration and implement new techniques.
  • We spent a lot of time setting up a cloud GPU environment, which we ended up not needing! The networks we were training were so small that it was faster to just run parallel CPU threads.
  • We didn’t try to reduce complexity or stochasticity early enough, so we didn’t really know why our agents weren’t learning
  • We (I) introduced a few very frustrating bugs! The highlight was a bug where I featurised the board for our agent, and accidentally changed the board array that all agents (and the display engine) shared. This bug manifested itself as our agent’s icon suddenly changing, and took me hours to debug.

Knowing what we know now, how would we have approached this problem?

  • Simplify the environment – start with a smaller version of the problem (e.g. 4×4 static grid, one other agent) with deterministic rules. If we can’t learn this then there’s probably no point continuing!
  • Simplify the agent to the extent where we fully understand everything that’s happening – for example, write a basic DQN agent from scratch. This would’ve made it easier to add different exploration strategies.
  • Gradually increase complexity, by increasing the grid size or stochasticity.
  • Add unit tests!

Despite our lack of success, we all learnt a lot and we’ll hopefully be back for another competition!

NeurIPS Day 3: Reproducibility, Robustness, and Robot Racism

Some brief notes from day 3 of NeurIPS 2018. Previous notes here: Expo Tutorials Day 2.

Reproducible, Reusable, and Robust Reinforcement Learning (Professor Joelle Pineau)

I was sad to miss this talk, but lots of people told me it was great so I’ve bookmarked it to watch later.

Investigations into the Human-AI Trust Phenomenon (Professor Ayanna Howard)

A few interesting points in this talk:

  • Children are unpredictable, and working with data generated from experiments with children can be hard. (for example, children will often try to win at games in unexpected ways)
  • Automated vision isn’t perfect, but in many cases it’s better than the existing baseline (having a human record certain measurements) and can be very useful.
  • Having robots show sadness or disappointment turns out to be much more effective than anger for changing a child’s behaviour.
  • Humans seem to inherently trust robots!

Two cool experiments:

To what extent would people trust a robot in a high-stakes emergency situation? 

A research subject is led into a room by a robot, for a yet-to-be-defined experiment. The  room fills with smoke, and the subject goes to leave the building. On the way out, the same robot is indicating a direction. Will the subject follow the robot?

It turns out that, yes, almost all of the time, the subject follows the robot.

What about if the robot, when initially leading the subject to the room, makes a mistake and has to be corrected by another human?

Again, surprisingly, the subject still follows the robot in the emergency situation that follows.

The point at which this stopped being true was when they had the robot point to obviously wrong directions (e.g. asking the subject to climb over some furniture).

This research has some interesting conclusions, but I’m not completely convinced. For one, based on the videos of the ’emergency situation’, it seems unlikely that any of the subjects would have believed the emergency situation to be genuine. The smoke is extremely localised, and the ‘exit’ just leads into another room. It seems far more likely to me that the subjects were trying to infer the researchers’ intentions for the study and decided to follow the robot since that was probably what they were meant to do.

Unfortunately this turns out to be because their ethics committee wouldn’t give them approval to run a more realistic version of the study in a separate building, which is a real shame. More on this study in the the paper, or in this longer write-up.

Does racism and sexism apply to people’s perception of robots?

Say we program a robot to perform a certain sequence of behaviours, and then ask someone to interpret the robot’s intention behind that behaviour. Will their interpretation be affected by a robot’s ‘race’ or ‘gender’?

It turns out that, yes, it is. For example, when primed to be aware of the ‘race’ of a robot, subjects are more likely to interpret the behaviour as angry when the robot is black than when the robot is white.

20181205_145551.jpg

But when humans are not primed to pay attention to race (and just shown robots with different colours), the effect disappears. Paper here.

Full video of the talk below.

NeurIPS Day 2: Cronenbergs

Just a brief highlight from day two: Professor Michael Levin’s incredible talk on What Bodies Think About, summarising 15 years of research in exploring the hardware/software distinction in biology. Hello, Cronenberg World.

A brief introduction on how the brain is far from the only place where computation happens in biology. Experiments with regenerative flatworms show that memories persist even when their heads are removed and grow back. Butterflies can remember experiences they had when they were caterpillars.

20181204_142133.jpg

Then the key bit of the talk: reverse engineering bioelectric signals to trigger high-level anatomical subroutines – aka “reprogramming organs”. For example, telling a normal frog to “regrow a leg”, or convincing a flatworm to grow a different shape head that belongs to a much older species of flatworm. No genomic editing or stem cells – but controlling bioelectric signals by changing ion channels directly or through drugs.

20181204_144821.jpg

The ultimate goal: a biological compiler. This could lead to amazing new regenerative medicine, though there are clearly a lot of ethical issues that need to be thought through!

20181204_145042

Do watch the full talk when you get a chance. It’s the only talk I ever remember attending where my jaw literally dropped, several times.

Previous Neurips 2018 posts: Expo and Tutorials. Next: Day 3.

NeurIPS Day 1: Tutorials

Monday was the first day of the core conference, following the Expo on Sunday. There were a number of really interesting tutorials. Here’s a brief summary of the three I managed to attend.

Scalable Bayesian Inference (Professor David Dunson)

This tutorial explored one main question: how do we go beyond point estimates to robust uncertainty quantification when we have lots of samples (i.e. ‘n’ is large) or lots of dimensions (i.e. ‘p’ is large)?

20181203_0836011.jpg

The introduction was Professor Dunson’s personal ode to Markov Chain Monte Carlo (MCMC) methods, and in particular the Metropolis-Hastings algorithm. In his words, “Metropolis-Hastings is one of the most beautiful algorithms ever devised”.

He tackled some of the reasons for the (in his view incorrect) belief that MCMC doesn’t scale, and showed how MCMC methods can now be used to perform bayesian inference even on very large data sets. Some key approaches involve clever parallelisation (WASP/PIE) and approximating transition kernels (aMCMC). Interestingly, some of these techniques have the combined advantage of improving computational efficiency and mixing (an analogue of exploration in bayesian inference).

A recurring theme throughout the talk was Professor Dunson’s call for more people to work in the field of uncertainty quantification: “There are open problems and a huge potential for new research”.

His recent work on coarsening in bayesian inference – a specific way of regularising the posterior – allows inference to be more robust to noise and helps manage the bias/variance trade-off when optimising for interpretability (i.e. if a relatively simple model is only slightly worse than a very complex model, we probably want to go with the simple model). This is useful for example in medicine, where doctors want to be able to understand and critique predictions rather than using black-box point estimates.

20181203_095050 (1)

The second part of the talk went on to explore high-dimensional data sets, particularly those with small sample size: “you’ve given us a billion predictors and you’ve run this study on ten mice.”

Naive approaches in this area can have serious issues with multiple hypothesis testing or requiring an unjustifiably strong prior to get a reasonable uncertainty estimate. Point estimates can be even worse – or, in Professor Dunson’s words, “scientifically dangerous”. Accurate uncertainty quantification can allow us to say “We don’t have enough information to answer these questions.”

20181203_095704.jpg

The hope is that over time, we can extend these methods to help scientists by saying “No, we can’t answer <this specific question that you’ve asked>. But given the data we have, <here’s something else we can do>. ”

20181203_102739.jpg

Unsupervised Deep Learning (Alex Graves from DeepMind, Marc’Aurelio Ranzato from Facebook)

This talk started with a reclassification of ML techniques: rather than thinking of three categories (supervised learning/unsupervised learning/reinforcement learning), it can be more useful to think of four categories across two dimensions.

Capture.PNG
This talk focused on the two quadrants on the right.

The key idea I took from this talk was that we can apply unsupervised learning to problems we’ve previously thought of as supervised learning, if we’re smart about how we do it.

For example, the classic approach to machine translation is to take a large set of sentence pairs across two languages, and then train a neural net to learn the mapping between the two. While this can work well, it relies on a lot of labelled data, which isn’t always available. Since there’s far more single-language data available, another approach would be to get a model to learn the structure of the data, for example by embedding words or phrases in some space which can capture relationships between them.

 

Since languages correspond to things in the real world, if we can learn an accurate enough mapping for two separate languages we can then find a way to go between languages by exploiting the shared word embedding space. Doing this for phrases or sentences is harder, but can already improve on supervised learning in certain special cases – for example in English-Urdu,

Capture.PNG
The available English-Urdu labelled data is restricted to very specific specific domains (genomics data/subtitles), and doesn’t allow supervised models to generalise well to other domains. In this case unsupervised models trained on monolingual examples can do abetter job (albeit still not necessarily good enough to be useful in practice).

 

 

Also interesting were the timelines of popularity of unsupervised feature learning in vision and natural language processing, showing how this type of approach goes in and out of fashion over time.

Capture.PNG

Capture

Both speakers here were quite optimistic about how much content they’d get through in an hour, so they didn’t quite manage to cover everything. I’d highly recommend checking out the slides though, since there are lots of references to interesting papers: part 1 and part 2.

Counterfactual Inference (Professor Susan Athey)

Like a few others I spoke to, I had high hopes for this talk but was a little disappointed. A lot of time was spent covering basic stats concepts with text-heavy slides, and even though Professor Athey’s quite an engaging speaker I didn’t feel like I learnt very much or even gained a good intuition for the types of problems her framework of counterfactual inference can solve.

It felt like much of the work was a slight reframing of supervised learning to account for unobserved counterfactuals where the probability of observation (propensity score) is correlated with some of the underlying covariates.

Having said that, it was nice to get a different perspective from someone who’s working in economics where the standards for publication and expectations of interpretability can be very different. Some of the notation was also interesting and new to me, and might be useful to anyone wanting to do a better job of considering counterfactuals in their work. It was noticeable that she compared her work to Deep Learning/AlphaGo quite a few times, even though it felt like her tools for counterfactual inference operate in quite a different problem domain. In the vain of Marc’Aurelio Ranzato’s popularity charts from above, I wonder if there’s an expectation that people would find the work more interesting if framed in terms of today’s Deep Learning, which would be a shame.

20181203_154418.jpg

Some of the themes from the Scalable Bayesian Inference workshop came up again, such as the idea that modern ML techniques haven’t been used much in economics since it’s hard to get things like confidence intervals. Towards the end Professor Athey presented some recent contributions to the field, such as Generalised Random Forests.

For anyone interested in learning more about the intersection of machine learning and econometrics, AEA has a longer lecture series featuring Professor Athey which goes into much more depth on the topic.

Next: Day 2.