NeurIPS Day 0: Expo

Today was NeurIPS Expo, the zeroth day of this year’s Neural Information Processing Systems conference in Montréal. The Expo is a day with content from industry right before the rest of the conference. Below are some highlights from a few of the sessions I managed to attend.

The Montréal Declaration

An initiative of the University of Montréal, the Declaration “aims to spark public debate and encourage a progressive and inclusive orientation to the development of AI”.

Unlike the Asilomar AI Principles, which were set by experts in the field, the principles of the Montréal Declaration are being set by consultation with the public and take a local (Québec-centric) view rather than trying to solve global issues.

Notably, the Declaration will remain open for revision to account for the fact that societal norms and our understanding of AI will adapt. The next draft of the Declaration, with updated principles, will be published on the 4th of December. [now published: English/French]

Despite an attempt to take opinions from a broad cross-section of the population, there was a significant skew towards highly educated people in the ~500 participant group, as well as towards those working in tech, and towards men.

While the content was interesting, the talk was a little unfocused – very little time was spent on context/setup (whose initiative is this? why?) and a lot of time on niche issues/tangents (what preferences might people subscribing to various moral frameworks express about certain trolley problem scenarios?).

One of the speakers suggested that rather than spending time considering moral dilemmas, more time should be spent planning societal/structural changes that would remove or reduce the need for machines to face those dilemmas.

More concretely, rather than optimising for thousands of different trolley problems, we could figure out ways to arrange our roads so that autonomous vehicles are significantly less likely to come across any pedestrians or unexpected objects. We could do this by, for example, investing more in pedestrian infrastructure (e.g. segregated sidewalks and raised crossing points), and rolling out autonomous vehicles only in areas with sufficient such infrastructure.

NetEase FUXI – Reinforcement Learning in Industry

Despite a lot of mentions of ‘AI’ and ‘Big Data’ in the first few minutes of this session, it actually turned out to be fairly interesting.

I didn’t manage to stay long due to a clash with the HRT talk, but here are some interesting points from the first and second parts of the workshop:

  • Game companies don’t want their bots to be too good, because humans want to have a chance of winning! So the problem here is different from e.g. DeepMind’s Atari bots. (not that there’s too much danger of unintentionally creating excessively strong strategies with today’s techniques)
  • FUXI are trying to create a meta chatbot design engine that can work across games, and a high-level character personality design engine.
  • Interesting quote: “Our ultimate goal is to build a virtual human.”
  • They framed supervised learning as being about ‘predictions’, and reinforcement learning as being about ‘decisions’, and claimed that recommendation tasks can be better framed in an RL context.
  • There was some discussion of RL issues with sample efficiency and exploration leading to limited current real-world use cases (with references to RL never worked and Deep RL Doesn’t Work Yet)
  • “Humans are not believed to be very easily simulated” (!)
  • Dogs are better at reinforcement learning than DQN (though maybe not as good at Atari games)
  • When building ‘customer simulators’ to train RL-based recommendation engines, they found value in trying to simulate intention rather than behaviour (through techniques like Inverse RL and GAIL)
  • They’re planning on releasing “VirtualTaobao” simulators, essentially gym environments for recommendation engines.
FUXI clearly didn’t get the diversity memo! Six men running a workshop could’ve picked a better image than this one to showcase one of their games.

Hudson River Trading

Everyone attending this panel in the hope of learning the secret Deep Learning techniques that could make them millions in trading was immediately disappointed by the introduction – “Due to the competitive nature of our business we can only talk about problems, not solutions…”

Fortunately for those who stayed anyway, the speakers were all great and the content was interesting.

HRT spent some time at the beginning of their talk framing their firm (and prop trading firms more generally) as beneficial to society by showing a reduction in US equity spreads over the past few decades, and linking this to lower trading costs for investors.

As would be expected for a prop trading firm, most of the Q&As were fairly uninformative though at least slightly amusing.
Audience member 1: “What types of models do you use?”
HRT employee: “We use a variety of different models.”
Audience member 2: “What is your average holding period?”
HRT employee: “Our strategies have a variety of different holding periods.”
Audience member 3: “Are you actually using Deep RL in production trading?”
HRT employee: “I’m afraid I can’t answer that. Come work for us and you’ll find out. ”
Audience member 4: “What are the annual returns and Sharpe ratios for your RL-based strategies? ”
HRT employee: “I cannot answer that question.”

One of the speakers previously worked at DeepMind, and it was interesting to hear him contrast different ‘families’ of RL and which might map most closely to the problem of trading.

The families in his classification were:

  • DQN: possibly sample-efficient enough (using Rainbow), but the state space is discrete, and these algorithms are not that great at exploration (though that’s changing). What’s the trading equivalent to a frame in Atari? Is it a tick? Or multiple ticks? How do we set constraints in a way that allows our model to optimise around them?
  • AlphaGo: adversarial and with the set of valid actions dependent on the state, but these strategies rely on an accurate world model and require a lot of compute.
  • Robotics: continuous N-dimensional actions, similar safety concerns/constraints, shared difficulty of translating model from simulator to reality. Maybe a trading algo dealing with market changes is analogous to robotics algo being robust with respect to lighting changes.
  • “Modern Games” (Dota, StarCraft, Capture The Flag): adversarial, simulations are expensive, big networks are required, some of the inputs are “a narrow window into a wider world”. (in the sense that they capture the current state perfectly but don’t tell you about the longer term consequences of your actions)

One audience question which did get a meaningful answer was whether they were using RL for portfolio optimisation. The response was that RL isn’t data efficient enough yet for it to be used for multi-day portfolio optimisation, since “the number of days that have happened since the start of electronic trading is not that many”.

A slide from the portfolio optimisation part of the talk, showing how optimising around position/margin constraints can be preferable to the naive myopic strategy of trading as you normally would until you hit the constraints.

Intel – nGraph

Some of this talk went comfortably over my head, but I was pleased to find that I understood more than I expected to.

My understanding is that Intel’s nGraph library is a software optimisation component which sits between different machine learning libraries used to construct computational graphs (TensorFlow, PyTorch, Chainer, PaddlePaddle, MXNet, …) and kernel libraries used to run those graphs on specific hardware (cuDNN MKL-DNN, …).


Having one shared library sitting between between these two means that you need to consider m+n different integrations rather than mn integrations. (where m is the number of ML libraries, and n the number of kernel libraries)


I didn’t understand much of the second part of the talk, but the optimisation examples they gave in the first part of the talk were based around removing redundant or inefficient operations – for example, two consecutive transpose operations (which would cancel each other out) would just be removed. Similarly, if two parts of a graph are doing exactly the same thing they could be condense into one. Adding or multiplying a tensor by zero can be dealt with at compile time.


All this (and much more that I didn’t understand) can supposedly lead to significant performance improvements in training neural nets, particularly for lower batch sizes.

Next: Day 1.

Pommerman: getting started

Last weekend we spent a day taking our first steps towards building a Pommerman agent.

In addition to a full game simulation environment, the team running the competition were kind enough to provide helpful documentation and some great examples to help people get started.

There are a few particularly useful things included:

  • A few example implementations of agents. One just takes random actions, another is heuristic based, and a third uses a tensorforce implementation of PPO to learn to play the game.
  • A Jupyter notebook with a few examples including a step-by-step explanation of the tensorforce PPO agent implementation. (this is probably the best place to start)
  • A visual rendering of each game simulation.

Before we get anywhere, we hit a few small stumbling blocks.

  • It took us a few attempts, installing different versions of Python, before we got TensorFlow running. Now we know that TensorFlow doesn’t support Python 3.7, or any 32-bit versions of Python.
  • The tensorforce library, which the included PPO example is based on, has been changing rapidly. Some of the calls to this library no longer worked. While the code change required was minimal, it took at least an hour of digging through tensorforce code before we knew what exactly needed to be changed. We committed a small fix to the notebook here, which now works with version 0.4.3 of tensorforce, available through pip. (I wouldn’t recommend using the latest version of tensorforce on GitHub as we encountered a few bugs when trying that)

I was hoping we’d get to an agent which could beat the heuristics-based SimpleAgent at FFA, but we didn’t manage to get there. In the end, we managed to:

  • Get the Jupyter notebook with examples running
  • Understand how the basic tensorforce PPO agent works
  • Set up a validation mechanism for running multiple episodes with different ages, and save each game so we can replay it for debugging purposes.
  • Train a tensorforce PPO agent (though it was technically training, we didn’t actually manage to get it to beat the SimpleAgent in any games yet)

To be continued…

Pommerman: relevant research

As part of the NIPS 2018 Pommerman challenge, we’ll have to build bots that are able to plan and cooperate against a common enemy. The challenge docs include some links to relevant research, which I’m aiming to summarise here.

I’ve broken the papers into three sections:

  1. Planning – the fundamental skill of coming up with a strategy and choosing actions that maximise the probability of winning. The field of reinforcement learning has a wealth of approaches for this.
  2. Cooperation – planning in the presence of other agents with the same goal and possibly known architecture/behaviour.
  3. Opponent modelling – planning in the presence of other agents with opposing goals and unknown behaviour.

Planning/reinforcement learning

Proximal Policy Optimisation (PPO) (2017) is a type of reinforcement learning technique developed by OpenAI that appears to be better at generalising to new tasks than older reinforcement learning techniques, and requires less hyperparameter tuning. (in contrast, techniques like DQN can perform very well once adapted to a problem, but will be useless unless the right hyperparameters are chosen)

Monte Carlo Tree Search (2012) gives an extensive overview of Monte Carlo Tree Search (MCTS) methods in various domains, as well as describing extensions for multi-player scenarios. MCTS is a method for building a reduced decision tree, selectively looking multiple moves ahead before deciding on an action.

Monte Carlo Tree Search and Reinforcement Learning (2017) reviews methods combining MCTS and other reinforcement learning techniques. The biggest success story so far is DeepMind’s AlphaGo, which managed to beat all previous Go playing algorithms as well as the best human players, for the first time ever, by combining MCTS with deep neural networks.

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games (2016) builds on Fictitious Self-Play strategies introduced in this paper, and introduces Neural Fictitious Self-Play for learning competitive strategies in imperfect-information games such as poker, where DQN does not reliably converge.

Cooperation/multi-agent learning

Multi-Agent DDPG is a technique developed by OpenAI, based on the Deep Deterministic Policy Gradient technique, where agents learn a centralised critic based on the observations and actions of all agents. The researchers found this technique to outperform traditional RL algorithms (DQN/DDPG/TRPO) on various multi-agent environments.

Cooperative Multi-Agent Learning (2005) is an overview of multi-agent learning approaches. At the highest level, it distinguishes between team learning (one learning process for the entire team) and concurrent learning (multiple concurrent learning processes).

Opponent modelling

Opponent Modeling in Deep Reinforcement Learning (2016) builds on DQN to model opponents through a Deep Reinforcement Opponent Network (DRON).

Machine Theory of Mind (2018) is a recent paper developing a system for learning to model other agents in gridworld environments, by predicting their behaviour through observation.

Coordinated Multi-Agent Imitation Learning (2018) looks at inferring the roles of other players in environments such as team sports to improve prediction of their behaviour.

Autonomous Agents Modelling Other Agents (2018) is a comprehensive survey of methods used across the machine learning literature for modelling other agents’ actions, goals, and beliefs.

Multi-agent learning with Pommerman

Together with James and Henry, I’m going to try to build two bots and enter them in the team Bomberman competition, which takes place at the beginning of December.

In a test of multi-agent learning, the two bots will face off against other bots, who they’ll try to blow up with bombs while avoiding being blown up themselves.

Our plan is:

  1. Get the basic Pommerman environment running on our laptops.
  2. Understand how the game and example agents work.
  3. Set up a way to run lots of iterations of competitions between various agents.
  4. Improve the example agents with more advanced heuristics-based play.
  5. Try out some techniques from the multi-agent learning literature, and see if we can systematically beat our heuristics-based agents.
  6. ???
  7. Submit our best team of two agents, and compete against other teams live at NIPS 2018.

Progress so far: environment installed. Example agents running. Next up: understand how they work.

Will we manage to build any agents that beat the example agents? Will our agents perform as expected on match day, or crash and freeze in live play? Will we win enough games to make it on to the leaderboard and win one of the prizes? To be continued…

Reading and remembering

It’s easier to read than to remember what you’ve read. I used to struggle to remember what a book was about, even just a few years after reading it.

I don’t have this problem anymore. In a few recent conversations about books, people have asked me how I manage to remember so much about books I read a long time ago.

I don’t think it’s because my brain has got better at remembering things. I think I’ve picked up habits from various places that put the information from books into my brain so that I remember it better. I now have a simple but powerful approach I use when reading most non-fiction books.

The basic structure of this approach is:

  1. Get a broad overview of the book.
  2. Read the book, slowly, noting key ideas and passages.
  3. Summarise the book.
  4. Occasionally review the summary.

Continue reading “Reading and remembering”

Segmenting communication

I’ve been called old-fashioned when it comes to communication. Some people don’t understand why I like email so much. (I love email. It’s the best. Please email me.)

One reason is that it’s the only good way to send messages which are clearly non-urgent, and can be easily tracked. Why is this important? I think it saves everyone a lot of time and attention, by removing unnecessary interruptions.

Message urgency

Communications can be segmented in various ways, one of which is by urgency. Most people would agree that there’s a relatively clear one-dimensional spectrum of message urgency, from “I found a cool cat picture” to “I’m outside your door” to “My house is on fire”. Broadly we could refer to these as low-urgency (whenever), medium-urgency (as soon as convenient), and high-urgency (right now).

A hundred years ago, when our choices of communication were limited, things were pretty straightforward. Low urgency? Send a letter. Medium urgency? A telegram might do. High urgency? Phone call.

Twenty years ago, things were similar. You might send a letter or email for a low-urgency message. Medium urgency could be a text, or a message through an IM platform such as MSN, AIM, or IRC. High urgency would be a phone call, ideally to someone’s mobile if they had one.

Now we’re in a world where a significant amount of communication goes through platforms like Facebook Messenger, WhatsApp, WeChat, Snapchat, and Instagram, where messages occupy an usually large portion of the urgency spectrum. While there are other reasons these platforms may not be ideal, I think there’s a high productivity cost associated with the lack of urgency segmentation in messages sent through these platforms.

Cost of interruptions

There might be benefits to multi-tasking in some situations, but generally it’s considered bad for productivity. Taking this idea further, the concept of Deep Work – long periods of uninterrupted focus – has become popular in recent years, and associated with success in challenging fields.

The interruption cost is worth paying in the case of an urgent message. But because of the lack of segmentation, a large number of non-urgent messages also incur this cost. The average WhatsApp user receives around 50 messages per day. How many of those are urgent to the extent of needing to be seen within a few minutes? Almost certainly no more than a handful. Yet for most people those will all trigger a sound or vibration alert, distracting them from what they’re doing at the time.

Tools for dealing with this are fairly limited. Some options:

Two things I’d really like to see (please let me know if you know of a way to do this!):

  • Opt-in, rather than opt-out, to notifications: this should really be a standard feature on mobile operating systems. Currently whenever I install a new app I then have to go and disable notifications separately.
  • Prod: disable notifications by default for a messaging app, but give users the option to notify others of urgent messages on a per-message basis.


It might seem a bit misguided to talk about social networking apps in terms of productivity, since productivity might not be something you particularly care about. However, increased productivity will tend to mean you spend less time doing things you don’t like and more time doing things you enjoy, so even if you’re not interested in increasing your output, you’ll probably gain from investing in increased productivity.

Equally, interruptions detract from experiences where productivity isn’t involved. Watching a film, going for a walk, or just having dinner with someone are all generally better without unnecessary notifications.

In certain areas productivity can be a final goal – such as at work. Now that IM (Skype, Slack, Bloomberg…) is a staple in the office, it’s a huge attention drain. I think in the vast majority of cases it probably does more harm than good, especially in a culture that overly values quick responses. Note that here, email is only a good solution if it’s used properly. As always, Cal Newport has some suggestions.

Donations and tax

Donations and tax

Over the past few years I’ve been increasing the amount I give to charity*.

As giving has taken up a larger proportion of my income, I’ve done some digging into the rules around donating to charity – specifically those on tax – to see how I might be able to give more with the amount I have.

While the obvious things are easy, finding and getting my head around the relevant tax rules took longer than I expected. In the hope of saving someone else (and future me!) some time, here’s my current understanding.

Note: I’m not an accountant, and I’m definitely not qualified to give tax advice. Almost everything here comes from the tax relief section on the website. Before making any decisions, check that what I’ve written is correct and applies to your situation!

Summary – key things to know about UK income tax and giving to charity:

  • UK charitable donations are fully tax-deductible.**
  • Some of the tax relief can go to your chosen charity automatically (i.e. ‘Gift Aid’).
  • You might be eligible for extra tax relief, which you can claim back by asking HMRC to reduce your tax bill.
  • Extra tax relief can be significant! Depending on your tax rate, tax relief can give you a 1.25x-2.5x donation multiplier.
  • There’s some flexibility around which tax year you account for donations in.  If you’re earning less this year than last year you might be able to significantly reduce your tax bill this way!

How UK Income Tax Works

Before we go into donations and claiming tax back, a brief summary of UK income tax.

UK income tax is progressive, i.e. increases with income.

UK income tax brackets 2018/2019 (source)

The chart below shows what total income tax looks like for various income levels, and how that breaks down into the various bands. I’ve included data up to £200k since beyond that it just goes up linearly, and if you’re in that band you should probably consider investing in proper tax advice!


chart (3).png
Click here for an interactive version of this chart.

A few things you’ll notice about the chart:

  • If you earn less than £11.85k per year, you pay no tax.
  • As you go further to the right from there, you always pay more tax overall.
  • The top line generally gets steeper as you go right, but this isn’t true everywhere.
  • The steepest part of the chart is between £100k and £123.7k, where you gradually lose your personal allowance.

The steepness of that top line represents your marginal tax rate – i.e. how much tax you’ll pay on every extra £1 you earn at that level. This is a useful thing to look at, because it affects the ‘donation multiplier’ you’ll get at that level – i.e. how much your chosen charity will get for every £1 in net income you give up.

Here’s another chart which shows that relationship more clearly:

UK Income Tax - Rate & Donation Multiplier
Click here for an interactive version of this chart.

While your average tax rate might be interesting to you, the marginal tax rate is generally more useful (unless you’re planning on donating 100% of your income!).

What does this second chart show? Pay attention to the yellow line, showing the donation multiplier for £1 at each level:

  • When you earn below £11.85k, your donation multiplier is x1. This makes sense, since there’s no tax to deduct. For every £1 you give to charity, you lose £1.
  • When you earn £11.85k-£46.35k, your donation multiplier is x1.25. If you’re in this bracket, you’re in luck – your tax deduction is fully taken care of by Gift Aid, so all you have to do is remember to tick that box when you donate and the charity gets an extra 25% directly from the government.
  • In the bracket £46.35k-100k and again from £123.7k-£150k, your donation multiplier is x1.67. At this point the 25% in Gift Aid doesn’t fully cover your tax deduction, so you get to claim back some extra tax from HMRC (see below for more on how to do this).
  • At £100k-£123.7k, not only are you in the top 0.1% of global earners, but you’ve hit the donation multiplier sweet spot of x2.5. You can more than double your*** money with every donation! You can give £2.5 to charity and lose only £1. This is because you’d be paying 40% in tax while your personal allowance would be reduced by 50p for every £1 increase in your salary, resulting in an effective 60% marginal tax rate. Again, you’ll get to claim back a lot of tax on any donations.
  • Beyond £150k – congrats! You’re comfortably in the top 0.1% of the global population, earning almost 150x the global average salary. Not only that, your donation multiplier is x1.8, so you only give up 55p for every £1 you give to charity. And giving to charity can raise your tax-free pension allowance.

How to claim tax back

So how does this claiming tax back thing work?

As I’ve covered above, if you’re a basic rate taxpayer (i.e. your total taxable income is up to £46.35k) then you don’t need to worry about claiming tax back – Gift Aid takes care of it.

Beyond that there are three options I’m aware of: Payroll Giving, doing a tax return, or asking HMRC to change your tax code.

Payroll Giving is great, but your employer needs to be set up for it. If they are, then all you need to do is tell your employer your intended monthly donation. They’ll take it straight out of your gross salary and give it to your charity of choice, without any tax being deducted.

If you fill in a Self Assessment tax return, there’s a section on charitable donations. Doing one isn’t exactly fun, but it’s not as difficult as it sounds (and I’ve heard it’s much easier than the US system!). All your employer’s data will be imported already, so you only need to fill in additional details on your donations and any other relevant sections. If you’re doing regular donations then the next option is probably better for you, but if you want to be able to do things like optimising the tax year of your donations then you’ll need to fill in a Self Assessment tax return. And if you earn over £100k you’ll have to do one anyway.

Until fairly recently, I thought those were the only two options. It turns out there’s a third one! If you give regularly and don’t fancy filling in a tax return, you can just ask HMRC to change your tax code. All you need to do is tell them how much you’re donating every month, and they’ll change your tax code to increase your personal allowance – thereby reducing the amount of tax you’ll pay. I think you can probably do this over the phone, but I found their online chat function easy enough. (obviously always make sure you keep a record of all your donations)

When to claim tax back

This might sound niche at first, but it can be very useful and is not so well known.

When you fill in a Self Assessment tax return, you do that for the previous tax year (April-April). And you have until January 31 in the following year to do this.

Now it turns out that you’re allowed to account for donations made in the current year as if they happened last year. Specifically: “you can also claim tax relief on donations you make in the current tax year (up to the date you send your return) if you either: want tax relief sooner, or won’t pay higher rate tax in current year, but you did in the previous year”.

What does this mean? Well, consider an extreme case where last tax year you earned £123k and this year you think you’ll earn £10k. Without this rule you’d get no tax relief on donations made now, but with it you can still get the 2.5x multiplier on your donation by submitting it in your tax return for last year!

Another scenario where this is useful is if it’s coming up to the end of the tax year and you haven’t decided where to donate to yet. As long as you make the donation before you submit your tax return, you’ll be able to count it as taking place this year for tax reasons.

Other things to consider

That’s probably enough on tax for one post. Here are a few other things to consider:

  • Which charities will maximise the impact of your donations?
  • Should you give now or give later?
  • How much should you give right now? Hopefully this post helps you think about the tax aspects of that. You can find the spreadsheet behind each of the charts here (and 2017/2018 version here), including a calculator sheet for any given income/donation amount. Many people have signed the Giving What We Can pledge to donate 10% or more of their income for the rest of their lives.
  • Could you donate appreciated assets (like property, shares, or bitcoin) instead of income? In this case you can get Gift Aid on both income tax and capital gains tax.
  • Some employers will match your donations, doubling your donations again with no extra effort involved.
  • If you don’t live in the UK, you’ll obviously have to follow different rules. Ben Kuhn has a great post on giving in the US.
  • If you’re earning a lot but aren’t sure where to give yet, consider setting up a donor advised fund.
  • Is earning to give (i.e. maximising your income and resulting donations) the most promising career path for you? Are there other things you should be considering if you want to maximise your impact on the world?

*See this post or Effective Altruism

** In this post I’ve focused on income tax. I haven’t taken into account National Insurance payments in any of the calculations, as these aren’t deductible. I also haven’t modelled the impact on other things like student loan repayments or pension allowance increases. As for income tax, there are some limits to the amount you can claim back, but they’re quite high – “Your donations will qualify as long as they’re not more than 4 times what you have paid in tax in that tax year”. 

*** Obviously the recipient charity’s money rather than yours. Still, pretty cool!