How to install Ubuntu 20.04 on a Dell XPS 8930

Last week I bought a refurbished Dell XPS 8930 desktop. It’s got an Nvidia RTX 2060 graphics card (arguably the most cost-effective GPU for deep learning right now) and a 9th gen i7 CPU, and I was excited to get stuck in.

The downside is that it comes with Windows and isn’t officially certified for Ubuntu… and this turned out to be a much bigger issue than expected! After a few days of trial-and-error and trawling support threads I did manage to get it working though! I’m putting these instructions up here for anyone who finds themselves in the same situation I was in last week, hope someone ends up finding it useful 🙂

I’ll be assuming you want to dual-boot Ubuntu and Windows. If you just want Ubuntu, you can skip some of these steps.

Prepare/back up data

Before doing anything, back up any data on your system that you wouldn’t want to lose. I skipped this since it was a new pc and I didn’t have any data on it yet, but I did create a Windows Recovery Drive in case that would turn out to be useful later.

You’ll also want to make a note of the current partition structure on the disk Windows is running on, as that might be useful for troubleshooting later on.

Once that’s done, create a bootable USB stick with Ubuntu on it using this guide.

Follow the Ubuntu guide for booting and installing from this USB stick. If you’re lucky, maybe this just works for you!

Disable TPM and Secure Boot

You might get one of these two errors before Ubuntu manages to boot: [Firmware Bug]: Failed to parse event in TPM Final Events Log, or Couldn't get size: 0x800000000000000e. If so, you need to turn off TPM and Secure Boot in the BIOS*:

  1. Reboot
  2. Hold F12 while booting to enter the boot menu
  3. Disable Firmware TPM and TPM under the security tab, and Securet Boot under the boot tab (see images below

* I don’t know what TPM or Secure Boot do or what the implications of turning it off are, other than that I can now install Ubuntu. You might want to do some of your own research here.

Change Storage Controller to AHCI

Continue with the guide. You should now be able to boot into Ubuntu, but it might not let you install it. It’s likely that at this point, the installer pops up an error message and directs you to If you read through that thread, you’ll see a long set of instructions along with several Dell users saying that they didn’t work. Instead of following these instructions, do the following (this worked for me and several other Dell users who commented on that page):

  1. Run bcdedit /set {current} safeboot minimal in Windows as administrator
  2. Reboot into BIOS setup and change storage controller to AHCI (i.e., change ‘SATA Operation’ from ‘RAID on’ to ‘AHCI’ – see image below)
  3. Continue to boot in Windows safemode and run bcdedit /deletevalue {current} safeboot as administrator
  4. Reboot back into Windows.

This changes your storage controller from RST to AHCI, and allows Ubuntu to understand the disk layouts. Before this change, your device manager shows these storage controllers:

And after the change, you should see this:

Make space for Ubuntu

My XPS 8930 came with an SSD and a normal hard disk. You’ll want Ubuntu running on the SSD, as that’s much faster. In order to do this you need to shrink the Windows partition in Disk Management:

I allocated 300gb for my Ubuntu partition, and ended up with this:

Finish installing Ubuntu

Now that this is all done, you can boot off the flash disk again and install Ubuntu on the new partition you created. Make sure you specify that partition as ‘Ext4 journaling file system’ and set the mount point as ‘/’.

…and we’re done! This was a lot more involved than expected, but past this point Ubuntu 20.04 has been great – the Nvidia drivers were already there, so installing CUDA was just a one-liner.

Creating a Simple Website for Free

Creating a Simple Website for Free

It’s really easy now to create a simple website from scratch!

A few months ago I was looking for a directory of ongoing machine learning competitions (like the kind you get on Kaggle), but I couldn’t find one.

So I decided to build a simple web page that just listed them. I was lucky that the domain was available and my hosting provider LCN were having a sale on domains, so I got that domain for free for the first year. (their customer service is great and they’re currently doing free domains!)

Since I didn’t need anything more fancy than a static page, I went with GitHub Pages for hosting, which is free and fast. The initial version of the page looked like this:

I was tempted to go with a database back-end, but ended up just keeping the list of competitions in a JSON file. This meant anyone could propose changes through pull requests on the project’s GitHub repository, and I was surprised that within a few weeks several people had added competitions I didn’t yet know about!

After I got a few visitors I realised a big grid really didn’t work on mobile, so I spent a bit of time trying to improve that. I hadn’t done any real web development in almost a decade, but with a little help (thanks Natasha!) and a bit of trial and error I got it to be much more usable on mobile.

From the start I’d had a form on the page so visitors could join the mailing list. It’s really easy (and free) to set this up through mailchimp, and I feel like it’s worth doing even if you’re not sure you’ll send many emails. So far around 500 people have joined the mailing list, and I’ve sent a handful of updates.

I also added Google Analytics early on, and it’s been nice to see the traffic growth. There was an temporary spike at launch when I posted the site on Reddit and Hacker News, followed by a brief plateau, after which the traffic increase has been slow but steady.

Most of the traffic comes through search, and I’ve found Google Search Console super valuable in trying to see which phrases people are searching for when they end up on my site, and for which phrases I could be ranking higher. It also pointed out a few site usability issues which could have been hurting my ranking.

I used the free Moz Pro trial to figure out which sites I should be trying to get links from in order to rank higher for relevant keywords. This led me to a few Medium posts, and after contacting the authors through LinkedIn they were usually more than happy to include a link to my site in their articles.

I’m hoping to see continued growth to the point where I can monetise the site a bit, but so far it’s been a good learning experience and it hasn’t cost me anything but time.

There’s one link on there advertising Genesis Cloud – a cloud provider I’ve been using for training some machine learning models (their GPUs are very cheap!). I contacted them as a customer looking to promote their service, and they gave me a link I could put on my site. If they get any new paying customers through my link I get some credits to use on their cloud compute service.

I hope my experience is helpful to others trying to set up a simple site.

Here’s a recap of the tools I mentioned:

And here’s my site, listing ongoing machine learning/data science competitions: Sign up to the mailing list for occasional (~once/month) updates on new competitions. All the code for the site is here.

For a more detailed guide on how to get set up your own blog with GitHub pages, I’d recommend this article.

Douglas Hofstadter on Love and Death

This weekend I read a short fragment from Douglas Hofstadter’s book I Am a Strange Loop at my friends James and Lucy’s wedding.

James is the person who initially introduced me to Douglas Hofstadter’s earlier book – Godel, Escher, Bach: an Eternal Golden Braid – first published 40 years ago this year. GEB quickly became my favourite book; an irresistibly playful tapestry of mathematics, art, music, programming, artificial intelligence, language, logic, and philosophy.

I Am a Strange Loop, written almost 30 years after GEB, was a sort of reinterpretation of that book. Hofstadter felt that the technical and subtle approach led many readers to miss the point of GEB, so he decided to write a less technical, more accessible, and more explicit treatment of the nature of self and machine consciousness.

Strange Loop is a deeply personal and emotional book, covering the death of Hofstadter’s wife Carol, and exploring Hofstadter’s moral basis for his vegetarian diet, among many other things. It speaks of love and death in intimate detail, in Hofstadter’s signature style. I’d highly recommend reading this book if you’re not attracted by the idea of the maths-dense GEB.

For the reading at James and Lucy’s wedding I edited a few paragraphs for brevity and context (and to remove references to death!). These are those paragraphs – I think they give a good introduction to Hofstadter’s views on human relationships:

What is really going on when you dream or think more than fleetingly about someone you love? In the terminology of Strange Loops, there is no ambiguity about what is going on.

The symbol for that person has been activated inside your skull, lurched out of dormancy, as surely as if it had an icon that someone had double-clicked. And the moment this happens, much as with a game that has opened up on your screen, your mind starts acting differently from how it acts in a “normal” context. You have allowed yourself to be invaded by an “alien universal being”, and to some extent the alien takes charge inside your skull, starts pushing things around in its own fashion, making words, ideas, memories, and associations bubble up inside your brain that ordinarily would not do so. The activation of the symbol for the loved person swivels into action whole sets of coordinated tendencies that represent that person’s cherished style, their idiosyncratic way of being embedded in the world and looking out at it.

As a consequence, during this visitation of your cranium, you will surprise yourself by coming out with different jokes from those you would normally make, seeing things in a different emotional light, making different value judgments, and so forth. Each one of us has a brain inhabited to varying extents by other I’s, other souls, the extent of each one depending on the degree to which you faithfully represent, and resonate with, the individual in question. But one can’t just slip into any old soul, no more than one can slip into any old piece of clothing; some souls and some suits simply “fit” better than others do.

Douglas Hofstadter, I Am a Strange Loop (edited)

In discussing an appropriate portion of the book, James and I also agreed to read the following at each other’s funeral, if either of us were ever to die:

In the wake of a human being’s death, what survives is a set of afterglows, some brighter and some dimmer, in the collective brains of all those who were dearest to them. And when those people in turn pass on, the afterglow becomes extremely faint. And when that outer layer in turn passes into oblivion, then the afterglow is feebler still, and after a while there is nothing left.

This slow process of extinction I’ve just described, though gloomy, is a little less gloomy than the standard view. Because bodily death is so clear, so sharp, and so dramatic, and because we tend to cling to the caged-bird view, death strikes us as instantaneous and absolute, as sharp as a guillotine blade. Our instinct is to believe that the light has all at once gone out altogether. I suggest that this is not the case for human souls, because the essence of a human being — truly unlike the essence of a mosquito or a snake or a bird or a pig — is distributed over many a brain. It takes a couple of generations for a soul to subside, for the flickering to cease, for all the embers to burn out. Although “ashes to ashes, dust to dust” may in the end be true, the transition it describes is not so sharp as we tend to think.

It seems to me, therefore, that the instinctive although seldom articulated purpose of holding a funeral or memorial service is to reunite the people most intimate with the deceased, and to collectively rekindle in them all, for one last time, the special living flame that represents the essence of that beloved person, profiting directly or indirectly from the presence of one another, feeling the shared presence of that person in the brains that remain, and thus solidifying to the maximal extent possible those secondary personal gemmae that remain aflicker in all these different brains. Though the primary brain has been eclipsed, there is, in those who remain and who are gathered to remember and reactivate the spirit of the departed, a collective corona that still glows. This is what human love means. The word “love” cannot, thus, be separated from the word “I”; the more deeply rooted the symbol for someone inside you, the greater the love, the brighter the light that remains behind.

Douglas Hofstadter, I Am a Strange Loop

Reproducibility issues using OpenAI Gym

Reproducibility is hard.

Last week I wrote a simple Reinforcement Learning agent, and I ran into some reproducibility problems while testing it on CartPole. This should be one of the simplest tests of an RL agent, and even here I found it took me a while to get repeatable results.

I was trying to follow Andrej Karpathy and Matthew Rahtz‘s recommendations to focus on reproducibility and set up random seeds early, but this was taking me much longer than expected – despite adding seeds everywhere I thought necessary, sometimes my agent would learn a perfect policy in a few hundred episodes, whereas other times it didn’t find a useful policy even after a thousand episodes.

I checked the obvious – setting seeds for PyTorch, NumPy, and the OpenAI gym environment I was using. I even added a seed for Python’s random module, even though I was pretty sure I didn’t use that anywhere.


Still I got different results on each run. I found a few resources pointing me to other things to check:

  • Consistency in data preparation and processing (not really relevant here- all the data I’m processing comes from the gym environment)
  • CuDNN specific seeding in PyTorch (my network is small enough to run quickly on CPU, so I’m not using CuDNN)

Out of ideas, I returned to debugging. My initial policy and target network weights were the same each run. Good. The first environment observation was the same too. Also good. But then, when I came to selecting a random action, I noticed env.action_space.sample() sometimes gave different results. Bad.

I looked through the OpenAI gym code for random seeds, and couldn’t find any seeding being done on the action space, even when the environment is passed a specific seed! I then found this commit, where Greg Brockman and others discuss how seeding should be done in OpenAI Gym environments. It looks like they initially wanted to seed action spaces as well as environments, but decided not to because they see action space sampling as belonging to the agent rather than the environment.

So here’s the solution, in one extra line:


I’d love to know why this isn’t called from env.seed()!

Anyway, now I’m getting reproducible results. To get an idea of how significant the difference is between different seeds even on a problem as simple of CartPole, here are five runs with different seeds:

Some resources with other useful reproducibility suggestions:

Pommerman: post mortem

Last month I spent a few days, together with some friends, trying to get an entry together for the Pommerman competition at this year’s NeurIPS.

While we learnt a huge amount, we didn’t manage to get an entry together in time for the conference.

All of us were pretty new to reinforcement learning, so maybe it’s not hugely surprising that we didn’t succeed. Still, I think if we’d done things differently we may have got there in time.

Some things we managed to achieve:

  • Get the game running, and set up some basic reinforcement learning agents (specifically, DQN and PPO) that could play the game.
  • Set up a training environment on a cloud server, to which we could deploy any number of training configs and have them run 16 at a time.
  • Set up Tensorboard logging for rewards, wins, and various intermediate metrics (% frequency for each action, # of survivors in each game, etc)
  • Train hundreds of PPO and DQN agents with different hyperparameters and network architectures.
  • Set up a validation environment that outputs performance stats for trained agents acting deterministically.
  • Experiment with experience replay, different types of exploration, CNNs, dropout, and simplified features.
  • Create different reward models intended to guide the agent to various strategies.

Despite all this, we didn’t manage to train an agent which figured out how to bomb through walls and find opponents. Our most successful agents would (mostly) avoid bombs near them, but otherwise be static.

What mistakes did we make?

  • We underestimated the difficulty of the problem. We figured we could just set some stock algorithms running on the environment and they’d figure out a basic strategy which we could then iterate on, but this wasn’t the case.
  • We committed fairly early on to a library (TensorForce) that we hadn’t used before without checking how good it was, or how easy it would be to change things. So when we realised, more than halfway into the project, that we really needed to get our agents to explore more, it was really hard for us to try to debug exploration and implement new techniques.
  • We spent a lot of time setting up a cloud GPU environment, which we ended up not needing! The networks we were training were so small that it was faster to just run parallel CPU threads.
  • We didn’t try to reduce complexity or stochasticity early enough, so we didn’t really know why our agents weren’t learning
  • We (I) introduced a few very frustrating bugs! The highlight was a bug where I featurised the board for our agent, and accidentally changed the board array that all agents (and the display engine) shared. This bug manifested itself as our agent’s icon suddenly changing, and took me hours to debug.

Knowing what we know now, how would we have approached this problem?

  • Simplify the environment – start with a smaller version of the problem (e.g. 4×4 static grid, one other agent) with deterministic rules. If we can’t learn this then there’s probably no point continuing!
  • Simplify the agent to the extent where we fully understand everything that’s happening – for example, write a basic DQN agent from scratch. This would’ve made it easier to add different exploration strategies.
  • Gradually increase complexity, by increasing the grid size or stochasticity.
  • Add unit tests!

Despite our lack of success, we all learnt a lot and we’ll hopefully be back for another competition!

NeurIPS Day 3: Reproducibility, Robustness, and Robot Racism

Some brief notes from day 3 of NeurIPS 2018. Previous notes here: Expo Tutorials Day 2.

Reproducible, Reusable, and Robust Reinforcement Learning (Professor Joelle Pineau)

I was sad to miss this talk, but lots of people told me it was great so I’ve bookmarked it to watch later.

Investigations into the Human-AI Trust Phenomenon (Professor Ayanna Howard)

A few interesting points in this talk:

  • Children are unpredictable, and working with data generated from experiments with children can be hard. (for example, children will often try to win at games in unexpected ways)
  • Automated vision isn’t perfect, but in many cases it’s better than the existing baseline (having a human record certain measurements) and can be very useful.
  • Having robots show sadness or disappointment turns out to be much more effective than anger for changing a child’s behaviour.
  • Humans seem to inherently trust robots!

Two cool experiments:

To what extent would people trust a robot in a high-stakes emergency situation? 

A research subject is led into a room by a robot, for a yet-to-be-defined experiment. The  room fills with smoke, and the subject goes to leave the building. On the way out, the same robot is indicating a direction. Will the subject follow the robot?

It turns out that, yes, almost all of the time, the subject follows the robot.

What about if the robot, when initially leading the subject to the room, makes a mistake and has to be corrected by another human?

Again, surprisingly, the subject still follows the robot in the emergency situation that follows.

The point at which this stopped being true was when they had the robot point to obviously wrong directions (e.g. asking the subject to climb over some furniture).

This research has some interesting conclusions, but I’m not completely convinced. For one, based on the videos of the ’emergency situation’, it seems unlikely that any of the subjects would have believed the emergency situation to be genuine. The smoke is extremely localised, and the ‘exit’ just leads into another room. It seems far more likely to me that the subjects were trying to infer the researchers’ intentions for the study and decided to follow the robot since that was probably what they were meant to do.

Unfortunately this turns out to be because their ethics committee wouldn’t give them approval to run a more realistic version of the study in a separate building, which is a real shame. More on this study in the the paper, or in this longer write-up.

Does racism and sexism apply to people’s perception of robots?

Say we program a robot to perform a certain sequence of behaviours, and then ask someone to interpret the robot’s intention behind that behaviour. Will their interpretation be affected by a robot’s ‘race’ or ‘gender’?

It turns out that, yes, it is. For example, when primed to be aware of the ‘race’ of a robot, subjects are more likely to interpret the behaviour as angry when the robot is black than when the robot is white.


But when humans are not primed to pay attention to race (and just shown robots with different colours), the effect disappears. Paper here.

Full video of the talk below.

NeurIPS Day 2: Cronenbergs

Just a brief highlight from day two: Professor Michael Levin’s incredible talk on What Bodies Think About, summarising 15 years of research in exploring the hardware/software distinction in biology. Hello, Cronenberg World.

A brief introduction on how the brain is far from the only place where computation happens in biology. Experiments with regenerative flatworms show that memories persist even when their heads are removed and grow back. Butterflies can remember experiences they had when they were caterpillars.


Then the key bit of the talk: reverse engineering bioelectric signals to trigger high-level anatomical subroutines – aka “reprogramming organs”. For example, telling a normal frog to “regrow a leg”, or convincing a flatworm to grow a different shape head that belongs to a much older species of flatworm. No genomic editing or stem cells – but controlling bioelectric signals by changing ion channels directly or through drugs.


The ultimate goal: a biological compiler. This could lead to amazing new regenerative medicine, though there are clearly a lot of ethical issues that need to be thought through!


Do watch the full talk when you get a chance. It’s the only talk I ever remember attending where my jaw literally dropped, several times.

Previous Neurips 2018 posts: Expo and Tutorials. Next: Day 3.

NeurIPS Day 1: Tutorials

Monday was the first day of the core conference, following the Expo on Sunday. There were a number of really interesting tutorials. Here’s a brief summary of the three I managed to attend.

Scalable Bayesian Inference (Professor David Dunson)

This tutorial explored one main question: how do we go beyond point estimates to robust uncertainty quantification when we have lots of samples (i.e. ‘n’ is large) or lots of dimensions (i.e. ‘p’ is large)?


The introduction was Professor Dunson’s personal ode to Markov Chain Monte Carlo (MCMC) methods, and in particular the Metropolis-Hastings algorithm. In his words, “Metropolis-Hastings is one of the most beautiful algorithms ever devised”.

He tackled some of the reasons for the (in his view incorrect) belief that MCMC doesn’t scale, and showed how MCMC methods can now be used to perform bayesian inference even on very large data sets. Some key approaches involve clever parallelisation (WASP/PIE) and approximating transition kernels (aMCMC). Interestingly, some of these techniques have the combined advantage of improving computational efficiency and mixing (an analogue of exploration in bayesian inference).

A recurring theme throughout the talk was Professor Dunson’s call for more people to work in the field of uncertainty quantification: “There are open problems and a huge potential for new research”.

His recent work on coarsening in bayesian inference – a specific way of regularising the posterior – allows inference to be more robust to noise and helps manage the bias/variance trade-off when optimising for interpretability (i.e. if a relatively simple model is only slightly worse than a very complex model, we probably want to go with the simple model). This is useful for example in medicine, where doctors want to be able to understand and critique predictions rather than using black-box point estimates.

20181203_095050 (1)

The second part of the talk went on to explore high-dimensional data sets, particularly those with small sample size: “you’ve given us a billion predictors and you’ve run this study on ten mice.”

Naive approaches in this area can have serious issues with multiple hypothesis testing or requiring an unjustifiably strong prior to get a reasonable uncertainty estimate. Point estimates can be even worse – or, in Professor Dunson’s words, “scientifically dangerous”. Accurate uncertainty quantification can allow us to say “We don’t have enough information to answer these questions.”


The hope is that over time, we can extend these methods to help scientists by saying “No, we can’t answer <this specific question that you’ve asked>. But given the data we have, <here’s something else we can do>. ”


Unsupervised Deep Learning (Alex Graves from DeepMind, Marc’Aurelio Ranzato from Facebook)

This talk started with a reclassification of ML techniques: rather than thinking of three categories (supervised learning/unsupervised learning/reinforcement learning), it can be more useful to think of four categories across two dimensions.

This talk focused on the two quadrants on the right.

The key idea I took from this talk was that we can apply unsupervised learning to problems we’ve previously thought of as supervised learning, if we’re smart about how we do it.

For example, the classic approach to machine translation is to take a large set of sentence pairs across two languages, and then train a neural net to learn the mapping between the two. While this can work well, it relies on a lot of labelled data, which isn’t always available. Since there’s far more single-language data available, another approach would be to get a model to learn the structure of the data, for example by embedding words or phrases in some space which can capture relationships between them.


Since languages correspond to things in the real world, if we can learn an accurate enough mapping for two separate languages we can then find a way to go between languages by exploiting the shared word embedding space. Doing this for phrases or sentences is harder, but can already improve on supervised learning in certain special cases – for example in English-Urdu,

The available English-Urdu labelled data is restricted to very specific specific domains (genomics data/subtitles), and doesn’t allow supervised models to generalise well to other domains. In this case unsupervised models trained on monolingual examples can do abetter job (albeit still not necessarily good enough to be useful in practice).



Also interesting were the timelines of popularity of unsupervised feature learning in vision and natural language processing, showing how this type of approach goes in and out of fashion over time.



Both speakers here were quite optimistic about how much content they’d get through in an hour, so they didn’t quite manage to cover everything. I’d highly recommend checking out the slides though, since there are lots of references to interesting papers: part 1 and part 2.

Counterfactual Inference (Professor Susan Athey)

Like a few others I spoke to, I had high hopes for this talk but was a little disappointed. A lot of time was spent covering basic stats concepts with text-heavy slides, and even though Professor Athey’s quite an engaging speaker I didn’t feel like I learnt very much or even gained a good intuition for the types of problems her framework of counterfactual inference can solve.

It felt like much of the work was a slight reframing of supervised learning to account for unobserved counterfactuals where the probability of observation (propensity score) is correlated with some of the underlying covariates.

Having said that, it was nice to get a different perspective from someone who’s working in economics where the standards for publication and expectations of interpretability can be very different. Some of the notation was also interesting and new to me, and might be useful to anyone wanting to do a better job of considering counterfactuals in their work. It was noticeable that she compared her work to Deep Learning/AlphaGo quite a few times, even though it felt like her tools for counterfactual inference operate in quite a different problem domain. In the vain of Marc’Aurelio Ranzato’s popularity charts from above, I wonder if there’s an expectation that people would find the work more interesting if framed in terms of today’s Deep Learning, which would be a shame.


Some of the themes from the Scalable Bayesian Inference workshop came up again, such as the idea that modern ML techniques haven’t been used much in economics since it’s hard to get things like confidence intervals. Towards the end Professor Athey presented some recent contributions to the field, such as Generalised Random Forests.

For anyone interested in learning more about the intersection of machine learning and econometrics, AEA has a longer lecture series featuring Professor Athey which goes into much more depth on the topic.

Next: Day 2.

NeurIPS Day 0: Expo

Today was NeurIPS Expo, the zeroth day of this year’s Neural Information Processing Systems conference in MontrĂ©al. The Expo is a day with content from industry right before the rest of the conference. Below are some highlights from a few of the sessions I managed to attend.

The Montréal Declaration

An initiative of the University of MontrĂ©al, the Declaration “aims to spark public debate and encourage a progressive and inclusive orientation to the development of AI”.

Unlike the Asilomar AI Principles, which were set by experts in the field, the principles of the Montréal Declaration are being set by consultation with the public and take a local (Québec-centric) view rather than trying to solve global issues.

Notably, the Declaration will remain open for revision to account for the fact that societal norms and our understanding of AI will adapt. The next draft of the Declaration, with updated principles, will be published on the 4th of December. [now published: English/French]

Despite an attempt to take opinions from a broad cross-section of the population, there was a significant skew towards highly educated people in the ~500 participant group, as well as towards those working in tech, and towards men.

While the content was interesting, the talk was a little unfocused – very little time was spent on context/setup (whose initiative is this? why?) and a lot of time on niche issues/tangents (what preferences might people subscribing to various moral frameworks express about certain trolley problem scenarios?).

One of the speakers suggested that rather than spending time considering moral dilemmas, more time should be spent planning societal/structural changes that would remove or reduce the need for machines to face those dilemmas.

More concretely, rather than optimising for thousands of different trolley problems, we could figure out ways to arrange our roads so that autonomous vehicles are significantly less likely to come across any pedestrians or unexpected objects. We could do this by, for example, investing more in pedestrian infrastructure (e.g. segregated sidewalks and raised crossing points), and rolling out autonomous vehicles only in areas with sufficient such infrastructure.

NetEase FUXI – Reinforcement Learning in Industry

Despite a lot of mentions of ‘AI’ and ‘Big Data’ in the first few minutes of this session, it actually turned out to be fairly interesting.

I didn’t manage to stay long due to a clash with the HRT talk, but here are some interesting points from the first and second parts of the workshop:

  • Game companies don’t want their bots to be too good, because humans want to have a chance of winning! So the problem here is different from e.g. DeepMind’s Atari bots. (not that there’s too much danger of unintentionally creating excessively strong strategies with today’s techniques)
  • FUXI are trying to create a meta chatbot design engine that can work across games, and a high-level character personality design engine.
  • Interesting quote: “Our ultimate goal is to build a virtual human.”
  • They framed supervised learning as being about ‘predictions’, and reinforcement learning as being about ‘decisions’, and claimed that recommendation tasks can be better framed in an RL context.
  • There was some discussion of RL issues with sample efficiency and exploration leading to limited current real-world use cases (with references to RL never worked and Deep RL Doesn’t Work Yet)
  • “Humans are not believed to be very easily simulated” (!)
  • Dogs are better at reinforcement learning than DQN (though maybe not as good at Atari games)
  • When building ‘customer simulators’ to train RL-based recommendation engines, they found value in trying to simulate intention rather than behaviour (through techniques like Inverse RL and GAIL)
  • They’re planning on releasing “VirtualTaobao” simulators, essentially gym environments for recommendation engines.

FUXI clearly didn’t get the diversity memo! Six men running a workshop could’ve picked a better image than this one to showcase one of their games.

Hudson River Trading

Everyone attending this panel in the hope of learning the secret Deep Learning techniques that could make them millions in trading was immediately disappointed by the introduction – “Due to the competitive nature of our business we can only talk about problems, not solutions…”

Fortunately for those who stayed anyway, the speakers were all great and the content was interesting.

HRT spent some time at the beginning of their talk framing their firm (and prop trading firms more generally) as beneficial to society by showing a reduction in US equity spreads over the past few decades, and linking this to lower trading costs for investors.

As would be expected for a prop trading firm, most of the Q&As were fairly uninformative though at least slightly amusing.
Audience member 1: “What types of models do you use?”
HRT employee: “We use a variety of different models.”
Audience member 2: “What is your average holding period?”
HRT employee: “Our strategies have a variety of different holding periods.”
Audience member 3: “Are you actually using Deep RL in production trading?”
HRT employee: “I’m afraid I can’t answer that. Come work for us and you’ll find out. ”
Audience member 4: “What are the annual returns and Sharpe ratios for your RL-based strategies? ”
HRT employee: “I cannot answer that question.”

One of the speakers previously worked at DeepMind, and it was interesting to hear him contrast different ‘families’ of RL and which might map most closely to the problem of trading.

The families in his classification were:

  • DQN: possibly sample-efficient enough (using Rainbow), but the state space is discrete, and these algorithms are not that great at exploration (though that’s changing). What’s the trading equivalent to a frame in Atari? Is it a tick? Or multiple ticks? How do we set constraints in a way that allows our model to optimise around them?
  • AlphaGo: adversarial and with the set of valid actions dependent on the state, but these strategies rely on an accurate world model and require a lot of compute.
  • Robotics: continuous N-dimensional actions, similar safety concerns/constraints, shared difficulty of translating model from simulator to reality. Maybe a trading algo dealing with market changes is analogous to robotics algo being robust with respect to lighting changes.
  • “Modern Games” (Dota, StarCraft, Capture The Flag): adversarial, simulations are expensive, big networks are required, some of the inputs are “a narrow window into a wider world”. (in the sense that they capture the current state perfectly but don’t tell you about the longer term consequences of your actions)

One audience question which did get a meaningful answer was whether they were using RL for portfolio optimisation. The response was that RL isn’t data efficient enough yet for it to be used for multi-day portfolio optimisation, since “the number of days that have happened since the start of electronic trading is not that many”.

A slide from the portfolio optimisation part of the talk, showing how optimising around position/margin constraints can be preferable to the naive myopic strategy of trading as you normally would until you hit the constraints.

Intel – nGraph

Some of this talk went comfortably over my head, but I was pleased to find that I understood more than I expected to.

My understanding is that Intel’s nGraph library is a software optimisation component which sits between different machine learning libraries used to construct computational graphs (TensorFlow, PyTorch, Chainer, PaddlePaddle, MXNet, …) and kernel libraries used to run those graphs on specific hardware (cuDNN MKL-DNN, …).


Having one shared library sitting between between these two means that you need to consider m+n different integrations rather than mn integrations. (where m is the number of ML libraries, and n the number of kernel libraries)


I didn’t understand much of the second part of the talk, but the optimisation examples they gave in the first part of the talk were based around removing redundant or inefficient operations – for example, two consecutive transpose operations (which would cancel each other out) would just be removed. Similarly, if two parts of a graph are doing exactly the same thing they could be condense into one. Adding or multiplying a tensor by zero can be dealt with at compile time.


All this (and much more that I didn’t understand) can supposedly lead to significant performance improvements in training neural nets, particularly for lower batch sizes.

Next: Day 1.

Pommerman: getting started

Last weekend we spent a day taking our first steps towards building a Pommerman agent.

In addition to a full game simulation environment, the team running the competition were kind enough to provide helpful documentation and some great examples to help people get started.

There are a few particularly useful things included:

  • A few example implementations of agents. One just takes random actions, another is heuristic based, and a third uses a tensorforce implementation of PPO to learn to play the game.
  • A Jupyter notebook with a few examples including a step-by-step explanation of the tensorforce PPO agent implementation. (this is probably the best place to start)
  • A visual rendering of each game simulation.

Before we get anywhere, we hit a few small stumbling blocks.

  • It took us a few attempts, installing different versions of Python, before we got TensorFlow running. Now we know that TensorFlow doesn’t support Python 3.7, or any 32-bit versions of Python.
  • The tensorforce library, which the included PPO example is based on, has been changing rapidly. Some of the calls to this library no longer worked. While the code change required was minimal, it took at least an hour of digging through tensorforce code before we knew what exactly needed to be changed. We committed a small fix to the notebook here, which now works with version 0.4.3 of tensorforce, available through pip. (I wouldn’t recommend using the latest version of tensorforce on GitHub as we encountered a few bugs when trying that)

I was hoping we’d get to an agent which could beat the heuristics-based SimpleAgent at FFA, but we didn’t manage to get there. In the end, we managed to:

  • Get the Jupyter notebook with examples running
  • Understand how the basic tensorforce PPO agent works
  • Set up a validation mechanism for running multiple episodes with different ages, and save each game so we can replay it for debugging purposes.
  • Train a tensorforce PPO agent (though it was technically training, we didn’t actually manage to get it to beat the SimpleAgent in any games yet)

To be continued…