Book review: Deep Learning by Goodfellow, Bengio and Courville

By | January 5, 2017

Deep Learning MIT bookThe big success story this past year (several decades in the making, though) was deep learning, a machine learning method that has allowed researchers and practitioners to tackle with success some of the hardest problems in AI. Only problem is that for newcomers to the field no canonical reference for deep learning exists. Until now, that is. Deep Learning is a new book, part of MIT’s The Adaptive Computation and Machine Learning series and written by deep learning specialists I. Goodfellow, Y. Bengio and A. Courville, that fills this gap with much success.

It is the first book to cover the field of deep learning in depth while at the same time remaining accessible to those being introduced to the subject for the first time.

The book is actually partitioned into 3 parts.

The first part covers applied math and machine learning basics for those who are new to machine learning but also for those who need a refresher. The reader receives a basic but good introduction to linear algebra, probability theory, optimization (mostly gradient descent since it is the dominant optimization algorithm in deep learning), and an short tour of machine learning concepts. Note that the machine learning chapter covers all machine learning algorithms and related concepts not just those relating to deep learning. It adequately surveys a (very) large research field explaining the most basic concepts one needs to build machines that can learn from data. It includes discussions on overfitting, underfitting, and machine learning algorithm capacity. Building on these basic concepts, the chapter motivates the need for deep learning as an effort to address some of the shortcomings of traditional machine learning algorithms.. It is an important chapter because it provides context for deep learning in the grander machine learning field. If you already are familiar with machine learning concepts, I still recommend that you spend the time to read this chapter.

The second part of the book introduces deep learning with a focus on feed forward neural networks, their construction and training. This part of the book is written for a target audience of deep learning practitioners. It clearly explains the choices a practitioner has to make in order to use deep learning to build accurate and useful predictive systems. The latter task is not an easy one.

I think the book is very clear as to the amount of effort that goes into using deep learning effectively in order to solve machine learning problems. Deep learning is not a black box method that just works. It takes effort to get good results not to mention lots of data (appropriate for the problem to be solved) and lots of computational horsepower. That said, in this second part of the book, the authors introduce the different types of hidden and output units and discuss the, as important, aspect of architecture design; many applied deep learning research papers published focus on discovering an architecture that is suitable for the problem at hand. The Back-Propagation algorithm used to train a network is also introduce early on. Regularization methods such as dataset augmentation, early stopping, dropout, and, the recently popular, adversial training are introduced as all comprise an important weapon in the successful practitioner’s arsenal. The authors also provide a good introduction on convolutional networks popular in Computer Vision and Recurrent and Recursive neural networks popular in speech recognition or generally in modelling sequential data. Part 2 of the book concludes with a chapter on application of deep learning especially useful for those looking for inspiration.

I only have one complaint about part 2 of this book. Since it is targeted to practitioners, I think the authors should have included more examples to demonstrate the introduced concepts. I don’t mean examples given in a specific programming language or one of the many freely available deep learning frameworks. I find the lack of dependence on a specific computational framework to be one of the book’s best attributes. Describing how to use a framework distracts the reader from focusing on the core concepts that, at the end of the day, can be implemented using any framework available. I mean that the book could be enhanced with some numerical examples to illustrate some of the core ideas.

The third and last part of the book is dedicated on deep learning research concepts. It is a collection of topics mostly relevant to researchers who want to push the field further ahead. It covers topics such as autoencoders, Monte Carlo methods, approximate inference, and generative models just to name a few. People already familiar with basic machine learning concepts who have had an introduction to deep learning either through experience or a university course could safely jump to this last part of the book.

If you are interested in pushing the boundaries of machine learning then these last 8 chapters will provide you with ample fodder for your canon. That said, the last part of the book may be mostly of interest to academics, especially PhD students and post-docs. The take away message for the rest of us is that deep learning, no matter its popularity in the popular media, it is still at its infancy and hardly a one-fits-all method straight out of the box. There exists many challenges to be solved. These last 200 or so pages of this well written book provide evidence as to why those who really understand deep learning methods are trying to quell the hype surrounding it this past year. There is still much work to be done so let’s be careful with the hype. We don’t want to have to go through another AI winter. Finally, the large number of different subjects covered in this section tells me that this book will become obsolete rather fast as researchers discover new deep learning methods and improve upon existing ones. I hope the authors will continue to update this book as our knowledge of this powerful machine learning technique increases.

In conclusion, Deep Learning is a good book for beginner and advanced machine learning practitioners as well as academics. The authors have done a wonderful job bringing a difficult subject to within reach of a wide audience at different levels of expertise.

My strong recommendation is to buy now!