The myth of using Scala as a better Java

When people talk about their experience with Scala, they often say that it is possible to use Scala as a better Java. And indeed, many companies, especially the ones that adopted Scala around 2008-2009, didn’t want to give up the familiar tooling and simply integrated Scala into existing workflows based on Maven. At that time, calling Scala an improved version of Java was questionable but at least justifiable. However, it’s no longer the case. For the most part, contemporary Scala shops don’t use Maven as a build tool, don’t use Spring as a DI container and rarely, if ever, resort to classical design patterns. What do they use then?

MXNet – Deep Learning Framework of Choice at AWS

Machine learning is playing an increasingly important role in many areas of our businesses and our lives and is being employed in a range of computing tasks where programming explicit algorithms is infeasible.

At Amazon, machine learning has been key to many of our business processes, from recommendations to fraud detection, from inventory levels to book classification to abusive review detection. And there are many more application areas where we use machine learning extensively: search, autonomous drones, robotics in fulfillment centers, text and speech recognitions, etc.

Among machine learning algorithms, a class of algorithms called deep learning hascome to represent those algorithms that can absorb huge volumes of data and learn elegant and useful patterns within that data: faces inside photos, the meaning of a text, or the intent of a spoken word. A set of programming models has emerged to help developers define and train AI models with deep learning; along with open source frameworks that put deep learning in the hands of mere mortals. Some examples of popular deep learning frameworks that we support on AWS include Caffe, CNTK, MXNet, TensorFlow, Theano, and Torch.

Among all these popular frameworks, we have concluded that MXNet is the most scalable framework. We believe that the AI community would benefit from putting more effort behind MXNet. Today, we are announcing that MXNet will be our deep learning framework of choice. AWS will contribute code and improved documentation as well as invest in the ecosystem around MXNet. We will partner with other organizations to further advance MXNet.

The Rise and Fall of Scala

Five years ago, Scala seemed like the next big thing in programming languages because it elegantly enabled functional programming within an object-oriented paradigm. Today, Scala’s popularity seems to be fading, with companies like LinkedIn and Yammer moving away from it. The TIOBE index ( of software language popularity ranked Scala at #13 in 2012; now it’s fallen to #32 in August 2016, being used by less than .6% of the programming community.

Here’s another ominous sign for Scala: Lightbend, its parent company, is now releasing new frameworks with a Java API before the Scala version. Anecdotally, as CTO of a leading software product engineering company, I meet many software development managers, and I know of at least two who have made the painful decision to abandon Scala after more than a year of adoption. What happened? What gave Scala its initial popularity boost, and what caused its decline? Are there any use cases for which Scala is still the best choice?

Sarcasm Detection with Machine Learning in Spark

This post is inspired by a site I found whilst searching for a way to detect sarcasm within sentences. As humans we sometimes struggle detecting sarcasm when we have a lot more contextual information available to us. People are emotive when they speak, they use certain tones and these traits can help us understand when someone is being sarcastic. However we don’t always catch it! So how the hell could a computer detect this, when all it has is text.

Well one way is to Machine Learning. I wondered if I could set up a machine learning model that could accurately predict sarcasm (or accurately enough for it to be effective). This search led me to the above link site where the author Mathieu Cliche cleverly came up with the idea of using tweets as the training set.

As with any machine learning algorithm, its level of accuracy is only as good as the training data it’s provided. Create a large catalog of sarcastic sentences could be rather challenging. However searching for tweets that contain the hastag #sarcasm or #sarcastic would provide me with a vast amount of training data (providing a good percentage of those tweets are actually sarcstic).

Using that approach as the basis, I developed a Spark application using the MlLib api that would use the Naive Bayes classifier to detect sarcasm in sentences – This post will cover the basics and I will be expanding on this next time to utilise sarcastic tweets to train my model.

Machine Learning for Developers

“Most developers these days have heard of machine learning, but when trying to find an ‘easy’ way into this technique, most people find themselves getting scared off by the abstractness of the concept of Machine Learning and terms as regression, unsupervised learning, Probability Density Function and many other definitions. If one switches to books there are books such as An Introduction to Statistical Learning with Applications in R and Machine Learning for Hackers who use programming language R for their examples.

However R is not really a programming language in which one writes programs for everyday use such as is done with for example Java, C#, Scala etc. This is why in this blog machine learning will be introduced using Smile, a machine learning library that can be used both in Java and Scala. These are languages that most developers have seen at least once during their study or career.

The first section ‘The global idea of machine learning’ contains all important concepts and notions you need to know about to get started with the practical examples that are described in the section ‘Practical Examples’. The section practical examples is inspired by the examples from the book Machine Learning for Hackers. Additionally the book Machine Learning in Action was used for validation purposes.

The second section Practical examples contains examples for various machine learning (ML) applications, with Smile as ML library.

Note that in this blog, ‘new’ definitions are hyperlinked such that if you want, you can read more regarding that specific topic, but you are not obliged to do this in order to be able to work through the examples.

As final note I’d like to thank the following people:

  • Haifeng Li for his support and writing the awesome and free to use library Smile.
  • Erik Meijer for all suggestions and supervision of the process of writing this blog.
  • Richard van Heest for his feedback and co-reading the blog.
  • Lars Willems for his feedback and co-reading the blog…”

How we ended up with microservices

“When I was at SoundCloud, I was responsible for the migration from a monolithic Ruby on Rails application to a constellation of microservices. I’ve told the technical side of this story multiple times, both in presentations, and as a multi-part series for SoundCloud’s engineering blog. These engineering bits are what people are most interested in hearing about, but recently I realised I never explained to a wider audience how we ended up using microservices to begin with.

I am sorry to disappoint my fellow techies, but the reason we migrated to microservices had to do much more with productivity than pure technical matters. I’ll explain.

Note: This post definitely has a lot of revisionism, and, in trying to make it easier to understand, oversimplifies a fairly chaotic chain of events into a linear timeline. Nevertheless, I believe it paints a pretty good picture of my first couple of years at SoundCloud…”

Predict Social Network Influence with R and H2O Ensemble Learning

“H2O is an awesome machine learning framework. It is really great for data scientists and business analysts “who need scalable and fast machine learning”. H2O is completely open source and what makes it important is that works right of the box. There seems to be no easier way to start with scalable machine learning. It hast support for R, Python, Scala, Java and also has a REST API and a own WebUI. So you can use it perfectly for research but also in production environments.

H2O is based on Apache Hadoop and Apache Spark which gives it enormous power with in-memory parallel processing…”

Functional Programming in the Real World

“Here is a list of functional programs applied to real-world tasks. The main criterion for being real-world is that the program was written primarily to perform some task, not primarily to experiment with functional programming. Functional is used in the broad sense that includes both `pure’ programs (no side effects) and `impure’ (some use of side effects). Languages covered include CAML, Clean, Erlang, Haskell, Miranda, Scheme, SML, and others…”

Sodium – Functional Reactive Programming (FRP) Library for Java, Haskell, C++, C# and Scala

“Sodium – Functional Reactive Programming in C#, C++, Java, Haskell and Scala (other languages to be added) This is based on Flapjax, Yampa, scala.React and a number of other Functional Reactive Programming efforts, as well as a lot of personal experience. Enjoy. Status: Haskell – complete Java – complete C++ – complete C# – complete Embedded-C – just an experiment Rust – got nowhere with it yet Scala – complete…”