Hashing for large scale similarity

Similarity computation is a very common task in real-world machine learning and data mining problems such as recommender systems, spam detection, online advertising etc. Consider a tweet recommendation problem where one has to find tweets similar to the tweet user previously clicked. This problem becomes extremely challenging when there are billions of tweets created each day.

In this post, we will discuss the two most common similarity metric, namely Jaccard similarity and Cosine similarity; and Locality Sensitive Hashing based approximation of those metrics.



HFT-like Trading Algorithm in 300 Lines of Code You Can Run Now

Commission Free API Trading Can Open Up Many Possibilities

Alpaca provides commission-free stock trading API for individual algo traders and developers, and now almost 1,000 people hang around in our community Slack talking about many different use cases. Among other things, like automated long-term value investing and Google Spreadsheet trading, high-frequency trading (“HFT”) often came up as a discussion topic among our users.

Is High-Frequency Trading (“HFT”) That Special?

Maybe because I don’t come from a finance background, I’ve wondered what’s so special about hedge funds and HFTs that those “Wallstreet” guys talk about. Since I am a developer who always looks for ways to make things work, I decided to do research and to figure out myself on how I could build similar things to what HFTs do.

I am fortunate to work with colleagues who used to build strategies and trade at HFTs, so I learned some basic know-how from them and went ahead to code a working example that trades somewhat like an HFT style (please note that my example does not act like the ultra-high speed professional trading algorithms that collocate with exchanges and fight for nanoseconds latency). Also, because this working example uses real-time data streaming, it can act as a good starting point for users who want to understand how to use real-time data streaming.

The code of this HFT-ish example algorithm is here, and you can immediately run it with your favorite stock symbol. Just clone the repository from GitHub, set the API key, and go!


Throttling Third-Party API calls with AWS Lambda

In the serverless world, we often get the impression that our applications can scale without limits. With the right design (and enough money), this is theoretically possible. But in reality, many components of our serverless applications DO have limits. Whether these are physical limits, like network throughput or CPU capacity, or soft limits, like AWS Account Limits or third-party API quotas, our serverless applications still need to be able to handle periods of high load. And more importantly, our end users should experience minimal, if any, negative effects when we reach these thresholds.

There are many ways to add resiliency to our serverless applications, but this post is going to focus on dealing specifically with quotas in third-party APIs. We’ll look at how we can use a combination of SQS, CloudWatch Events, and Lambda functions to implement a precisely controlled throttling system. We’ll also discuss how you can implement (almost) guaranteed ordering, state management (for multi-tiered quotas), and how to plan for failure. Let’s get started!


Multi-region serverless backend 

In 2018, I wrote a series of blog posts on building a multi-region, active-active, serverless architecture on AWS [1, 2, 3 and 4]. The solution was built using DynamoDB Global Tables, Lambda, the regional API Gateway feature, and Route 53 routing policies. It worked well as a resiliency pattern and as a disaster recovery (DR) strategy . But there was an issue.



The Game Engine Black Book: DOOM features a whole chapter about DOOM console ports and the challenges they encountered. The utter failure of the 3DO, the difficulties of the Saturn due to its affine texture mapping, and the amazing “reverse-engineering-from- scratch” by Randy Linden on Super Nintendo all have rich stories to tell.

Once heading towards disaster[1], the Playstation 1 (PSX) devteam managed to rectify course to produce a critically and commercially acclaimed conversion. Final DOOM was the most faithful port when compared to the PC version. The alpha blended colored sectors not only improved visual quality, they also made gameplay better by indicating the required key color. Sound was also improved via reverberation effects taking advantage of the PSX’s Audio Processing Unit.

The devteam did such a good job that they found themselves with a few extra CPU cycles they decided to use to generate animated fire both during both the intro and the gameplay. Mesmerized, I tried to find out how it was done. After an initial calling found no answer, I was about to dust off my MIPS book to rip open the PSX executable when Samuel Villarreal replied on Twitter to tell me he had already reverse-engineered the Nintendo 64 version[2]. I only had to clean, simplify, and optimize it a little bit.

It was interesting to re-discover this classic demoscene effect; the underlying idea is similar to the first water ripple many developers implemented as a programming kata in the 90’s. The fire effect is a vibrant testimony to a time when judiciously picked palette colors combined with a simple trick were the only way to get things done.