Machine Learning Cheat Sheet (for scikit-learn)

“As you hopefully have heard, we at scikit-learn are doing a user survey (which is still open by the way).
One of the requests there was to provide some sort of flow chart on how to do machine learning. As this is clearly impossible, I went to work straight away…”

MongoDB: How to limit results and how to page through results

“In this post we are going to take a look at how to limit results in MongoDB as well how to page through results. MongoDB use limit to limit the number of results return, MongoDB use skip to skip a number of records from the results set. Using limit in conjunction with skip enables you to do paging in MongoDB…”

Machine Learning, Big Data, Deep Learning, Data Mining… FAQ

“What’s the difference between machine learning, deep learning, big data, statistics, decision & risk analysis, probability, fuzzy logic, and all the rest?…”

“In mathematics there are many “logic” theories that have more than one truth value, and not just one universal “logic.” What’s up with that?…”

“What’s the difference between probability and decision analysis?…”



Py2.6+ and Py3.0+ backport of Python 3.3’s LRU Cache (Python recipe)

“Full-featured O(1) LRU cache backported from Python3.3. The full Py3.3 API is supported (thread safety, maxsize, keyword args, type checking, __wrapped__, and cache_info). Includes Py3.3 optimizations for better memory utilization, fewer dependencies, and fewer dict lookups…”

Optimising NginX, Node.JS and networking for heavy workloads

“Used in conjunction, NginX and Node.JS are the perfect partnership for high-throughput web applications. They’re both built using event-driven design principles and are able to scale to levels far beyond the classic Y10K limitations afflicting more archaic web servers such as Apache. Out-of-the-box configuration will get you pretty far, but when you need to start serving upwards of thousands of requests per second on commodity hardware, there’s some extra tweaking you must perform to squeeze every ounce of performance out of your servers.

This article assumes you’re using NginX’s HttpProxyModule to proxy your traffic to one or more upstream node.js servers. We’ll cover tuning sysctl settings in Ubuntu 10.04 and above, as well as node.js application and NginX tuning. You may be able to achieve similar results if you’re using a Debian Linux distribution, but YMMV if you’re using something else…” enjoy FP in Python

“Despite the fact that Python is not pure-functional programming language, it’s multi-paradigm PL and it gives you enough freedom to take credits from functional programming approach. There are theoretical and practical advantages to the functional style:

  • Formal provability
  • Modularity
  • Composability
  • Ease of debugging and testing library provides you with missing “batteries” to get maximum from functional approach even in mostly-imperative program…”

Follow up to “The Unreasonable Effectiveness of C”

“…Higher level languages, like Python and Ruby, are extremely useful and should definitely be used where appropriate. Java has a lot of advantages, C++ does too. Erlang is amazing. Most every popular language has uses where it’s a better choice…”

“But when both raw performance and reliability are critical, C is very very hard to beat…”

“Don’t just blindly use C, understand it’s own tradeoffs and if it makes sense in your situation. Erlang is quite good for us, but to stay competitive we need to move on to something faster and industrial grade for our performance oriented code. And Erlang itself is written in C…”

“C++ is also a complicated mess, so when you adopt C++ for it’s libraries and community, you have to take the good with the bad and weird to get the benefits. And there is a lot of disagreement what constitutes bad or weird. Your sane subset of the language is very likely to be at odds with others ideas of a sane subset. C has this problem to a much much smaller degree…”