Month: March 2014
10 Things You Should Know About Running MongoDB at Scale
“This post outlines ten things you need to know for operating MongoDB at scale based on my experience working with MongoDB customers and open source users:…”
Install Perl modules without root rights on Linux
“If you have root rights there might be other, easier ways to install Perl modules than the following.
After an initial configuration, many Perl modules from CPAN can be easily installed, but there are quite a few that required some additional tools. In this article I’ll assume that either you already have those installed, or that at least those you can install as root.
If you don’t have those prerequisites then you will need to build the modules on another, similar machine where you do have root rights and then transfer the whole directory tree. That’s another story that will be covered in a separate article. In that situation you’d be probably better off downloading and using DWIM Perl…”
http://perlmaven.com/install-perl-modules-without-root-rights-on-linux-ubuntu-13-10
Coping with the TCP TIME-WAIT state on busy Linux servers
“The Linux kernel documentation is not very helpful about what net.ipv4.tcp_tw_recycle does:…”
http://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux.html
Simplified Template Directory (Great for PHP/Python/Perl users)
“We’ve made it even easier to migrate your PHP, Python, and Perl apps to Openshift. With this latest release, I’m happy to announce that now all your PHP/Python/PERL code can be put in the root directory of your Git repo instead of a PHP/ or WSGI/ directory. To show you just how this works, I’ll start off with PHP…”
https://www.openshift.com/blogs/openshift-online-march-2014-release-blog
The tech behind our time series graphs – 2bn docs per day, 30TB per month
“Server Density processes over 30TB/month of incoming data points from the servers and web checks we monitor for our customers, ranging from simple Linux system load average to website response times from 18 different countries. All of this data goes into MongoDB in real time and is pulled out when customers need to view graphs, update dashboards and generate reports…”
https://blog.serverdensity.com/tech-behind-time-series-graphs-2bn-docs-per-day-30tb-per-month/
Redis as the primary data store? WTF?!
“Redis is a key-value in memory data store typically used for caches and other such mechanisms to speed up web applications. We however store all our data in Redis as our primary database.
The web is abound with warnings and cautionary tales about going this route. There are horror stories about lost data, hitting memory limits, or people unable to effectively manage the data within Redis, so you might be wondering “What on earth were you thinking?!” So here is our story, why we decided to use Redis anyway, and how we overcame those issues…”
https://moot.it/blog/technology/redis-as-primary-datastore-wtf.html
Ricky Jay plays Poker
Machine Learning With Python
“Machine learning (ML) teaches machines how to carry out tasks by themselves. It is that simple. The answer is No. This article will give you a broad overview of the types of learning algorithms that are currently used in the diverse fields of machine learning and what to watch out for when applying them.
The goal of machine learning is to teach machines (software) to carry out tasks by providing them with a couple of examples (how to do or not do a task). Let us assume that each morning when you turn on your computer, you perform the same task of moving e-mails around so that only those e-mails belonging to a particular topic end up in the same folder. After some time, you feel bored and think of automating this chore. One way would be to start analyzing your brain and writing down all the rules your brain processes while you are shuffling your e-mails. However, this will be quite cumbersome and always imperfect. While you will miss some rules, you will over-specify others. A better and more future-proof way would be to automate this process by choosing a set of e-mail meta information and body/folder name pairs and let an algorithm come up with the best rule set. The pairs would be your training data, and the resulting rule set (also called model) could then be applied to future e-mails that we have not yet seen. This is machine learning in its simplest form…”