thoughts…

rants and bookmarks about programming stuff…


SHARK – C++ machine learning library

SHARK is a fast, modular, feature-rich open-source C++ machine learning library. It provides methods for linear and nonlinear optimization, kernel-based learning algorithms, neural networks, and various other machine learning techniques (see the feature list below). It serves as a powerful toolbox for real world applications as well as research. Shark depends on Boost and CMake. It is compatible with Windows, Solaris, MacOS X, and Linux. Shark is licensed under GPLv3…”

http://image.diku.dk/shark/sphinx_pages/build/html/index.html

 


List of Machine Learning APIs

“Wikipedia defines Machine Learning as “a branch of artificial intelligence that deals with the construction and study of systems that can learn from data.”  Below is a compilation of APIs that have benefited from Machine Learning in one way or another, we truly are living in the future so strap into your rocketship and prepare for blastoff…”

http://blog.mashape.com/post/48074869493/list-of-machine-learning-apis


Getting Started with Python for Data Scientists

“With the R Users DC Meetup broadening its topic base to include other statistical programming tools, it seemed only reasonable to write a meta post highlighting some of the best Python tutorials and resources available for data science and statistics. What you don’t know is often the hardest part of picking up a new skill, so hopefully these resources will help make learning Python a little easier. Prepare yourself for code indentation heaven.

Python is such an incredible language because it can do practically anything, from high performance scientific computing to web frameworks such as Django or Flask.  Python is heavily used at Google so the language must be doing something right. And, similar to R, Python has a fantastic community around it and, luckily for you, this community can write. Don’t just take my word for it, watch the following video to fully understand…”

http://datacommunitydc.org/blog/2013/03/getting-started-with-python-for-data-scientists/


10 R packages I wish I knew about earlier

“R can be more prickly and obscure than other languages like Python or Java. The good news is that there are tons of packages which provide simple and familiar interfaces on top of Base R. This post is about ten packages I love and use everyday and ones I wish I knew about earlier…”

http://blog.yhathq.com/posts/10-R-packages-I-wish-I-knew-about-earlier.html


Machine Learning Cheat Sheet (for scikit-learn)

“As you hopefully have heard, we at scikit-learn are doing a user survey (which is still open by the way).
One of the requests there was to provide some sort of flow chart on how to do machine learning. As this is clearly impossible, I went to work straight away…”

http://peekaboo-vision.blogspot.de/2013/01/machine-learning-cheat-sheet-for-scikit.html


Machine Learning, Big Data, Deep Learning, Data Mining… FAQ

“What’s the difference between machine learning, deep learning, big data, statistics, decision & risk analysis, probability, fuzzy logic, and all the rest?…”

“In mathematics there are many “logic” theories that have more than one truth value, and not just one universal “logic.” What’s up with that?…”

“What’s the difference between probability and decision analysis?…”

http://wmbriggs.com/blog/?p=6465

 

 


Why Probabilistic Programming Matters

Probabilistic programming is a newer way of posing machine learning problems. As the models we want to create become more complex it will be necessary to embrace more generic tools for capturing dependencies. I wish to argue that probabilistic programming languages should be the dominant way we perform this modeling, and will demonstrate it by showing the variety of problems that can be trivially modeled with such a language…”

http://zinkov.com/posts/2012-06-27-why-prob-programming-matters/


Probability Tutorials

These tutorials are designed as a set of simple exercises, leading gradually to the establishment of deeper results. Proved Theorems, as well as clear Definitions are spelt out for future reference. (An alphabetical index A|B|C|D … should also be helpful.) Contrary to standard university lectures or textbooks, these tutorials do not contain any formal proof: instead,they will offer you the means of proving everything yourself. However, for those who need more help, Solutions to exercises are provided, and can be downloaded in A4 paper format from the Printing  page…”

http://www.probability.net/

 


Practical machine learning tricks from the KDD 2011 best industry paper

“A machine learning research paper tends to present a newly proposed method or algorithm in relative isolation. Problem context, data preparation, and feature engineering are hopefully discussed to the extent required for reader understanding and scientific reproducibility, but are usually not the primary focus. Given the goals and constraints of the format, this can be seen as a reasonable trade-off: the authors opt to spend scarce “ink” on only the most essential (often abstract) ideas.

As a consequence, implementation details relevant to the use of the proposed technique in an actual production system are often not mentioned whatsoever. This aspect of machine learning is often left as “folk wisdom” to be picked up from colleagues, blog posts, discussion boards, snarky tweets, open-source libraries, or more often than not, first-hand experience.

Papers from conference “industry tracks” often deviate from this template, yielding valuable insights about what it takes to make machine learning effective in practice. This paper from Google on detecting “malicious” (ie, scam/spam) advertisements won best industry paper at KDD 2011 and is a particularly interesting example…”

http://blog.david-andrzejewski.com/machine-learning/practical-machine-learning-tricks-from-the-kdd-2011-best-industry-paper/

Follow

Get every new post delivered to your Inbox.

Join 516 other followers