Tag Archives: machine learning
Data analysis in Python with pandas
Tutorial: scikit-learn
Practical Machine Learning in Python
What is Bayesian/Frequentist Inference?
“When I started this blog, I said I wouldn’t write about the Bayes versus Frequentist thing. I thought that was old news.
But many things have changed my mind. Nate Silver’s book, various comments on my blog, comments on other blogs, Sharon McGrayne’s book, etc have made it clear to me that there is still a lot of confusion about what Bayesian inference is and what Frequentist inference is.
I believe that many of the arguments about Bayes versus Frequentist are really about: what is the definition of Bayesian inference?…”
https://normaldeviate.wordpress.com/2012/11/17/what-is-bayesianfrequentist-inference/
Basic Sentiment Analysis with Python
“In this post I will try to give a very introductory view of some techniques that could be useful when you want to perform a basic analysis of opinions written in english.
These techniques come 100% from experience in real-life projects. Don’t expect a theoretical introduction of Sentiment Analysis and the multiple strategies out there to achieve opinion mining, this is only a practical example of applying some basic rules to extract the polarity (positive or negative) of a text.
Let’s start looking at an example opinion:
“What can I say about this place. The staff of the restaurant is nice and the eggplant is not bad. Apart from that, very uninspired food, lack of atmosphere and too expensive. I am a staunch vegetarian and was sorely dissapointed with the veggie options on the menu. Will be the last time I visit, I recommend others to avoid.”
As you can see, this is a mainly negative review about a restaurant…”
http://fjavieralba.com/basic-sentiment-analysis-with-python.html
RankLib is a library of learning to rank algorithms
“RankLib is a library of learning to rank algorithms. Currently eight popular algorithms have been implemented:
- MART (Multiple Additive Regression Trees, a.k.a. Gradient boosted regression tree) [6]
- RankNet [1]
- RankBoost [2]
- AdaRank [3]
- Coordinate Ascent [4]
- LambdaMART [5]
- ListNet [7]
- Random Forests [8]
- With appropriate parameters for Random Forests, it can also do bagging several MART/LambdaMART rankers.
It also implements many retrieval metrics as well as provides many ways to carry out evaluation…”
How I made $500k with machine learning and HFT (high frequency trading)
“This post will detail what I did to make approx. 500k from high frequency trading from 2009 to 2010. Since I was trading completely independently and am no longer running my program I’m happy to tell all. My trading was mostly in Russel 2000 and DAX futures contracts.
The key to my success, I believe, was not in a sophisticated financial equation but rather in the overall algorithm design which tied together many simple components and used machine learning to optimize for maximum profitability. You won’t need to know any sophisticated terminology here because when I setup my program it was all based on intuition. (Andrew Ng’s amazing machine learning course was not yet available – btw if you click that link you’ll be taken to my current project: CourseTalk, a review site for MOOCs)
First, I just want to demonstrate that my success was not simply the result of luck. My program made 1000-4000 trades per day (half long, half short) and never got into positions of more than a few contracts at a time. This meant the random luck from any one particular trade averaged out pretty fast. The result was I never lost more than $2000 in one day and never had a losing month…”
http://jspauld.com/post/35126549635/how-i-made-500k-with-machine-learning-and-hft
Using python and k-means to find the dominant colors in images
“I’m working on a little photography website for my Dad and thought it would be neat to extract color information from photographs. I tried a couple of different approaches before finding one that works pretty well. This approach uses k-means clustering to cluster the pixels in groups based on their color. The center of those resulting clusters are then the “dominant” colors. k-means is a great fit for this problem because it is (usually) fast. It has the caveat of requiring you to specify up-front how many clusters you want — I found that it works well when I specified around 3…”
http://charlesleifer.com/blog/using-python-and-k-means-to-find-the-dominant-colors-in-images/
Bayesian Reasoning and Machine Learning
“The book is now available in hardcopy from Cambridge University Press”