- Speaker: Rich Hickey
- Conference: Clojure/Conj 2017 – Oct 2017
- Video: https://www.youtube.com/watch?v=2V1FtfBDsLU
Bounter is a Python library, written in C, for extremely fast probabilistic counting of item frequencies in massive datasets, using only a small fixed memory footprint.
Bounter lets you count how many times an item appears, similar to Python’s built-in
from bounter import bounter counts = bounter(size_mb=1024) # use at most 1 GB of RAM counts.update([u'a', 'few', u'words', u'a', u'few', u'times']) # count item frequencies print(counts[u'few']) # query the counts 2
Counter, Bounter can process huge collections where the items would not even fit in RAM. This commonly happens in Machine Learning and NLP, with tasks like dictionary building or collocation detection that need to estimate counts of billions of items (token ngrams) for their statistical scoring and subsequent filtering.
Bounter implements approximative algorithms using optimized low-level C structures, to avoid the overhead of Python objects. It lets you specify the maximum amount of RAM you want to use. In the Wikipedia example below, Bounter uses 31x less memory compared to
Bounter is also marginally faster than the built-in
Counter, so wherever you can represent your items as strings(both byte-strings and unicode are fine, and Bounter works in both Python2 and Python3), there’s no reason not to use Bounter instead.
Ethereum is a decentralized platform that runs smart contracts, applications that run exactly as programmed without possibility of downtime, censorship, fraud or third party interference. In this blog post I will take you through all the steps required in setting up a fully functioning private ethereum blockchain, inside your local network — which includes:
- Setting up a private blockchain with ethereum using geth.
- Setting up the MetaMask ethereum wallet to work with the private blockchain.
- Transfer funds between multiple accounts.
- Create, deploy and invoke a smart contract on the private blockchain using remix.
- Setting up ethereum block explorer over the private blockchain.
Many companies are suffering data breaches because attackers gain access to data in AWS S3 buckets. I don’t want to repeat all the news articles outlining all the S3 data breaches. A Google search will give many examples, and it seems like by the time I write this another one will be in the news. Instead, I’d like to jump to why these S3 bucket breaches are happening and how to securely store data in an S3 bucket.
Kind of a big deal. You’d have to be a total square not to have heard about them. Me? I’ve got eight.
Often over-complicated, over-mysticised, over-singularised (I don’t even know what the right word for it is, but people say The Blockchain a lot). What are they? Join me for a rough tour from the ground up and I’ll try to make sure you leave here knowing the answer to one question:
What are people talking about when they talk about blockchains?
There’s a lot to cover, so it’s actually going to come in two parts. This, the first, will look at the data structures known as blockchains and their properties, along with any other bits and pieces you need to make sense of them.
The second part will apply what you’ve learnt to the practical and widespread applications of blockchains to power distributed ledgers, cryptocurrencies such as Bitcoin and Litecoin and smart-contract based chains like Etherium.
In machine learning, computers apply statistical learning techniques to automatically identify patterns in data. These techniques can be used to make highly accurate predictions. Using a data set about homes, we will create a machine learning model to distinguish homes in New York from homes in San Francisco.
First, some intuition
Let’s say you had to determine whether a home is in San Francisco or in New York. In machine learning terms, categorizing data points is a classification task.Since San Francisco is relatively hilly, the elevation of a home may be a good way to distinguish the two cities. Based on the home-elevation data to the right, you could argue that a home above 240 ft should be classified as one in San Francisco.
Adding another dimension allows for more nuance. For example, New York apartments can be extremely expensive per square foot. So visualizing elevation and price per square foot in a scatterplot helps us distinguish lower-elevation homes. The data suggests that, among homes at or below 240 ft, those that cost more than $1776 per square foot are in New York City. Dimensions in a data set are called features, predictors, or variables. 1
You can visualize your elevation (>242 ft) and price per square foot (>$1776) observations as the boundaries of regions in your scatterplot. Homes plotted in the green and blue regions would be in San Francisco and New York, respectively.
Identifying boundaries in data using math is the essence of statistical learning. Of course, you’ll need additional information to distinguish homes with lower elevations and lower per-square-foot prices. The dataset we are using to create the model has 7 different dimensions. Creating a model is also known as training a model. On the right, we are visualizing the variables in a scatterplot matrix to show the relationships between each pair of dimensions.
There are clearly patterns in the data, but the boundaries for delineating them are not obvious.
And now, machine learning
Finding patterns in data is where machine learning comes in. Machine learning methods use statistical learning to identify boundaries. One example of a machine learning method is a decision tree. Decision trees look at one variable at a time and are a reasonably accessible (though rudimentary) machine learning method.
What you will find in the full article:
- Finding better boundaries
- Your first fork
- The best split
- Growing a tree
- Making predictions
- Reality check
To check out all this information, and play with a few cool interactive visualizations, click here.
The Netflix API is based on a dynamic scripting platform that handles thousands of changes per day. This platform allows our client developers to create a customized API experience on over a thousand device types by executing server side adapter code in response to HTTP requests. Developers are only responsible for the adapter code they write; they do not have to worry about infrastructure concerns related to server management and operations. To these developers, the scripting platform in effect, provides an experience similar to that offered by serverless or FaaS platforms. It is important to note that the similarities are limited to the developer experience (DevEx); the runtime is a custom implementation that is not designed to support general purpose serverless use cases. A few years of developing and operating this platform for a diverse set of developers has yielded several DevEx learnings for us…
In Part 1 of this series, we outlined key learnings the Edge Developer Experience team gained from operating the API dynamic scripting platform which provides a serverless or FaaS like experience for client application developers. We addressed the concerns around getting code ready for production deployment. Here, we look at what it takes to deploy it safely and operate it on an ongoing basis…