A grammar of graphics for Python

plotnine is an implementation of a grammar of graphics in Python, it is based on ggplot2. The grammar allows users to compose plots by explicitly mapping data to the visual objects that make up the plot.

Plotting with a grammar is powerful, it makes custom (and otherwise complex) plots are easy to think about and then create, while the simple plots remain simple.

To find out about all building blocks that you can use to create a plot, check out the documentation. Since plotnine has an API similar to ggplot2, where we lack in coverage theggplot2 documentation may be of some help.

https://github.com/has2k1/plotnine

Advertisements

Titan – Distributed Graph Database

Titan is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. Titan is a transactional database that can support thousands of concurrent users executing complex graph traversalsin real time.

In addition, Titan provides the following features:

Download Titan or clone from GitHub. Read the Titan documentation and join the mailing list.

http://titan.thinkaurelius.com/

Sigma is a JavaScript library dedicated to graph drawing

It makes easy to publish networks on Web pages, and allows developers to integrate network exploration in rich Web applications.

The default configuration of sigma deals with mouse and touch support, refreshing and rescaling when the container’s size changes, rendering on WebGL if the browser supports it and Canvas else, recentering the graph and adapting the nodes and edges sizes to the screen…

Sigma provides a lot of different settings to make it easy to customize how to draw and interact with networks. And you can also directly add your own functions to your scripts to render nodes and edges the exact way you want.

Sigma is a rendering engine, and it is up to you to add all the interactivity you want. The public API makes it possible to modify the data, move the camera, refresh the rendering, listen to events…

http://sigmajs.org/

A* search algorithm

“A* search is an informed search algorithm used for path-finding and graph traversal. It combines the advantages of both Dijkstra’s algorithm (in that it can find a shortest path) and Greedy Best-First-Search (in that it can use a heuristic to guide search). It combines the information that Dijkstra’s algorithm uses (favoring vertices that are close to the starting point) and information that Best-First-Search uses (favoring vertices that are closer to the goal).

Let g(n) represent the exact cost of the path from the starting point to any vertex n, and h(n) represent the heuristic estimated cost from vertex n to the goal. Dijkstra’s algorithms builds a priority queue of nodes ordered on their g(n) values. Best-First-Search employs a priority queue of nodes ordered on h(n) values. A* balances the two as it moves from the starting point to the goal. It builds a priority queue of nodes ordered on f(n) = g(n) + h(n) which is the total estimated path cost through the node n…”

https://kartikkukreja.wordpress.com/2013/12/28/a-star-search/

One Trillion Edges: Graph Processing at Facebook-Scale

“Analyzing large graphs provides valuable insights for social networking and web companies in content ranking and recommendations. While numerous graph processing systems have been developed and evaluated on available benchmark graphs of up to 6.6B edges, they often face significant dif- ficulties in scaling to much larger graphs. Industry graphs can be two orders of magnitude larger – hundreds of billions or up to one trillion edges. In addition to scalability challenges, real world applications often require much more complex graph processing workflows than previously evaluated. In this paper, we describe the usability, performance, and scalability improvements we made to Apache Giraph, an open-source graph processing system, in order to use it on Facebook-scale graphs of up to one trillion edges. We also describe several key extensions to the original Pregel model that make it possible to develop a broader range of production graph applications and workflows as well as improve code reuse. Finally, we report on real-world operations as well as performance characteristics of several large-scale production applications…”

http://www.vldb.org/pvldb/vol8/p1804-ching.pdf

file: p1804-ching

The Traveling Tesla Salesman

“The Traveling Salesman Problem (TSP) is quite an interesting math problem. It simply asks: Given a list of cities and the distances between them, what is the shortest possible path that visits each city exactly once and returns to the origin city?

It is a very simple problem to describe and yet very difficult to solve. TSP is known to be NP-hard and a brute-force solution can be incredibly expensive computationally. Even with just 200 cities, with the brute-force method you have this many possible permutations to check:…”

http://mortada.net/drafts/the-traveling-tesla-salesman.html