Search

thoughts…

https://about.me/irocha

Microservices at Netflix Scale: Principles, Tradeoffs & Lessons Learned

Open Sourcing a Deep Learning Solution for Detecting NSFW Images

Automatically identifying that an image is not suitable/safe for work (NSFW), including offensive and adult images, is an important problem which researchers have been trying to tackle for decades. Since images and user-generated content dominate the Internet today, filtering NSFW images becomes an essential component of Web and mobile applications. With the evolution of computer vision, improved training data, and deep learning algorithms, computers are now able to automatically classify NSFW image content with greater precision.

Defining NSFW material is subjective and the task of identifying these images is non-trivial. Moreover, what may be objectionable in one context can be suitable in another. For this reason, the model we describe below focuses only on one type of NSFW content: pornographic images. The identification of NSFW sketches, cartoons, text, images of graphic violence, or other types of unsuitable content is not addressed with this model.

To the best of our knowledge, there is no open source model or algorithm for identifying NSFW images. In the spirit of collaboration and with the hope of advancing this endeavor, we are releasing our deep learning model that will allow developers to experiment with a classifier for NSFW detection, and provide feedback to us on ways to improve the classifier.

Our general purpose Caffe deep neural network model (Github code) takes an image as input and outputs a probability (i.e a score between 0-1) which can be used to detect and filter NSFW images. Developers can use this score to filter images below a certain suitable threshold based on a ROC curve for specific use-cases, or use this signal to rank images in search results.

https://yahooeng.tumblr.com/post/151148689421/open-sourcing-a-deep-learning-solution-for

https://github.com/yahoo/open_nsfw

Audio Fingerprinting with Python and Numpy

The first day I tried out Shazam, I was blown away. Next to GPS and surviving the fall down a flight of stairs, being able to recognize a song from a vast corpus of audio was the most incredible thing I’d ever seen my phone do. This recognition works though a process called audio fingerprinting. Examples include:

After a few weekends of puzzling through academic papers and writing code, I came up with the Dejavu Project, an open-source audio fingerprinting project in Python. You can see it here on Github:

https://github.com/worldveil/dejavu

On my testing dataset, Dejavu exhibits 100% recall when reading an unknown wave file from disk or listening to a recording for at least 5 seconds.

Following is all the knowledge you need to understand audio fingerprinting and recognition, starting from the basics. Those with signals experience should skip to “Peak Finding”.

http://willdrevo.com/fingerprinting-and-audio-recognition-with-python/

What I Wish I Had Known Before Scaling Uber to 1000 Services

Kubernetes 1.4: Making it easy to run on Kubernetes anywhere

What’s new:

Cluster creation with two commands – To get started with Kubernetes a user must provision nodes, install Kubernetes and bootstrap the cluster. A common request from users is to have an easy, portable way to do this on any cloud (public, private, or bare metal).

  • Kubernetes 1.4 introduces ‘kubeadm’ which reduces bootstrapping to two commands, with no complex scripts involved. Once kubernetes is installed, kubeadm init starts the master while kubeadm join joins the nodes to the cluster.
  • Installation is also streamlined by packaging Kubernetes with its dependencies, for most major Linux distributions including Red Hat and Ubuntu Xenial. This means users can now install Kubernetes using familiar tools such as apt-get and yum.
  • Add-on deployments, such as for an overlay network, can be reduced to one command by using a DaemonSet.
  • Enabling this simplicity is a new certificates API and its use for kubelet TLS bootstrap, as well as a new discovery API.

Expanded stateful application support – While cloud-native applications are built to run in containers, many existing applications need additional features to make it easy to adopt containers. Most commonly, these include stateful applications such as batch processing, databases and key-value stores. In Kubernetes 1.4, we have introduced a number of features simplifying the deployment of such applications, including:

  • ScheduledJob is introduced as Alpha so users can run batch jobs at regular intervals.
  • Init-containers are Beta, addressing the need to run one or more containers before starting the main application, for example to sequence dependencies when starting a database or multi-tier app.
  • Dynamic PVC Provisioning moved to Beta. This feature now enables cluster administrators to expose multiple storage provisioners and allows users to select them using a new Storage Class API object.
  • Curated and pre-tested Helm charts for common stateful applications such as MariaDB, MySQL and Jenkins will be available for one-command launches using version 2 of the Helm Package Manager.

Cluster federation API additions – One of the most requested capabilities from our global customers has been the ability to build applications with clusters that span regions and clouds.

  • Federated Replica Sets Beta – replicas can now span some or all clusters enabling cross region or cross cloud replication. The total federated replica count and relative cluster weights / replica counts are continually reconciled by a federated replica-set controller to ensure you have the pods you need in each region / cloud.
  • Federated Services are now Beta, and secrets, events and namespaces have also been added to the federation API.
  • Federated Ingress Alpha – starting with Google Cloud Platform (GCP), users can create a single L7 globally load balanced VIP that spans services deployed across a federation of clusters within GCP. With Federated Ingress in GCP, external clients point to a single IP address and are sent to the closest cluster with usable capacity in any region or zone of the federation in GCP.

Container security support – Administrators of multi-tenant clusters require the ability to provide varying sets of permissions among tenants, infrastructure components, and end users of the system.

  • Pod Security Policy is a new object that enables cluster administrators to control the creation and validation of security contexts for pods/containers. Admins can associate service accounts, groups, and users with a set of constraints to define a security context.
  • AppArmor support is added, enabling admins to run a more secure deployment, and provide better auditing and monitoring of their systems. Users can configure a container to run in an AppArmor profile by setting a single field.

Infrastructure enhancements – We continue adding to the scheduler, storage and client capabilities in Kubernetes based on user and ecosystem needs.

  • Scheduler – introducing inter-pod affinity and anti-affinity Alpha for users who want to customize how Kubernetes co-locates or spreads their pods. Also priority scheduling capability for cluster add-ons such as DNS, Heapster, and the Kube Dashboard.
  • Disruption SLOs – Pod Disruption Budget is introduced to limit impact of pods deleted by cluster management operations (such as node upgrade) at any one time.
  • Storage – New volume plugins for Quobyte and Azure Data Disk have been added.
  • Clients – Swagger 2.0 support is added, enabling non-Go clients.

Kubernetes Dashboard UI – lastly, a great looking Kubernetes Dashboard UI with 90% CLI parity for at-a-glance management.

For a complete list of updates see the release notes on GitHub. Apart from features the most impressive aspect of Kubernetes development is the community of contributors. This is particularly true of the 1.4 release, the full breadth of which will unfold in upcoming weeks.

http://blog.kubernetes.io/2016/09/kubernetes-1.4-making-it-easy-to-run-on-kuberentes-anywhere.html

Ultra fast implementation of asyncio event loop on top of libuv

uvloop is a fast, drop-in replacement of the built-in asyncio event loop. uvloop is implemented in Cython and uses libuv under the hood.

The project documentation can be found here. Please also check out the wiki.

Performance

uvloop makes asyncio 2-4x faster.

performance.png

The above chart shows the performance of an echo server with different message sizes. The sockets benchmark usesloop.sock_recv() and loop.sock_sendall() methods; the streams benchmark uses asyncio high-level streams, created by the asyncio.start_server() function; and the protocol benchmark uses loop.create_server() with a simple echo protocol. Read more about uvloop performance.

https://github.com/magicstack/uvloop

Python by the C side

All the world is legacy code, and there is always another, lower layer to peel away. These realities cause developers around the world to go on regular pilgrimage, from the terra firma of Python to the coasts of C. From zlib to SQLite to OpenSSL, whether pursuing speed, efficiency, or features, the waters are powerful, and often choppy. The good news is, when you’re writing Python, C interactions can be a day at the beach.

Python by the C side

Realtime notification delivery using rabbitmq, Tornado and websocket

For this hack-off, my goal is to write a RabbitMQ consumer that reads the messages off the message queue, and deliver (notify) them to the front-end using websocket.

I’ve heard good things about Tornado. Having read their docs on websocket request handler, I felt it’s straightforward enough for me, so I chose Tornado as my backend.

https://reminiscential.wordpress.com/2012/04/07/realtime-notification-delivery-using-rabbitmq-tornado-and-websocket/

curl statistics made simple

httpstat is a single file Python script that has no dependency and is compatible with Python 3.

https://github.com/reorx/httpstat

 

Blog at WordPress.com.

Up ↑