tiny-dnn is a C++11 implementation of deep learning

  • reasonably fast, without GPU
    • with TBB threading and SSE/AVX vectorization
    • 98.8% accuracy on MNIST in 13 minutes training (@Core i7-3520M)
  • portable & header-only
    • Run anywhere as long as you have a compiler which supports C++11
    • Just include tiny_dnn.h and write your model in C++. There is nothing to install.
  • easy to integrate with real applications
    • no output to stdout/stderr
    • a constant throughput (simple parallelization model, no garbage collection)
    • work without throwing an exception
    • can import caffe’s model
  • simply implemented
    • be a good library for learning neural networks

https://github.com/tiny-dnn/tiny-dnn#examples

AWS Storage Gateway provides a file interface to objects in your Amazon S3 buckets

AWS Storage Gateway now provides a virtual on-premises file server, which enables you to store and retrieve Amazon S3 objects through standard file storage protocols. With file gateway, existing applications or devices can use secure and durable cloud storage without needing to be modified. File gateway simplifies moving data into S3 for in-cloud workloads, provides cost-effective storage for backup and archive, or expands your on-premises storage into the cloud.

File gateway is available as a virtual machine image which you download from the AWS Management Console. Once deployed in your data center and associated with your AWS account, your configured S3 buckets will be available as Network File System (NFS) mount points. Your applications read and write files and directories over NFS, interfacing to the gateway as a file server. In turn, the gateway translates these file operations into object requests on your S3 buckets. Like existing volume and tape gateways, your most recently used data is cached on the gateway for low-latency access, and data transfer between your data center and AWS is fully managed and optimized by the gateway. Once in S3, you can access the objects directly or manage them using features such as S3 Lifecycle Policies, object versioning, and cross-region replication.

To start using the new AWS Storage Gateway, click here. There are no up-front commitments required and you pay only for what you use. To learn more, click here.

https://aws.amazon.com/about-aws/whats-new/2016/11/aws-storage-gateway-provides-a-file-interface-to-objects-in-your-amazon-s3-buckets/

Dockerizing MySQL at Uber Engineering

Uber Engineering’s Schemaless storage system powers some of the biggest services at Uber, such as Mezzanine. Schemaless is a scalable and highly available datastore on top ofMySQL¹ clusters. Managing these clusters was fairly easy when we had 16 clusters. These days, we have more than 1,000 clusters containing more than 4,000 database servers, and that requires a different class of tooling.

Initially, all our clusters were managed by Puppet, a lot of ad hoc scripts, and manual operations that couldn’t scale at Uber’s pace. When we began looking for a better way to manage our increasing number of MySQL clusters, we had a few basic requirements:

  • Run multiple database processes on each host
  • Automate everything
  • Have a single entry point to manage and monitor all clusters across all data center

The solution we came up with is a design called Schemadock. We run MySQL in Dockercontainers, which are managed by goal states that define cluster topologies in configuration files. Cluster topologies specify how MySQL clusters should look; for example, that there should be a Cluster A with 3 databases, and which one should be the master. Agents then apply those topologies on the individual databases. A centralized service maintains and monitors the goal state for each instance and reacts to any deviations.

Schemadock has many components, and Docker is a small but significant one. Switching to a more scalable solution has been a momentous effort, and this article explains how Docker helped us get here.

https://eng.uber.com/dockerizing-mysql/

How do I develop Trading Systems?

Often, trading model developers “spoil” the eventual results of their model by making errors early in the process. These errors could be using poorly-collected data, not accounting for survivorship bias, or testing too many specifications of a similar model. Data snooping such as that can be particularly costly in that it is an error that cannot be reversed. Therefore, if you haven’t yet begun the active work of acquiring data, specifying a model, or backtesting, then you have the opportunity to conduct the testing and development process optimally.

https://www.linkedin.com/pulse/how-do-i-develop-trading-systems-ariel

Binary Data Now Supported by API Gateway

You can now send and receive binary data through API endpoints hosted on Amazon API Gateway. Binary data can either pass through directly or be converted to Base64 encoded text.

You can specify media types (e.g., image/png, application/octet-stream, etc.) to treat as binary through the API Gateway console or APIs. Web or mobile clients can then leverage standard HTTP headers (Content-type and Accept) to declare what they are sending or expecting to receive. You can also configure API Gateway to either pass through, convert to text (Base 64 encoding), or convert to binary (Base64 decoding) the API integration request and response body.

MXNet – Deep Learning Framework of Choice at AWS

Machine learning is playing an increasingly important role in many areas of our businesses and our lives and is being employed in a range of computing tasks where programming explicit algorithms is infeasible.

At Amazon, machine learning has been key to many of our business processes, from recommendations to fraud detection, from inventory levels to book classification to abusive review detection. And there are many more application areas where we use machine learning extensively: search, autonomous drones, robotics in fulfillment centers, text and speech recognitions, etc.

Among machine learning algorithms, a class of algorithms called deep learning hascome to represent those algorithms that can absorb huge volumes of data and learn elegant and useful patterns within that data: faces inside photos, the meaning of a text, or the intent of a spoken word. A set of programming models has emerged to help developers define and train AI models with deep learning; along with open source frameworks that put deep learning in the hands of mere mortals. Some examples of popular deep learning frameworks that we support on AWS include Caffe, CNTK, MXNet, TensorFlow, Theano, and Torch.

Among all these popular frameworks, we have concluded that MXNet is the most scalable framework. We believe that the AI community would benefit from putting more effort behind MXNet. Today, we are announcing that MXNet will be our deep learning framework of choice. AWS will contribute code and improved documentation as well as invest in the ecosystem around MXNet. We will partner with other organizations to further advance MXNet.

http://www.allthingsdistributed.com/2016/11/mxnet-default-framework-deep-learning-aws.html

Tutorial – Write a System Call

A while back, I wrote about writing a shell in C, a task which lets you peek under the covers of a tool you use daily. Underneath even a simple shell are many operating system calls, like read, fork, exec, wait, write, and chdir (to name a few). Now, it’s time to continue this journey down another level, and learn just how these system calls are implemented in Linux.

What is a system call?

Before we start implementing system calls, we’d better make sure we understand exactly what they are. A naive programmer—like me not that long ago—might define a system call as any function provided by the C library. But this isn’t quite true. Although many functions in the C library align nicely with system calls (like chdir), other ones do quite a bit more than simply ask the operating system to do something (such as fork or fprintf). Still others simply provide programming functionality without using the operating system, such as qsort and strtok.

In fact, a system call has a very specific definition. It is a way of requesting that the operating system kernel do something on your behalf. Operations like tokenizing a string don’t require interacting with the kernel, but anything involving devices, files, or processes definitely does.

System calls also behave differently under the hood than a normal function. Rather than simply jumping to some code from your program or a library, your program has to ask the CPU to switch into kernel mode, and then go to a predefined location within the kernel to handle your system call. This can be done in a few different ways, such as a processor interrupt, or special instructions such as syscall or sysenter. In fact, the modern way of making a system call in Linux is to let the kernel provide some code (called the VDSO) which does the right thing to make a system call. Here’s an interesting SO question on the topic.

Thankfully, all that complexity is handled for us. No matter how a system call is made, it all comes down to looking up the particular system call number in a table to find the correct kernel function to call. Since all you need is a table entry and a function, it’s actually very easy to implement your own system call. So let’s give it a shot!

https://brennan.io/2016/11/14/kernel-dev-ep3/

Computer Science video courses

List of Computer Science courses with video lectures.

  • Please note:
    • Focus would be to keep the list to the point so that it is readable and usable. To access syllabus/notes/assignments, please visit link to the course or use Google search with course number/name.
    • Only MOOCs with comprehensive lecture material which may be equivalent to a standard University course will be added.
    • NPTEL contains large number of good Computer Science courses. To check courses by Indian IIT’s, please refer nptel site.

https://github.com/Developer-Y/cs-video-courses