This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bounding boxes and segmentation masks for each instance of an object in the image. It’s based on Feature Pyramid Network (FPN) and a ResNet101 backbone.
The repository includes:
- Source code of Mask R-CNN built on FPN and ResNet101.
- Training code for MS COCO
- Pre-trained weights for MS COCO
- Jupyter notebooks to visualize the detection pipeline at every step
- ParallelModel class for multi-GPU training
- Evaluation on MS COCO metrics (AP)
- Example of training on your own dataset
The code is documented and designed to be easy to extend. If you use it in your research, please consider referencing this repository. If you work on 3D vision, you might find our recently released Matterport3D dataset useful as well. This dataset was created from 3D-reconstructed spaces captured by our customers who agreed to make them publicly available for academic use. You can see more examples here.
Four years ago, Google started to see the real potential for deploying neural networks to support a large number of new services. During that time it was also clear that, given the existing hardware, if people did voice searches for three minutes per day or dictated to their phone for short periods, Google would have to double the number of datacenters just to run machine learning models.
The need for a new architectural approach was clear, Google distinguished hardware engineer, Norman Jouppi, tells The Next Platform, but it required some radical thinking. As it turns out, that’s exactly what he is known for. One of the chief architects of the MIPS processor, Jouppi has pioneered new technologies in memory systems and is one of the most recognized names in microprocessor design. When he joined Google over three years ago, there were several options on the table for an inference chip to churn services out from models trained on Google’s CPU and GPU hybrid machines for deep learning but ultimately Jouppi says he never excepted to return back to what is essentially a CISC device.
We are, of course, talking about Google’s Tensor Processing Unit (TPU), which has not been described in much detail or benchmarked thoroughly until this week. Today, Google released an exhaustive comparison of the TPU’s performance and efficiencies compared with Haswell CPUs and Nvidia Tesla K80 GPUs. We will cover that in more detail in a separate article so we can devote time to an in-depth exploration of just what’s inside the Google TPU to give it such a leg up on other hardware for deep learning inference. You can take a look at the full paper, which was just released, and read on for what we were able to glean from Jouppi that the paper doesn’t reveal.
Jouppi says that the hardware engineering team did look to FPGAs to solve the problem of cheap, efficient, and high performance inference early on before shifting to a custom ASIC. “The fundamental bargain people make with FPGAs is that they want something that is easy to change but because of the programmability and other hurdles, compared with an ASIC, there ends up being a big difference in performance and performance per watt,” he explains. “The TPU is programmable like a CPU or GPU. It isn’t designed for just one neural network model; it executes CISC instructions on many networks (convolutional, LSTM models, and large, fully connected models). So it is still programmable, but uses a matrix as a primitive instead of a vector or scalar.”
At Google, we continuously improve the language understanding capabilities used in applications ranging from generation of email responses to translation. Last summer, we open-sourced SyntaxNet, a neural-network framework for analyzing and understanding the grammatical structure of sentences. Included in our release was Parsey McParseface, a state-of-the-art model that we had trained for analyzing English, followed quickly by a collection of pre-trained models for 40 additional languages, which we dubbed Parsey’s Cousins. While we were excited to share our research and to provide these resources to the broader community, building machine learning systems that work well for languages other than English remains an ongoing challenge. We are excited to announce a few new research resources, available now, that address this problem.
Tensorflow is a powerful open-source software library for machine learning developed by researchers at Google Brain. It has many pre-built functions to ease the task of building different neural networks. Tensorflow allows distribution of computation across different computers, as well as multiple CPUs and GPUs within a single machine. TensorFlow provides a Python API, as well as a less documented C++ API. For this course, we will be using Python.
This course will cover the fundamentals and contemporary usage of the Tensorflow library for deep learning research. We aim to help students understand the graphical computational model of Tensorflow, explore the functions it has to offer, and learn how to build and structure models best suited for a deep learning project. Through the course, students will use Tensorflow to build models of different complexity, from simple linear/logistic regression to convolutional neural network and recurrent neural networks with LSTM to solve tasks such as word embeddings, translation, optical character recognition. Students will also learn best practices to structure a model and manage research experiments.
Add famous painting styles to any photo in a fraction of a second! You can even style videos!
It takes 100ms on a 2015 Titan X to style the MIT Stata Center (1024×680) like Udnie, by Francis Picabia.
Our implementation is based off of a combination of Gatys’ A Neural Algorithm of Artistic Style, Johnson’s Perceptual Losses for Real-Time Style Transfer and Super-Resolution, and Ulyanov’s Instance Normalization.