Million WebSockets and Go

This article is about how we developed the high-load WebSocket server with Go.

If you are familiar with WebSocket, but know little about Go, I hope you will still find this article interesting in terms of ideas and techniques for performance optimization.

https://gbws.io/articles/million-websocket-and-go/

Using API Gateway WebSockets with the Serverless Framework

As we approach the end of 2018, I’m incredibly excited to announce that we at Serverless have a small gift for you: You can work with Amazon API Gateway WebSockets in your Serverless Framework applications starting right now.

But before we dive into the how-to, there are some interesting caveats that I want you to be aware of.

First, this is not supported in AWS CloudFormation just yet, though AWS has publicly stated it will be early next year! As such, we decided to implement our initial support as a plugin and keep it out of core until the official AWS CloudFormation support is added.

Second, the configuration syntax should be pretty close, but we make no promises that anything implemented with this will carry forward after core support. And once core support is added with AWS CloudFormation, you will need to recreate your API Gateway resources managed by CloudFormation. This means that any clients using your WebSocket application would need to be repointed, or other DNS would have needed to be in place, to facilitate the cutover.

I recommend you check out my original post for a basic understanding of how WebSockets works at a technical level via connections and callbacks to the Amazon API Gateway connections management API.

With all that out of the way, play with our new presents!

https://serverless.com/blog/api-gateway-websockets-example/

AWS Lambda in a VPC Will Soon be Much Faster

One of the most common pains for users of AWS Lambda is cold starts. Cold starts add unwanted delays to Lambda invocations, and in cases where a Lambda is used inside of a Virtual Private Cloud (VPC), the latency can be as high as several seconds. This practically negates the speed benefits of Lambda functions.

Fortunately, the Lambda team announced at AWS re:Invent 2018 that they are changing the architecture of Lambdas running in a VPC in order to reduce this latency and make Lambdas start much faster.

https://www.nuweba.com/AWS-Lambda-in-a-VPC-will-soon-be-faster

Making an Unlimited Number of Requests with Python aiohttp + pypeln

This post is a continuation on the works of Paweł Miech’s Making 1 million requests with python-aiohttp and Andy Balaam’s Making 100 million requests with Python aiohttp. I will be trying to reproduce the setup on Andy’s blog with some minor modifications due to API changes in the aiohttp library, you should definitely read his blog, but I’ll give a recap.

UPDATE: Since Andy’s original post, aiohttp introduced another API change which limited the total number of simultaneous requests to 100 by default. I’ve updated the code shown here to remove this limit and increased the number of total requests to compensate. Apart from that, the analysis remains the same.

https://medium.com/@cgarciae/making-an-infinite-number-of-requests-with-python-aiohttp-pypeln-3a552b97dc95

EC2 Network Performance Cheat Sheet

EC2 Network Performance Cheat Sheet

What is the maximum network throughput of your EC2 instance? The answer to this question is key to choosing the type of an instance or defining monitoring alerts on network throughput. Unfortunately, you will only find very vague information about the networking capabilities of EC2 instances within AWS’s service description and documentation. That is why I run a network performance benchmark for almost all EC2 instance types within the last few days. The results are compiled into the following cheat sheet.

https://cloudonaut.io/ec2-network-performance-cheat-sheet/

On Incomplete HTTP Reads and the Requests Library In Python

The requests library is arguably the mostly widely used HTTP library for Python. However, what I believe most of its users are not aware of is that its current stable version happily accepts responses whose length is less than what is given in the Content-Length header. If you are not careful enough to check this by yourself, you may end up using corrupted data without even noticing. I have witnessed this first-hand, which is the reason for the present blog post. Lets see why the current requests version does not do this checking (spoiler: it is a feature, not a bug) and how to check this manually in your scripts.

https://blog.petrzemek.net/2018/04/22/on-incomplete-http-reads-and-the-requests-library-in-python/

Integration layer between Requests and Selenium for automation of web actions

Requestium is a python library that merges the power of Requests, Selenium, and Parsel into a single integrated tool for automatizing web actions.

The library was created for writing web automation scripts that are written using mostly Requests but that are able to seamlessly switch to Selenium for the JavaScript heavy parts of the website, while maintaining the session.

Requestium adds independent improvements to both Requests and Selenium, and every new feature is lazily evaluated, so its useful even if writing scripts that use only Requests or Selenium.

Features

  • Enables switching between a Requests’ Session and a Selenium webdriver while maintaining the current web session.
  • Integrates Parsel’s parser into the library, making xpath, css, and regex much cleaner to write.
  • Improves Selenium’s handling of dynamically loading elements.
  • Makes cookie handling more flexible in Selenium.
  • Makes clicking elements in Selenium more reliable.
  • Supports Chrome and PhantomJS.

https://github.com/tryolabs/requestium

Node.js Express API Development Security Checklist

The folks at RisingStack have published a really good article on security in Node.js applications and this checklist is meant to complement it with specifics for API development using the express framework.

  • [ ] Secure headers: use helmet, especially to set the Strict Transport Security header which will keep all your connections on HTTPS. Also see here on how to setup https using a free certificate from letsencrypt.
  • [ ] Log all errors but don’t expose stacktraces to the client.
  • [ ] Rate limit api calls to protect against DoS attacks. Can use expres-rate-limit.
  • Sanitize all user input
    • [ ] Sql injection: use prepared statements in favor of concatenating user input. For e.g.
      app.get('/', function(req, res) {
        Promise.using(getSqlConnection(), function(connection) {
          var sql = 'SELECT * from users where id = "' + req.query.username + '"';
          return connection.queryAsync(sql, [id])
            .then(function(rows, cols) {
              return rows;
            });
        });
      });

      can be hijacked to /?username=anything%22%20OR%20%22x%22%3D%22x which results in the following sql query being executed: select * from users where id = "anything" OR "x"="x". This will always result in true and return data for all the users in the system. This can be further extended to cause a lot more damage.

    • [ ] XSS: prevent the ability of an attacker to inject arbitary code into your application by sanitizing user input. For e.g. the following end point which accepts user input
      app.get('/', function(req, res) {
        var html = 'Hello ' + req.query.username;
        res.send(html);
      });

      can then be hijacked to create a url as follows /?username=%3Cbody%20onload%3Dalert(%27test1%27)%3E. This link can then be sent to unsuspecting users of your website and have arbitary code being executed on their machine. See here for more types of XSS attacks and examples.

    • [ ] Command injection: for example, a url like https://example.com/downloads?file=user1.txt could be turned into https://example.com/downloads?file=%3Bcat%20/etc/passwd.
    • [ ] MongoDb query injection: similar to sql injection but using MongoDb’s special operators instead. As an example consider the following end point
      app.post('/', function (req, res) {
        db.users.find({username: req.body.username, password: req.body.password}, function (err, users) {
            // TODO: handle the rest
        });
      });

      where sending in

      POST http://target/ HTTP/1.1
      Content-Type: application/json
      
      {
          "username": "vic@smalldata.tech",
          "password": {"$gt": ""}
      }
      

      will result in a successful match. Use mongo-express-sanitize to sanitize all user input.

    • [ ] Regex Denial of Service: a situation where user inputted regex can lead to blocking the event loop and a hanging application. See here for examples.
  • [ ] Use TLS for all connections. Also see here on how to setup https using a free certificate from letsencrypt.
  • [ ] Keep dependencies updated to stay ahead of any security issues. Use nsp to check dependencies for security vulnerabilities. Another great platform for open source projects is snyk.io.
  • [ ] Check for permissions at every step of the API chain: for e.g. GET /users/:userId/contacts/:contactId should not assume that the userId authenticated for the request is also authorized to make this call. Check that request.params.userId === request.authenticatedUserId or isAuthorized(authenticatedUserId, {userId: authenticatedUserId, resource: 'CONTACTS'}.
  • [ ] Don’t block the event loop: as an example parsing json is not a free operation and can potentially block the event loop for large json files (> 1Mb). Note that using the bodyparser module globally will give you a default maximum of 100kb for json payloads. It is efficient to only use it for routes which require it.

Please note that this checklist is meant to be used as a reference for further study. It is by no means an exhaustive list of all potential security issues. See also the web developer security checklist. Additions and comments are welcome.

https://smalldata.tech/blog/2017/05/19/nodejs-express-api-development-security-checklist

Building Business Systems with Domain-Specific Languages for NGINX & OpenResty

This post is adapted from a presentation at nginx.conf 2016 by Yichun Zhang, Founder and CEO of OpenResty, Inc. This is the first of two parts of the adaptation. In this part, Yichun describes OpenResty’s capabilities and goes over web application use cases built atop OpenResty. In Part 2, Yichun looks at what a domain-specific language is in more detail.

You can view the complete presentation on YouTube.

https://www.nginx.com/blog/building-business-systems-with-domain-specific-languages-for-nginx-openresty-part-1/
https://www.nginx.com/blog/building-business-systems-with-domain-specific-languages-for-nginx-openresty-part-2/