Web Development: Let's Go: Golang Concurrency, Part 1

Thursday, December 1, 2016

Let's Go: Golang Concurrency, Part 1

Overview

Every successful programming language has some killer feature that made it successful. Go's forte is concurrent programming. It was designed to around a strong theoretical model (CSP) and provides language level syntax in the form of the "go" keyword that starts an asynchronous task (yeah, the language is named after the keyword) as well as a built-in way to communicate between concurrent tasks.

In this article (part one), I'll introduce the CSP model that Go's concurrency implements, goroutines, and how to synchronize the operation of multiple cooperating goroutines. In a future article (part two), I'll write about Go's channels and how to coordinate between goroutines without synchronized data structures.

CSP

CSP stands for Communicating Sequential Processes. It was first introduced by Tony (C. A. R.) Hoare in 1978. CSP is a high-level framework for describing concurrent systems. It is much easier to program correct concurrent programs when operating at the CSP abstraction level than at the typical threads and locks abstraction level.

Goroutines

Goroutines are a play on coroutines. However, they are not exactly the same. A goroutine is a function that is executed on a separate thread from the launching thread, so it doesn't block it. Multiple goroutines can share the same OS thread. Unlike coroutines, goroutines can't explicitly yield control to another goroutine. Go's runtime takes care of implicitly transferring control when a particular goroutine would block on I/O access.

Let's see some code. The Go program below defines a function, creatively called "f", which sleeps randomly up to a half second and then prints its argument. The main() function calls the f() function in a loop of four iterations, where in each iteration it calls f() three times with "1", "2" and "3" in a row. As you would expect, the output is:

--- Run sequentially as normal functions
1
2
3
1
2
3
1
2
3
1
2
3

Then main invokes f() as a goroutine in a similar loop. Now the results are different because Go's runtime will run the f goroutines concurrently, and then since the random sleep is different between the goroutines, the printing of the values doesn't happen in the order f() was invoked. Here is the output:

--- Run concurrently as goroutines
2
2
3
1
3
2
1
3
1
1
3
2
2
1
3

The program itself uses the "time" and "math/rand" standard library packages to implement the random sleeping and waiting in the end for all goroutines to complete. This is important because when the main thread exits, the program is done, even if there are outstanding goroutines still running.

package main

import (
    "fmt"
        "time"
        "math/rand"
)

var r = rand.New(rand.NewSource(time.Now().UnixNano()))

func f(s string) {
        // Sleep up to half a second
        delay := time.Duration(r.Int() % 500) * time.Millisecond
        time.Sleep(delay)
        fmt.Println(s)  
}


func main() {
        fmt.Println("--- Run sequentially as normal functions")
        for i := 0; i < 4; i++ {
                f("1")
                f("2")
                f("3")
        }

        fmt.Println("--- Run concurrently as goroutines")
        for i := 0; i < 5; i++ {
                go f("1")
                go f("2")
                go f("3")
        }
        
        // Wait for 6 more seconds to let all go routine finish
        time.Sleep(time.Duration(6) * time.Second)
        fmt.Println("--- Done.")        
}

Sync Group

When you have a bunch of wild goroutines running all over the place, you often want to know when they are all done.

There are different ways to do that, but one of the best approaches is to use a WaitGroup. A WaitGroup is a type defined in the "sync" package that provides the Add(), Done() and Wait() operations. It works like a counter that counts how many go routines are still active and waits until they're all done. Whenever you start a new goroutine, you call Add(1) (you can add more than one if you launch multiple go routines). When a goroutine is done, it calls Done(), which reduces the count by one, and Wait() blocks until the count reaches zero.

Let's convert the previous program to use a WaitGroup instead of sleeping for six seconds just in case in the end. Note that the f() function uses defer wg.Done() instead of calling wg.Done() directly. This is useful to ensure wg.Done() is always called, even if there is a problem and the goroutine terminates early. Otherwise, the count will never reach zero, and wg.Wait() can block forever.

Another little trick is that I call wg.Add(3) just once before invoking f() three times. Note that I call wg.Add() even when invoking f() as a regular function. This is necessary because f() calls wg.Done() regardless of whether it runs as a function or goroutine.

package main

import (
    "fmt"
        "time"
        "math/rand"
        "sync"
)

var r = rand.New(rand.NewSource(time.Now().UnixNano()))
var wg sync.WaitGroup

func f(s string) {
        defer wg.Done()
        // Sleep up to half a second
        delay := time.Duration(r.Int() % 500) * time.Millisecond
        time.Sleep(delay)
        fmt.Println(s)  
}


func main() {
        fmt.Println("--- Run sequentially as normal functions")
        for i := 0; i < 4; i++ {
                wg.Add(3)
                f("1")
                f("2")
                f("3")
                
        }

        fmt.Println("--- Run concurrently as goroutines")
        for i := 0; i < 5; i++ {
                wg.Add(3)
                go f("1")
                go f("2")
                go f("3")
        }
        
        wg.Wait()       
}

Synchronized Data Structures

The goroutines in the 1,2,3 program don't communicate with each other or operate on shared data structures. In the real world, this is often necessary. The "sync" package provides the Mutex type with Lock() and Unlock() methods that provide mutual exclusion. A great example is the standard Go map.

It isn't synchronized by design. That means that if multiple goroutines access the same map concurrently without external synchronization, the results will be unpredictable. But, if all goroutines agree to acquire a shared mutex before every access and release it later, then access will be serialized.

Putting It All Together

Let's put it all together. The famous Tour of Go has an exercise to build a web crawler. They provide a great framework with a mock Fetcher and results that let you focus on the problem at hand. I highly recommend that you try solving it by yourself.

I wrote a complete solution using two approaches: a synchronized map and channels. The complete source code is available here.

Here are the relevant parts of the "sync" solution. First, let's define a map with a mutex struct to hold fetched URLs. Note the interesting syntax where an anonymous type is created, initialized and assigned to a variable in one statement.

var fetchedUrls = struct {
    urls map[string]bool
        m    sync.Mutex
}{urls: make(map[string]bool)}

Now, the code can lock the m mutex before accessing the map of URLs and unlock when it's done.

// Check if this url has already been fetched (or being fetched)
    fetchedUrls.m.Lock()
        if fetchedUrls.urls[url] {
                fetchedUrls.m.Unlock()
                return
        }

        // OK. Let's fetch this url
        fetchedUrls.urls[url] = true
        fetchedUrls.m.Unlock()

This is not completely safe because anyone else can access the fetchedUrls variable and forget to lock or unlock. A more robust design will provide a data structure that supports safe operations by doing the locking/unlocking automatically.

Conclusion

Go has superb support for concurrency using lightweight goroutines. It is much easier to use than traditional threads. When you need to synchronize access to shared data structures, Go has your back with the sync.Mutex.

There is a lot more to tell about Go's concurrency. Stay tuned...

Web Development