Filed under misc

Async I/O and Python

When you're working on OpenStack, you'll probably hear a lot of references to 'async I/O' and how eventlet is the library we use for this in OpenStack.

But, well ... what exactly is this mysterious 'asynchronous I/O' thing?

The first thing to think about is what happens when a process calls a system call like write(). If there's room in the write buffer, then the data gets copied into kernel space and the system call returns immediately.

But if there isn't room in the write buffer, what happens then? The default behaviour is that the kernel will put the process to sleep until there is room available. In the case of sockets and pipes, space in the buffer usually becomes available when the other side reads the data you've sent.

The trouble with this is that we usually would prefer the process to be doing something useful while waiting for space to become available, rather than just sleeping. Maybe this is an API server and there are new connections waiting to be accepted. How can we process those new connections rather than sleeping?

One answer is to use multiple threads or processes - maybe it doesn't matter if a single thread or process is blocked on some I/O if you have lots of other threads or processes doing work in parallel.

But, actually, the most common answer is to use non-blocking I/O operations. The idea is that rather than having the kernel put the process to sleep when no space is available in the write buffer, the kernel should just return a "try again later" error. We then using the select() system call to find out when space has become available and the file is writable again.

Below are a number of examples of how to implement a non-blocking write. For each example, you can run a simple socket server on a remote machine to test against:

$> ssh -L 1234:localhost:1234 some.remote.host 'ncat -l 1234 | dd of=/dev/null'

The way this works is that the client connects to port 1234 on the local machine, the connection is forwarded over SSH to port 1234 on some.remote.host where ncat reads the input, writes the output over a pipe to dd which, in turn, writes the output to /dev/null. I use dd to give us some information about how much data was received when the connection closes. Using a distant some.remote.host will help illustrate the blocking behaviour because data clearly can't be transferred as quickly as the client can copy it into the kernel.

Blocking I/O

To start with, let's look at the example of using straightforward blocking I/O:

import socket

sock = socket.socket()
sock.connect(('localhost', 1234))
sock.send('foo\n' * 10 * 1024 * 1024)

This is really nice and straightforward, but the point is that this process will spend a tonne of time sleeping while the send() method completes transferring all of the data.

Non-Blocking I/O

In order to avoid this blocking behaviour, we can set the socket to non-blocking and use select() to find out when the socket is writable:

import errno
import select
import socket

sock = socket.socket()
sock.connect(('localhost', 1234))
sock.setblocking(0)

buf = buffer('foo\n' * 10 * 1024 * 1024)
print "starting"
while len(buf):
    try:
        buf = buf[sock.send(buf):]
    except socket.error, e:
        if e.errno != errno.EAGAIN:
            raise e
        print "blocking with", len(buf), "remaining"
        select.select([], [sock], [])
        print "unblocked"
print "finished"

As you can see, when send() returns an EAGAIN error, we call select() and will sleep until the socket is writable. This is a basic example of an event loop. It's obviously a loop, but the "event" part refers to our waiting on the "socket is writable" event.

This example doesn't look terribly useful because we're still spending the same amount of time sleeping but we could in fact be doing useful rather than sleeping in select(). For example, if we had a listening socket, we could also pass it to select() and select() would tell us when a new connection is available. That way we could easily alternate between handling new connections and writing data to our socket.

To prove this "do something useful while we're waiting" idea, how about we add a little busy loop to the I/O loop:

        if e.errno != errno.EAGAIN:
            raise e

        i = 0
        while i < 5000000:
            i += 1

        print "blocking with", len(buf), "remaining"
        select.select([], [sock], [], 0)
        print "unblocked"

The difference is we've passed a timeout of zero to select() - this means select() never actually block - and any time send() would have blocked, we do a bunch of computation in user-space. If we run this using the 'time' command you'll see something like:

$> time python ./test-nonblocking-write.py 
starting
blocking with 8028160 remaining
unblocked
blocking with 5259264 remaining
unblocked
blocking with 4456448 remaining
unblocked
blocking with 3915776 remaining
unblocked
blocking with 3768320 remaining
unblocked
blocking with 3768320 remaining
unblocked
blocking with 3670016 remaining
unblocked
blocking with 3670016 remaining
...
real    0m10.901s
user    0m10.465s
sys     0m0.016s

The fact that there's very little difference between the 'real' and 'user' times means we spent very little time sleeping. We can also see that sometimes we get to run the busy loop multiple times while waiting for the socket to become writable.

Eventlet

Ok, so how about eventlet? Presumably eventlet makes it a lot easier to implement non-blocking I/O than the above example? Here's what it looks like with eventlet:

from eventlet.green import socket

sock = socket.socket()
sock.connect(('localhost', 1234))
sock.send('foo\n' * 10 * 1024 * 1024)

Yes, that does look very like the first example. What has happened here is that by creating the socket using eventlet.green.socket.socket() we have put the socket into non-blocking mode and when the write to the socket blocks, eventlet will schedule any other work that might be pending. Hitting Ctrl-C while this
is running is actually pretty instructive:

$> python test-eventlet-write.py 
^CTraceback (most recent call last):
  File "test-eventlet-write.py", line 6, in 
    sock.send('foo\n' * 10 * 1024 * 1024)
  File ".../eventlet/greenio.py", line 289, in send
    timeout_exc=socket.timeout("timed out"))
  File ".../eventlet/hubs/__init__.py", line 121, in trampoline
    return hub.switch()
  File ".../eventlet/hubs/hub.py", line 187, in switch
    return self.greenlet.switch()
  File ".../eventlet/hubs/hub.py", line 236, in run
    self.wait(sleep_time)
  File ".../eventlet/hubs/poll.py", line 84, in wait
    presult = self.do_poll(seconds)
  File ".../eventlet/hubs/epolls.py", line 61, in do_poll
    return self.poll.poll(seconds)
KeyboardInterrupt

Yes, indeed, there's a whole lot going on behind that innocuous looking send() call. You see mention of a 'hub' which is eventlet's name for an event loop. You also see this trampoline() call which means "put the current code to sleep until the socket is writable". And, there at the very end, we're still sleeping in a call to poll() which is basically the same thing as select().

To show the example of doing some "useful" work rather than sleeping all the time we run a busy loop greenthread:

import eventlet
from eventlet.green import socket

def busy_loop():
    while True:
        i = 0
        while i < 5000000:
            i += 1
        print "yielding"
        eventlet.sleep()
eventlet.spawn(busy_loop)

sock = socket.socket()
sock.connect(('localhost', 1234))
sock.send('foo\n' * 10 * 1024 * 1024)

Now every time the socket isn't writable, we switch to the busy_loop() greenthread and do some work. Greenthreads must cooperatively yield to one another so we call eventlet.sleep() in busy_loop() to once again poll the socket to see if its writable. Again, if we use the 'time' command to run this:

$> time python ./test-eventlet-write.py 
yielding
yielding
yielding
...
real    0m5.386s
user    0m5.081s
sys     0m0.088s

you can see we're spending very little time sleeping.

(As an aside, I was going to take a look at gevent, but it doesn't seem fundamentally different from eventlet. Am I wrong?)

Twisted

Long, long ago, in times of old, Nova switched from twisted to eventlet so it makes sense to take a quick look at twisted:

from twisted.internet import protocol
from twisted.internet import reactor

class Test(protocol.Protocol):
    def connectionMade(self):
        self.transport.write('foo\n' * 2 * 1024 * 1024)

class TestClientFactory(protocol.ClientFactory):
    def buildProtocol(self, addr):
        return Test()

reactor.connectTCP('localhost', 1234, TestClientFactory())
reactor.run()

What complicates the example most is twisted protocol abstraction which we need to use simply to write to the socket. The 'reactor' abstraction is simply twisted's name for an event loop. So, we create a on-blocking socket, block in the event loop (using e.g. select()) until the connection completes and then
write to the socket. The transport.write() call will actually queue a writer in the reactor, return immediately and whenever the socket is writable, the writer will continue its work.

To show how you can run something in parallel, here's how to run some code in a deferred callback:

def busy_loop():
    i = 0
    while i < 5000000:
        i += 1
    reactor.callLater(0, busy_loop)

reactor.connectTCP(...)
reactor.callLater(0, busy_loop)
reactor.run()

I'm using a timeout of zero here and it shows up a weakness in both twisted and eventlet - we want this busy_loop() code to only run when the socket isn't writeable. In other words, we want the task to have a lower priority than the writer task. In both twisted and eventlet, the timed tasks are run before the
I/O tasks and there is no way to add a task which is only run if there are no runnable I/O tasks.

GLib

My introduction to async I/O was back when I was working on GNOME (beginning with GNOME's CORBA ORB, called ORBit) so I can't help comparing the above abstractions to GLib's main loop. Here's some equivalent code:

/* build with gcc -g -O0 -Wall $(pkg-config --libs --cflags glib-2.0) test-glib-write.c -o test-glib-write */

#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

#include <glib.h>

GMainLoop    *main_loop = NULL;
static gchar *strv[10 * 1024 * 1024];
static gchar *data = NULL;
int           remaining = -1;

static gboolean
socket_writable(GIOChannel   *source,
                GIOCondition  condition,
                gpointer      user_data)
{
  int fd, sent;

  fd = g_io_channel_unix_get_fd(source);
  do
    {
      sent = write(fd, data, remaining);
      if (sent == -1)
        {
          if (errno != EAGAIN)
            {
              fprintf(stderr, "Write error: %s\n", strerror(errno));
              goto finished;
            }
          return TRUE;
        }

      data = &data[sent];
      remaining -= sent;
    }
  while (sent > 0 && remaining > 0);

  if (remaining <= 0)
    goto finished;

  return TRUE;

 finished:
  g_main_loop_quit(main_loop);
  return FALSE;
}

static gboolean
busy_loop(gpointer data)
{
  int i = 0;
  while (i < 5000000)
    i += 1;
  return TRUE;
}

int
main(int argc, char **argv)
{
  GIOChannel         *io_channel;
  guint               io_watch;
  int                 fd;
  struct sockaddr_in  addr;
  int                 i;
  gchar              *to_free;

  for (i = 0; i < G_N_ELEMENTS(strv)-1; i++)
    strv[i] = "foo\n";
  strv[G_N_ELEMENTS(strv)-1] = NULL;

  data = to_free = g_strjoinv(NULL, strv);
  remaining = strlen(data);

  fd = socket(AF_INET, SOCK_STREAM, 0);

  memset(&addr, 0, sizeof(struct sockaddr_in));
  addr.sin_family      = AF_INET;
  addr.sin_port        = htons(1234);
  addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);

  if (connect(fd, (struct sockaddr *)&addr, sizeof(addr)) == -1)
    {
      fprintf(stderr, "Error connecting to server: %s\n", strerror(errno));
      return 1;
    }

  fcntl(fd, F_SETFL, O_NONBLOCK);

  io_channel = g_io_channel_unix_new(fd);
  io_watch = g_io_add_watch(io_channel,
                            G_IO_OUT,
                            (GIOFunc)socket_writable,
                            GINT_TO_POINTER(fd));

  g_idle_add(busy_loop, NULL);

  main_loop = g_main_loop_new(NULL, FALSE);

  g_main_loop_run(main_loop);
  g_main_loop_unref(main_loop);

  g_source_remove(io_watch);
  g_io_channel_unref(io_channel);

  close(fd);

  g_free(to_free);

  return 0;
}

Here I create a non-blocking socket, set up an 'I/O watch' to tell me when the socket is writable and, when it is, I keep blasting data into the socket until I get an EAGAIN. This is the point at which write() would block if it was a blocking socket and I return TRUE from the callback to say "call me again when the socket is writable". Only when I've finished writing all of the data do I return FALSE and quit the main loop causing the g_main_loop_run() call to return.

The point about task priorities is illustrated nicely here. GLib does have the concept of priorities and has a "idle callback" facility you can use to run some code when no higher priority task is waiting to run. In this case, the busy_loop() function will *only* run when the socket is not writable.

Tulip

There's a lot of talk lately about Guido's Asynchronous IO Support Rebooted (PEP3156) efforts so, of course, we've got to have a look at that.

One interesting aspect of this effort is that it aims to support both the coroutine and callbacks style programming models. We'll try out both models below.

Tulip, of course, has an event loop, time-based callbacks, I/O callbacks and I/O helper functions. We can build a simple variant of our non-blocking I/O example above using tulip's event loop and I/O callback:

import errno
import select
import socket

import tulip

sock = socket.socket()
sock.connect(('localhost', 1234))
sock.setblocking(0)

buf = memoryview(str.encode('foo\n' * 2 * 1024 * 1024))
def do_write():
    global buf
    while True:
        try:
            buf = buf[sock.send(buf):]
        except socket.error as e:
            if e.errno != errno.EAGAIN:
                raise e
            return

def busy_loop():
    i = 0
    while i < 5000000:
        i += 1
    event_loop.call_soon(busy_loop)

event_loop = tulip.get_event_loop()
event_loop.add_writer(sock, do_write)
event_loop.call_soon(busy_loop)
event_loop.run_forever()

We can go a step further and use tulip's Protocol abstraction and connection helper:

import errno
import select
import socket

import tulip

class Protocol(tulip.Protocol):

    buf = b'foo\n' * 10 * 1024 * 1024

    def connection_made(self, transport):
        event_loop.call_soon(busy_loop)
        transport.write(self.buf)
        transport.close()

    def connection_lost(self, exc):
        event_loop.stop()

def busy_loop():
    i = 0
    while i < 5000000:
        i += 1
    event_loop.call_soon(busy_loop)

event_loop = tulip.get_event_loop()
tulip.Task(event_loop.create_connection(Protocol, 'localhost', 1234))
event_loop.run_forever()

This is pretty similar to the twisted example and shows up yet another example of the lack of task prioritization being an issue. If we added the busy loop to the event loop before the connection completed, the scheduler would run the busy loop every time the connection task yields.

Coroutines, Generators and Subgenerators

Under the hood, tulip depends heavily on generators to implement coroutines. It's worth digging into that concept a bit to understand what's going on.

Firstly, remind yourself how a generator works:

def gen():
    i = 0
    while i < 2:
        print(i)
        yield
        i += 1

i = gen()
print("yo!")
next(i)
print("hello!")
next(i)
print("bye!")
try:
    next(i)
except StopIteration:
    print("stopped")

This will print:

yo!
0
hello!
1
bye!
stopped

Now imagine a generator function which writes to a non-blocking socket and calls yield every time the write would block. You have the beginnings of coroutine based async I/O. To flesh out the idea, here's our familiar example with some generator based infrastructure around it:

import collections
import errno
import select
import socket

sock = socket.socket()
sock.connect(('localhost', 1234))
sock.setblocking(0)

def busy_loop():
    while True:
        i = 0
        while i < 5000000:
            i += 1
        yield

def write():
    buf = memoryview(b'foo\n' * 2 * 1024 * 1024)
    while len(buf):
        try:
            buf = buf[sock.send(buf):]
        except socket.error as e:
            if e.errno != errno.EAGAIN:
                raise e
            yield
    quit()

Task = collections.namedtuple('Task', ['generator', 'wfd', 'idle'])

tasks = [
    Task(busy_loop(), wfd=None, idle=True),
    Task(write(), wfd=sock, idle=False)
]

running = True

def quit():
    global running
    running = False

while running:
    finished = []
    for n, t in enumerate(tasks):
        try:
            next(t.generator)
        except StopIteration:
            finished.append(n)
    map(tasks.pop, finished)

    wfds = [t.wfd for t in tasks if t.wfd]
    timeout = 0 if [t for t in tasks if t.idle] else None

    select.select([], wfds, [], timeout)

You can see how the generator-based write() and busy_loop() coroutines are cooperatively yielding to one another just like greenthreads in eventlet would do. But, there's a pretty fundamental flaw here - if we wanted to refactor the code above to re-use that write() method to e.g. call it multiple times with
different input, we'd need to do something like:

def write_stuff():
    for i in write(b'foo' * 10 * 1024 * 1024):
        yield
    for i in write(b'bar' * 10 * 1024 * 1024):
        yield

but that's pretty darn nasty! Well, that's the whole idea behind Syntax for Delegating to a Subgenerator (PEP380). Since python 3.3, a generator can now yield to another generator using the 'yield from' syntax. This allows us to do:

...
def write(data):
    buf = memoryview(data)
    while len(buf):
        try:
            buf = buf[sock.send(buf):]
        except socket.error as e:
            if e.errno != errno.EAGAIN:
                raise e
            yield

def write_stuff():
    yield from write(b'foo\n' * 2 * 1024 * 1024)
    yield from write(b'bar\n' * 2 * 1024 * 1024)
    quit()

Task = collections.namedtuple('Task', ['generator', 'wfd', 'idle'])

tasks = [
    Task(busy_loop(), wfd=None, idle=True),
    Task(write_stuff(), wfd=sock, idle=False)
]
...

Conclusions?

Yeah, this is the point where I've figured out what we should do in OpenStack. Or not.

I really like the explicit nature of Tulip's model - for each async task, you explicitly decide whether to block the current coroutine on its completion (or put another way, yield to another coroutine until the task has completed) or you register a callback to be notified of the tasks completion. I'd much prefer this to rather cavalier "don't worry your little head" approach of hiding the async nature of what's going on.

However, the prospect of porting something like Nova to this model is more than a little dauting. If you think about the call stack of an REST API request being handled and ultimately doing an rpc.cast() and that the entire call stack would need to be ported to 'yield from' in order for us to yield and handle another API request while waiting for the result of rpc.cast() .... as I said, daunting.

What I'm most interested in is how to design our new messaging API to be able to support any and all of these models in future. I haven't quite figured that out either, but it feels pretty doable.

Tagged ,

April 14th OpenStack Foundation Board Meeting

On April 14, the day before the OpenStack Summit in Portland began, the OpenStack Foundation Board held an all-day, in-person meeting. Like all of our board meetings, the agenda was packed solid.

After our meetings, the goal is for Jonathan to post a summary to the mailing list (like this one) and I try and follow up with a longer blog post with a bit more commentary. For a bunch of reasons, we've let things slide this time, but here's my attempt at a summary anyway. It's been a while since the meeting, so a lot of the details are pretty vague at this point.

One thing I really liked about the meeting was that we had chairs set up so that anyone who wished to attend the meeting and listen were welcome to do so. With so many stackers already in town for the summit, it really was a great opportunity to open the workings of the board to anyone interested.

Transparency Committee

The first meaty item on the agenda was an update from Joshua McKenty on the progress of the board's Transparency Committee.

We began by discussing Nick Barcet's efforts to choose and implement a document management system for the board. OwnCloud, Google Drive, Dropbox and github have all been considered. Google Drive and Dropbox would not be accessible by all board members. It was noted that the Foundation staff will also require a document management system and we might be able to use the same system for both use cases. Conclusion was for Nick and Monty to see if the OpenStack Infrastructure team could host OwnCloud for this purpose.

Next we discussed the transparency policy which Lauren Sell and Mark Radcliffe have been working on. The core concern is how to balance the need for having a culture of transparency and openness with the need for some information (like legal or personnel matters) to be kept confidential. The general approach discussed was having the board's mailing list open, but confidential information would be shared via private documents in a document management system that could be referred to in mailing list posts. The distinction between confidential and embargoed information was discussed and it was agreed that embargoed information should have a disclosure date associated with it.

There was also a brief discussion on the possibility of having a transparency ombudsman. This would be a person trusted by the community with whom complaints related to transparency could be raised. It was felt this could be a duty of a foundation employee, if that individual had already established the reputation for trustworthiness and independence required.

2013 Individual Director Elections

Next up, Rob Hirschfeld led a discussion on possible changes to the election process in time for the 2013 Individual Director elections.

We began by discussing how we would go about making any required changes to the bylaws. It was recognized that we would need a firm proposal to be made to the board in August so that notice could be given in September of a vote in October in time for the elections in November.

It was also noted that a bylaws change would require a majority vote of the individual members with a voter turnout of 25%. For reference, we had a 28% turnout at the previous election.

We also briefly discussed the potential objectives for any changes to the elections process but avoided going deep on this because of the limited time available on the agenda.

Legal Affairs Committee

Next on the agenda was a discussion (led by me) about the Legal Affairs Committee.

The main concern I raise was the project (or the "Technical Community and Committee", as I called it) needed the support of subject matter experts when dealing with the kind of legal issues regularly encountered by all open-source projects, particularly in the area of copyright and licensing. I proposed Richard Fontana's idea that, rather than depending exclusively on the Legal Affairs Committee or the Foundation's legal counsel for help with such matters, we would create a legal-discuss mailing list where any subject matter experts could contribute to resolving issues raise by the technical community. The idea was that we would also have a FAQ wiki page to document any conclusions reache on the list. There was some discussion about whether opinions expressed on the list could be construed as legal advice and it was agreed to put in place disclaimers to avoid that perception.

(For the record, we've since created the legal-discuss mailing list and the legal issues FAQ.)

We also discussed whether the description of scope of the Legal Affairs Committee in the bylaws could be improved and whether the strict five member limit in the bylaws was really such a good idea. It was agreed that I would work with Alice King to see if we can address those issues.

(Again for the record, Alice and I had a productive discussion and we put together the Legal Affairs Committee wiki page to at least ensure the committee's scope, membership and progress could be properly documented.)

Gold Member Applications

Next up, we had presentations from Juniper and Ericsson on their applications for Gold Membership. Once the presentations had been completed, we convened an executive session to discuss the applications in private.

During the executive session, the board reviewed each application using the [https://wiki.openstack.org/wiki/Governance/Foundation/PotentialMemberCriteria previously agreed criteria].

Once the executive session was completed, the board reconvened the meeting and carried out the official, public vote on the new member applications. The applications of both Ericsson and Juniper were approved.

As part of the voting process, some directors chose to publicly restate their concerns with the applications that had been discussed during the executive session. One concern raised by Rob Hirschfeld and Joshua McKenty was that, while both applicants shared their plans for increasing their OpenStack engagement, it could be argued that future plans aren't the same as the "demonstrated commitment" talked about in the membership criteria.

Another concern discussed at length was that applicants could provide much more in the way of written detail about how they meet the membership criteria. One take on this was that applicants should be expected to provide a resume of their involvement with, and commitment, to OpenStack. Simon Anderson proposed that we form a membership committee which would mentor applicants and advise them on how to prepare their application.

In total, discussion of these applications and the process for new gold members took up 4 hours of the board's time. While this may seem excessive, it was very apparent that everyone on the board was keen for the Foundation to have gold members that would help drive OpenStack's success. With the addition of Ericsson and Juniper, we now have 16 of the 24 gold member slots filled. As the number of remaining slots shrink, I think we will see the board having higher expectations of applications.

Programs

At this point, the board seemed collectively quite drained of energy and the agenda had to be significantly reworked because there wasn't much time remaining.

Monty took the opportunity to briefly discuss an idea he had to introduce the concept of "programs" in addition to our current concept of projects. We would use this to recognize efforts like documentation and CI on equal footing with the integrated projects. Other efforts like Oslo, TripleO and puppet recipes were also mentioned as potential programs.

Rather than go into too detailed discussion on the topic, we generally expressed support for the concept but agreed it needed further discussion.

Joint Session

Part of the agenda for the meeting was that members of the Technical Committee were invited to join the meeting for a joint session. Monty and I were obviously already present, but Michael Still and Thierry Carrez also joined. Other TC members had not yet arrived in Portland or weren't fully aware of the joint session. We had some discussion about how to have a joint session with better attendance from both groups and the idea of a joint design summit session at the next summit was mooted.

User Committee

Tim Bell stepped up next to give an overview of the results of the User Committee's recently completed survey of OpenStack users.

Tim is presenting these exciting results at the summit, so I won't repeat them here. However, we did have an interesting discussion about how representative the survey was of OpenStack deployments. Tim offered 50-75% as his gut feeling for the percentage of deployments covered but Joshua McKenty reckoned it was probably less than 25%. There was some discussion about future surveys and the potential for an opt-in usage stats tracking tool to increase the percentage of deployments we know about.

Incubation Update Committee

Next up, Alan Clark presented the final report of the Incubation Update Committee.

The board was generally very supportive of the work of the committee, but it was clear further discussion was required on the topic of Core status and the board's trademark usage program. Much discussion was had around the desire for projects like Heat and Ceilometer to be able to adopt official OpenStack project names (like "OpenStack Measurement" for Ceilometer) and whether this was all that would be implied by Core status. However, there is also the question of whether clouds which wish to use the OpenStack brand should be required to use all Core projects or whether a trademark program around more targeted interoperability testing made more sense.

It was agreed that this was an incredibly important and urgent topic but that we had run out of time at this meeting to make progress on it.

Tagged ,

February 12th OpenStack Foundation Board Meeting

Yesterday was the first in-person meeting of the 2013 Foundation Board. We met at Rackspace's offices in San Francisco at Jim Curry's invitation. The meeting also coincided with the Foundation staff all getting together in-person for the first time. With so many OpenStack people converging on the one place, it was quite funny to randomly run into various people at the Courtyard Marriott around the corner.

The meeting kicked off at 10am after some informal introductions. For me, it was a case of playing a "hey, are you Tristan?"/"No, I'm Sean" guessing game. Monty brought his usual wackiness to the ocassion by handing out beads and wearing a luminous, sparkling orange fedora with bright blue flashing LEDs. It's Mardi Gras time!

Formalities

The first item on the agenda was a formal role call. 5 board members couldn't attend and Tim Bell was on the phone. Also attending were Jonathan Bryce and Mark from the Foundation staff. Mark Radcliffe, our outside council, was on the phone.

From that start, it was obvious that it was going to be extremely difficult for those dialled into the meeting to be able to participate effectively. This is definitely something we will continue to struggle with and, as one of the few who had to travel halfway around the world for the meeting, I can certainly sympathise with those on the phone.

As a further formality, Alan Clark reviewed some of the existing board policies:

  • that observers are allowed at board meetings, except in the case of executive sessions
  • that board members wouldn't blog or tweet until after an official summary of the meeting had been posted to the foundation list

There was some discussion about how executive discussions should be handled. How we can describe what the topic of any executive sessions were and perhaps also publicly hold any votes arising from those sessions.

Rob Hirschfeld proposed that our general decision making process should involve first coming to an agreement on the criteria for making the decision and then applying the criteria. The idea was that this process would avoid some circular debates and save time.

User Committee

Next up, Ryan Lane presented an update on the progress of the User Committee. He described the mandate of the committee as "fighting for the users". The initial goals of the committee are to define their charter and a set of categories of users. Ryan encouraged the board to review and comment on the Google Doc they had prepared.

Much discussion was had about how to gather data about our users. Options included a CRM system, user polls, anonymous usage reporting tools, aggregating statistics to protect the innocent and the like. This is clearly going to be a very hot topic for the committee.

During the discussion, Josh McKenty posted a blueprint for how a opt-in tracking system might work.

We also had an interesting debate about the committee should move forward with adding more members to the committee. The committee's own proposal for this was to be democratic and have elections for representatives from different geographies or categories of users. Quite a few board members raised concerns about this approach and asked the committee members to press forward quickly and appoint a diverse set of members with minimal bureaucracy.

Once Ryan had finished, the board expressed their thanks and support for the efforts of the committee. Ryan was asked to stay and observe the board meeting, but Ryan preferred to go and get some "real work" done. That's the spirit!

If you want to follow the progress of the user committee, the best way is to subscribe to user-committee@lists.openstack.org mailing list. Ryan would be happy to share his slide deck if you email him.

Legal Affairs Committee

Alice King and Nissa Strottman took the floor and presented the work of the Legal Affairs Committee on the whole area of patents and risk mitigation.

Alice and Nissa first talked the board through some background on patents. They described the "risk landscape" in terms of competing companies suing and counter-suing each other for infringement (Alice had an awesome diagram of "who's suing who in the mobile industry" which got a good laugh from everyone) but also, perhaps more importantly, the threat of Non Practicing Entities or "patent trolls".

We talked in some detail about the Patent Grant in the Apache License and how it does a lot to mitigate the risks but doesn't completely eliminate them. In particular, it does little to help with the threat of NPEs. An interesting debate followed about various nuances of the Apache License definitions and how they relate to OpenStack.

Alice and Nissa described approaches taken by other communities - e.g. OIN, GPLv3, Open Compute and Eclipse - and also how an additional contributor agreement could be adopted by the project to further help. Brian Stevens talked about how the OIN works and how it isn't strictly limited to Linux. It was noted that many of the larger members of the Foundation are also members of OIN. Eileen Evans described how the adoption of a new contributor agreement would be a massive bureaucratic challenge for some of the larger companies involved in the project.

A point on one of the slides that a more robust patent policy would remove a barrier to OpenStack's adoption triggered some discussion about whether patent risk is in fact hindering OpenStack's adoption. The consensus seemed to be that this wasn't seen as a huge issue and how, in fact, OpenStack is in as good a position as most any other open-source project. While it's important for the Foundation to do due diligence on this matter, it's also important that we don't overstate the actual risk.

Another topic discussed was the question of "defensive publication" of ideas generated by the developer community so that they are properly documented in a way that is accessible to lawyers who wish to demonstrate prior art. Various ideas were suggested about how blueprints could form the basis for such an approach and how to do this without placing the burden on the developers.

It's amusing the way we wind ourselves in knots over these things and generally come to the conclusion that the issues are difficult, but that things are working pretty well as they currently stand.

Finally, with limited time available, we picked up again on some of the discussion from the previous board meeting about the scope, name and makeup of the committee. Some board members seemed fine with how things currently stand while others reiterated the issues previously discussed.

Anyone interested in the legal affairs committee should talk to Alice King about how to get involved. She would be happy to share her slides with those interested.

Lunch

At this point, it was time to break for lunch. Most board members stayed for the lunch provided on-site and had productive, informal discussions while eating.

Monty, Nick Barcet and I attended a Technical Committee meeting over IRC while also eating and chatting with the rest of the board. That's multi-taksing! Amazingly every single member of the TC attended the meeting and we voted unamimously to update the Incubation Process as recommended by the IncUp committee. Such a level of TC consensus is completely unprecedented and took a tonne of work to achieve. Kudos to Thierry for herding the cats on this.

Financial Report

Sean Roberts presented a report on the work of the Financial Oversight committee and we discussed the progress on completing the Foundation's first financial audit and the updated financial forecast for the year.

The topic was so uncontroversial (you could even say boring) that we didn't even take much in the way of notes on it. There was unanimous agreement that this is how it should be and we'd be doing something wrong if the topic was "interesting". The board praised the committee and the staff for their dilligence in doing everything to make sure the Foundation's financial affairs were beyond question.

Transparency

Josh McKenty was next up presenting his proposal for a Transparency Policy for the board and his wanting volunteers to form a committee to finalize the details.

He described the committee's goal as:

To improve transparency and foster collaboration between the foundation members and members of the board, technical committee, user committee and other committees. Specifically, to draft statements and prototype systems changes for board review and approval.

A large part of the discussion was around trying to quantify the problem we'retrying to solve and understanding how we'd achieve closure on it. The board doesn't want to be consumed with this indefinitely, so how will we know when it's simply a case of "you can't please everyone".

We wrapped this up by gathering volunteers for the committee:

  • Nick Barcet
  • Jonathan Bryce
  • Alan Clark
  • Eileen Evans
  • Tristan Goode
  • Rob Hirschfeld
  • Kyle MacDonald
  • Joshua McKenty
  • Mark McLoughlin
  • Lauren Sell

Director's Report

Next up, Jonathan provided an awesome update on the Foundation's progress and plans. It's clear an awful lot of time and thought went into this super-helpful and encouraging update.

One of Jonathan's slides showed some mind-blowing statistics detailing the growth of the developer community, user community and ecosystem. The statistics on the number of patches merged every month were really stunning and the board praised the success of the project's infrastructure team in enabling this level of activity.

Jonathan talked to his hiring plan and introduced the latest Foundation employees. We're executing almost to the plan and there are two positions still to fill - another infrastructure engineer and another community manager.

There was a brief discussion of the updated budget. The summary is that slightly more money is both coming in and going out than planned, giving a positive net effect. A motion was passed to approve the new budget.

The discussion moved more to event planning and marketing matters and Lauren Sell filled the board in on a lot of the details in this area.

Planning of the Havana summit was well in hand and it was noted that the choice of venue means that costs will rise more linearly with attendance than previous summits. Lauren discussed the venue selection process for the October 2013 summit which should come to a conclusion soon. Future summits were discussed with general agreement that a two year cycle should be rougly East Coast US, Asia, West Coast US and Europe. Several board members expressed a desire to get started on the selection process for the October 2014 summit since the size of the event means that suitable locations get booked up very far in advance.

Lauren also described her efforts to bring the various marketers at dozens of OpenStack companies together with some of the same principles that drive the collaborative software development process. She has created a mailing list which already has almost 100 members and holds a regular phone meeting of the group. The goal is to have all those involved in marketing OpenStack to be highly co-ordinated and "on message".

Finally, Lauren quickly presented an independent study she had just received on OpenStack's marketing impact relative to other open-source projects and industry players. The results of the study were simply stunning and most of us appeared to be struggling to believe them. However, even if the statistics on OpenStack are massively inflated, we're still hugely being successful. Lauren, Mark and Jonathan have already worked with the authors of the report to find any data that could undermine the conclusions and will continue to do so.

Jonathan and Lauren can be contacted directly for the materials they presented to the board.

Strategy Session

Jonathan moved on to kick off a strategy session where board members would have an opportunity to put their thinking hats on and brainstorm together. He teed up the discussion by presenting his thesis that OpenStack was building a "platform ecosystem" with huge potential for network effects that would result in OpenStack being the dominant cloud platform in the market place.

We then moved onto having a short exercise where we used sticky-notes to throw out our ideas in three areas - "interoperability", "reference architectures" and "engaging users". Once the ideas had been gathered, we split into breakout groups to discuss ideas for each of those areas.

Rather than trying to capture all of the ideas and action items from the discussions here, it's perhaps easier to call out some highlights:

  • We're going to explore the use of Tempest as an API compatibility testing tool which can be pointed at an OpenStack instance. The idea is that anyone who wants to license the OpenStack trademark would request a test run which would result in a scorecard. At every release, the board would update the
    results required for each OpenStack mark so that obtaining a trademark license simply becomes a matter of fixing any failing tests which are required for the desired mark.
  • We're going to explore the definition of reference architecture "flavours" and how these reference architectures could be defined and maintained. The initial idea is that Heat templates could be used as a deployable reference architecture definitions.
  • In terms of engaging users, we decided that trystack.org and deployment tracking tools were crucial.

Election Process Committee

Todd Moore raised the question of whether allowing Foundation staff to run for the Individual Member Director elections was a waste of a board seat since staff members are so involved in the Foundation anyway. The question of potential conflicts of interest was also raised. There was no time to debate this question so the board resolved to reconstitute the Election Process Committee to consider this question and the general question of the eligibility to run and vote for these board seats.

The committee will consist of Todd as chair and the 8 individual board members.

Evening Event

We wrapped up at 6pm sharp and headed over to a nearby restaurant with all of the Foundation staff members. Far from simply being a social event, it was incredible to see 30+ focused and committed folks spend over 4 hours going from conversation to conversation about how to continue OpenStack's success into the future. For me personally, those conversations alone were enough to justify the effort it took to get to the meeting.

Tagged ,

First Board Meeting

Disclaimer: apparently board members aren't supposed to comment on board meetings until the official minutes have been published (up to 2 weeks after the meeting). However, Jonathan does publish timely summaries of the meetings to bridge that gap. This post isn't intended as an official record of the meeting. Read it as if it was a summary by a non-board-member listening to the public part of the meeting.

I attended my first OpenStack Foundation Board meeting today.

When I was observing the board from the outside (yet not listening in on their calls), I found the meeting minutes a fairly colourless way of following what was going on. I'm going to attempt to write up some thoughts after each meeting, but no promises :-)

The first thing that struck me is that we spend a fair amount of time messing about with the conference call system, taking a roll call, debating proper procedure and the like. My guess is that over the course of the 150 minute call, we spent 30 minutes on this stuff. Still, given that the Foundation is still relatively new and there are 24 board members attending along with some Foundation staff members, it actually wasn't terrible.

Another thing is that I'd say roughly only half of the folks on the call actively contributed to the discussions. I'm curious whether that was because it's difficult to jump into a discussion on such a big call or just that all points were being covered satisfactorily by those contributing. I myself often sit through conference calls without saying a word for the latter reason. Not today, though.

The first meaty item on the agenda was Jonathan giving a nice 30 minute summary of how the Foundation has progressed since it was formed. At one point, Jonathan had too keep going while some hold music played in the background. It was all really positive stuff, nicely putting our achievements into context. I hope the slide deck will be forwarded to the foundation mailing list.

Next up was Alice King with a proposed charter for the Legal Affairs Committee (which doesn't have a page on our governance page - we should fix that). The committee was formed under the bylaws:

4.15 Legal Affairs Committee. The Legal Affairs Committee shall be an advisory committee to the Board of Directors and shall be comprised of no more than five (5) members. The Board of Directors shall appoint the members of Legal Affairs Committee. The Legal Affairs Committee shall advise the Board of Directors on the management of: (i) compliance with and enforcement of the Trademark Policy, (ii) strategies to promote the efficient intellectual property protection of the OpenStack Project, including without limitation, the resolution of patent and other intellectual property issues and disputes related to the Members’ use of the OpenStack Project, and (iii) all programs that the Board of Directors is considering relating to intellectual property management and protection.

but this proposed charter attempted to elaborate that the role of the committee would be as policy advisor in the area of Intellectual Policy. This seemed to catch some board members by surprise (including me) and there were a bunch of really good points like whether a name like "IP Policy Committee" would be more appropriate and that more diversity (including non-laywers) beyond the existing 5 members is needed for such a charter. I'm very happy to see the level of interest in getting this right, because it's hugely important. In the end, we agreed to the charter as propose but without the list of specific projects the committee might undertake. The whole topic will be revisited in more detail at our next meeting.

Next up, I was presenting as a Technical Committee representative about the progress of the Incubation and Core Update Committee. More specifically, I was briefing the board on how the TC intends to proceed with the incubation process for Ceilometer and Heat based on our understanding of the TC's existing mandate. This is all covered by an excellent diagram from Thierry and a wordy etherpad. This seemed to be fairly well received, barring some confusion about whether a vote was required.

At this point, our scheduled time was almost up but we still had two agenda items that really needed covering. Both needed to be discussed in private (for good reasons IMHO), however one of them will be adequetly summarised in the minutes while the other will be open to public discussion very soon.

All in all, I have to say I enjoyed the meeting and look forward to the full day meeting on Feb 12 in San Francisco.

Tagged ,

Image Building Service Demo

Martyn Taylor and Steven Hardy have done an awesome job of demoing an Image Building Service for OpenStack:

http://www.youtube.com/watch?v=MdBM4HA3QUk

I think this has huge potential. Imagine an OpenStack API to which you could send a request for a fresh image build of any OS, request specific packages, software or other content to be included and have that image be uploaded to Glance or Cinder once it's built. The image gets built in the cloud on a Nova instance/VM and the cloud provider bills you for the compute and I/O resources needed to complete the build.

Tagged