elarson’s posterous

 
Filed under

python

 

Parallelism Relevance

The other day I got rather caught up in the whole * is Unix discussion and wanted to get a better understanding of some of the more low level details associated with writing servers. Eventually, I had a crude version of CherryPy that forked instead of used threads. The benchmark in CherryPy seemed to suggest it was a great deal faster, but honestly I think that was a fluke, especially considering I'm pretty sure there are some serious deficiencies in terms of process management. When Bob posted his forking example on the CherryPy list and mentioned signals, it was clear that I was missing some pretty important understanding.

One thing I thought seemed helpful was that it seemed to use both processors on my system. My understanding is that calling fork will create two separate processes, which effectively should place at least some of the forked processes on different processors. My understanding is that this is how the multiprocessing module works (on Linux at least). With this new cursory knowledge under my belt it was interesting to see an article in my ACM Communications magazine remark on the future of computing and the need to handle multiple processors in a similar manner as sequential paradigms. I should note that this is one of the very few times there was anything remotely interesting to me in my ACM magazine, so I felt compelled to check it out. The idea is not new to me and while the arguments that this is a huge issue is pretty valid, I also started thinking that maybe it is not as big a deal in practice.

The reason for this is the web. Back in the early days of computing people had dumb terminals and logged into a server where all the actual "computing" was performed. While thick clients have been the rage for quite a while, the web is effectively becoming the mainframe in the sky in terms of where people are doing their computing. This should be very helpful in making the jump to parallel computing because there s a definite history and set of tools that have been developed to scale that allow parallel processors. In other words, who really cares if your computer has 2 quad core processors when your searching email online, browsing facebook, checking your bank statements and reading the news through your web browser.

On the web, what really matters is that the sites doing the processing have optimized their architectures to handle the load. The other side of the equation is the browser, but in reality, this is an area that is already becoming more robust. Google Chrome is a good example with its separate processes for each tab, but I would even argue that Javascript is well suited to handle distributed tasks. We've seen a ton of articles on using async servers. Javascript is already using an async model that is only getting faster with the recent developments in new Javascript engines. None of this means that parallel computing and utilization of more processors is not very important. I just don't think it is quite as critical as some might think. That doesn't mean programmers shouldn't try to understand it. After all, it is a hard problems and programmers typically like hard problems. My whole point though, is that the problem might be more of a fad than an actual crux for the IT industry.

Filed under  //   programming   python  

My Org-Mode Server

I went ahead and started working out how to get my todo list online. I started off pretty simple and ended up with a relatively nice system. The basic idea is that I can push my org files to my webserver and edit them. Likewise, I can pull from the server. It started with some simple paver scripts that uploaded the files and quickly became an actual application.

Here is the paver file for some of the operations:


import os
from mercurial import commands, ui, hg
from paver.easy import *
import subprocess

IONROCK_HG = 'ssh://eric@ionrock.org/path/to/todos/'
REMOTE_TODO = IONROCK_HG # '/local/dev/path/to/todos'

@task
def server():
    import cherrypy
    cherrypy.tree.graft(TodoServer(base_url='/'), '/')
    cherrypy.quickstart()

@task
def create_repo():
    cmd = subprocess.call("fab create_repo:hosts='ionrock.org'", shell=True)

@task
def commit():
    conf = ui.ui()    
    user = conf.username()
    repo = hg.repository(conf, '.')
    files = [f for f in os.listdir('.') if f.endswith('.org')]
    commands.add(conf, repo, *files)
    commands.commit(conf, repo, addremove=True, message='Syncing org files')
    commands.push(conf, repo, REMOTE_TODO)

@task
@needs('commit')
def pull():
    conf = ui.ui()    
    user = conf.username()
    repo = hg.repository(conf, '.')
    commands.pull(conf, repo, REMOTE_TODO)
    commands.update(conf, repo)

@task
@needs('commit')
def update():
    cmd = subprocess.call("fab update_todos:hosts='ionrock.org'", shell=True)
    

The server task was for starting the eventual web application for development. The commit task just automatically commits the current org files and pushes them to the remote server. The pull command does the commit first, then pulls from the remote server. These two commands uses the mercurial libraries to work with the mercurial repos.

The create_repo was just a simple task to create an mercurial repo. More interesting is the update task which updates the remote todo mercurial repo. I'm using fabric for this aspect. It was all really easy. Here is the fabfile:


from fabric import run

def update_todos():
    run('cd /home/eric/htdocs/todo && hg up')

def create_repo():
    run('cd /home/eric/htdocs/todo && hg init')

Hopefully it is really clear what is happening here. Fabric lets you run commands via ssh on a remote server.

The actual todo server is a bit longer but also pretty simple.


import os
import re
import posixpath as path
import difflib
from selector import Selector
from webob import Response, Request
from webob.exc import *
from mercurial import commands, ui, hg
import datetime


class TodoFile(object):
    def __init__(self, fn):
        self.fn = fn
        self.html_diff = difflib.HtmlDiff()
        self.diff = difflib.Differ()
        self.matcher = difflib.SequenceMatcher()
        lines = [l for l in open(fn, 'r')]
        self.matcher.set_seq2(lines)

    def _hg(self):
        conf = ui.ui()
        user = conf.username()
        repo = hg.repository(os.path.dirname(self.fn))
        return conf, repo, user

    def __str__(self):
        return ''.join(self.read())
    
    def read(self):
        return [l for l in open(self.fn, 'r')]

    def write(self, new):
        f = open(self.fn, 'w')
        clean = re.sub('\r', '', new)
        f.write(new)
        f.close()
        conf, repo, user = self._hg()
        date = datetime.datetime.now().strftime('%m-%d-%y %H:%M')
        commands.commit(conf, repo, message='Web write on %s' % date)
        
    def is_different(self, new):
        self.matcher.set_seqs(new.split('\n'), self.read())

    def diff_txt(self, new):
        return list(difflib.context_diff(new.split('\n'), self.read()))

    def diff_html(self, new):
        return self.html_diff.make_file(self.read(), new.split('\n'))        
    

class TodoStore(object):
    def __init__(self, directory):
        self.dir = os.path.abspath(directory)

    def get_todo(self, name):
        for fn in os.listdir(self.dir):
            if fn.endswith('.org') and (fn[:-4] == name):
                return TodoFile(os.path.join(self.dir, fn))
        return false

    def all(self):
        return [fn[:-4] for fn in os.listdir(self.dir) if fn.endswith('.org')]


class Auth(object):
    def __init__(self, creds, login_url, success_url=None):
        self.login_url = login_url
        self.success_url = success_url
        self.creds = creds

    def __call__(self, f):
        def func(env, sr):
            sess = env['beaker.session']
            if sess.get('auth.user'):
                return f(env, sr)
            req = Request(env)
            sess['auth.after_login_url'] = req.url
            sess.save()
            return HTTPSeeOther(location=self.login_url)(env, sr)
        return func

    def login(self, env, sr):
        res = Response()
        sess = env['beaker.session']
        flash = sess.get('flash', '')
        if flash:
            sess['flash'] = ''
            sess.save()
        res.write('''<div>%s</div>
        <form action="%s" method="post">
          <label for="username">Username</label>
          <input type="text" name="username" value=""><br />
          <label for="password">Password</label>
          <input type="password" name="password" value=""><br />
          <input type="submit" value="login" />
        </form>''' % (flash, self.login_url))
        return res(env, sr)

    def handle_login(self, env, sr):
        req = Request(env)
        post = req.POST
        sess = env['beaker.session']        
        if post.get('username') and post.get('password'):
            if self.creds.get(post['username']):
                if self.creds[post['username']] == post['password']:
                    sess['auth.user'] = post['username']
                    url = sess.get('auth.after_login_url', self.success_url)
                    sess.save()
                    return HTTPSeeOther(location=url)(env, sr)
        sess['flash'] = 'Error logging in.'
        sess.save()
        return HTTPSeeOther(location=self.login_url)(env, sr)
            

class TodoServer(object):

    def __init__(self, **config):
        self.conf = {
            'todo_dir': os.path.dirname(os.path.abspath(__file__)),
        }
        self.conf.update(config or {})

        self.auth = Auth(self.conf.get('creds', {}),
                         self.url('login'),
                         self.url())
        

        self.store = TodoStore(self.conf['todo_dir'])

        self.router = Selector([
            ('[/]', {'GET': self.listing}),
            ('/login[/]', {
                'GET': self.auth.login,
                'POST': self.auth.handle_login
            }),
            ('/{name}/edit[/]', {
                'GET': self.edit,
                'POST':  self.auth(self.update)
            }),
            ('/{name}[/]', {'GET': self.read}),
        ])

    def url(self, extras=None):
        extras = extras or ''
        if isinstance(extras, list):
            extras = '/'.join(extras)
        return path.join(self.conf['base_url'], extras)

    def _header(self):
        return '''<html><head>
        <title>org todo server</title>
        <style type="text/css">
        body {
            font-size: 2em; font-family: sans-serif;
        }
        </style>
        '''

    def _footer(self):
        return '''</body></html>'''

    def edit(self, env, sr):
        res = Response()
        req = Request(env)
        name = req.urlvars['name']
        td = self.store.get_todo(name)

        res.write(self._header())
        res.write('''
        <form action="%s" method="post">
        <input type="submit" name="submit" value="save" /><br />        
        <textarea rows="50" cols="80" name="new_body">%s</textarea>
        </form>
        ''' % (self.url('%s/edit' % name), str(td)))
        res.write(self._footer())
        
        return res(env, sr)

    def update(self, env, sr):
        req = Request(env)
        name = req.urlvars['name']
        new_body = req.POST['new_body']
        todo = self.store.get_todo(name)
        todo.write(new_body.strip())
        location = self.url('%s' % name)
        return HTTPSeeOther(location=location)(env, sr)

    def read(self, env, sr):
        res = Response()
        req = Request(env)
        name = req.urlvars['name']

        res.write(self._header())
        res.write('''
        Home | Edit
        <hr />
        <pre>''' % (self.url(), self.url('%s/edit' % name)))
        td = self.store.get_todo(name)
        res.write(str(td))
        res.write('</pre>')
        res.write(self._footer())
        
        return res(env, sr)

    def listing(self, env, sr):
        res = Response()

        res.write(self._header())
        res.write('<ul>\n')
        for f in self.store.all():
            res.write('<li>%s</li>\n' % (self.url(f), f))
        res.write('</ul>\n')
        res.write(self._footer())
        
        return res(env, sr)

    def __call__(self, env, sr):
        return self.router(env, sr)


This is a WSGI app simply because I'm using WSGI for my main application. I save a bit of memory by running all my smaller apps via one WSGI server (CherryPy), which makes a difference as I use a VPS.

One observation I made is that things would have been simpler had I been able to use CherryPy. Things like sessions, form processing and even URL routing would have been built in and made the whole thing a lot simpler in terms of dependencies and actual code.

This also made me realize what the problem is building applications with WSGI. You really need a framework. I don't mean Pylons, web.py or some other WSGI framework. But you will undoubtedly write some glue code to help handle things like request and response objects that help to deal with form handling, sessions and cookies. It is nice to know that it is so easy to create these micro frameworks, but at the same time, it is clear that people would be making bad decisions. I only say that because I'm one of them.

When I think of the micro frameworks I've written throughout the past few years, it is clear that I've had to experiment quite a bit. WebOb was a helpful library for sure, but the API you build translating the request to a WebOb Request means breaking WSGI at some level. That means that you've lost the advantages of WSGI as an API for your application. In my mind, it makes me wonder why then the app was written with WSGI in the first place as there is a solid and proven API already built with something like CherryPy.

I doubt I'll rewrite my whole site anytime soon, but if I do, the application framework will most definitely revolve around the framework rather than WSGI. The advantages that I believed were present ended up being much less than I thought. Having a tool like CherryPy manages to take care of the generic aspects enough while letting me use more opinionated aspects such as templating or databases. You could most certainly substitute your framework of choice, but for me CherryPy is making more and more sense.

Filed under  //   emacs   mercurial   python  

App Server Performance Thoughts

It is always interesting to see that CherryPy is included in Python web server benchmarks. I think it is a testament to the code base being considered a standard option as well as signifying that it is a reasonably fast base to consider other options. Often times it is not the fastest option, but at the same time, rarely is the most time spent simply responding to the request. Databases and application logic traditionally takes much longer than serving the response.

I'm not trying to argue that performance isn't important for a web server of course. CherryPy uses a threaded model, which has its issues in certain situations. For example, handling many clients for long periods of time is often difficult for a server like CherryPy. Notice that I just said "like" CherryPy. Threaded servers no matter the language or implementation often have similar characteristics. This is why you have things like prefork/mpm with Apache for example.

The other thing to consider regarding web application performance is the state. No matter what you do, there is going to have to be some concept of state that will be a bottleneck. There is as subtle abstraction I'm making here here that is meant to generalize the essence of web applications that differs from the concept of state within HTTP. HTTP is a stateless protocol, but web applications almost always have a state in some shape or form.

In this case I'm defining a "state" as something that must be read before handling the request. Anything from checking the authentication, reading a file or querying a database all involves some concept of state at some level. If the connection to the DB is open, then request this query, otherwise, make a new connection. If the file exists, read it. If the user exists, let the next function or object handle the rest of the request process. In all these cases there is some element of state that must be considered before handling the eventual response to the client.

Going back to peformance, the questions asked of the state traditionally are going to be what really hinders performance. Database are the traditional bottleneck, but it is definitely not limited to this. Sessions are a great example where state needs to be maintained. If you have many servers running, how is that session state managed? Authentication is another area that is traditionally not associated with a single server. I mention this because while it is totally appropriate to consider how fast a web server handles responses, it is also just as important to consider how fast a session service or directory service handles their respective tasks. Likewise, there is the question of whether the server is responsible for handling some of these more global requirements. Apache and CherryPy can both handle sessions where a standalone WSGI server like Paste's HTTP Server relegates that to the application. Comparing a generic WSGI server to CherryPy may not really be as similar as one might think. Comparing a WSGI server with an app that uses Beaker, Static, URLMap, Routes and WebOb might get you closer to an actual apples to apples comparison.

One nice thing about CherryPy is that while it comes equipped with a healthy set of features, it is often relatively simple to use an external tool. For example, you can use sessions out of the box, or implement your own distributed session system. When considering performance for something needing support for a massive amount of clients, it might simply mean starting more servers and increasing the threadpool of the servers while using an external service for sessions. An asynchronous server might be better equipped to handle more clients initially, but the bottleneck of session state will still most likely need to be handled at which point the faster server might not have a trivial way of allowing a different session tool. Or it might have a great way of using other session tools! The point being there is more to performance than simply handling requests.

I'm not suggesting that you should use CherryPy for everything. What I am saying is that when considering performance the measurement is partly going to be specific to the application because of its dependence on some idea of state. CherryPy is a great server that is well tested and very stable. It may not be the fastest, but in terms of writing an application that uses something like a session service or other remote state tracking services, it can be very effective. Its concept of an engine bus is very powerful for integrating these kinds of services and connections. In other words, just as a framework makes writing application code easier, CherryPy's facilities help to create a more robust server environment relatively easily. These features can make scaling easier even though they most likely impact raw response performance. That may or may not be an effective trade off.

I should point out that I'm biased as we happily use CherryPy at work and I've used it personally for quite a while. That said, my goal is not to promote CherryPy, but to show where it optimizes the web application building process. Sometimes its facilities are going to be extremely helpful. Other times, not so much. The nice thing is that there are plenty of great options that facilitate many different styles of applications to meet different requirements. It is also important that as web developers consider performance it is done with an eye towards real measurements and an understanding of state. There is nothing new here of course, but it can't hurt to provide a slightly more specific argument as to why web server benchmarks may not be as telling as you might think.

Filed under  //   programming   python  

URL Matching and CherryPy

I don't know what it is about URLs, but I always manage to get somewhat hung up on specific patterns. For example, in Rails, the traditional :controller/:method/:id pattern just gets on my nerves. My (rather silly) issue is that resource is not being described separately from its actions. In my mind, the RESTful way to design the URL is to provide the resource URL and utilize the protocol methods to perform actions. In the case you need a special kind of action that may not easily be supported, then take analyze the idempotency of the requirement and mint yourself a new URL. It all is pretty simple.

Unfortunately, for whatever reason, the default dispatchers that come with CherryPy seem to never really allow me to easily design the URLs like this. I've tried the routes dispatcher and every time I end up getting hung up because of the rather greedy matching. The method dispatcher can be really helpful at times, but lacks the robustness to dynamically call a specific controller or function. I always end up heading back to Sylvains Selector dispatcher for CherryPy. It provides a simple way of attaching a function to a URL pattern and HTTP method. Here is an example:



controller = SomeController()
d = SelectorDispatcher()

d.add('/api/:model/:id', { 
    'GET': controller.read, 
    'PUT': controller.update, 
    'DELETE': contoller.delete 
})


While this may seem pretty verbose, it is easy enough to have controllers return their own dict for mapping the HTTP methods to the controller methods. Likewise, you can just have a controller that uses method names such as "PUT".The Selector docs have more information as well on doing partial matches and declaring optional aspects of the URL.This patterns has made sense to me for a while now and it is definitely my preferred mapping library.

UPDATE:

One thing that might be confusing here is that I'm not bashing CherryPy, Rails, Routes or any other routing tools. I'm merely pointing out how I personally like to think when coding a class that is meant to be an API. There are plenty of ways to design a web app framework to organize code. Personally, I like to have the URL router handle sending things directly to a function and Selector fits my own way of thinking. There are probably countless arguments for organizing handlers as classes, function, or a combination thereof dealing with HTTP methods or completely ignoring them. Likewise, I like a URL routing tool that lets someone take a look and get a good idea what the API is. Other people prefer to use tests with others usign comments. To each his/her own.

Filed under  //   programming   python  

Transplanting with Mercurial

At work we use Mercurial. I don't know that we will keep using it as we are a rather global company and some of the other teams don't have the time adopt a new VCS that is much more complicated than existing systems. Despite mercurial's mixed reviews among the team, I'm becoming more of a fan. I can't say I'm really a fan of mercurial per se, but it is becoming clear how a DVCS is beneficial in a more intimate way. There are the traditional arguments surrounding things like "commit on a plane" and "branching made easy" but I don't think people totally see the impact until they really have to work with a tool like mercurial for an extended period of time. It doesn't mean it's easy by any means, but after a while there are definite advantages.

One of the benefits of a DVCS is the ability to take a set of changes and place them in another branch. This is not as simple as it sounds. There are a suite of things to consider and even more potential data to keep track of. Where did the patch come from? Can you revert the changes to a different version that existed before the current version was added? If it is a set of patches or changesets, do you get to revert specific changes or is it an all or nothing kind of operation? Is there now a permanant link between the two branches/tags/heads after copying over the changes? How would that even work?!

When you start limiting things a bit, the idea becomes manageable. Mercurial has a plugin called transplant that makes some decisions. You don't necessarily get massive amounts of information which makes it relatively simple to move changesets around without much hassle. It also moves the changeset around as an atomic entity, which means that after you've transplanted, you don't need to commit or add a message saying you transplanted things. All in all, it is pretty easy once you get the hang of it.

To do a transplant first you need a repo. We are going to do everything in place, which means we are not going to clone to another directory, creating an implicit new branch.



elarson $ echo 'print "hello world!"' > hello.py
elarson $ hg add hello.py 
elarson $ hg branch 1.0
marked working directory as branch 1.0
elarson $ echo 'print "goodbye world!"' >> hello.py
elarson $ hg st
A hello.py
elarson $ hg ci -m 'ended'
elarson $ hg id
020db5c02665 (1.0) tip
elarson $ hg branch 2.0
marked working directory as branch 2.0
elarson $ echo 'print "wait... ah nvmd"' >> hello.py 
elarson $ hg ci -m 'nvmd'
elarson $ hg up 1.0
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
elarson $ echo 'print "talk to you again later"' >> hello.py
elarson $ hg st
M hello.py
elarson $ hg ci -m 'tty'
created new head
elarson $ hg id
65a5be09f306 (1.0) tip
elarson $ hg heads
changeset:   2:65a5be09f306
branch:      1.0
tag:         tip
parent:      0:020db5c02665
user:        Eric Larson <eric@ionrock.org>
date:        Mon Feb 09 21:10:41 2009 -0600
summary:     tty

changeset:   1:93127fc79160
branch:      2.0
user:        Eric Larson <eric@ionrock.org>
date:        Mon Feb 09 21:09:38 2009 -0600
summary:     nvmd

elarson $ hg up 2.0
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
elarson $ hg transplant -b 1.0 2
applying 65a5be09f306
65a5be09f306 transplanted to 9cbf35e4f623
elarson $


In the example we made two branches and both had some work in them. In the real world, this is as if you are working on a new release (2.0) and you fixed a bug in a previous release (1.0) that you need to forward port to your new release branch. If there were a series of commits you'd just do something like

hg transplant -b 1.0 3:5 7
. That would transplant the changesets 3 through 5 and also changeset 7. If there are conflicts you will get to merge as usual. For example, if you use Emacs (really, what else would you use?) ediff should come up with the merge interface and you can move along.

Also, I should mention there is something to be said for being able to work in the same directory all the time. As a Python developer, I use virtualenv, but for day to development, it can be much easier to just keep your system some what bleeding edge and only use virtualenv's for specific projects or sandboxes. It is nice to have your server running, hg up to some branch to test and see your server restart and be ready to go. It is a small issue, but once you get used to it, it is pretty convenient.

If you are using mercurial I hope you spend some time trying to learn the more detailed aspects of it. The concept of heads, while trying, is pretty helpful at times. There are also a host of plugins that can be helpful. For example, Mercurial Queues is one that consistently comes up when comparing Git and rebasing. I've found queues to be extremely confusing, but transplanting has worked for me. There are also other plugins like Local Branch that seem pretty nice. A DVCS raises the complexity bar in terms of possible work flows, and there is a pretty good chance that whatever DVCS you choose, there should be a way to make it work. For me, transplant works.

Filed under  //   mercurial   python  

A Contextmanager Based Connection Pool

At work we have a whole variety of services we use that utilize some concept of a connection. Some of the services are RESTful, while others utilize Pyro and straight sockets. Seeing as these are so common, our makeshift web framework has a simple pool implementation that allows you to reuse the connections in a threadsafe way. One of these services is called Faststore (and yes this name is pretty bad). Faststore is a storage tool that aims at making writes extremely fast. It uses a bsddb underneath and handles a massive amount of data for us already. We've also written a CouchDB like app on top of Faststore called Ottoman (which I thought was a very clever name). In both cases, there is a pool that can be made available via CherryPy tools that allows you to use and reuse connections to these data storage services.

Seeing as our goal is to eventually make these apps open source, I've started playing around with them in my free time to see what it would take to remove a few of the more coupled aspects, which lead me to needing a pool implementation. I could have simply taken the pool from our framework, but it seemed like a good opportunity to learn something new and I read this article  from Jesse Noller on context managers that seemed applicable. The result is a simple connection pool using context managers. I call it "poodle" because it kind of sounds like "pool".



from __future__ import with_statement

import thread, threading
import contextlib
import random
import time


class Pool(object):
    def __init__(self, factory, args=None, kwargs=None, cleanup=None, min=5):
        self.pool = []
        self.swimmers = {}
        self.args = args or tuple()
        self.kwargs = kwargs or {}
        self.factory = factory
        self.cleanup = cleanup
        self._lock = threading.Semaphore()
        for i in xrange(min):
            self._ci()
            self.pool.append(self._create())

    def _create(self):
        return self.factory(*self.args, **self.kwargs)

    def _get(self):
            return self.swimmers[id]

    @contextlib.contextmanager
    def get(self):
        with self._lock:
            id = thread.get_ident()
            if id not in self.swimmers:
                if self.pool:
                    self.swimmers[id] = self.pool.pop()
                else:
                    self.swimmers[id] = self._create()
            yield self.swimmers[id]
        with self._lock:
            id = thread.get_ident()
            if self.swimmers.get(id):
                if self.cleanup:
                    self.cleanup(self.swimmers[id])
                else:
                    self.pool.append(self.swimmers[id])
                    del self.swimmers[id]


class MockThread(threading.Thread):

    def __init__(self, pool, name, indent=0):
        threading.Thread.__init__(self)
        self.pool = pool
        self.name = name
        self.indent = '\t'.join(['|' for i in xrange(0, indent)])

    def m(self, *s):
        print '%s %s' % (str(self.indent), ''.join(map(str, s)))

    def run(self):
        with self.pool.get() as conn:
            waiting = random.randint(1, 3)
            self.m('got connection')
            self.m('using connection in ', self.name)
            time.sleep(waiting)
            conn(self.indent, ' hello world')
        return


class MockConn(object):
    def __init__(self, name):
        self.name = name
    
    def __call__(self, *args):
        print ''.join(args)


if __name__ == '__main__':
    tp = Pool(MockConn, ['eric'], min=0)
    workers = [MockThread(tp, x, x) for x in range(0, 10)]
    for i, w in enumerate(workers):
        w.start()

The win here is that by using context managers, I've eliminated the need to check to see if threads are using a connection. This is traditionally necessary because an exception somewhere can be missed, which in turn never releases the lock on the connection. The context managers should automatically release the lock no matter what, which simplifies the code quite a bit. That said, I have no idea if there are blocking issues I'm missing or anything so feel free to comment and set me straight!

Filed under  //   python  

Does Javascript Need Jack?

Kevnin Dangoor mentioned Jack, a Rack implementation in Javascript. While I'm a fan of Rack, I don't get Jack. Why you ask? Well, I'll tell you!

The reason Rack works as it does is because Ruby doesn't support functions as first class citizens. Yes, I realize that many Rubyist fancy their language as an entry to functional programming (I'm assuming b/c of blocks), but Ruby is far from functional. Python, on the other hand, while also far from functional, does allow passing functions as arguments. The WSGI specification, where Rack draws its inspiration, in fact works via passing callables (a duck type for functions) along with an environment. This is very similar to XSLT in how the "." represents the current context, the environment represents the current state of the request. Secondly, WSGI allows two places to adjust the request response. Again, this is very functional. You either adjust the input (the environment) or the output. Basically, through passing functions, you nest them together using the WSGI spec and when the last one calls the start_response function, the output bubbles back through the function hierarchy.

Based on how WSGI works then, it makes no sense whatsoever to implement a WSGI copy in a similar fashion to Rack. Javascript already supports the necessary tools to make a rather pure WSGI implementation (lets call it JSGI) using the same patterns. One distinction the Jack docs make is that Jack is meant for server side Javascript. Again, I don't think this is a problem whatsoever.

I don't want to suggest that Jack is a bad idea, but it seems like basing it on Rack lacks foresight into the actual use of the design. WSGI has proven itself in a very powerful way. There are a multitude of applications, libraries and middleware that all make it clear WSGI is successful technology. Rack on the other hand does not have the same track record. Including Rack in Rails 3 doesn't really count either as Rails 3 is not released and there are no examples of sites or services using it. It also seems like a shame that the functional features of Javascript are not being utilized. The sucess of jQuery, I believe, is largely a function of its functional approach and respect for the language. It seems that a similar functional approach should be taken implementing something like WSGI in Javascript.

Filed under  //   javascript   python   ruby  

Needing More Features

Seeing as I actually enjoy programming, sometimes it is nice to make small projects for myself. I have another project I'm working on for a friend that has been taking up most of my free programming time, so my blog makes for arena of experimentation. Recently, I've been interested in using CouchDB. I'm partly interested in because it is not a traditional database. While I can put SQL on my resume, I'm far from an expert and lacking a desire to become one. Unless I find a good backend storage system, I'm always going to be a slave to either the filesystem or my own concoctions.

One option is to move the backend to CouchDB, but honestly, SQLite has been treating me very well, so I don't mind sticking with it for the time being. Moving my friends project is also an option, but my job is to make a working project, not experiment. One area that always seems to be lacking on my blog is the lack of Web 2.0-isms. I don't have any widgets, tagging, or more generally, social features. At this point, my interest is not to participate, but to contribute. My writing has become more important to me this past year and without readership, I'm losing out on important feedback from readers. Making posting articles on popular sites might help in boosting readership and generally get a little more attention. Tagging, on the other hand, just seems semantically cool over the long run.

So, my next post should hopefully include some nifty "Post to Social Website X" widgets as well as a simple tags list. Hopefully it will help improve my readership.

Filed under  //   couchdb   python