elarson’s posterous

 
Filed under

programming

 

Parallelism Relevance

The other day I got rather caught up in the whole * is Unix discussion and wanted to get a better understanding of some of the more low level details associated with writing servers. Eventually, I had a crude version of CherryPy that forked instead of used threads. The benchmark in CherryPy seemed to suggest it was a great deal faster, but honestly I think that was a fluke, especially considering I'm pretty sure there are some serious deficiencies in terms of process management. When Bob posted his forking example on the CherryPy list and mentioned signals, it was clear that I was missing some pretty important understanding.

One thing I thought seemed helpful was that it seemed to use both processors on my system. My understanding is that calling fork will create two separate processes, which effectively should place at least some of the forked processes on different processors. My understanding is that this is how the multiprocessing module works (on Linux at least). With this new cursory knowledge under my belt it was interesting to see an article in my ACM Communications magazine remark on the future of computing and the need to handle multiple processors in a similar manner as sequential paradigms. I should note that this is one of the very few times there was anything remotely interesting to me in my ACM magazine, so I felt compelled to check it out. The idea is not new to me and while the arguments that this is a huge issue is pretty valid, I also started thinking that maybe it is not as big a deal in practice.

The reason for this is the web. Back in the early days of computing people had dumb terminals and logged into a server where all the actual "computing" was performed. While thick clients have been the rage for quite a while, the web is effectively becoming the mainframe in the sky in terms of where people are doing their computing. This should be very helpful in making the jump to parallel computing because there s a definite history and set of tools that have been developed to scale that allow parallel processors. In other words, who really cares if your computer has 2 quad core processors when your searching email online, browsing facebook, checking your bank statements and reading the news through your web browser.

On the web, what really matters is that the sites doing the processing have optimized their architectures to handle the load. The other side of the equation is the browser, but in reality, this is an area that is already becoming more robust. Google Chrome is a good example with its separate processes for each tab, but I would even argue that Javascript is well suited to handle distributed tasks. We've seen a ton of articles on using async servers. Javascript is already using an async model that is only getting faster with the recent developments in new Javascript engines. None of this means that parallel computing and utilization of more processors is not very important. I just don't think it is quite as critical as some might think. That doesn't mean programmers shouldn't try to understand it. After all, it is a hard problems and programmers typically like hard problems. My whole point though, is that the problem might be more of a fad than an actual crux for the IT industry.

Filed under  //   programming   python  

Just Checking In

Today I had the desire to write something down, but really didn't have a concise idea of what to write about. So this post is just going to be a small summary of some thoughts and experiences.

Free Software and Open Source

I recently read the RMS opinion on Codeplex and Miguel's response. After a quick glance over at planet gnome I noticed a few people taking sides and it occurred to me that the whole argument is rather silly. When I was in college the concept of free software made a ton of sense. Looking back it was because I didn't have any money, so generally anything free made a ton of sense. Now that I'm a full fledged tax paying adult, the glamour of free software has lost its glitz. It is not that free software has become unimportant or useless. What has happened, in my mind at least, is the arguments associated with free software have become rather stale. By stale, I simply mean it isn't anything exciting for me personally. I think free software is critical, but I have better things to do than care about it in its own right. I'm probably just getting old, but it was an interesting realization for me nonetheless.

Test Driven Development

At work I've been trying to improve my tests. By "improve" I really mean write them in earnest. It is a really difficult thing to write code using TDD. It is a similar approach to modeling in that it forces you to consider an abstract idea of what some code should do and look like. TDD is sort of like UML in the age of Ruby on Rails, which is kind of funny as the recent web frameworks and NoSQL all suggest rapid prototyping over planning before coding. While both UML and TDD are doing pretty much the same in terms of hashing out code, the obvious benefit of TDD is that you get something that can be used in the future. At the same time, a well tested code base is not that important if the tests are bad and are hard to run. Testing in web browsers is the most obvious case in point. The larger point then is obviously that planning, whether through tests, visio or some hodge podge of tools, is helpful for writing better code. It might also be argued that it is faster since the design is fleshed out to some extent, but I would ask if the time spent planning is included in that calculation and if it is a real calculation at all. Programmers have a nasty habit of estimating because of the constant requirement to create hypotheses in debugging. My bet is that many of the virtues of TDD (like UML as well) are overblown and the only real benefit is forcing a developer to focus on what the problem is. One of my issues is that it creates a whole new class of code that deals with testing. This is totally fine, but where are tests for the tests! It seems like a story that we'll probably never see the end of.

Text Browsing

I'm going to suggest that if you're a programmer, it would treat you well to try out a good text web browser. My recommendation is w3m due to its Emacs integration, but anything that can keep you in your work environment works. My guess is vimmers would get similar usage from links/lynx assuming the terminal is their environment. The reason being is that if you are constantly editing text and reading it in your dev environment, browsing the web textually can be a helpful tool to keep focus. For me, I get the same keybindings, easy copy and pasting, and simpler window/frame/buffer management. Beyond this though, it feels faster when it comes to reading documentation and finding helpful code. Your milage may vary, but it sure couldn't hurt to try.

Administering Systems

At work we recently rolled out a new system. It isn't actually new, but is in fact the latest step in an improvement to a current system. What always strikes me about the smart folks I work with is how gracefully they walk the line between system administrator and programmer. The two fields are completely intertwined, but the best programmers are those that have the better understanding of both sides. This is probably partly why I'm not that great of a programmer! For whatever reason, my mind doesn't ever seem to really indulge in the system administration side of things. It is always a challenge for me to make pre-existing software work the way I think it should. That doesn't mean I'm not trying of course! But it does mean that I have a ton to learn and will for the foreseeable future.

ACL Wrap Up

This past weekend was ACL in Austin and it was a blast. We saw Them Crooked Vultures, The Walkmen, School of Seven Bells, Broken Social Scene and some guys from Phoenix DJ. We also played a show with The Riverboat Gamblers and The Soldier Thread. It was a ton of fun. On the road we don't really get to hang out that much. There is usually somewhere to drive, something to load or unload or something to sell that usually keeps us busy. It was great to come home, rest and then have a great weekend of music and friends. We didn't go to ACL proper and I'm glad. There were plenty of bands I would have liked to catch but the weather was horrid and my guess is I would have been pretty miserable in the mud. Hopefully next year there will be some nicer weather. Who knows, maybe we'll even get to play!

Filed under  //   emacs   music   programming  

Finally Using Emacs for Web Browsing and Email

It has taken a long time, but I've finally managed to get email working relatively well in Emacs! In the end Wanderlust was the client of choice for me. It has a pretty simple file format for organizing mail and the keybindings have already begun to be rather natural. The reasoning behind my quest for merging my email and emacs had more todo with my recent tour than an actual desire or need for checking mail in my text editor. I was perfectly happy with Gmail and in fact I'm technically still using it through IMAP. The real reason was that I wanted to try and reduce my bandwidth usage. On tour I had a 5 Gig cap on my network usage. Having never really monitored this sort of thing before, I had no idea how much bandwidth things like IRC might use when you're talking about every day, all day usage. To make things worse, I forgot to grab the USB cable that would let me check my current usage for the month, so I was essentially flying blind!

My fear of an enormous bill made me consider what I could do reduce my bandwidth. The obvious avoidance of listening to music and watching videos was easy. Also, reading local versions of docs was another helpful tactic. This lead me to see a few things about w3m, which I had tried to use before in Emacs. It dawned on me that things like code examples viewed in a text based web browser makes nothing but sense in my text editor. It makes things like copying and pasting code samples a breeze. After getting used to w3m in Emacs it became clear that I should really try to finally tackle email. As I said, Gmail has been just fine, so the motivation wasn't very strong to move. The killer feature in my mind was tying things into my todo list and avoiding Gmail's constant stream of Ajax requests. I'm sure the bandwidth savings are practically nothing, but nonetheless, it seemed like if I got things working it might really be helpful in ways similar to w3m.

Once things were configured and I managed to get everything working, I did notice some helpful bits. Silly annoyances that really should never have been a problem smoothed out. I'm talking about basic copy and paste issues I noticed here and there when working with Emacs and the rest of my desktop. Little things like working with Trac was also a good deal faster, although it was something of a let down that I couldn't submit updates to tickets. Finally, it seems more reasonable to keep up with mailing lists as most of the necessary content (code) is close at hand. This last aspect seems to be the biggest draw of using Emacs. I primarily write code and keeping more tools inside Emacs makes it faster to transition between those tools and the code I'm working on. Of course it does seem kind of cool in a geeky way to work almost solely in text, but the main nicety is actually being more productive. I'm hoping in the next couple weeks things can continue to become a little more streamlined. The killer app that I would like to find a good Emacs replacement for is my feed reader. Google Reader is great but something more bare might help to move through things faster and generally filter out the cruft. That said, blogs were one of the first things I cut out in my quest to lower bandwidth. My desire to check what is new in the blogosphere has greatly diminished since taking a break. I'd like to potentially keep it that way for a while.

Filed under  //   emacs   programming  

Some Org-Mode Workflow

I've got to say that org-mode has been treating me rather well. To contrast this, I also started playing with Evernote on my iphone. In theory, Evernote seems really helpful for working with todo lists. It has been reasonably nice for things like going to the store to buy a list of things. One feature that seemed really helpful was the ability to take a picture of something as a note. I took a picture of a pedal board I was making for Lauren that had some measuremeants on it. Unfortunately, for some odd reason, it wasn't saved. I think this happened when I tried to write a note that went along with the photo. It wasn't a huge deal and I'm sure user error was part of the equation, but I really hope that the photo feature is able to let jot down some notes in addition to simply having a picture.

Going back to org-mode, I found this tutorial. I had read it plenty of times before but it didn't really click what was happening. I was always frustrated that the agenda side of org-mode required configuration. That was probably more a function of my own lack of understanding tweaking my .emacs file than a real negative. Now that I'm more familiar with Emacs Lisp, doing small configuration details in my .emacs doesn't seem like nearly as big a deal. After all, that is what it is there for!

This morning I went ahead and create a few different org files to better manage my todo list items. The next step will be to migrate my current todo file to the different files and start throwing in a few deadlines (C-c C-d). Hopefully once that is working, the agenda side of org can be of more use to me. The end of that tutorial mentions a time log of sorts as well, which seems pretty helpful. The other idea is to use the org clock in/out feature which I've used in the past.

At one of my jobs I had to do a lot of service work that was billable which meant keeping track of time. Org-mode was very helpful there in the end. I did write a small command line utility as well, but maintaining it was just silly. We had an idea for a script we called brain that acted like a catchall for generic tools to help productivity. It was kind of modeled after Paste's paster command. We used IronPython for it and it ended up being somewhat helpful. If I were to do something like that I think I would use Paver. In fact, this gives me an idea!

It will be trivial to write a simple paver command that lets me sync a rendered version of my todo list to my server. I can also use emacs on the server to re-gen the HTML view after editing it. It should be really simple to have an edit window with the text and then regen on the server. The on my home machine, I'll effectively pull down any changes and be on my way. At the very least, pushing via paver will be trivial.

What is nice is that thanks to my recent todo list, I really have been more productive. There have been a million things to do before tour and it hasn't been that bad getting them all done along side taking care of my work. While it is pretty mundane, I'm still really excited.

Filed under  //   emacs   programming  

App Server Performance Thoughts

It is always interesting to see that CherryPy is included in Python web server benchmarks. I think it is a testament to the code base being considered a standard option as well as signifying that it is a reasonably fast base to consider other options. Often times it is not the fastest option, but at the same time, rarely is the most time spent simply responding to the request. Databases and application logic traditionally takes much longer than serving the response.

I'm not trying to argue that performance isn't important for a web server of course. CherryPy uses a threaded model, which has its issues in certain situations. For example, handling many clients for long periods of time is often difficult for a server like CherryPy. Notice that I just said "like" CherryPy. Threaded servers no matter the language or implementation often have similar characteristics. This is why you have things like prefork/mpm with Apache for example.

The other thing to consider regarding web application performance is the state. No matter what you do, there is going to have to be some concept of state that will be a bottleneck. There is as subtle abstraction I'm making here here that is meant to generalize the essence of web applications that differs from the concept of state within HTTP. HTTP is a stateless protocol, but web applications almost always have a state in some shape or form.

In this case I'm defining a "state" as something that must be read before handling the request. Anything from checking the authentication, reading a file or querying a database all involves some concept of state at some level. If the connection to the DB is open, then request this query, otherwise, make a new connection. If the file exists, read it. If the user exists, let the next function or object handle the rest of the request process. In all these cases there is some element of state that must be considered before handling the eventual response to the client.

Going back to peformance, the questions asked of the state traditionally are going to be what really hinders performance. Database are the traditional bottleneck, but it is definitely not limited to this. Sessions are a great example where state needs to be maintained. If you have many servers running, how is that session state managed? Authentication is another area that is traditionally not associated with a single server. I mention this because while it is totally appropriate to consider how fast a web server handles responses, it is also just as important to consider how fast a session service or directory service handles their respective tasks. Likewise, there is the question of whether the server is responsible for handling some of these more global requirements. Apache and CherryPy can both handle sessions where a standalone WSGI server like Paste's HTTP Server relegates that to the application. Comparing a generic WSGI server to CherryPy may not really be as similar as one might think. Comparing a WSGI server with an app that uses Beaker, Static, URLMap, Routes and WebOb might get you closer to an actual apples to apples comparison.

One nice thing about CherryPy is that while it comes equipped with a healthy set of features, it is often relatively simple to use an external tool. For example, you can use sessions out of the box, or implement your own distributed session system. When considering performance for something needing support for a massive amount of clients, it might simply mean starting more servers and increasing the threadpool of the servers while using an external service for sessions. An asynchronous server might be better equipped to handle more clients initially, but the bottleneck of session state will still most likely need to be handled at which point the faster server might not have a trivial way of allowing a different session tool. Or it might have a great way of using other session tools! The point being there is more to performance than simply handling requests.

I'm not suggesting that you should use CherryPy for everything. What I am saying is that when considering performance the measurement is partly going to be specific to the application because of its dependence on some idea of state. CherryPy is a great server that is well tested and very stable. It may not be the fastest, but in terms of writing an application that uses something like a session service or other remote state tracking services, it can be very effective. Its concept of an engine bus is very powerful for integrating these kinds of services and connections. In other words, just as a framework makes writing application code easier, CherryPy's facilities help to create a more robust server environment relatively easily. These features can make scaling easier even though they most likely impact raw response performance. That may or may not be an effective trade off.

I should point out that I'm biased as we happily use CherryPy at work and I've used it personally for quite a while. That said, my goal is not to promote CherryPy, but to show where it optimizes the web application building process. Sometimes its facilities are going to be extremely helpful. Other times, not so much. The nice thing is that there are plenty of great options that facilitate many different styles of applications to meet different requirements. It is also important that as web developers consider performance it is done with an eye towards real measurements and an understanding of state. There is nothing new here of course, but it can't hurt to provide a slightly more specific argument as to why web server benchmarks may not be as telling as you might think.

Filed under  //   programming   python  

Front Loaded Mercurial

I'm going to have to go back and see how I can avoid laying a big fat patch bomb on a repo and I'm not happy about it. There is no one to blame but myself. That doesn't make it any nicer. My big issue is that for all the cool features of Mercurial there is a consistent front loading requirement. You cannot simply work and then later construct your commits that you'll be pushing. MQ does help with this sort of thing and I'm going to have to find out just how much tomorrow, but it would have been really nice if I could have started coding and whan I finished have a convenient way to go through all the files and commit them in reasonable chunks.

The astute reader will recognize that this issue really just a sign of bad DVCS habits and I'm not about to argue otherwise. Still, I'm very much a part of the "not a great coder" club, and as such, seem like a good candidate for how to help out the normal developers using these powerful tools. One might also suggest that I open a ticket, or even better, contribute a patch. Again, my "not a great coder" club membership explicitly states that any gripes need to stay far away from those folks getting a lot of work done (a la the mercurial devs), hence I'm totally fine leaving my whining here on my blog. My bet is bringing it up here will do more to improve my own habits than suggesting to others they are real problems.

Next time I'm really going to do a better job manaing my patches. Feel free to hold my feet to the fire in the future seeing how I've done.

Filed under  //   mercurial   programming  

URL Matching and CherryPy

I don't know what it is about URLs, but I always manage to get somewhat hung up on specific patterns. For example, in Rails, the traditional :controller/:method/:id pattern just gets on my nerves. My (rather silly) issue is that resource is not being described separately from its actions. In my mind, the RESTful way to design the URL is to provide the resource URL and utilize the protocol methods to perform actions. In the case you need a special kind of action that may not easily be supported, then take analyze the idempotency of the requirement and mint yourself a new URL. It all is pretty simple.

Unfortunately, for whatever reason, the default dispatchers that come with CherryPy seem to never really allow me to easily design the URLs like this. I've tried the routes dispatcher and every time I end up getting hung up because of the rather greedy matching. The method dispatcher can be really helpful at times, but lacks the robustness to dynamically call a specific controller or function. I always end up heading back to Sylvains Selector dispatcher for CherryPy. It provides a simple way of attaching a function to a URL pattern and HTTP method. Here is an example:



controller = SomeController()
d = SelectorDispatcher()

d.add('/api/:model/:id', { 
    'GET': controller.read, 
    'PUT': controller.update, 
    'DELETE': contoller.delete 
})


While this may seem pretty verbose, it is easy enough to have controllers return their own dict for mapping the HTTP methods to the controller methods. Likewise, you can just have a controller that uses method names such as "PUT".The Selector docs have more information as well on doing partial matches and declaring optional aspects of the URL.This patterns has made sense to me for a while now and it is definitely my preferred mapping library.

UPDATE:

One thing that might be confusing here is that I'm not bashing CherryPy, Rails, Routes or any other routing tools. I'm merely pointing out how I personally like to think when coding a class that is meant to be an API. There are plenty of ways to design a web app framework to organize code. Personally, I like to have the URL router handle sending things directly to a function and Selector fits my own way of thinking. There are probably countless arguments for organizing handlers as classes, function, or a combination thereof dealing with HTTP methods or completely ignoring them. Likewise, I like a URL routing tool that lets someone take a look and get a good idea what the API is. Other people prefer to use tests with others usign comments. To each his/her own.

Filed under  //   programming   python  

A Kind of Programmer

I got into programming with PHP. It was an easy way to get started and proved to be a valuable tool for a very long time. In school, it was a no brainer to knock out projects with a few bits of PHP and gladly take the A. Although, for whatever reason, I ended up getting more interested in other languages. My interest in programming partly was fueled by an interest in open source communities such as Gnome and Linux. This got me interested in C# and Mono, and to some extent, Python. As I moved forward in learning these other languages, my opinion of PHP fell radically. It started to become ugly and archaic. Eventually, PHP seemed like a dirty and uninteresting tool.

Now that I'm a bit older, I realize that my opinions were immaturity. There is nothing wrong with PHP and there are many great examples of how it is used effectively and elegantly. Yet, even though I recognize this silly reluctance to enjoy a practical language, there is still a part of me that glamorizes more complicated systems. At times I idolize those developers that have to wrangle memory into place and construct interesting data structures to improve I/O and caching. In my mind these are the people that create the basic environment that all other programmers must work under, which must mean they are an extremely intelligent and elite group of people.

I'm not for a second going to diminish what these excellent programmers do. But, I am going to stop believing it is something out of the ordinary. When I was young, my view of PHP was partly attributed to my desire to be one of these influential hackers who seemed to write the code that other coders would use. People who wrote databases, kernels, and servers seemed like their influence was spread far and wide among people who had to get things done. In a way, my attitude was that writing code for people was not as cool or interesting as writing code that enabled more code. Obviously, this is one of the lamest ideas ever.

While I still have dreams of being associated with the elite group of people who manage to write amazing databases and highly distributed server systems, the reality is I'm just not that kind of programmer. I've been doing a lot of work with Javascript and it has made me realize that it is a problem space that fits me rather well. Up until this point, I don't know that I would have wanted to admit that, but the reality is reality. I wouldn't say I'm a "front end" coder as I would associate that more with having an eye for design. But at this point in my programming career I'm happy to admit I've finally begun to really appreciate writing applications for end users.

I'm sure there are still tons of interesting lower level applications that I'll dream of writing, but for the time being I'm going to stick with thinking about end users. I missed an opportunity when I first started programming to appreciate people using my code because I wanted those people using it to be other programmers. That shouldn't happen again. I'm hoping this kind of attitude will not only help me improve my code, also help improve the lives of those using my code.

Filed under  //   programming  

Free Time Coding

The other day a miracle happened and I had a little time to code on whatever I wanted. Lately I've been so busy with music and work that there has been almost no opportunities to hack on something fun. Honestly, even though work as been tough, I've actually been having a lot of fun with Javascript, so it is not as though coding is a chore. The bigger issue is finding something creative to code.

The last time a little free time came up I took a look at Javascript on the server. I wrote a tiny framework and started writing a simple template language. Eventually I got stuck trying to figure out to compile/eval some Javascript code within the context of a set of variables, which is something I had done in Python. Since the tactic wasn't going to work and it seemed as though time was no on my side, I figured I'd move along.

This most recent bit of free time got me thinking about Erlang. Recently _why disappeared and as a result Hacker News became a posting board for theories and eulogies. To combat this theme a bunch of users started posting Erlang articles. It got me interested so I started taking a look at the Erlang web frameworks and tools out there. It didn't take long for me to lose interest in Erlang. It seems like a great language in terms having functional qualities, but past that it doesn't seem like that much fun. I'm sure I'll take a stab some other time at really digging in, but for now, I'm ok letting Erlang gestate a bit more in my head.

So, the big question is what's next? I'm coming to the conclusion that my desire to start something new is rapidly falling by the wayside. I'm not in the mood to learn an entirely new framework or language at this point. This attitude could change, but at the moment it seems like it would be better to really dig in on an idea or existing project. The hardest thing about figuring out what that project or idea might be is simply an itch that needs to be scratched.

I know Bob has quite a few projects that could use help. Dejavu seems incredibly interesting as an ORM, but I'm really not much of a database guy. There is always making an effort to tackle a ticket on CherryPy, but again, that code is darn solid with quite a bit of history, so bugs seem like they would need a good deal of knowledge. This is totally fine, but it doesn't necessarily fit my free time coding ideal of doing something that lets me code quickly. That said, I also don't want to be lazy either so biting the bullet and working with something like Dejavu or CherryPy would be worthwhile.

Another area that is of interest to me is creating some tools for Ume. We are terribly unorganized when it comes to our merch. We're hitting the road soon and having some simple tools to help keep track of how we're doing financially seems like it might be nice. The problem there would be getting users. I know my wife pretty well and suggesting that we use some hacked together web page to input how many shirts we sold is going to be a challenge when you have a mini-rush at the merch table. Also, it is doubtful a well designed spreadsheet wouldn't be more than adequate.

It looks like my best bet is to take another look at Dejavu and see if there might be something I could do. At the very least it would be nice to get back in to the SQL world a bit and possibly brush up on things that I've long forgotten. I can use a simple accounting or merch tracking app as an example and see what I figure out. Making it a CherryPy app instead of WSGI might also help me learn something as well and it might be possible to help folks out on the mailing list a bit more.

Honestly, I can't imagine I'm going to get very far, but having a plan seems like part of the battle. Wish me luck as I'm sure I'll need it!

Filed under  //   programming  

Emacs Regexp Search and Replace

This is mainly just a reminder on how to do a search and replace with emacs using regular expressions.

First step is to find what you want to find using the regexp builder (M-x regexp-builder). I'm going to consider this crucial since emacs regex seems to be a little different from other regex engines (ie Python).

Second, once you have your regex, you define a section to capture for later use. Emacs uses parens as literals in regex, so grouping is done via escaping them:



\(http[|s]\)://\(.*\)


This provides two groups, the "http" and the rest after the "://".

Lastly, you do the actual replace-regex and provide the necessary details. To use something you caputured (put in a group) in your replacement,  you either use "\&" when you have no groups, or "\1" where 1 is the group index (like in other regex engines). Here is some code I used it on:



class SomeTest(SeleniumTest):
    def __init__(self):
        self.t.open('/path/to/test', '')
        self.t.pause('500', '')
        # ... other commands
        self.t.pause('1100', '')
        #.... more commands
        self.t.pause('1500, '')


What I wanted to do was turn all those pause calls to "self.pause(${time})". Here is what I did in emacs:



M-x replace-regexp RET self.t.pause('\([[:digit:]]+\)', '') RET self.pause('\1') RET


Not very difficult, but when you don't practice that sort of thing it becomes easy to forget.

Filed under  //   emacs   programming