elarson’s posterous

 
Filed under

programming

 

Issues with TDD

I'm making a concerted effort to do a lot more testing at work. Up until this point, tests have always been an afterthought. Something to provide a sanity check that helps to be sure things don't randomly break. There was obviously a little voice in my head nagging me that I should do a better job testing. The goal is to create a more stable environment where we can better evaluate a release. This is an important goal because it establishes the real point of testing, providing better software.

One observation I've found is that unless you start doing TDD or have some sense of what the tests should look like, it can be very difficult to get tests in order. Some might argue that if the tests are difficult to write, it is indicative of the code needing some refactoring. I honestly couldn't tell you if that is true, but my gut suggests that might very well be the case. On the other side of the spectrum, great tests do not make users happy through more usable applications or libraries. There has got to be a balance.

One thing I'm noticing in our own application is a set of global state that seems to always muck up the works. It can be a pain in the neck to always pass around variables instead of relying on a global. You have to consider if that mutable variable you just gave to one object will be modified and potentially affect another object's use of the variable. One solution is to try to pass variables where ever possible, but at that point we've somewhat missed some of the advantages of objects. With that in mind the globals we do have seem relatively benign as they traditionally are wrapping some storage or persistence piece that doesn't logically make sense to always pass around. This doesn't even begin to answer the questions of handling thread safety.

When I start thinking about all these things it really makes me frustrated. It is definitely a morale killer because improving the code slows momentum to a crawl. You want to refactor things to get something more testable and reliable, yet in doing so, you can easily create regressions by changing the previously working tests. Along the same lines refactoring tests seems like a recipe for disaster as you are changing the one bit of code that stands as a benchmark for functionality and quality. Today was particularly frustrating because it felt like every time I started working on cleaning up some piece of code, it became intertwined in the code I was trying to isolate. At this point it seems clear that there is real need, which is a good thing. Hopefully tomorrow things might be a bit clearer on how to start untangling things in a way actually helps improve the reliability of the code and makes for a better user experience.

Loading mentions Retweet
Filed under  //   programming   python  

Introducing Focusr

One aspect of time management that is critical to success is finding a way to focus on tasks. For many people, myself included, it is a pretty serious battle that takes tons of practice and creative techniques for fooling yourself to stick to the task at hand. One such technique is the Pomodoro Technique. I haven't read the book or would consider myself an expert by any stretch, but the basic idea seems simple enough to run with it despite formal training.

In a nutshell, you give yourself 25 minutes to complete a task and then take a short 5 minute break before moving onto the next task. From what I understand, the book emphasizes using an egg timer that is visible to make the whole process convenient. Seeing as I'm a programmer and there are multitude of ways built into my desktop to get my attention, it seemed like a good opportunity to create a simple tool.

The result is Focusr. This is really simple timer that helps to complete Pomodoro like cycles. You say you want to start a task, it starts the 25 minute timer, lets you know when the times up and does the same for the break. Rinse and repeat. It is super simple and surprisingly effective.

You can grab it from the web or install it with easy_install or pip. It uses libnotify's

notify-send

command to do the actual notification. Also, I created a simple Emacs function so I could start it easily.

 
(defun pomo ( ) 
 "Start a pomodoro task 25 minutes working and 5 off" 
 (interactive) 
 (setq msg (read-string "What do you want to work on? ")) 
 (setq cmd (concat "focusr " msg)) 
 (comint-simple-send (make-comint "pomodoro-task" "bash") cmd)) 

While I'm sure buying the book could be helpful, it seems more helpful to understand what Pomodoro is actually doing. For myself, it presents a attainable period of time focus on a task. I've read over and over again that one key to better productivity is breaking large tasks into smaller tasks. This is easier said than done though. By taking on the day in 25 minute chunks you're forced to consider how you can break up tasks such that you finish a task with in the time limit. In addition to getting better practice breaking up tasks, you also are exercising your estimation skills and getting a better understanding of how much work you can really do.Like I said before, the concepts are really simple with or without formal training.

For myself, I also appreciate the obvious openness of the system. Becoming more productive is partly effectively utilizing systems while always evolving your techniques. As a person you have an innate ability to hack around your own efforts. I think this technique is simple enough that it can be used many different ways to help keep your mind guessing, which in turn helps to truly learn how to get more focus.

Loading mentions Retweet
Filed under  //   productivity   programming   python  

Experiments with Diesel: Repeater

It feels as though web development has begun to focus on other parts of the stack. Up until recently, the framework decisions seem to be the biggest focus, with MVC based patterns reigning in the masses. At this point though, there is a wealth of documentation and options that make trying out the latest and greatest MVC framework slightly passe. To combat this stagnation in hip web technology, the focus has changed slightly to server technology.

Here are few recent articles to make point regarding the state of web development:


While most of these ideas and concepts are not relatively new in the realm of computer science, they are new to me. I represent a rather large audience of web developers who did not necessarily see web develop from socket libraries to Rails. Instead, my experience began with PHP and didn't include any understanding of what actually happens on the web. Fortunately though, my own forays into web development exposed some consistent patterns that helped me understand what was really happening. So, with these new-ish ideas coming around, it seemed like a great learning opportunity to become better acquainted with more the lower level aspects of web development.

While I have a few projects laying around my home directory for each of the ideas mentioned above, this one is specific to Diesel. I'm somewhat partial to for social reasons. When thinking of an idea for something that might very well benefit from being asynchronous (that is not a chat server), I came up with the idea for a database hub.

The idea for Repeater is that you set up a proxy that will "repeat" the requests to a set of services. For example, if you were using CouchDB, you could set up Repeater, which would be a round robin proxy for the different instances and allow updating all the instances with one request. This doesn't work yet! But that is the idea.

What does work is a basic proxy that will balance between some set of services. It works pretty well from the stand point of basic tests not failing, but I have no idea if it would be extremely slow in practice. The thing that got me stuck is how to handle more requests without blocking when requesting the other site. From the code, you can see where I started messing around with threads to make sure I don't block. The gotcha in all this was that I couldn't figure out a way to effectively yield a sleep or join where the join is when the thread has finished making the request. I'm sure there is a way to do it, but my experiments were not very fruitful.

My overall impressions are pretty good. I'm still a bit hazy on the applications where an async system is an order of magnitude better than a threaded or forking (or both) system. The chat example seems obvious because you have a direct relationship between a client connection and a single process that needs to make responses in order. This is similar to a database connection in that it may need to handle a large amount connections, but at the same time, as soon as you start doing things like requests over the web, it seems inevitable that blocking will be an issue. Although, it seems very solvable.

Loading mentions Retweet
Filed under  //   programming   python  

Noticing Small Features

We recently returned from a tour. While on the road, I'm still working. I have a slick bean bag chair, curtains and MiFi router, which makes it possible to get my work done while cruising down the freeway. This is something I've discussed before.

My two recent annoyances with my setup are not having an automatic way of completing an email address and slow updates to the server. So far, the slow updates will probably involve setting up a local IMAP server to sync to and from. It is something that I've started looking into, but haven't spent that much time on it. It ends up being easy enough to just ignore email and save the time. What is rather annoying is having to enter email addresses.

It is not that bad, but it is relatively time consuming. My typing could be much better and it can be easy to make mistakes. Also, there are many times I'm copying the email address from some other resource such as a web page or IRC, in which case, there is a lot of buffer management to deal with. Again, not a huge deal but enough to warrant looking into solutions.

Fortunately, like most things in the Emacs landscape, there was already a good solution in place. The BBDB or the Insidious Big Brother Database. I initially tried using the tips mentioned on emacs-fu but found it didn't work. Fortunately, I found the solution was incredibly convenient and directly in the Wanderlust docs. The result, is now I'm collecting email addresses for completion much like I would get in gmail in addition to having an actual addressbook I might consider using. Pretty nice ROI for googling a bit.

This is realization you need a feature is a pretty common occurance in Linux. The conceptual basis is usually there to create a solution, but often times it takes a bit of work to really get things configured. Emacs is very similar, with the exception being it is almost expected that the configuration might very well be writing a new mode or piece of functionality yourself. It serves as a good reminder that I shouldn't go recommending Emacs to my parents anytime soon. Although it would be pretty cool if I did get a call from my dad asking about setting up a keybinding for some lisp function he wrote.

Loading mentions Retweet
Filed under  //   emacs   programming  

Parallelism Relevance

The other day I got rather caught up in the whole * is Unix discussion and wanted to get a better understanding of some of the more low level details associated with writing servers. Eventually, I had a crude version of CherryPy that forked instead of used threads. The benchmark in CherryPy seemed to suggest it was a great deal faster, but honestly I think that was a fluke, especially considering I'm pretty sure there are some serious deficiencies in terms of process management. When Bob posted his forking example on the CherryPy list and mentioned signals, it was clear that I was missing some pretty important understanding.

One thing I thought seemed helpful was that it seemed to use both processors on my system. My understanding is that calling fork will create two separate processes, which effectively should place at least some of the forked processes on different processors. My understanding is that this is how the multiprocessing module works (on Linux at least). With this new cursory knowledge under my belt it was interesting to see an article in my ACM Communications magazine remark on the future of computing and the need to handle multiple processors in a similar manner as sequential paradigms. I should note that this is one of the very few times there was anything remotely interesting to me in my ACM magazine, so I felt compelled to check it out. The idea is not new to me and while the arguments that this is a huge issue is pretty valid, I also started thinking that maybe it is not as big a deal in practice.

The reason for this is the web. Back in the early days of computing people had dumb terminals and logged into a server where all the actual "computing" was performed. While thick clients have been the rage for quite a while, the web is effectively becoming the mainframe in the sky in terms of where people are doing their computing. This should be very helpful in making the jump to parallel computing because there s a definite history and set of tools that have been developed to scale that allow parallel processors. In other words, who really cares if your computer has 2 quad core processors when your searching email online, browsing facebook, checking your bank statements and reading the news through your web browser.

On the web, what really matters is that the sites doing the processing have optimized their architectures to handle the load. The other side of the equation is the browser, but in reality, this is an area that is already becoming more robust. Google Chrome is a good example with its separate processes for each tab, but I would even argue that Javascript is well suited to handle distributed tasks. We've seen a ton of articles on using async servers. Javascript is already using an async model that is only getting faster with the recent developments in new Javascript engines. None of this means that parallel computing and utilization of more processors is not very important. I just don't think it is quite as critical as some might think. That doesn't mean programmers shouldn't try to understand it. After all, it is a hard problems and programmers typically like hard problems. My whole point though, is that the problem might be more of a fad than an actual crux for the IT industry.

Loading mentions Retweet
Filed under  //   programming   python  

Just Checking In

Today I had the desire to write something down, but really didn't have a concise idea of what to write about. So this post is just going to be a small summary of some thoughts and experiences.

Free Software and Open Source

I recently read the RMS opinion on Codeplex and Miguel's response. After a quick glance over at planet gnome I noticed a few people taking sides and it occurred to me that the whole argument is rather silly. When I was in college the concept of free software made a ton of sense. Looking back it was because I didn't have any money, so generally anything free made a ton of sense. Now that I'm a full fledged tax paying adult, the glamour of free software has lost its glitz. It is not that free software has become unimportant or useless. What has happened, in my mind at least, is the arguments associated with free software have become rather stale. By stale, I simply mean it isn't anything exciting for me personally. I think free software is critical, but I have better things to do than care about it in its own right. I'm probably just getting old, but it was an interesting realization for me nonetheless.

Test Driven Development

At work I've been trying to improve my tests. By "improve" I really mean write them in earnest. It is a really difficult thing to write code using TDD. It is a similar approach to modeling in that it forces you to consider an abstract idea of what some code should do and look like. TDD is sort of like UML in the age of Ruby on Rails, which is kind of funny as the recent web frameworks and NoSQL all suggest rapid prototyping over planning before coding. While both UML and TDD are doing pretty much the same in terms of hashing out code, the obvious benefit of TDD is that you get something that can be used in the future. At the same time, a well tested code base is not that important if the tests are bad and are hard to run. Testing in web browsers is the most obvious case in point. The larger point then is obviously that planning, whether through tests, visio or some hodge podge of tools, is helpful for writing better code. It might also be argued that it is faster since the design is fleshed out to some extent, but I would ask if the time spent planning is included in that calculation and if it is a real calculation at all. Programmers have a nasty habit of estimating because of the constant requirement to create hypotheses in debugging. My bet is that many of the virtues of TDD (like UML as well) are overblown and the only real benefit is forcing a developer to focus on what the problem is. One of my issues is that it creates a whole new class of code that deals with testing. This is totally fine, but where are tests for the tests! It seems like a story that we'll probably never see the end of.

Text Browsing

I'm going to suggest that if you're a programmer, it would treat you well to try out a good text web browser. My recommendation is w3m due to its Emacs integration, but anything that can keep you in your work environment works. My guess is vimmers would get similar usage from links/lynx assuming the terminal is their environment. The reason being is that if you are constantly editing text and reading it in your dev environment, browsing the web textually can be a helpful tool to keep focus. For me, I get the same keybindings, easy copy and pasting, and simpler window/frame/buffer management. Beyond this though, it feels faster when it comes to reading documentation and finding helpful code. Your milage may vary, but it sure couldn't hurt to try.

Administering Systems

At work we recently rolled out a new system. It isn't actually new, but is in fact the latest step in an improvement to a current system. What always strikes me about the smart folks I work with is how gracefully they walk the line between system administrator and programmer. The two fields are completely intertwined, but the best programmers are those that have the better understanding of both sides. This is probably partly why I'm not that great of a programmer! For whatever reason, my mind doesn't ever seem to really indulge in the system administration side of things. It is always a challenge for me to make pre-existing software work the way I think it should. That doesn't mean I'm not trying of course! But it does mean that I have a ton to learn and will for the foreseeable future.

ACL Wrap Up

This past weekend was ACL in Austin and it was a blast. We saw Them Crooked Vultures, The Walkmen, School of Seven Bells, Broken Social Scene and some guys from Phoenix DJ. We also played a show with The Riverboat Gamblers and The Soldier Thread. It was a ton of fun. On the road we don't really get to hang out that much. There is usually somewhere to drive, something to load or unload or something to sell that usually keeps us busy. It was great to come home, rest and then have a great weekend of music and friends. We didn't go to ACL proper and I'm glad. There were plenty of bands I would have liked to catch but the weather was horrid and my guess is I would have been pretty miserable in the mud. Hopefully next year there will be some nicer weather. Who knows, maybe we'll even get to play!

Loading mentions Retweet
Filed under  //   emacs   music   programming  

Finally Using Emacs for Web Browsing and Email

It has taken a long time, but I've finally managed to get email working relatively well in Emacs! In the end Wanderlust was the client of choice for me. It has a pretty simple file format for organizing mail and the keybindings have already begun to be rather natural. The reasoning behind my quest for merging my email and emacs had more todo with my recent tour than an actual desire or need for checking mail in my text editor. I was perfectly happy with Gmail and in fact I'm technically still using it through IMAP. The real reason was that I wanted to try and reduce my bandwidth usage. On tour I had a 5 Gig cap on my network usage. Having never really monitored this sort of thing before, I had no idea how much bandwidth things like IRC might use when you're talking about every day, all day usage. To make things worse, I forgot to grab the USB cable that would let me check my current usage for the month, so I was essentially flying blind!

My fear of an enormous bill made me consider what I could do reduce my bandwidth. The obvious avoidance of listening to music and watching videos was easy. Also, reading local versions of docs was another helpful tactic. This lead me to see a few things about w3m, which I had tried to use before in Emacs. It dawned on me that things like code examples viewed in a text based web browser makes nothing but sense in my text editor. It makes things like copying and pasting code samples a breeze. After getting used to w3m in Emacs it became clear that I should really try to finally tackle email. As I said, Gmail has been just fine, so the motivation wasn't very strong to move. The killer feature in my mind was tying things into my todo list and avoiding Gmail's constant stream of Ajax requests. I'm sure the bandwidth savings are practically nothing, but nonetheless, it seemed like if I got things working it might really be helpful in ways similar to w3m.

Once things were configured and I managed to get everything working, I did notice some helpful bits. Silly annoyances that really should never have been a problem smoothed out. I'm talking about basic copy and paste issues I noticed here and there when working with Emacs and the rest of my desktop. Little things like working with Trac was also a good deal faster, although it was something of a let down that I couldn't submit updates to tickets. Finally, it seems more reasonable to keep up with mailing lists as most of the necessary content (code) is close at hand. This last aspect seems to be the biggest draw of using Emacs. I primarily write code and keeping more tools inside Emacs makes it faster to transition between those tools and the code I'm working on. Of course it does seem kind of cool in a geeky way to work almost solely in text, but the main nicety is actually being more productive. I'm hoping in the next couple weeks things can continue to become a little more streamlined. The killer app that I would like to find a good Emacs replacement for is my feed reader. Google Reader is great but something more bare might help to move through things faster and generally filter out the cruft. That said, blogs were one of the first things I cut out in my quest to lower bandwidth. My desire to check what is new in the blogosphere has greatly diminished since taking a break. I'd like to potentially keep it that way for a while.

Loading mentions Retweet
Filed under  //   emacs   programming  

Some Org-Mode Workflow

I've got to say that org-mode has been treating me rather well. To contrast this, I also started playing with Evernote on my iphone. In theory, Evernote seems really helpful for working with todo lists. It has been reasonably nice for things like going to the store to buy a list of things. One feature that seemed really helpful was the ability to take a picture of something as a note. I took a picture of a pedal board I was making for Lauren that had some measuremeants on it. Unfortunately, for some odd reason, it wasn't saved. I think this happened when I tried to write a note that went along with the photo. It wasn't a huge deal and I'm sure user error was part of the equation, but I really hope that the photo feature is able to let jot down some notes in addition to simply having a picture.

Going back to org-mode, I found this tutorial. I had read it plenty of times before but it didn't really click what was happening. I was always frustrated that the agenda side of org-mode required configuration. That was probably more a function of my own lack of understanding tweaking my .emacs file than a real negative. Now that I'm more familiar with Emacs Lisp, doing small configuration details in my .emacs doesn't seem like nearly as big a deal. After all, that is what it is there for!

This morning I went ahead and create a few different org files to better manage my todo list items. The next step will be to migrate my current todo file to the different files and start throwing in a few deadlines (C-c C-d). Hopefully once that is working, the agenda side of org can be of more use to me. The end of that tutorial mentions a time log of sorts as well, which seems pretty helpful. The other idea is to use the org clock in/out feature which I've used in the past.

At one of my jobs I had to do a lot of service work that was billable which meant keeping track of time. Org-mode was very helpful there in the end. I did write a small command line utility as well, but maintaining it was just silly. We had an idea for a script we called brain that acted like a catchall for generic tools to help productivity. It was kind of modeled after Paste's paster command. We used IronPython for it and it ended up being somewhat helpful. If I were to do something like that I think I would use Paver. In fact, this gives me an idea!

It will be trivial to write a simple paver command that lets me sync a rendered version of my todo list to my server. I can also use emacs on the server to re-gen the HTML view after editing it. It should be really simple to have an edit window with the text and then regen on the server. The on my home machine, I'll effectively pull down any changes and be on my way. At the very least, pushing via paver will be trivial.

What is nice is that thanks to my recent todo list, I really have been more productive. There have been a million things to do before tour and it hasn't been that bad getting them all done along side taking care of my work. While it is pretty mundane, I'm still really excited.

Loading mentions Retweet
Filed under  //   emacs   programming  

App Server Performance Thoughts

It is always interesting to see that CherryPy is included in Python web server benchmarks. I think it is a testament to the code base being considered a standard option as well as signifying that it is a reasonably fast base to consider other options. Often times it is not the fastest option, but at the same time, rarely is the most time spent simply responding to the request. Databases and application logic traditionally takes much longer than serving the response.

I'm not trying to argue that performance isn't important for a web server of course. CherryPy uses a threaded model, which has its issues in certain situations. For example, handling many clients for long periods of time is often difficult for a server like CherryPy. Notice that I just said "like" CherryPy. Threaded servers no matter the language or implementation often have similar characteristics. This is why you have things like prefork/mpm with Apache for example.

The other thing to consider regarding web application performance is the state. No matter what you do, there is going to have to be some concept of state that will be a bottleneck. There is as subtle abstraction I'm making here here that is meant to generalize the essence of web applications that differs from the concept of state within HTTP. HTTP is a stateless protocol, but web applications almost always have a state in some shape or form.

In this case I'm defining a "state" as something that must be read before handling the request. Anything from checking the authentication, reading a file or querying a database all involves some concept of state at some level. If the connection to the DB is open, then request this query, otherwise, make a new connection. If the file exists, read it. If the user exists, let the next function or object handle the rest of the request process. In all these cases there is some element of state that must be considered before handling the eventual response to the client.

Going back to peformance, the questions asked of the state traditionally are going to be what really hinders performance. Database are the traditional bottleneck, but it is definitely not limited to this. Sessions are a great example where state needs to be maintained. If you have many servers running, how is that session state managed? Authentication is another area that is traditionally not associated with a single server. I mention this because while it is totally appropriate to consider how fast a web server handles responses, it is also just as important to consider how fast a session service or directory service handles their respective tasks. Likewise, there is the question of whether the server is responsible for handling some of these more global requirements. Apache and CherryPy can both handle sessions where a standalone WSGI server like Paste's HTTP Server relegates that to the application. Comparing a generic WSGI server to CherryPy may not really be as similar as one might think. Comparing a WSGI server with an app that uses Beaker, Static, URLMap, Routes and WebOb might get you closer to an actual apples to apples comparison.

One nice thing about CherryPy is that while it comes equipped with a healthy set of features, it is often relatively simple to use an external tool. For example, you can use sessions out of the box, or implement your own distributed session system. When considering performance for something needing support for a massive amount of clients, it might simply mean starting more servers and increasing the threadpool of the servers while using an external service for sessions. An asynchronous server might be better equipped to handle more clients initially, but the bottleneck of session state will still most likely need to be handled at which point the faster server might not have a trivial way of allowing a different session tool. Or it might have a great way of using other session tools! The point being there is more to performance than simply handling requests.

I'm not suggesting that you should use CherryPy for everything. What I am saying is that when considering performance the measurement is partly going to be specific to the application because of its dependence on some idea of state. CherryPy is a great server that is well tested and very stable. It may not be the fastest, but in terms of writing an application that uses something like a session service or other remote state tracking services, it can be very effective. Its concept of an engine bus is very powerful for integrating these kinds of services and connections. In other words, just as a framework makes writing application code easier, CherryPy's facilities help to create a more robust server environment relatively easily. These features can make scaling easier even though they most likely impact raw response performance. That may or may not be an effective trade off.

I should point out that I'm biased as we happily use CherryPy at work and I've used it personally for quite a while. That said, my goal is not to promote CherryPy, but to show where it optimizes the web application building process. Sometimes its facilities are going to be extremely helpful. Other times, not so much. The nice thing is that there are plenty of great options that facilitate many different styles of applications to meet different requirements. It is also important that as web developers consider performance it is done with an eye towards real measurements and an understanding of state. There is nothing new here of course, but it can't hurt to provide a slightly more specific argument as to why web server benchmarks may not be as telling as you might think.

Loading mentions Retweet
Filed under  //   programming   python  

Front Loaded Mercurial

I'm going to have to go back and see how I can avoid laying a big fat patch bomb on a repo and I'm not happy about it. There is no one to blame but myself. That doesn't make it any nicer. My big issue is that for all the cool features of Mercurial there is a consistent front loading requirement. You cannot simply work and then later construct your commits that you'll be pushing. MQ does help with this sort of thing and I'm going to have to find out just how much tomorrow, but it would have been really nice if I could have started coding and whan I finished have a convenient way to go through all the files and commit them in reasonable chunks.

The astute reader will recognize that this issue really just a sign of bad DVCS habits and I'm not about to argue otherwise. Still, I'm very much a part of the "not a great coder" club, and as such, seem like a good candidate for how to help out the normal developers using these powerful tools. One might also suggest that I open a ticket, or even better, contribute a patch. Again, my "not a great coder" club membership explicitly states that any gripes need to stay far away from those folks getting a lot of work done (a la the mercurial devs), hence I'm totally fine leaving my whining here on my blog. My bet is bringing it up here will do more to improve my own habits than suggesting to others they are real problems.

Next time I'm really going to do a better job manaing my patches. Feel free to hold my feet to the fire in the future seeing how I've done.

Loading mentions Retweet
Filed under  //   mercurial   programming