It is always interesting to see that CherryPy is included in Python web server benchmarks. I think it is a testament to the code base being considered a standard option as well as signifying that it is a reasonably fast base to consider other options. Often times it is not the fastest option, but at the same time, rarely is the most time spent simply responding to the request. Databases and application logic traditionally takes much longer than serving the response.
I'm not trying to argue that performance isn't important for a web server of course. CherryPy uses a threaded model, which has its issues in certain situations. For example, handling many clients for long periods of time is often difficult for a server like CherryPy. Notice that I just said "like" CherryPy. Threaded servers no matter the language or implementation often have similar characteristics. This is why you have things like prefork/mpm with Apache for example.
The other thing to consider regarding web application performance is the state. No matter what you do, there is going to have to be some concept of state that will be a bottleneck. There is as subtle abstraction I'm making here here that is meant to generalize the essence of web applications that differs from the concept of state within HTTP. HTTP is a stateless protocol, but web applications almost always have a state in some shape or form.
In this case I'm defining a "state" as something that must be read before handling the request. Anything from checking the authentication, reading a file or querying a database all involves some concept of state at some level. If the connection to the DB is open, then request this query, otherwise, make a new connection. If the file exists, read it. If the user exists, let the next function or object handle the rest of the request process. In all these cases there is some element of state that must be considered before handling the eventual response to the client.
Going back to peformance, the questions asked of the state traditionally are going to be what really hinders performance. Database are the traditional bottleneck, but it is definitely not limited to this. Sessions are a great example where state needs to be maintained. If you have many servers running, how is that session state managed? Authentication is another area that is traditionally not associated with a single server. I mention this because while it is totally appropriate to consider how fast a web server handles responses, it is also just as important to consider how fast a session service or directory service handles their respective tasks. Likewise, there is the question of whether the server is responsible for handling some of these more global requirements. Apache and CherryPy can both handle sessions where a standalone WSGI server like Paste's HTTP Server relegates that to the application. Comparing a generic WSGI server to CherryPy may not really be as similar as one might think. Comparing a WSGI server with an app that uses Beaker, Static, URLMap, Routes and WebOb might get you closer to an actual apples to apples comparison.
One nice thing about CherryPy is that while it comes equipped with a healthy set of features, it is often relatively simple to use an external tool. For example, you can use sessions out of the box, or implement your own distributed session system. When considering performance for something needing support for a massive amount of clients, it might simply mean starting more servers and increasing the threadpool of the servers while using an external service for sessions. An asynchronous server might be better equipped to handle more clients initially, but the bottleneck of session state will still most likely need to be handled at which point the faster server might not have a trivial way of allowing a different session tool. Or it might have a great way of using other session tools! The point being there is more to performance than simply handling requests.
I'm not suggesting that you should use CherryPy for everything. What I am saying is that when considering performance the measurement is partly going to be specific to the application because of its dependence on some idea of state. CherryPy is a great server that is well tested and very stable. It may not be the fastest, but in terms of writing an application that uses something like a session service or other remote state tracking services, it can be very effective. Its concept of an engine bus is very powerful for integrating these kinds of services and connections. In other words, just as a framework makes writing application code easier, CherryPy's facilities help to create a more robust server environment relatively easily. These features can make scaling easier even though they most likely impact raw response performance. That may or may not be an effective trade off.
I should point out that I'm biased as we happily use CherryPy at work and I've used it personally for quite a while. That said, my goal is not to promote CherryPy, but to show where it optimizes the web application building process. Sometimes its facilities are going to be extremely helpful. Other times, not so much. The nice thing is that there are plenty of great options that facilitate many different styles of applications to meet different requirements. It is also important that as web developers consider performance it is done with an eye towards real measurements and an understanding of state. There is nothing new here of course, but it can't hurt to provide a slightly more specific argument as to why web server benchmarks may not be as telling as you might think.