| Perl + DBIx::Class + Catalyst - Our Technology Choice |
| Written by Jon Schutz | |
| Friday, 09 February 2007 | |
|
Our software had started out as one thing and then, beaten over the years by the irrepressible forces of business necessity, it had been bent and twisted and jemmied to fit a different set of requirements, until it could be bent no more. A full system-wide upgrade, a rewrite from scratch, had become unavoidable. This is the story of how we came to choose Perl + DBIx::Class + Catalyst as our system platform. 1. Why we chose PerlOur core application was written in C. Whilst largely configurable through an XML configuration file, it soon became evident that we needed to run some custom code on a per-customer basis. We had two solutions for this - we could either pass our context out to an external program, or run through an HTTP proxy which could use PHP, mod_perl or any CGI-based program. We weren't strict on which approach to take - depending on the customer requirements, generally one solution or another made the most sense. Given time and many customer installations, it emerged that the language that people preferred to use was Perl, either in the HTTP proxy or external program modes. This is perhaps not surprising, since our processing was text-oriented and used lots of regular expressions. Programs written in PHP were long and ugly; those written in Perl were neat and simple. Some developers started encapsulating recurring parts of the custom components in libraries, and thus Perl became the preferred language. This choice was not without its fair share of drama, however. Perl as CGI or as an external program came with a speed/load penalty since the program had to load, start and run on every invocation. Some of the developers dabbled more in mod_perl; on several occasions, we found we had lost all customer services due to Apache not starting when someone had deployed a new module in the mod_perl environment with a missing dependency or incorrect file path. Nevertheless, with due exhortations for discipline when deploying new mod_perl components, we found that server load decreased to a more satisfactory level, and the frequency of outages was low, although the risk was always lurking. Character encoding issues were our next major concern. The changes in handling Unicode between Perl 5.8.0 and later were part of the problem. The fact that customer's websites often mixed character sets or failed to correctly identify them also contributed. Mostly we found ourselves beating our heads against the cold hard stone wall when any combination of 'use', 'utf8', 'encoding', 'bytes' just wouldn't do what it was &#$&()(@# supposed to. In the end, with a bit of careful attention to character sets and encodings of the various layers, these issues were also resolved. We had no doubt that the benefits of using Perl outweighed the few problems we had from time to time. When we reached the point that our software infrastructure as it stood became unmaintainable, we set about planning a new version of the software and Perl was always considered a strong candidate. It was the discovery of DBIx::Class that finally tipped the scales. 2. Why we chose DBIx::ClassWe took a system view of the planned upgrade. Our hardware setup includes ten or so high availability cluster pairs located into two geographically separated data centres. Each pair runs DRBD for data mirroring and Heartbeat for failover. As well as running our core services, each pair generally has one or more additional dedicated tasks - such as LVS load balancing, DNS, NFS or MySQL. Managing the high availability side was itself a complex thing, and hence prone to failure, and more than once it ended up that a dying server in a pair would malicously take out its partner to spite us. So, a key aspect of our new proposal was to integrate the management of our systems so that all configuration information could be sourced from one database. Since our existing configuration was largely XML based, we started looking for an XML database that we could cluster, and came up with eXist (Java) as the answer. Java was also a strong candidate for our applications, due to the various technologies available for working with standards-based XML. I don't wish to denigrate eXist; in fact, I think it has great potential. Where we ran into problems was when we started pushing it a bit - trying to load and query around 1GB of XML - where we hit the ol' Java memory limit thing. 256MB of heap was insufficient, so we tried giving it 512MB. It ran for a bit longer, and then crash! same problem. We tried 768MB. Same problem. This is where I had to say 'No more!". I cannot dedicate 1GB or more of memory to a single process and then still not be sure that it will be sufficient for the task. Until Java can learn to grow and shrink its heap properly, it will continue to be unsuitable for a range of tasks. Then we considered using Caché, a commercial object database with built-in XML translation. The sales people were very sure it would meet our needs. Even I thought it might, and on the first indication of pricing, it seemed reasonably priced. But when it came to the crunch, I had to choose between two pricing models; one which was very expensive, and another where I had to know how many users I would have logged into the database at any time. Now, I have a few hundred customers, and no idea how many of those would be logged in at once, and the sales person wasn't sure if they counted as separate database logins or not. I also have several developers who might be logged in, but whether they used the same password or a different password seemed to make a difference. And whether I clustered the database over 8 or 16 machines seemed to be significant. To preserve my sanity I just put down the telephone and haven't spoken to them again since. They did send me a nice little remote control racing car though, that I gave to my kids. Perhaps one day I'll understand why... At the end of the road with Java and with object databases, I made the firm decision to stick with traditional, low-risk, relational database technology and went back to CPAN and started digging around, looking for XML-relational mappers. I didn't find exactly what I wanted, but somewhere along the way came across DBIx::Class which caught my attention. Firstly, it sounded like DBIx::Class could provide the functionality that I needed, although not via XML as I had planned. Secondly, the extensive documentation and tutorials that were provided struck me as marks of quality of the package. I tried it out by importing part of our existing schema, and blow me over, it worked! DBIx::Class felt right for the application. Next, I needed a framework for web deployment, for both HTML-based GUI type applications and web service delivery. 3. Why we chose CatalystI had more or less settled on DBIx::Class in a mod_perl environment. mod_perl irritated me a little bit; I'd had some bad experiences trying to get it to compile in the past, plus the issues described above with it crashing the whole server, plus the confusion of the version 1 and 2 APIs. Nevertheless, it is an efficient environment for Perl-based web services, and we already had it set up. I knew of the Template Toolkit, and thought something like this might be useful for the GUI aspects, and started digging around more on CPAN to see if this was still the "best of breed" templating system available. Between DBIx::Class and Template Toolkit, I started seeing references to Catalyst. Marketers say that you must be exposed to a brand seven times before the brain acknowledges it and suddenly you have brand recognition. I think it was like that for me with Catalyst. I wasn't looking for it, but it started coming up so often in my searching that it was time to find out what it was all about. I wasn't looking for an application server, but when I found it I realised it was just what I needed. There are so many things that we do in the mod_perl environment that are not standardised, which ultimately leads to quality issues of one form or another. Now that I knew I needed one, I looked about on CPAN to see what else was available, and came across AxKit, Maypole, App::Context, and a few others. I soon decided that AxKit wasn't the right choice. We had used XML/XSLT extensively in the past, and whilst it was excellent for producing standards-compliant HTML, it was just all a bit painful to use and the developers went out of their way to avoid it wherever possible. I tried to understand the documentation for App::Context, but frankly struggled to get my head around it. I'm not particularly stupid most of the time, so I knew it would be a struggle for the rest of my team as well, so I left it and moved on. Maypole seemed fairly mature. Eventually I noted that Catalyst was originally based on Maypole. I also noted that Catalyst had very good and extensive documentation, much like DBIx::Class. (Unfortunately, since then the documentation for Catalyst been split from the main package, leaving some dead-end links and a poor user experience for those who don't know where to look. Luckily for me, it wasn't like that back then). So, Catalyst struck me as a reasonably mature package with a strong heritage, good documentation and a wealth of 3rd party plugins to provide all sorts of functionality, and the decision was made. Well, the decision was made for the HTML GUI component. I still felt that for the web services component of the application, performance would demand a "bare to the bone" programming approach - SQL programming without any database abstractions, mod_perl request/response objects, and the rest custom classes. So I set about designing and prototyping various aspects of the system. I had something that resembled a model there, a controller here, a view over there; I wanted to cater for plugins to make it futureproof; I wanted to give the flexibility of multiple views; I wasn't quite sure of the final URL structure so thought I'd need to keep that flexible. Eventually, I came to the conclusion that I was considering rewriting rafts of features that Catalyst already provided. As noted above, I'm not all that stupid, not always, so Catalyst made sense for the whole application. I recognised that some performance, in terms of responses per second, might be sacrificed by choosing Perl, DBIx::Class and Catalyst, but if I had to provision another server or two, that was a small cost compared to all of the coding and testing I was saving, and the flexibility I was introducing. 4. Why we chose FastCGIDecisions, decisions. Reviewing the deployment recommendations for Catalyst, it would seem that in a mod_perl environment I'd need a front-end and back-end Apache instance, the first to farm out the requests and the second to do the mod_perl/Catalyst heavy lifting. Having had considerable experience with reverse proxies, that all seemed a bit messy with having to manage two sets of Apache configuration files, and potentially unreliable The FastCGI option was attractive. The FastCGI server could be started by the Apache process, be automatically restarted if it died, and required minimal configuration. Functionally it is the same as the front-end/back-end Apache, but without the Apache wrapper around the back-end component. I was pleased to see that I could run Catalyst as a standalone server. That makes the development environment so much simpler; previously, we tried to maintain a staging cluster of four servers to replicate the production environment. Unfortunately, maintenance of the staging cluster was always neglected, being of secondary importance to the production clusters of course, so it never worked that well and the developers were more prone to taking a short-cut straight into production. With the standalone server option, we could decouple the functionality that the developers needed from the full load-balanced, redundant environment, allowing adequate development and testing to be performed on the developer's desktop. The one hurdle that we had to deal with was coping with the longer "maintenance" operations which would cause the Apache server to timeout waiting for a request. We needed to fork and put these processes into the background. Unfortunately Catalyst didn't play nicely with fork - at least in the development environment - as the child process didn't know of all the file descriptors that the parent process might hold open, so if the Catalyst engine tried to restart, it would fail as the main socket was still kept open by the background process. This would have caused problems in production if an Apache instance was restarted while a background process was running. In the end we settled on a separate XML RPC service invoked by xinetd to run the background tasks. So that is our system. It was quick to deploy and has been running flawlessly for some months now. For our needs, Perl, DBIx::Class and Catalyst provided an excellent solution. About the Author Jon Schutz is the CTO of YourAmigo, a Search Engine Marketing and Optimisation firm in Adelaide, South Australia. Jon has a Masters degree in Mathematical Science in Signal and Information Processing and an Honours degree in Electrical and Electronic Engineering. He has extensive experience in systems and software engineering, network and computer systems management, particularly on Linux. Previously he has worked in the fields of radar and control systems. |
|
| Last Updated ( Tuesday, 13 February 2007 ) |
| < Prev |
|---|