Tuesday, March 4, 2008

It's OK to Let Software Die

I think we should let PDF::Writer die.

Why single out a specific library? Just because I'm fairly familiar with some details of it. It's nothing personal and the message behind this post is intended to apply to many projects. For example, the Ruby Core team has publicly stated that they want to see the standard cgi.rb library replaced. I'm sure we all feel that way about some software. I'll stick with PDF::Writer and you can mentally replace it with a project you are familiar with.

Now back to the point: I think we should let PDF::Writer die. I guess that sounds kind-of drastic, but give me a chance to explain. There's a great quote that Matz, the creator of Ruby, showed on a slide in a talk he gave to Google recently. It said, "OSS Should Move Forward or Die." That's an important truth.

Why are Matz and I so ready to start handing out the destruction? The reason is not at all complicated: a project can get to the point where it's hindering more than it's helping. I believe PDF::Writer is there.

I have the utmost respect for Austin and his work to build PDF::Writer. Back then, it was a welcome effort. Today is a different time though and the landscape has changed:
  • Austin no longer keeps PDF::Writer up
  • PDF::Writer's new maintainers (more like patch appliers) don't completely understand the system
  • There are several known issues that just aren't practical to fix for various reasons
  • PDF::Writer is a vast and complex code base
  • There are serious performance issues with the library
  • The API is far from ideal, requiring complex wrapping for just about any implementation
  • It would be a substantial effort to port the library to Ruby 1.9
If we put all of this together, the picture becomes clear: PDF::Writer has stopped "moving forward" as Matz put it. It's on life support. That's worse than being dead because it means we're burning valuable effort to keep things in this obviously less than ideal state.

Now, if we could just get the coroner to call the time of death for PDF::Writer, we could move on. Where would we go next? Who knows. Anywhere is better though, because we would again be moving forward. Some options we might explore in the immediate future:
  • Using a different format for printable content, RTF perhaps
  • Piping some HTML through html2ps and ps2pdf
  • Prince XML
The fact is, we've used all three of those options in production applications at work within the last two years. None of them are perfect. Prince is amazing, but so is the price tag. html2ps is just shy of being as useful as we would love it to be in some areas. And if you really need PDF, substituting just may not be an option. That said, all three supported our needs better than PDF::Writer.

Perhaps the only viable long term solution is a shinny and sleek rewrite of PDF::Writer. We know we have at least a few people interested in the project, so if we could free them from monitoring the life support systems we might just have the beginnings of a rebirth effort. That's the way we need to get things moving.

The moral is simple: it is not just OK to let PDF::Writer (or whatever project) die, it can actually be a blessing. Sure, we would mourn the loss of a once great resource, but eventually we will also choose to move on. That's for the good of us all.

RIP PDF::Writer.


James Healy said...

I guess I agree with what you've said here - PDF::Writer in its current state is only suitable for relatively basic documents.

Starting from scratch appeals to my inner rubyist, but not my inner time manager. It would be a significant undertaking.

There is definitely a certain amount of modest code cleaning and refactoring that could make the code significantly more maintainable, but maybe a clean slate and fresh ideas are a worthy goal.

I too have hit the complications with the alternatives you've suggested, and have decided against using them in any significant way in production projects.

donn said...

I think the point has the same sound as that made about COBOL, and yet, 90% of all financial transactions are touched by COBOL.

You have also made the point that there is not yet a suitable successor. Some replacements require extra steps, such as converting html/xml to pdf. That is extra processing flow complexity.

I did find humor in asking the coroner to call a time of death. If it's the coroner calling it, then the patient is being seen by the wrong doctor.

I personally like the implementation in face of what is available for me within my work environment. We recently started a discussion about using Reporting Services. And that works great if one is already processing within a sql environment. But my customer work in a flat/fixed file environment. We are not able to simply implement reporting services to use xml as a dataset, format the reports, convert them pdf, then save them in a project archive folder. It just doesn't lend itself to that. PDF::writer does lend itself well.

Maybe the implementation isn't the greatest. But it works, like COBOL.

JEG2 said...

I'm found PDF::Writer to be the one that adds to my work flow way more than the other way around. Its pseudo tag structure is a solid miss at HTML/XML support and its Ruby API is painful enough to make simple programmatic generation too difficult to be practical.

This is the reason we've used so many alternate solutions at work. We just couldn't afford the time to reimplement all the pieces of our application that would need PDF::Writer specific handling.

I've been thinking about how to improve the Ruby interface. A builder-like syntax would be just awesome for the page layout stuff. Unfortunately, it's pretty clumsy for things like inline bolding of some text. I'm sure there's a solid approach, we just need to dream it up.

James Healy nails the primary issue though: "PDF::Writer in its current state is only suitable for relatively basic documents." Do we really need over 5,000 lines of library code for basic handling? No. That's what makes the library so cumbersome to use and maintain. Why the heck is it using transactions, threading, and Mutexes to layout tables in a linear data format? I've got a programming student who has been practicing with ASCII art tables that I'm confident can rewrite the PDF table generation output code field by field (assuming I provided the font metric methods, of course). The student's code would be a huge speed boost too.

Where James and I disagree is the difficulty of replacing it. For some reason, a lot of people think PDF's are really hard. Honestly, I think it's the 1,300 page specification that frightens so many of us. Since I spoke up about the library though, I've been reading it. It's not half as bad as people think. PDF is a pretty simple format and the large majority of that scary specification just don't apply to a version 1.0 replacement for the generator.

Adam said...

Don't forget HTMLDOC, which can meet many PDF output requirements and has a very straightforward Ruby API in PDF::HTMLDoc.

cmdicely said...

"It's not hard" is a lot more convincing when its attached to something showing it done: lots of things are conceptually easy on a broad view, but tricky in important implementation details. PDF::Writer fills an important need: it may not do it as well as one would like, and it may be easier to replace it than to fix it. But I think the most convincing argument for that is going to be an alpha (or pre-alpha) release of something that shows some real promise of being a viable replacement that lacks PDF::Writer's warts.

I don't particularly have strong feelings either way; I've never dug into PDF::Writer's internals and use it mostly indirectly.

JEG2 said...

I played with multiple prototypes while reading the PDF specification. That's how I came to these conclusions.

I haven't publicly released any of my work yet because I haven't found a user code interface I've fallen in love with yet. I'm still thinking on how to get that right.

Beyond that, I've offered to help the Ruby Medicant project with the replacement, if that's the selected activity.

I'll put code where my mouth is. I promise.