Serious business services really can't go down, whether due to hardware or software failures. If your necessary services rely on MySQL, clustering and high availability can prevent failures. Kris Buytaert's article Building a High-Availability MySQL Cluster shows how his group recently used MySQL Cluster and Heartbeat to provide redundant, failure-proof replication and availability of their data.

Hello, readers. With great pleasure I introduce the revamped and reworked O'Reilly Network Databases site. We've revised the site display to feature the knowledge, opinions, and wisdom of our expert webloggers as well as to give you better access to newer, fresher information.

Our goal is to update the site several times a week with new postings from our webloggers as well as original articles and links to useful information elsewhere.

We're still in transition putting all of the pieces together (and gathering varied articles from several years of the O'Reilly Network to present in a meaningful and useful way), but we'll have everything up and running in the next couple of weeks. In the meantime, please feel free to let us know what we're doing right, where we can improve, which projects and authors to watch and to recruit in the comments section here, or by mailing me directly at chromatic@oreilly.com.

Thanks for reading!

I am not sure that the soothing discourse of vendors about "easy administration" goes the right way. When I read in Oracle's "2 Day DBA" guide

Prior knowledge or experience with managing databases is not required. The only requirement is a basic knowledge of computers.

I can just wonder. Everything would no doubt be perfect if we were living in a world where things just keep rolling the way they are, where volumes increase just slowly, where no one does any mistake and where DBMS products have no bug. Unfortunately, in a world where databases are often the nexus of information systems and where Murphy's law rules, database administration often requires a bit more knowledge than being able to locate Ctrl, Alt and Del on the keyboard.
Good skills are as required as ever - and possibly more than ever. Automation only affects the boring, routine part of the job. But IT departments will suffer if the DBA (or, for that matter, any IT staff) image is destroyed.
Working as a DBA used to be an enviable career, for University graduates as much as for self-taught IT professionals. Welcome to the days where basic computer skills will decide of a career as a DBA or as a hamburger-flipper.

Recently my niece, who is a senior in high-school, asked me for some advice on buying a laptop to use at college. Below is my response to her, a sort of brain-dump of the different trade-offs she should consider. She is studying music. She does not have any particular interest in PC technology for its own sake, and so I've taken some pains to explain terms like "pixel" that I might otherwise leave unexplained. She is a user in the true sense of the word, in that she cares only about what she can do with a computer, and not about the technology itself.

Dear Mary Beth,

You asked my advice on purchasing a laptop computer for college. I might give some specific suggestions later, but first and foremost I want to encourage you to think about how you might use a laptop while in school, and what you might use it for. Then I want to make you aware of some tradeoffs, or choices that you have when shopping for a laptop.

The good news is that it is very difficult these days to make an outright mistake. Any laptop on the market today will easily handle all of the common things you might use a computer for:

  • Word processing
  • Spreadsheets
  • Web browsing
  • Email
  • Managing your budget
  • Light photo editing
  • Other run-of-the-mill tasks

With tasks like these, what laptop to buy boils down to preferences related to size and weight and style and so forth (more on these tradeoffs below). You do need to be a bit more careful though, if you fall into the following categories:

  • You are a gamer, and you want to play computer games on your laptop
  • You plan to do video editing on your laptop
  • You want to do a large amount of heavy-duty image editing
  • You have some other specialized need that falls outside of "the usual thing"

I do not believe you fall into any of the above categories. If you have any needs that fall outside of my "common things" list, just let me know. I'll be happy to talk through the requirements for anything you plan to do.

Your Target College: My first bit of advice is to check with your target college to see whether they have any specific advice to offer or program that you may want to take advantage of. For example, Northern Michigan University (near my home) has what they call a Teaching, Learning, and Communication Laptop Initiative that offers students their choice between a Lenovo Thinkpad (Windows) and an Apple iBook (Mac OS X). From what I can tell from NMU's website, the cost of a Thinkpad is rolled into your tuition. The deal also includes software that you might need, support in case something goes wrong, theft insurance, and damage insurance. If your college offers something similar, you should almost certainly take advantage of it.

Windows versus Macintosh: The operating-system is the program that runs when you first turn a computer on, and from there you do all your other work, start all your other programs. Operating-systems are like universes: you live in one or the other, and not both. There are two universes to consider: Microsoft Windows and Apple Macintosh. Programs that run on Windows generally do not run on the Macintosh, and vice-versa. If you go down the Windows path, you will not be able to do the same things in the same way as someone who went down the Macintosh path. You'll usually be able to do the same things (no great worry there), but the specific software that you use and how you use it will often differ.

You are probably most familiar with Microsoft Windows. Windows dominates the market today. If your computers at high-school (or at your local library) have a green "Start" button at the lower left corner of the screen, then you are using Windows. Your cousin, my daughter Jenny, owns an Apple iBook. She runs the Mac OS X operating-system ("Macintosh" is commonly truncated to just "Mac"). Jenny just spent two weeks at grandma's, where you and she spent a lot of time together. I don't know how much she showed you of her laptop during her time there, but if you've seen her screen, then you've seen Mac OS X.

Apple is the only vendor to offer Mac OS X laptops. The vast majority of laptops on offer in stores will be running Windows. Here are my thoughts on the choice between the two:

  • Windows is the "safe" choice when it comes to being able to use any bit of hardware or software that you might run across. Windows has well over 90% of the market, so software and hardware vendors tend to support Windows first and foremost. For example, my wife has it in mind to buy a box to let her program cards for her sewing machine. Several companies make such boxes. Last we checked, none supported anything other than Windows. Another example is Brother's MFC-7220 printer. It supports faxing from a Windows PC, but not from a Mac (link to system requirements).
  • Mac is the "safe" (or at least much safer) choice when it comes to viruses and spyware and such "malware". My neighbors are frequently confounded by malware on their Windows PCs. I just do not hear that same sort of weeping, wailing, and gnashing of teeth from Mac users. People will argue the cause, whether it's because Windows is more vulnerable, or whether Windows just gets attacked more, but all I care about here are the results: you will have less trouble with malware if you go down the Mac path.
  • Apple, and by extension their Mac OS X operating-system, places a strong emphasis on ease-of-use. Their buzzphrase is that they want their products to "just work". And they do a good job in that area. It's why I bought Jenny a Mac. It's why, could I do it over again, I would buy my wife and mother Macs instead of the Thinkpads that they have now. Here's an example, btw, of the sort of difference you might find between a Mac and a Windows PC: all I have to do to lock up any of my Windows laptops is to close the lid, let the sleep process begin, and then quickly reopen the lid before the laptop fully goes to sleep. Jenny's Mac, by contrast, will handle that sequence of actions with aplomb.

Knowing you as I do, I believe you would do well to consider going down the path of buying a Macintosh. At the very least, try to find out what other students in your choosen program of study (music) are using. Find out whether any are using Macs. Ask about software. Try to find out whether you would be required to run a Windows-only program. If your fellow students in the music program are successfully using Macs, then you probably are on safe ground if you buy one for yourself. You don't want to be the "odd man out" though.

Despite all that I say in this section, I tred the Windows path. The Mac world feels to me like a closed and somewhat walled-in world. There is only one manufacturer, that tends to be rather controlling. Software choices are more limited than for the Windows platform. Apple can sometimes value style too highly over substance. The Oracle Database software that I often write about is a big factor for reasons I won't bore you with. Still, because I know you don't like to play with computers just for the sake of playing with computers, I encourage you to give serious thought to the Mac.

Shape of the Screen: Moving on to some far less controversial ground, another tradeoff to consider is the overall shape of the screen on whatever laptop that you ultimately buy. You have two, fundamental choices:

  • The traditional, squarish screen like Jenny and I have. For example, both she and I have screens that are 1024 dots wide by 768 deep. The ratio of width to depth is 4:3 (1024/768 = 4/3). For many years, the 4:3 ratio was the standard.
  • The ever more common "widescreen" shape. This shape conforms to the shape of a widescreen DVD movie. For example, Apple's mid-range, 15-inch Powerbook screen is 1440 dots wide by 960 deep, giving a ratio of 1.5:1 and exactly matching the ratio used in widescreen, DVD movies.

I'm a big fan of the wider screen ratio. I find it very pleasing to the eye. It's easier to lay out two documents side-by-side. Watching a DVD movie on a widescreen laptop is a far more pleasing experience. I have no hesitation at all in recommending that you look for a widescreen laptop. But do bear in mind that you lose nothing critical with the more traditional, 4:3 aspect-ratio. Either way, you'll still be able to get all your college work done.

A note here. I run a traditional, 4:3 screen on my laptop, and I will likely stay with that when I (hopefully) replace my laptop this year. The more squarish screen combined with the very small size of my laptop means that I have room left over on an airline tray-table to set down a drink or a small sandwich. I say this to illustrate why it's important to think about how and where you will use a laptop.

Pixels on the Screen: All the information you see on a laptop screen is made up of little, colored dots called "pixels". The more pixels, the more information you'll be able to see at one time: the more lines of text in a letter, the more cells in a spreadsheet, more of a web page, etc. Mine and Jenny's laptops each have 1024 pixels wide by 768 tall. I would not want anything less. That 15-inch Apple Powerbook I mention in the previous section shows more information in both dimensions. It is 1440 pixels wide versus my 1024 (41% wider) and 960 tall versus my 768 (25% taller). The upshot is that the particular Apple screen that I'm talking about will show all that my laptop can show plus 76% again as much. Here's the math:

  1. 1024 x 768 = 786,432
  2. 1440 x 960 = 1,382,400
  3. 1,382,400 - 786,432 = 595,968 more pixels on the 15-inch Apple Powerbook versus my laptop
  4. 595,968 / 786,432 = 76% of my screen size

Don't obsess about the math. You don't need to walk the aisles of Best Buy or CompUSA with a calculator in hand. Pixels are expensive too. With your budget, you probably cannot afford to let the pixel count be a driving factor in your decision. Just know that different screens show differing amounts of information, and that more pixels is generally better.

Think about your usage pattern here too. I sometimes wish I had more pixels on my own laptop screen, but I made the tradeoff that I did in order to get a smaller size laptop that is easier to travel with. If you do a lot of graphics work, or software development, or if you just like to have many windows and applications open at one time, then more pixels become very desirable.

Physical Size of the Screen: A physically larger screen does not necessarily mean that you are getting more pixels. For example, my 12-inch laptop screen is 1024 pixels wide by 768 tall. My wife's laptop has a 14-inch screen that is also 1024 pixels wide by 768 tall. Her screen is "bigger" than mine, but she and I see the same number of pixels, and so the same amount of information. The difference is one of magnification. Letters, buttons, and such are simply larger on my wife's screen than on mine.

Laptop manufacturers often stress the physical size of a screen without referencing the pixel count, and so you have to be careful when comparing two screens to consider both physical size and the number of pixels. My 12-inch Thinkpad screen shows the same amount of information as my wife's 14-inch Thinkpad screen. But you can also buy a 14-inch Thinkpad screen with a great deal more pixels (1400 x 1050). Similarly, Apple makes a 14-inch iBook screen that shows the same number of pixels as Jenny's 12-inch model.

If you have a difficult time reading small text, then you might give preference to a somewhat larger screen size (e.g. 14-inch over 12-inch) while keeping the pixel count equal. Sometimes the magnification can be too much. For example, I'm ok with a 14-inch screen at 1024 x 786, but a 15-inch screen with the same pixel count just seems to me to make everything look too big. It's best to look at a few different screen sizes to get a feel for what you prefer.

Two other considerations related to physical screen size are: larger screens tend to reduce battery life, and a physically larger screen means a physically larger and heavier laptop.

Size and Weight of the Laptop: Physical size and weight are important to think about, because they have a lot to do with how easily you can carry a laptop around with you, and that in turn might have a strong affect on how much use you get out of the laptop. Permit me to somewhat arbitrarily divide the world of laptops into three categories:

Heavy and hulking desktop replacements - These are laptops that are best bought by someone who wants to set one up at home and not move it around very much. The advantage being that a big, heavy laptop still takes less space than a desktop, and you have the option to travel with such a laptop, even though it might be cumbersome to lug along. Any laptop that weighs 7lbs or more, or with greater than a 14-inch traditional screen or a 15-inch widescreen, I tend to lump into this category. Much of what you see in Walmart or Best Buy fits here. You're going to college. You'll want to lug your laptop from dorm room to library. You'll want to sit around a commons area with your friends. You'll want to bring your laptop home on breaks. Don't saddle yourself with something that can't easily be thrown into a backpack.

Mainstream laptops - It's pretty easy these days to get a laptop that falls into the 4-7lb range. Five and six pound weights are easily doable, and without breaking the bank. Watch the screen size too. Keep it to 14-inches or less on a traditional, 4:3 screen, and to 15-inches or less on a widescreen. You'll have something that's much easier to carry around campus with you.

Really small laptops - Here I lump anything weighing less than four pounds, or that has a small screen such as the 12-inch screen that Jenny and I use. You might consider this category. Jenny's iBook, for example, while it weighs over 4lbs has a small, 12-inch screen. It slides nicely into her book pack, leaving plenty of room for other things. I've been impressed with the iBook's durability, and Jenny carries it with her everywhere.

Bottom line: don't get anything too heavy, and don't get anything too large. You want a laptop that you can easily take with you as you move about your college campus. Next time I come down to visit, I'll bring mine and my wife's with me so that you can have a look at some different sizes and get a feel for what it might mean to carry each around all day.

Battery life: Do think about how long you might want to run on batteries. A four to five hour runtime is probably a reasonable goal these days. Physically smaller screens tend to translate into greater battery life. Battery sizes can vary from model to model. Some Lenovo Thinkpads offer a choice between, for example, six-cell and nine-cell batteries. Some models, like mine for example, allow you to snap on a second battery in order to gain a longer battery run-time at the expense of increased weight and bulk.

If you buy a Windows laptop, either buy one that is branded as a "Centrino", or talk to me first. Centrino laptops are built with a collection of power-saving chips from Intel. Centrino is just a brand, but it is brand built around the promise of longer battery life and it generally delivers on that promise. Be careful of some of the inexpensive, desktop-replacement notebooks that you see. Sometimes inexpensive models are built with desktep chips that consume a great deal more power than a mobile chip would. You don't want to wind up with a 30-minute battery life. If you stick with a Centrino-branded laptop, you should be on safe ground.

Graphics accelleration: If you are a gamer, or are doing heavy graphics work, then you will want a laptop that has a discrete, so-called graphics accellerator chip along with dedicated video memory. (I realize I might be losing you a bit here) Such a chip takes a lot of the graphics load off of the laptop's central processing unit. All Apple Macintosh laptops (that I am aware of) include such a chip. Many Windows laptops use an embedded graphics solution in which a big chunk of your laptop's main memory is used to manipulate the graphics to be shown on screen. Embedded solutions do not perform nearly as well as discrete solutions. However, the difference does not matter to most people. Gamers notice it, because game-playing taxes a computer's graphics ability to the limit (and usually beyond). I do not believe you need to worry at all, one way or the other, about graphics accelleration. But if you feel you might get heavily into gaming, or that you might end up doing a great deal of image editing or video editing, then we should probably talk.

My own laptop, by the way, uses a shared-memory solution. My next laptop (the one that I have in mind to buy) will also use an embedded, shared-memory solution. Such solutions cost less, and they draw less current from your battery. You should be aware that you have a choice here, but I truly do not believe you need to worry about this issue one way or the other.

Upcomming technology changes: As it turns out, laptop technology in both the Windows and the Mac universes is going through some major changes this year. If having the "latest and greatest" is important to you, then you should wait until spring, and perhaps late spring, before buying anything.

The new brand to look for on the Windows side of the fence is "Centrino Duo". The big change is that Intel's central processing units (CPU) now come as the equivalent of two CPUs in one. Performance is improved. Battery life is also significantly improved.

On the Mac side of the fence, the big change is that Apple will begin this year to change their hardware architecture from IBM's Power PC chip to Intel's new Core processors, which are an evolution of Intel's Pentium processors.

The change on the Mac side is very significant. If you decide to buy a Mac, make sure and wait to buy the newer technology. Whatever you buy will probably need to last you all four years of college. Thus, you want a Mac laptop with the newer, Intel chip. You do not want the soon-to-be-bygone Power PC platform.

The change on the Windows side is not so much to worry about. Having two CPUs in a laptop is wonderful and fun, but in the grand scheme of things it is really not all that important. If you need two CPUs, you would already know it. I still do a lot of work, including running the Oracle database, on a four-year-old laptop. For the day-to-day work that you'll be doing, one CPU will do just fine for as far into the future as I can foresee. Consider two CPUs a nice bonus, but don't be afraid to pick up a good deal on a close-out, single-CPU model.

Ports and connectors and such: Laptops come with a variety of ports and connectors and features. At a minimum, I'd recomend at least the following:

  • At least two USB ports (one for a mouse, another for, perhaps, a printer)
  • A drive capable of playing DVDs and recording CDs. (Look for "CD-RW/DVD-ROM" in the feature list)
  • Wireless networking (often called "wi-fi"). Look for at least 802.11b. Better is 802.11b/g. If you see an "a", as in "a/b/g", that "a" is of little real use. "b/g" is what you really want.
  • An external display connecter so that you can plug in an external monitor. For example, you might want to plug in to an overhead projecter for a class presentation.

Most laptops will come with all of the above. Some still come without wireless though, so do watch out for that.

Memory: Buy a laptop that can be expanded to hold at least 1GB of memory. Many come initially with only 256MB. That's really not enough. You want at least 512MB right off the bat, so you may need to budget for an immediate upgrade. Give yourself some breathing room too, by making sure that you can eventually plug in 1GB.

Some specific recommendations: I should probably try and make a few, specific recommendations. And so I will:

  • Were I to buy a Windows laptop today, I would look first to Lenovo Thinkpad. My last two laptops have been Thinkpads, and I've bought them for both my wife and my mother. Thinkpads are good, solid machines that give little trouble, and the support I get when something goes wrong is top-notch. I hardly look at any other brand these days.
  • Aside from Thinkpads, I know at least one person who is happy with his Hewlett-Packard laptop; Acer seems to be making some well-reviewed laptops (link and link) these days (though I know nothing about their support); another friend is very happy with his recently purchased Toshiba Tablet PC, Sony seems to make some stylish models. These are all mainstream brands that you probably wouldn't go wrong in buying.
  • You asked about Dell. I've used a Dell Latitude in the past, and it was a solid, well-built machine. Beware though, that Dell markets two lines of laptop. Dell Latitude's are marketed to businesses. Dell Inspiron's however, are consumer-oriented. I've seen a few Inspirons and have never been impressed with them. If you go Dell, go Latitude.
  • If you want a Mac, you have only one choice and that is Apple (iBooks and Powerbooks). Fortunately, Apple hardware is good stuff. I have no concerns at all about their quality and support.

Dare I recommend a specific model? You might consider Lenovo's Thinkpad Z60t series (watch for the "t" at the end; the "m" is a heavier model). The Z60t is relatively small yet mainstream, weighs only 4.6 lbs (some configurations weigh a tad less), has a 14-inch widescreen display with a reasonable pixel count, gives two battery choices (for shorter or longer life), supports a second battery in place of the CD drive (for up to 8 hours of battery life), and garners good reviews (link and link and link). Thinkpads are a bit pricey, but worth it, in my opinion. Also look at Apple's new, Intel-based laptops when they are announced this month (probably by the time you get this letter).

And I've rambled on long enough: Mary Beth, that's all the advice I have to offer for now. If you're reading this online, watch the space below. Others will no doubt have differing opinions that you might want to consider. Readers might also suggest tradeoffs and features that I didn't think to mention. Good luck with your decision. Feel free to let me know what you plan to purchase before you pull the trigger. I'll be happy to discuss more, offer further opinions, etc.

With love,

Uncle Jonathan

It seems that developers are more likely to take advantage of new
database features in new applications than they are while maintaining
and upgrading old applications. It is a pity, because there are
very significant gains to be obtained at a low cost. I'll take a simple
example: a few years ago, I had to improve a program that had to display
on the user's screen both the first rows returned by a query and the
total number of rows found. The query was dynamically built, and was
executed twice: once to return the data, and once to count the rows, by
the mere substitution of a COUNT(*) to the select list. I significantly
improved performance by reworking the count queries and narrowing them
to the tables and conditions that were truly identifying the result set,
as opposed to the numerous joins that were required to return
complementary information. As a result, this query was taking a fraction
of the time of the "real" query, instead of doubling the execution time.

That was the best that could be done at that time.


I have met very recently something quite similar in a JDBC program. But
today, many DBMS products implement window (sometimes known as
analytical or OLAP) functions. Append something such as COUNT(*) OVER ()
to your select list, and your query will magically return the data AND
the total count in each row, at almost no cost if the query already
contains an ORDER BY clause. Any ordering requires to identify the full
result set before returning the data, counting the rows comes free. The
application code is simpler, and the database is queried once were it
was queried twice.
That's a cheap way to make a query almost twice as
fast as before.

Sal Cangeloso's brief article on Linux application names caught my eye yesterday morning. Having just come away from a struggle with cryptic Linux/Unix command names the evening before, I was in the mood to reflect on the importance of giving a name. I'll come back to my struggle. First I want to talk about cars.

If there's one industry that's given a tremendous amount of thought to naming, surely it is the automotive industry. Auto makers don't choose names that directly describe their products (else I would be driving a "Geo Small Slow Car" rather than a "Geo Metro"). Auto makers, at least from my observation, carefully choose names for their visceral, emotional appeal, and then they carefully build a brand around each name to reinforce their choosen image. The Ford Mustang is a good example. Ford carefully associates "Mustang" with wild horses, freedom of the open range, youth, vitality, strength, speed(!). And then the car itself is marketed to a demographic that wishes they had all those things. Customers don't buy a car, they buy an image. The key is that the name is consistently reinforced by images, by logos, by a product that can be emotionally associated with the name, and so forth. Ford Mustang is a well-known and successfull brand.

In the open source world, I see similar success in the names of Mozilla's flagship products: Firefox and Thunderbird. It used to be that when you said "Thunderbird", that I thought of the car. Now my first thought is of email. Mozilla has taken common, easy to pronounce words such as "fire" and "fox", "thunder" and "bird", and combined them into colorful and easily pronouncible product names. And Mozilla has backed those names up with advertising, with promotition, and with colorful logos. (And don't underestimate the value of those great logos)

(As a side note here, I often wonder about all the "K-names" for KDE applications. KDE is too locked into the cute, K thing. Gnome has the advantage here, I think, in that Gnome developers are more free to choose colorful names. No one had to worry about slipping a "G" into Mozilla's product names.)

So I disagree somewhat with the argument in Sal's article that application names need to be somehow descriptive of what the applications do. Firefox would be less successful, at least less memorable, were it to be called "Web Browser". Firefox is memorable because of the colorful name backed up by a good logo and consistent marketing. Acrobat is memorable for the same reason. Even "Gimp" can be a successful name with the right marketing effort.

But I do agree that Linux command and application names, and especially command names, are often cryptic, confounding, and a stumbling block. That struggle I had the other evening? It involved the command needed to format a hard drive. I was using Knoppix to help a neighbor rescue some files from a Windows machine gone bad. We were intending to copy the files to a second hard drive that I'd just installed. Trouble was, I needed to format that drive, and the GUI didn't seem to offer an option to do that, so it was off to the command-line for a solution. Some Googling led me to the mkfs command. Mkfs for format? My neighbor was completely confounded by that, and, frankly, so was I. (At least the command worked, and the files were saved) It wasn't until the next morning, just before reading Sal's article, that the probable association between "mkfs" and "make filesystem" occurred to me. Sure it makes sense now, but it is not intuitive to associate "mkfs" with "format". And "mkfs" is not even a pronouncible word, something I feel is very important in making a command memorable and useable.

Names are important. And troublesome names are certainly not the exclusive domain of Linux. Good branding can make a name memorable (e.g.: Firefox); Descriptive naming certainly doesn't hurt (e.g.: Notepad); Pronouncible names help me a lot (Kate is perfectly fine for an editor name). Abbreviations and truncated words often cause me to stumble (e.g.: df and mkfs). Unusual associations can make a name memorable (e.g.: Gimp).

Linux distributions, by the way, have tackled the naming problem by making their menu choices descriptive of what an application does. For example, KDE in Suse 10.0 uses "Media Player (Kaffeine)". That's a helpful approach for two reasons: it helps me get started on finding a media player, and it reinforces the association between "Kaffeine" and "media player". This gets back to branding and marketing. Kaffeine is a perfectly good name (IMHO). It just needs to be backed by enough marketing for people to learn to associate the name with the function. The dual-entry menu name really helps with that.

It's a slightly provocative question to ask oneself whether the automation of database administration isn't taking database administrators back to the bad old days of panic-mode, reactive behavior when they have painfully tried to be proactive for years. For many years, the hallmark of a good DBA was a collection of automated reports that were warning about storage issues to come, or parameters no longer adapted to a changing load, and all the intelligence of the work was in preventing issues. Automated administration certainly means that one DBA is now able to administer much more databases than before. But it also means that action will be focused on nasty bugs and generally speaking everything that shall have slipped through the net of automation. Let's hope that vendors' support will be, well, supportive.

I have been poring over an Oracle procedure of death in which a loop over a complex cursor calls a no less complex function in which multiple SELECT statements set various ancillary variables before a row insert accompanied by an UPDATE of the corresponding row of one of the tables referenced in the initial cursor. Still with me?

I have several times heard that procedures and simplistic SQL statements make for easier-to-maintain programs. I am not that sure; my only certainty is that it makes it easier to assign more junior, and therefore cheaper, developers to the task. I am under the, possibly false, impression that to most young graduates GROUP BY represents the ultimate level of SQL sophistication.

But complexity isn't inherent to a language, whether it is SQL or a wrapper language to embed SQL statements. Complexity is born of business requirements, and usually made significantly worse by poor database design.
I am not sure that a long succession of if ... else if ... else if ... embedding a number of SQL statements makes for an easier read than a CASE construct and a handful of outer joins. Actually, my feeling is that the larger the number of lines, the longer it takes me to grasp what the damn thing is meant to do.
But what I am certain of is that if we don't replace this procedure with an INSERT ... SELECT and a trigger to update the other table, we won't be able to go much faster than the 150,000 rows per hour that are currently processed.

No doubt that the OpenSOurce community is shaken by the announcement on October 7th that Oracle is acquiring Innobase, the company behind InnoDB. InnoDB is the product that enables MySQL to turn into an enterprise-grade DBMS, with the support for commit and rollback, foreign keys, row locking ... Forget about all the reassuring noises about "commitment to open source software", etc. Commitment to keep OpenSource software under close watch, so that it stays a toy, and not a menace in the corporate world ? (OpenSource database software, that is. Anything that has to do with operating systems, word-processing or spreadsheets is welcome).

In fact, there are many positive aspects to this announcement. Firstly, Oracle still has an interest in databases, something that wasn't absolutely obvious from the recent buying-spree. Secondly, Oracle implicitly acknowledges that OpenSource software databases have reached a level of maturity that makes them worthy contenders in the corporate market. Thirdly, it may be an opportunity for some other OpenSource entreprise-grade DBMS products to step out of the MySQL shadow. Fourthly, it makes MySQL a greater champion, if the MySQL announcement is to be believed, of the GPL license; time to dust off business models perhaps. And fifthly, it may announce the birth of a new lucrative cottage-industry: developing strong storage engines.

It's funny how people tend to side naturally with one school of thought or another. Database topics often polarize practitioners, and one of the issues that are hotly debated happens to be the 'natural key vs surrogate key' question, that is whether the primary key should have some significance outside the information system or merely being an internally generated number. I don't see any reason why I would call 357914358 something that I could call 'SPADE', and I use natural keys whenever I can. But a friend had a very interesting reaction to my mentioning that primary keys should not be updated:

We don't update PK's because of the inherent difficulty of propagating changes to related tables (assuming that we are using natural leys). This is one reason people go with surrogates. But what uniquely identifies a row can and will change. For example, what about a Companies table, and a company goes through a name change. Is the PK partly based on the name of the company? ... This idea that we don't update PK's seems rooted more in the difficulty of updating PK's when natural keys are used.

Cough, choke! There seems to me to be a confusion between the identification of a row (what the primary key is about) and the subtly weaker condition of distinguishing one row from another. As it happened, my company changed its name some time ago. Why could we keep our banking accounts without having to close and reopen them? Why could we keep our contracts running? Simply because the registration number attached to the company when it was incorporated didn't change. The name of a company distinguishes it from another. But it's not what truly identifies it. Some might argue that a registration number is a surrogate key of sort; and indeed it might be considered a shorter alias for a company that was created under a given name at a given place and date by some particular people and as a particular type of company. It's a surrogate key, but it's rooted in the real world. It seems to me quite acceptable to use a surrogate key to identify a company, as long as we clearly understand that it's merely a short-hand for information such as incorporation details that we have no use for in our model otherwise.

If you were to change what truly identifies something, how would you know that it is the same thing? There is no way to distinguish an update from a delete followed by an insert. "Updating a primary key" implicitly acknowledges that you have some out-model knowledge that the before update and after update values truly represent the same item. What truly defines the row isn't in your model. Don't blame the theory, blame your model.

Windows Vista is on the way, with the first beta soon to be released on 3 August. It may be too late to influence Microsoft, but I've put together the following list of features that would get me excited about the next release of Windows:

  • Enforce at the operating-system level a rule that no application is ever allowed to steal focus. It really wrecks flow for me to be typing along at an article only to have the Office Assistant ask me a question, or my spyware scanner interrupt me to say that new updates have been downloaded, or whatever. No interruptions. Ever. That's what I want.
  • More reliable killing of applications. When I ctrl-alt-del, highlight an application, and click the end button, I'd like the app to end right then. I know I can go to the Processes tab to end the underlying process more reliably, but the mapping from application name to process name is not always clear.
  • Longer battery life. This is more of a hardware problem. But Win XP actually does a good job at milking every minute from my batteries. Any further software improvements to extend battery life under Windows Vista would be most welcome.
  • I'd like to see better integration with Linux/Unix. For example, build in really good ssh and sftp clients. Maybe bundle in a good scripting language such as Python.
  • Ship Windows with some sort of built-in programming environment. I know it's easy enough to install, say, Python, but interesting things might happen if we could count on all users to just have it already.
  • Build in a stand-alone address book so that I can manage all my email contacts independently of whatever email program I happen to be using.
  • Take a good, hard look at the Start menu. I think we need a new approach to finding and starting applications. When I got to Start->All Programs, my list is so long that it wraps into two columns. And sometimes I have to navigate through not one, but two folders before I can click on an executable. I wish I had a solution to offer here, but I can't help but think there might be a better way.
  • Throw in a really good text editor. Vim maybe? Just something more capable than Notepad.

I'm sure as soon as I post this that other ideas will come to mind.

A key benefit from having a weblog is that you can quickly and easily post your thoughts on a topic for all to see. Alas, lurking within that advantage is the risk that you might post in the heat of the moment something that you later regret. Last week I made such a post.

Friday last week, in response to the new name "Windows Vista", I took a rather gratuitious potshot at Microsoft. It is true that I find their new product name uninspiring, and I really did immediately think of "station-wagon" (I'm old enough to remember the Vista Cruiser), but my post on the topic brought no benefit to anyone. If you're one of the people who read my post, I apologize for its inflammatory nature. You deserve better. Next time I will make it a point to calm down a bit and consider my words more thoughtfully.

I spoke with Andy Astor, CEO of EnterpriseDB, and got more details on their market strategy for encouraging enterprise adoption of their commercial PostgreSQL bundle.

Their initial push will be focused on providing an alternative to large corporations looking for some relief from Oracle's steep licensing. Large corporations currently pay the most, and therefore have the most to gain by a viable Oracle work-alike. E-DB's Oracle emulation currently consists of extensions to the pg/plsql language to support the range of Oracle data types, as well as the input-output parameters.

This seems like a reasonable approach, as Postgres's sql dialect is pretty close to Oracle's. But pretty close can still mean very expensive to migrate when you consider a nontrivial database application. To address this, EDB must make it ridiculously easy to port by providing something like 5 9's Oracle compatibility (and yes I just made that phrase up). But having taken a stab at sql dialect translation, I have to say that I think this is a pretty lofty goal. There are so many corner cases to consider, and the risk of not considering them is that porting to EDB becomes too labor intensive or risky to attempt. Raising the ante even more, EDB's goal is to do these corner cases not just for Oracle but for SQL Server as well.

As a concrete example, consider running something like OpenACS on EDB. OpenACS is a ~70KLOC open source web appdev framework that currently supports both Postgresql and Oracle. It makes liberal use of packages (not supported in the first EDB release, but 'soon' in sp1), and of course it uses Intermedia if you want full-text search, and has as least one Java stored procedure that I know about, etc. Any openacs+oracle site considering saving money with EDB would have to address these niggling syntax mismatch issues at the application level, and then do the same for their production support procedures. In other words, people currently using OpenACS in production with Oracle will have to redo their disaster recovery plans, their sql*loader scripts, and plan and execute an extensive regression testing phase for each application being ported. Ouch.

On the Microsoft side, nontechnical challenges present themselves. MS is cleverly pitching their low end SQL Server Express at the price-sensitive part of the market. For $0, you get the same code base they sell for $25k cpu, with limits of 4G/database, 1Gig of RAM, and 1 cpu. If you had to pick a database for Windows and can meet these limits, postgresql/edb is going to be a tough choice for a while.

To be honest, I want EDB to succeed -- I would love to give clients the choice to start their app with a free database and upgrade if/when they decide they need to, rather than making a choice based solely on ISV support. And that's where I ultimately see them being successful: not necessarily in providing a path off Oracle, but in providing a path *to* Oracle. In other words, if your developers learn parts of the Oracle API not supported by EDB, they can code their app to EDB and scale up to Oracle if there's every a compelling business reason to do so. This option should be very compelling both for start-ups without any legacy app support issues (already prime candidates for open source alternatives) and workgroup applications at Oracle-centric enterprises looking to cut costs and gain some vendor independence.

I believe this is how JBoss initially gained traction with shops who would deploy on commercial java app servers like WebLogic. JBoss provided something simpler, developer friendly, compatible, and way cheaper. At some point, as JBoss matured, people started thinking about the WebLogic deployment phase as optional. For EDB's sake, let's hope that history repeats itself.

O'Reilly has just published one of the three most significant and meaningful books that I've edited during my almost five years with the company. In case you haven't seen the announcement yet, Chris Date's latest book, Database In Depth, is now available. The subtitle, Relational Theory for Practitioners, sets out the goal for the book, which is to impart the fundamentals of relational theory to professionals in need of a quick refresher, or who have come up in the field through learning specific, vendor products.

Chris is a giant in the field, and I consider him to be one of the founders. A man named E.F. Codd was the founder, but Chris was one of the first to see the genius in Codd's idea. Chris and Ted (E.F. was Ted to his friends) worked together for many years, first at IBM and later at their own firm, constantly refining and developing the relational model that underlies practically every commercial database of significance today. Together, they gave birth to a new field.

Life sometimes runs in strange circles. Some 23 years ago, it was Chris, through his book A Guide to the SQL Standard, who led me into the world of SQL and relational databases. My memory of browsing his book in a Saginaw, Michigan bookstore is vividly clear even to this day. Ironically, the most influential part of that book was his Appendix A (I believe it was A) containing a critique of the then-current SQL language. That critique was my introduction to relational theory, and the knowledge I gained from that one appendix set me head-and-shoulders above my coworkers (at the time) in my understanding of SQL and relational databases.

As the years wore on, I tended to focus more on products, notably on Oracle, neglecting the importance of fundamentals and theory. Then, one day, a jarring argument about subquery optimization that hinged on a point of relational theory led (surprisingly!) to my meeting Chris Date, and that meeting led the book I'm talking about today. I'm still amazed when I think about how this book project came together.

I'm the book's editor, and so I'm biased, but permit me still to make a recommendation: If you do any work at all with relational databases, pick up a copy of Database In Depth. I won't say it is an easy read, because in some chapters you'll really need to think, and you may need to read some chapters twice, but the book is less than 200 pages, and reading those 200 pages is a small, yet very strong investment that you can make in your career. Reading Database In Depth is especially important if you've come into the field through learning a specific product. Product knowledge is vital for getting day-to-day work done, but the sort of fundamental knowledge you'll get from Chris is likewise important in the long-term. You'll better understand the day-to-day work that you are doing, and you'll better advance your career.

Pick up a copy of Chris's book. Learn the fundamentals of your field. You won't regret the investment.

PyCon 2005. I've just returned to the office and recovered from a fantastic trip to Washington D.C. last week for PyCon, the annual Python Conference. While I enjoyed the conference, my wife took the kids around to a few of the many museums and monuments that are to be found in our nation's capital. I didn't get in on that action, except to manage a late evening subway excursioun with my daughter to see the Capital Building, the Supreme Court, and Union Station. I love old-style train stations, and the lighted Capital dome is gorgeous at 8:00pm on a dark evening.

The venue. I can't say enough about how very much I like the venue for PyCon. The George Washington University area is bursting with life, so unlike many conference venues. There are shops, restaurants, monuments, people live in the area, and things were happening. Even the food-court known as J-Street on floor one of the Marvin Center (where the conference was held) was fun. My daughter and I found a great source of vegetarian sandwiches down there. Jenny's a strict vegetarian, so this was no small discovery! And the University bookstore was in the building. How can you not like a bookstore that sells cool medical equipment like stethoscopes and reflex hammers (I forget the proper name) and what not? I came this close to buying my nine-year-old his own stethoscope (uh, Jeff, I hope you're not reading this 'blog). What I did find to buy was a copy of Timothy Gowers Mathematics: A Very Short Introduction.

Python Books. Alex Martelli and Anna Martelli Ravenscroft had only just finished revising the Python Cookbook. We managed to ship several dozen copies of the Second Edition direct from the printer to the show bookstore. If you were at the conference, you are among the first to see the book. I greatly enjoyed working with Alex and Anna on this second edition. They are excellent writers, passionate about their topic, and knowledgable. For me, the high point of the project was the day that Alex added me to the credit list for the recipe on "Finding Last Friday". While editing the chapter on time and money, I'd read the first draft of the recipe, and was hit with an idea for improving it. I worked out some details of modular arithmetic with a friend while hiking in the Pictured Rocks National Lakeshore. Then I emailed Alex my new alorithm and put the matter out of my mind. I about fell out of my chair when he sent the chapter back with my friend's and my name in the credit list for that recipe. It just goes to show, that no matter what your Python expertise (and mine is very little indeed), you can still contribute to the cookbook. Alex, I'm honored.

For any who are curious, my friend's and my solution is the one on page 119.

Favorite Session. There were many good sessions at the conference. Perhaps my favorite sessions were the two back-to-back sessions on the new, decimal module by Michael Chermside and Facundo Batista. Perhaps it's because of my background in COBOL supporting a payroll system, but I've always thought it important to have support for true-decimal arithmetic, and so often languages seem not to provide for that support. It's nice to see it coming to Python. Later that same day, I sat in on an "open space" session led by Facundo, in which Facundo, Alex, Anna, myself, and a few others discussed the possibility of creating a specific datatype for money. In the past, I've been skeptical of the benefit from such a type, but now I'm rethinking the idea. I've seen money types that are nothing more than fixed, two-digit decimal types. Those probably don't add much value. But what if you could create a money type that combined both an amount and a currency type, so that you could store a value of USD 100 in one variable of the type and CAD 100 in another variable of the type, and the currency unit would be part of each value? What if you could somehow automate comparison of values across currency units? Well, that last is certainly an interesting challenge, isn't it?

On the subject of time, Anna Martelli Ravenscroft gave an excellent presentation on The Time of Day tackling topics such as time zone support and Coordinated Universal Time (UTC). Anna also pointed me towards what appears to be a very comprehensive resource on Timezone Information. A few years back, I did a fair bit of research into time and time zones while revising the datetime chapter in Steven Feuerstein's Oracle PL/SQL Programming to cover the then new, time zone, timestamp, and interval support in Oracle9i Database. Time and calendars, these things are not so simple. I have a lot yet to learn, and time is a fascinating area to explore. I never knew, for example, that Detroit, the city I grew up in, once had its own time zone.

In the favorite quote department, I had to laugh out loud when Greg Lindstrom of Novasys Health made the comment the largest obstacle to corporate adoption of Python is that "Python is too easy." A close runner up was Guido's comment during his keynote that "Perl isn't all bad."

I met many authors at the conference whom I don't get a chance to see often: Alex and Anna I've mentioned already, there is also David Ascher, Mark Lutz, Ray Lischner (of C++ In A Nutshell fame), and Abe Fettig (upcomming book on Twisted). There were many other good sessions, on Scripting the Mac with Python, on PythonCard, on Design Patterns, and many more.

I thoroughly enjoyed my two days at the conference. It was great meeting people in person whom I usually can only trade emails with. The venue was great. My wife says it's the best vacation (for her and the kids, anyway) that I've put together in a long time. I can't wait to see what next year brings.