Database Performance and Row Size

I was reading through some posts in a database-related newsgroup or mailing list (actually, right now I can’t remember which one it was). The conversation I was reading was in response to a question like “Does the number of columns or size of the row matter in terms of performance?”

Actually, the question asked what kind of a performance impact might be expected if a query was issued against two similar tables. The first table had (say) 20 columns, and the second table had the same 20 columns, as well as 35 additional columns.

Well, most of the basic responses were similar. The consensus was that as long as the query was going against the same columns then performance should be about the same. I disagree. Here is why.

You also need to factor in the I/O requests that are required to return the data. The DBMS will perform I/O at the block (or page) level — this is so whether you return one row or millions of rows. For multi-row results, accessing data from the table with the wider row (more columns) will usually be less efficient. This is so because fewer rows will exist on each page (the row with 100 columns is smaller than the row with 150 columns so more rows can reside in a single, pre-sized block/page). The bigger the result set, the more pronounced the performance degradation can be (because more physical I/Os are required to retrieve the data).

Think about it this way. Is it faster to pull smaller peaches out of a basket than bigger peaches? That is about the same type of question and anybody can envision the process. Say you want 100 peaches. Small peaches fit 25 per basket; big peaches fit ten per basket. To get 100 small peaches you’d need to pull 4 baskets from the shelf. To get 100 big peaches you’d need to pull 10 baskets from the shelf. The second task will clearly take more time.

Of course, things can get more complicated, too. Let’s say that you want 1000 peaches instead of 100, but you have 5 varieties of peaches and the baskets are all stored in a store room on shelves but they are intermingled. So, you now have to look up 40 locations from which to pull the small peaches and 100 to pull the larger peaches. That means that you not only have to deal with more baskets when you deal with the larger peaches but you also have to spend more time finding them. (with thanks to Ralph D. Wilson II for his insight on this paragraph).

Of course, the exact performance difference is difficult to calculate — especially over an online forum and without knowledge of the specific DBMS being used. But there will, more than likely, be a performance effect on queries when you add columns to a table.

Advertisements

About craig@craigsmullins.com

I'm a strategist, researcher, and consultant with nearly three decades of experience in all facets of database systems development.
This entry was posted in database design, performance. Bookmark the permalink.

3 Responses to Database Performance and Row Size

  1. Abel Martins says:

    I agree you; we normally need think about all access set and the I/O parallelism then whether
    we have normalized data, we have less accessed pages/blocks and it’s possible to parallel I/O
    operations.

  2. Eliana says:

    Great post and good information. I would like to see more of these.

  3. Pingback: Database Performance and Row Size | MYSQL | Scoop.it

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s