Database Performance Tuning Tips

January 17, 2010 - 8:56pm
Submitted by gary

The following is the list of things to avoid or do to increase performance in database speed:

 

1. Avoid or minimize the use of views!!

    Yes, views are nice and handy when retrieving complicated queries. But views are the worse things you can have when exposed to other developers who will use them.

    Lets assume the following tables: car, engine, and drivers,   (I wont put the effort into drawing a UML for these table implementations)

    logically, we assume car -> many-to-many -> drivers, car -> one-to-many ->engines (we can change engines)

   Imagine a view  denote v1 that has the results of : select * from car as c, engine as e where c.color = red and e.speed > 1000rpm

   Now imagine someone else wants a car with color yellow, and speed lower than 2000, he makes v2.

   Since these are heavily used queries, they think is best to implement a view for them. (And we can not argue WHEN exactly is a right time to make a view)

   Now programmer3 comes along and sees the 2 views, and is exactly what he wants, so he joins the 2 views to get the results.

   Now he is executing 3 queries to get the result he wants. instead of 1 single query.

   See the sign of database performance hit? Now imagine 10 developers making 10 views that are relying on other views. You are now into a query nightmare!!!

   One would argue that programmers are not that stupid. No, this has nothing to do with stupidity, even te smartest developers would fall into this trap because our logic of code reuse and avoid code duplication. (To use something already exist rather than creating something all over)

    This will ALWAYS happen if you enable the use of views...

    So if we avoid views totally, then we solve one huge maintenance and performance problem.

 

2. Avoid the use of sub selects

    This is very similar to views. For every sub select is an extra query to the database. In some situations, you NEED a sub select to get the result you want. So is unavoidable. But that is where (Object Relational Mapping) ORM frameworks like Hibernate comes in. (alright, what if you are not using a ORM framework, then yes you are stuck with the sub select, but those situations are rare)

    Lets look at an example:

        assume the tables: car, driver, engine. lets asume same structure, with link table car_driver for the many-to-many relationship.

        How would you find cars that driven by drivers d1, d2, ... dn, AND with engines speed > 1000rpm.

        Select c.* from car as c, engine as e, car_driver as cd where c.engine_fk = e.engine_pk and c.car_pk = cd.car_fk and cd.driver_fk in (
                     select d.* from driver as d where d.driver_name in (d1, d2, .... dn) )

         (Might have some minor errors in the query, but you get my idea)

       All this complicated query, you can easily search through if your objects are mapped correctly in hibernate.

      You can easily do:

       DetachedCriteria criteria = DetachedCriteria.forClass(Car.class);
       criteria.add(Restrictions.in(Car,DRIVER, driverList));
       criteria.add(Restrictions.ge(Engine.SPEED, 1000rpm));
       getHibernateTemplate().findByCriteria(criteria);

       One can argue that hibernate might be doing something even more slowly than your raw queries.

       I have 2 counter comments to that:

           1. Hibernate mapped object cache is ALWAYS faster than your database cache. (Please prove me wrong with counter examples if you have them) So even with the same amount of queries, hibernate will be faster than your raw queries.
           2. Notice the raw sql speed is: O(n)*O(m) = O(n2) loops through list of cars and drivers

               The query Hibernate goes through is O(C)*O(n) = O(n) is loop through cars, c is the constant driver list.

       So please avoid sub selects, and make use of your ORM objects. (Please prove me wrong if you have counter examples)

 

3. Do batch updates rather than hammering the database with single entry updates (REALLY BAD!!!)

    if you are saving a bunch of entries, make sure to use one single batch update rather than saving them after each entry is ready.

    In Hibernate, make use the getHibernateTemplate().saveOrUpdateAll() method, it combines all the entries and make a batch update rather than millions of transactions.

    Example:

          BAD!! foreach (Object object : objects) {
                         //do stuff to object
                        dao.save(object);
                     }

          GOOD: List tempList;
                       foreach(Object object : objects) {
                          //do stuff to object
                          tempList.add(object);
                       }
                       dao.batchUpdate(tempList);

    This will speed up database performance at least by 5-10 times the speed. (You will notice the speed increase on a million entry update)

 

4 This is a simple one, but index columns that get searched often (One can argue to index every column, but that is not necessary, mostly are just a waste of space)

    Simple enough, when a column is queried on often, an index on it will help increase the query speed dramatically.

    Index uses a tree(hash) structure, so that the speed to search for an element in the table is usually faster than O(n). There are claims that some indexing algorithms can be as fast as O(C). But that is a whole different topic on its own.

 

5. Clustered indexing on read-only tables

    If a table is huge, but not updated often, adding a clustered index will result in a notable performance increase.

    This is the case because all the entries in your database or table is optimally stored in your harddrive for easy retrieval. (Think disk defragment, clustered index is similar to defragmenting)

 

more to come...