BI Yearly Maintenance
I look at data warehouses and data marts like cabinets where you store specific items. And like any cabinet, at least in my house, over time they get a little overstuffed, you may find things that don’t really belong, or things that you no longer want or need, maybe even a few things that have expired. So, much like I do a regular cleaning up of the contents of my cabinets, I also schedule regular maintenance of the data that my team is responsible for. We put this maintenance into two different categories the first being business logic related and the second being structural.
Business logic will most likely live in your presentation layers or in your build processes but may also lurk in your data structures; computed columns are a good example. It’s always good from a hygiene perspective to do a quick review of the business logic that impacts how your data is consumed. For us this one is really pretty straightforward, we review all of the reports that we are responsible for, see who’s been using them and how often. If they’re not being used we have the opportunity to go back to the customer and find out if something is wrong with report that we can fix or is it just not needed anymore. We also review the build process to ensure we’re not negatively impacting a data point looking for an increase in NULL values, empty columns or errors in columns.
A specific example of this, when we updated our fiscal calendar tables for 2014 a computed column was resolving to 51 weeks in the year instead of 52 – that would be a problem.
Technical issues are usually within the data storage itself and are easily found by checking the performance of jobs. I start here with some easy spreadsheets that show if jobs are running longer or if one suddenly running shorter and of course noting any regular failures. This will give you an idea where to focus first and will also help you assess the impact of any changes later. From there we take a look at indexes and the freshness of the statistics to find additional areas for improvement before moving onto my favorite maintenance task -purging data! Every year we take between six months to a year’s worth of data out of every warehouse/mart and for the most part its purged completely though we do occasionally archived them off temporarily, just in case the date is needed at a later date (rarely).
And finally I like to do a little bit of a wish list, which now ends up in my Agile backlog, of things I would like to further scrub, research or fix going forward. Things like old stored procedures that need to be reviewed, old jobs that aren’t currently scheduled, and areas where we have data redundancy or manual effort. I like to think of this from the perspective of someone new walking into the job tomorrow, would they be able to take a look at a server and understand what it was used for, if not then we have some work to do.
SQL Server makes a lot of this super easy, our Reporting Services environment automatically logs all report executions and statuses, and our warehouse jobs are all set to log to system tables for easy historical reporting – we don’t even have to trouble a DBA for help with any of this. With 15 different warehouses/mart and a medium sized report server, this all takes about a month to complete with the bulk of that in planning and reviewing and the actual changes are done via scheduled jobs off hours.