This case study provides an example of why systems should be proactively managed and have regular health checks.
We were asked by a large Blue-Chip Organisation to Health Check and Optimise their ERP system, which had serious performance issues. At peak times the ERP system was used by several thousand people globally entering timesheets, holidays, expenses, financial data, HR information, running reports and payrolls etc… the system was the most business-critical application in the organisation.
Not only was performance impacting Business as Usual and the End Users, it was also having a major impact on their SAN environment. The SAN was consistently seeing a very high utilisation of disk I/O of around 90% capacity. In addition, all the VMWare virtual disks were located on this one SAN, so all the other server-side applications on the SAN were running slower due to the Disk I/O load.
It was identified internally that the cause of this issue was their ERP system. Under pressure from the business to improve the situation, the IT Department was being forced towards the purchase of a second SAN (the cost of which, was six figures), solely to accommodate the ERP system.
We sent in one of our Senior DBA’s who specialises in Database Optimisation, not only did Paul sort out the ERP performance issue, but he saved them a six-figure bill for a new SAN and helped restore the kudos of the IT Department.
Paul very quickly identified the primary cause of why the ERP system was so Disk I/O intensive…
An internal application expert or perhaps an external consultant had created a very simple SQL Server stored procedure, which checked for specific types of transactions entering the system. If it found any of these transactions the stored procedure would updated the status field.
Simple stored procedures like this which provide a small amount of bespoke customisation are not uncommon in these types of systems. The stored procedure had been scheduled from within the ERP application to run every 2 minutes.
Although the stored procedure was simple in what it did it was not particularly well written. The execution time of the procedure was over 1 minute, during this period several things occurred that effected system performance: –
- The UPDATE statement put locks on the main transaction table in the database, which caused a huge amount of blocking in the system. For users who were trying to access the table at the same time, this caused their application to effectively hang for the remaining duration of the UPDATE.
- The UPDATE statement joined several tables together to determine which transactions needed updating, as such SQL Server identified this a large query that could be made more efficient by spreading the SQL statement across all the CPU’s that were in the server. This amounted to a SQL statement being processed by 8 CPU’s for the duration of the UPDATE statement. This produced a massive amount of disk I/O, all of which had to be read from the SAN and which took over 1 minute.
- The query result set of the UPDATE was so big SQL Server had to use all the memory that was available to it to process the statement. This flushed all other cached data out of memory. For any interactions with the system after the UPDATE had finished, the data had to be read from disk as opposed to being retrieved from memory.
During the 50+ second period in between the procedure finishing and then starting again, the load on the system from the application users was a lot heavier than normal due to them finding that the system was suddenly responding better.
After identifying the problem stored procedure, a plan was put in place to optimise it. The optimisation was simple enough and reduced the run time of the procedure from over 1 minute to under 1 second. It did this by reducing the huge amount of reads that the UPDATE statement did to under 100 reads.
The changes provided a significant performance improvement, the application stopped hanging, the SAN Disk I/O was reduced from 90% to 10% utilisation and the other server-side applications also experienced a significant improvement in performance.
The above case study is a good example of an unmanaged system. We have seen many similar issues where system performance has caused the system to become partly unusable and has affected XX environment and other systems. This has been across a wide range of applications including Patient Records, ERP systems, Student Records, Manufacturing and HR Systems.
Although the applications for the systems may be very different, the options and the way that SQL Server processes SQL statements from these applications remains the same. In-house developed systems and systems that have been implemented by third party software companies can all benefit from SQL Server optimisation.