This is part four of an ongoing series on Optimizing Script Performance – part three focused on Keyword Searches.
Hello my fellow scripting friends! I hope you are ready for some more tips and tricks in the world of data analytics…so let’s get right to it.
Today, we are going back to the basics as we investigate the often used (and misused) PRESORT command.
When trying to get the most out of your analytics the PRESORT (or less common SECSORT) command is often overlooked. While scripting in ACL Analytics, it is easy to just leave in the PRESORT command every time you SUMMARIZE, JOIN, CLASSIFY…etc. Is this the best thing to do, or are we doing this out of sheer laziness? I’m going with laziness (at least for me)…although we can mask our laziness by calling it ‘selective scripting’…yeah that sounds better.
When I plan out a script, I usually like to think of the series of commands that I will need to achieve my goal. When writing my script, I often neglect digging too much into the logic and just leave PRESORT in most of my commands (it is way easier to just leave it there than to think about it!). This sounds like a great idea; however, you may be severely impacting the performance of your script.
As an example, let us look at a scenario using SUMMARIZE followed by a JOIN on the same key field. If your data is already sorted on the key field, you can leave the PRESORT out of both commands. If the data has not been sorted, you can add PRESORT to the SUMMARIZE command and leave it out of the JOIN command (assuming of course that you are joining the summarized table). The PRESORT command will create a temporary sorted table in the background to work from – if you are unnecessarily using PRESORT then you will be waiting longer to get your results (this is especially true when you have large data files).
I have to admit that I’ve often been guilty of leaving the PRESORT command in unnecessary places until a co-worker recently called me on it, hence the idea to write about the topic. I guess the lesson here is to think critically about your code and to make sure that you are not performing redundant tasks that could cause long run times on your analytics.