When you’re using large datasets, it’s not uncommon for your do-file to take several hours to run. Especially if you run it overnight, you might be curious how long it took to run. Stata puts timestamps at the openings and closings of logfiles, but maybe you want to know the time taken for a given command.
I remember one of my first tasks as a research assistant was to extract a few years’ records from a tremendous file for a data request (we later broke the file up into smaller files). Believe it or not, this task was well into the realm of “things so long and resource-intensive that you run them overnight,” and I was curious to know just how much of the time was spent simply loading the file into main memory, even before selecting the relevant years and projecting on the attributes requested then writing output to disk. Here’s a nice .ado file I’ve been using ever since. I think it’s an important command to help users find out which commands are the most efficient for their purposes, and I’m surprised Stata doesn’t include it.
program define clock
di “starting at ” c(current_time)
timer on 1
timer off 1
quietly timer list
di “finished at ” c(current_time)
di “total length was ” r(t1) ” seconds”
A fun extension I was considering would be to have it write out to a simple comma-delimited log something like the current timestamp, the command run, and the time taken to execute. For those sharing a server with coworkers like I was, it would be interesting to see how certain commands’ efficiencies compare under different network conditions. For instance, I always wondered what the cost was of having two users load data from disk into memory at the same time, (perhaps) causing non-sequential disk reads with extra seeks. Users who don’t share resources might still be interested in evaluating efficiency while other programs are running.