Automating Log Checks: A Case for Innovation in Statistical Programming


September 26, 2024

Innovation in statistical programming is essential for the evolving clinical development landscape, where the demands of managing and analyzing large datasets is growing exponentially. As clinical trials become more data-intensive and require faster timelines, outdated manual processes and inefficiencies in statistical programming workflows can cause costly delays and even impact the integrity of trial results.

One of the biggest challenges faced by statistical programmers is managing the large volume of outputs — tables, graphs, and listings — generated during trials, particularly in later phases. A few years ago, I faced a similar challenge that many programmers dealt with then and continue to struggle with today — the tedious and error-prone process of manually checking log files. To address this, I developed an automated solution that significantly improved the speed and accuracy of the process.

In this blog, I share how this simple, yet innovative solution solved the problem of manual log checks.

The Challenge

As part of Cytel’s Functional Service Provider engagement with a large biotech company, I led a team of statistical programmers working on a Phase 4 immunology trial. The project had a tight deadline, requiring us to deliver 150 to 200 outputs within a couple of days. We used a variety of macros to produce these outputs, from box plots and Kaplan-Meier (KM) plots to efficacy tables.

Before submitting a task, each log had to be manually checked for 1) errors, 2) warnings, 3) uninitialized variables, 4) MERGE statements with repeated BY values, 5) character to numeric conversions, and 6) numeric to character conversions. If any issues were found in the logs, the code had to be updated and rerun until the logs were clean.

This manual process quickly became tedious and time-consuming. Moreover, it was prone to error as there was always the risk of missing specific terms in the review cycle. Recognizing the inefficiency of this approach, I began brainstorming a more effective solution, ultimately creating an automated log-checking process.

 

The Solution

After a couple of weeks of trial and error, I created a code that automated the log-checking process. The code was designed to automatically scan the logs for:

  • Errors
  • Warnings
  • Uninitialized values
  • MERGE statement that has more than one data set with repetitions of BY values
  • Character values that have been converted to numeric values
  • Numeric values that have been converted to character values
  • Missing values that were generated as a result of performing an operation on missing values

 

The syntax is as follows:

options noxwait;

X ‘Findstr/i “error warning repeats” input-path\progname.log >output-path\log_check.txt’;

X ‘output-path\log_check.txt’;

 

First Command:

options noxwait;

This command makes the Command Prompt window close automatically without your having to type EXIT when the process is finished.

Second Command:

X ‘Findstr/i “error warning repeats” input-path\progname.log >output-path\log_check.txt’;

This command searches for specific words in double quotes within the log files located on the input-path and records them in the text file on the output-path.

X: Enables you to enter UNIX commands without ending the SAS session

FINDSTR (short for “find string”): Locates files containing a specific string of plain text

/i: Specifies that the search is not case-sensitive

“error warning repeats”: Words that need to be searched in log files

input-path\progname.log: Input path and name of logs. You can specify a regular expression *.log to check all log files or abc_xyz_*.log for similar types of logs with different subtypes.

> output-path\log_check.txt: Redirects the output of the program to output-path\log_check.txt

Third Command:

X ‘output-path\log_check.txt’; 

This command opens the text file on the output-path, which lists the specific words mentioned in double quotes.

If the log check file is empty, it means no issues were detected and no further action is needed. If issues are found, the file lists the program name followed by lines from the log check files that contain any of the words mentioned in the second command. Once the issues are resolved, the batch codes are rerun, and the process is repeated until the log check file is empty. The goal is to reach a stage where no issues are detected, resulting in an empty log check file.

 

Outcome

Despite the effort involved in manually verifying the logs files, we were never sure that we had caught everything. Fortunately, with just three commands, my code eliminated the need to spend time manually checking logs once it was integrated into our batch files. The code, a clean and concise MS-DOS script, is far simpler than the existing complex SAS solutions that achieve similar objectives. Using code to automate the process, we not only saved time but also improved the accuracy and reliability of our outputs. It can be used as a macro or snippet in the program, offering flexibility for different workflows.  With this automated process in place, we consistently saw clean log check files, boosting our confidence in the final results.

 

Final Takeaways

With the growing emphasis on innovation, it is crucial to implement practical, user-friendly solutions that meet the evolving needs of clinical trial data analysis. While more sophisticated tools and SAS platforms with built-in macros have emerged in recent years, many of them are overly complex. My code offers a simple alternative that can be used within macros to automate log checks. It’s a flexible, adaptable solution that continues to benefit statistical programmers today.

Learn More
Subscribe to our newsletter

Dr. Sharayu More-Deshmukh

Principal Statistical Programmer

Dr. Sharayu More-Deshmukh is Principal Statistical Programmer at Cytel. She has over 10 years of experience, primarily in Oncology and Immunology. Prior to joining Cytel, she was a Homeopathic practitioner, before later deciding to turn to Bioinformatics, leading to her career in SAS programming.

Read full employee bio

Claim your free 30-minute strategy session

Book a free, no-obligation strategy session with a Cytel expert to get advice on how to improve your drug’s probability of success and plot a clearer route to market.

glow-ring
glow-ring-second