Nuts and Bolts Code Analysis Lead image: Lead Image © Andrey KOTKO, 123RF.com

Static code analysis finds avoidable errors

At the Source

Static code analysis tools like JSLint, Splint, RATS, and Coverity help you find code vulnerabilities. By Tobias Eggendorfer

Buffer overflow attacks have been understood since 1972, and yet buffer overflows still dominate the warning lists of security analysts today. In this article, I make a plea for compliance with coding standards, more and better source code reviews, and the use of good tools for static analysis that can improve quality and security.

Lack of training and inadequate quality assurance and checks are the root causes of breakdown-prone software. As early as 1972, scientists described the first, albeit theoretical, buffer overflow attack [1], and SQL injections [2] have been around since 1998. Both still account for the majority of IT security vulnerabilities today, and both can be easily avoided: buffer overflows with n functions (e.g., strncpy() instead of strcpy()) and SQL injections with prepared statements. But who makes sure developers adhere to these practices?

Although coding standards that prescribe secure functions would help avoid repeating these classic errors, hardly any company or project even lays down mandatory rules, let alone monitors compliance with them. Therefore, very few developers tend to orient their work on security targets.

Brackets and Indentation

Having to check manually whether a fellow programmer has, for example, correctly indented their curly braces is unlikely to make code review a more attractive task. Tools can and should do this kind of no-brainer work, but they have to be flexible.

Opinions differ, even in terms of the visual arrangement of curly braces and their indentation depth. Although I can never completely shake off my Pascal past and always start a block in a new line with a bracket (Pascal begin style), many view the loop header or the condition as sufficient and place the opening bracket at the end of the line. The pseudocode in Listing 1 illustrates the differences.

Listing 1: Two Ways to Indent Code

01 if (x == y) then
02   {
03      do something;
04   }
05
06 while ( x < y) {
07   x = x+1;
08   }

The indentation depth in the source code is equally a matter of interpretation. Two spaces? A tab? What is the maximum number of spaces equivalent to a tab?

Although a supporting tool can clean up indentation, will it enforce the ban on strcpy() and other functions that are tagged as deprecated? As an example, you can find a comprehensive example of C coding standards for the development of embedded software online [3].

Some development environments, such as Eclipse or Vim, support full or partial compliance with standards out of the box, as do independent programs such as Uncrustify [4].

School of Life

Even a visual cleanup can help you identify errors in control structures. I learned this early on in school when a learning program written by my teacher suddenly started to behave in a peculiar way. Thanks to me commenting on it, she handed me the source text with the homework task of removing any potential bugs.

The code was a mess: sometimes indented, sometimes not. After running a code beautifier against it, it turned out that the block ends were chaotic and random. To keep the compiler from complaining about not having enough block ends, the teacher had simply moved them elsewhere. As this anecdote shows, errors can be avoided simply by adopting visually clean programming techniques.

JSLint: Bane of JavaScript Authors

Code analysis tools go one step beyond a clean approach. For JavaScript, for example, JSLint [5] offers an online check, which you can see in Figure 1 testing the quadrat.js script I borrowed from an online tutorial [6]. You can restrict the far too detailed criticism by telling JSLint to restrict itself to one browser. Nevertheless, in some cases, it is simply too harsh. For example, it likes to criticize the handling of quotation marks, although generally accepted references [7] take a far more relaxed view of this issue.

Figure 1: JSLint complains about a sample program (CC BY-SA 3.0 [8]; German translated to English).

Splint for C Programmers

Like JSLint, Splint [9] for C offers thorough code analysis. Listing 2 shows a deliberately broken sample program. The potential buffer overflow in line 8 and the format string vulnerability in line 9 are even noticeable in a quick manual review. However, GCC compiles the source code and creates an executable program without putting up much of a fight. Figure 2 confirms how broken the program is.

Listing 2: Don't Do This at Home!

01  #include <strings.h>
02  #include <stdio.h>
03
04  #define BUFSIZE 10
05  int main(int argc, char * argv [])
06    {
07      char buffer [BUFSIZE];
08      strcpy(buffer, argv[1]);
09      printf(buffer);
10    } // end main()

Figure 2: Disaster is imminent with this sample program.

Before Splint can show whether it can identify the bugs, you need to go through classic steps of downloading, unpacking, compiling, and perhaps installing:

tar -xzf splint-3.1.2.src.tgz
cd splint-3.1.2
./configure
make
make install # if so desired

After winning this battle, Splint needs to know where it can find the header file details. If you omit the install step, they will be in the lib subdirectory, which results in the possible path:

export LARCH_PATH=home/<user>/code/splint-3.1.2/lib/

Some distributions also keep the current Splint binaries in their repositories.

Now, you're ready to go:

/bin/splint -strict example1.c

Listing 3 shows the output and thus the full extent of the digital horror story. Splint uncompromisingly reveals any sloppiness; in fact, it outputs seven warnings, which is far more than gcc -Wall (i.e., output all warnings) shows in Listing 4.

Listing 3: Splint Warnings

splint -strict example1.c
Splint 3.1.2 --- 11 May 2019
example1.c: (in function main)
example1.c:9:5: Format string parameter to printf is not a compile-time
                   constant: buffer
  Format parameter is not known at compile-time.  This can lead to security
  vulnerabilities because the arguments cannot be type checked. (Use
  -formatconst to inhibit warning)
example1.c:9:5: Called procedure printf may access file system state, but
                   globals list does not include globals fileSystem
  A called function uses internal state, but the globals list for the function
  being checked does not include internalState (Use -internalglobs to inhibit
  warning)
example1.c:9:5: Undocumented modification of file system state possible from
                   call to printf: printf(buffer)
  report undocumented file system modifications (applies to unspecified
  functions if modnomods is set) (Use -modfilesys to inhibit warning)
example1.c:10:4: Path with no return in function declared to return int
  There is a path through a function declared to return a value on which there
  is no return statement. This means the execution may fall through without
  returning a meaningful result to the caller. (Use -noret to inhibit warning)
example1.c:8:5: Possible out-of-bounds store: strcpy(buffer, argv[1])
    Unable to resolve constraint:
    requires maxRead(argv[1] @ example1.c:8:20) <= 9
     needed to satisfy precondition:
    requires maxSet(buffer @ example1.c:8:12) >= maxRead(argv[1] @
    example1.c:8:20)
     derived from strcpy precondition: requires maxSet(<parameter 1>) >=
    maxRead(<parameter 2>)
  A memory write may write to an address beyond the allocated buffer. (Use
  -boundswrite to inhibit warning)
example1.c:8:20: Possible out-of-bounds read: argv[1]
    Unable to resolve constraint:
    requires maxRead(argv @ example1.c:8:20) >= 1
     needed to satisfy precondition:
    requires maxRead(argv @ example1.c:8:20) >= 1
  A memory read references memory beyond the allocated storage. (Use
  -boundsread to inhibit warning)
example1.c:5:14: Parameter argc not used
  A function parameter is not used in the body of the function. If the argument
  is needed for type compatibility or future plans, use /*@unused@*/ in the
  argument declaration. (Use -paramuse to inhibit warning)
Finished checking --- 7 code warnings

Listing 4: GCC Warnings

gcc -Wall example1.c
example1.c:9:12: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
    printf(buffer);
           ^~~~~~
example1.c:9:12: note: treat the string as an argument to avoid this
    printf(buffer);
           ^
           "%s",
1 warning generated.

Bug Report

The first warning is a reference to the potential format string vulnerability in line 9 of Listing 2. This oversight is not difficult for a QA software tool to detect; because this printf() function requires two parameters, passing only one user-controlled parameters is dangerous. An attacker could simply pass in a format string to printf(), as shown in Figure 2. Printf then looks for data on the stack and tries to read out nonexistent data. With a few tricks, an attacker could thus also overwrite the stack content.

The solution is simple: printf() always needs at least one fixed format string and, if applicable, the strings to be used there. The correct version of line 9 is thus:

printf("%s", buffer)

The second, new warning points to untidy style: main() is declared with an integer type return value, with no return statement in the control flow. If main() does not see a return value, it can lead to unpleasant surprises.

The third warning that Splint issues is about the buffer overflow. Splint failed to see how the programmer had prevented writing beyond the array boundaries of the buffer variable. Although Splint complains about this in a fairly circumstantial way, it is describing the potential overflow. The solution is strncpy(). By the way, this function was added to the ISO C standard in 1990 and is definitely mature enough by now for programmers to use it by default.

In the same line, Splint detects another error: a possible out-of-bounds read. What happens if a call is made without any further parameters? Instead of checking argc first, the program simply assumes there is a parameter. If you try this, you run into another segmentation fault. The last warning also indicates this error: argc was declared as a function parameter, but never used.

If you regularly run Splint against your programs, it will quickly discover small careless errors, and you will be warned of major problem. This approach should be standard, because this type of quality assurance effectively prevents vulnerabilities.

Teaching Splint

Splint might warn you about something, because it lacks the information to assess your code correctly. However, you can fix such warnings by adding comments to the source code. A corrected version of Listing 2 is shown in Listing 5 with control information that keeps Splint from outputting some warnings.

Listing 5: Splint Control Info

01  #include <strings.h>
02  #include <stdio.h>
03
04  #define BUFSIZE 10
05  int main(int argc, char * argv [])
06    /*@requires maxRead(argv) >= 1 @*/
07    /*@-modfilesys@*/
08    {
09      /*@-initallelements@*/
10      char buffer [BUFSIZE] = {'\0'};
11      /*@+initallelements@*/
12      if ( ( argc > 0 )  && (argv != NULL) )
13        {
14          strncpy(buffer, argv[1], (size_t) (BUFSIZE-1));
15          printf("%s\n",buffer);
16        }
17      return 0;
18    } // end main()

The clues for Splint are bundled in C comments that start and end with at signs (@; e.g., see line 6). Here, the programmer promises that at least argv[1] exists, which puts pay to the out-of-bounds read warning. Line 12 does check for argc, as promised.

Line 7 assures Splint that no filesystem operations are planned, thus alleviating its worries about the printf() in line 15, which has now also been enriched by a proper format string. Line 10 is shorthand that fills the entire array with ASCII 0. Splint does not understand this syntax and would complain that not all elements of the array are filled, but line 9 disables this warning. To make sure Splint continues to examine the rest of the code, line 11 switches monitoring back on again. A plus or minus before the value does the trick.

In line 14, an explicit type conversion has been added – otherwise Splint would complain about the wrong data type. By the way, a classic programming error is taken into account here: C terminates strings with a binary zero, which consumes 1 byte in the buffer; that is, a maximum of nine readable characters fit into a 10-character buffer.

If you want to try this out, have a look at Listing 6, which includes all the required Splint instructions but still provokes an explicit warning (Figure 3).

Listing 6: Too Many Characters

01 #include <stdio.h>
02 #include <string.h>
03 #define BUFSIZE 10
04
05 int main(/*@unused@*/ int argc,
06          /*@unused@*/ char * argv []
07         )
08   /*@-modfilesys@*/
09   {
10     /*@unused@*/
11     char pwd[18] = "It is top secret!";
12     /*@-initallelements@*/
13     char target [BUFSIZE] = { '\0' };
14     /*@+initallelements@*/
15     char source [17] = "0123456789ABCDEF";
16
17     printf("source: %s\ntarget: %s\n",source,target);
18     strncpy(target, source, (size_t) BUFSIZE );
19     printf("source: %s\ntarget: %s\n",source,target);
20     return 0;
21   }

Figure 3: Splint warns the programmer of a likely out-of-bounds error.

It's Worth It

Adding comments to Splint source code sounds unproductive, but remember that Splint only criticizes what it deems to be unusual – that is, what a third party reading the source code would also stumble over. By the way, after a few months, even the program's author practically becomes a third party. Moreover, the Splint comments make you aware of where the pitfalls lie. Once seen and documented, they can also be avoided.

The comments are reminiscent of Hoare logic, a formal program verification method. The developer defines preconditions, a computational step, and an end condition. The final condition must be derived from the precondition by the calculation step. If you want to read about this in more detail, you can find the original article online [10].

This very formal approach has limited suitability for many application programs, but it is a good school of thought, because it forces you not "simply" to convert the individual program steps into code form, but really to rethink them.

RATS Catcher

If you find Splint too finicky and detailed, try domesticating RATS (rough auditing tool for security) [11]. Instead of concrete warnings, the tool gives more general hints on critical code locations. You can install RATS with the classic five-step sequence:

unzip rough-auditing-tool-for-security-master.zip
cd rough-auditing-tool-for-security-master
./configure
make
make install # if so desired

Listing 7 shows the RATS output at warning level 3 (-w3; compare with Listing 3 for Splint -strict reporting). If the tool only resides in the current directory and has not been installed, the path to rats-c.xml, which describes typical C errors, must be passed in with the -d option.

Listing 7: RATS Warnings

rats -w3 example1.c
Entries in c database: 310
Analyzing example1.c
example1.c:7: High: fixed size local buffer
Extra care should be taken to ensure that character arrays that are allocated on the stack are used safely. They are prime targets for buffer overflow attacks.
example1.c:8: High: strcpy
Check to be sure that argument 2 passed to this function call will not copy more data than can be handled, resulting in a buffer overflow.
example1.c:9: High: printf
Check to be sure that the non-constant format string passed as argument 1 to this function call does not come from an untrusted source that could have added formatting characters that the code is not prepared to handle.
Total lines analyzed: 11
Total time 0.000192 seconds
57291 lines per second

RATS messages help developers find the critical points in the code, and the explanations are quite understandable, especially with regard to security problems. However, Splint additionally points out further problems in the source code, beyond known security risks. Splint's checks are also more substantial. In comparison, the corrected program version in Listing 5 provokes two warnings from RATS, both false positives, triggered by the "fixed buffer size" and strncpy() detection features.

Where false positives raise their heads, the risk of overlooking genuine problems becomes more real. The resulting impression of having done everything possible in terms of security is fatal, of course. On the positive side, however, RATS can rightly claim that it is able to check code in many programming languages.

Commercial Coverity

Coverity is not free software, but it is well known for its detection capabilities in the open source scene. Coverity [12] is a commercial tool for static code analysis of C, C++, C#, and Java programs. It comes in a local variant and as a cloud service. The extent to which the cloud service is compatible with your own corporate policy and legal requirements such as GDPR is something you have to decide for yourself.

If you manage an open source project that is recognized by the manufacturer, you can use the cloud branch [13] to perform code analysis free of charge. The service is used by the Linux kernel project, and since this testing began, the code quality has increased significantly, as measured by the number of findings.

Virtue out of Necessity

If you want to get used to a thorough and clean programming style, going with Splint is undoubtedly a good idea – you will be in good company. Developers who also want to investigate every false positive thoroughly will find RATS a helpful companion.

In all cases, the results are important: enforcing quality assurance; rethinking and relearning from the constant, unyielding criticism of the check tools; and ensuring low-security-risk software. OpenBSD shows that static code analysis, reviews, and coding standards can make secure programming a reality, as evidenced by just two remotely exploitable security vulnerabilities in 20 years.