C String Functions

ECE 8473, Fall 2022


Table of Contents


1. C character strings and string functions

In C, character strings are just arrays of characters, with a terminating zero byte marking the end of the string.

This implies that you can not put a zero byte in the middle of a string;

you can, but the standard library functions will treat the first zero byte as the end of the string.

Example:
        char s[80] = { 'a', 'b', 'c', '\0', 'd', 'e', 'f', '\0' }
or
        char s[80] = "abc\0def";
strlen(s) will return 3.

There is not much you can do with strings in C without using the standard library string functions.

For example, to change the value of the string s above you can not do:

        s = "xyz"; /* illegal */
because C does not copy arrays (except, of course, in an initialization part of a declaration).

But you could do either:

        s[0] = 'x';  s[1] = 'y';  s[2] = 'z';  s[3] = '\0';
or
        strcpy( s, "xyz");
Note that the illegal statement above would be legal if the left-hand side was a pointer:
        char *p;

        p = "xyz";      /* OK */
because, in this case, the array of characters "xyz" is not copied,

only the address of that array is assigned to p.


2. string length

The length of a string is the number of characters in it, not including the terminating zero byte.
        strlen( "hi")  is  2
        strlen( "")    is  0
It's easy to compute a string length. Given an array of characters s containing a string:
        int len = 0;

        while( s[len] != '\0')
            ++len;
If `len' is the length of a string s, then s[0]...s[len-1] are the characters in the string,

and s[len] is the terminating zero byte.

Putting the above into a function:

        int strlen( char s[])
        {
            int len = 0;
            while( s[len] != '\0')
                ++len;
            return len;
        }
The standard C library definition of strlen is similar, but since the length of a string can not be negative, `size_t', the unsigned integral type returned by sizeof, is used instead of int.

Also, since strlen() does not modify the string contents, the string argument can be declared as an array of constant characters:

        size_t strlen( const char s[])
The `const' in the declaration tells the user that the function does not modify the string contents,

and it tells the compiler to generate an error message if the function definition attempts to modify the string contents.

Since C does not pass arrays to functions, only the array starting address, array arguments are usually declared using address (pointer) notation,

though it really makes no difference in writing the function:

        size_t strlen(const char *s)
        {
            size_t len = 0;
            while( s[len] != '\0')
                ++len;
            return len;
        }

3. string.h functions

The first part of each string.h function description below was mostly copied from K&R:

The C Programming Language, Second Edition, by Brian W. Kernighan and Dennis M. Ritchie

3.1. strchr

char *strchr( const char *s, int c);
strchr returns a pointer to first occurrence of c in s or NULL if not present.

The terminating '\0' of s is considered to be part of the string.

For example, to replace all the '/' characters in a string str with '\\':

        char *p;

        while( (p = strchr( str, '/')) != NULL)
            *p = '\\';
We could improve the efficiency of the above by having strchr start its search for the next match at the position of the current match, i.e.
        char *p = str;

        while( (p = strchr( p, '/')) != NULL)
            *p = '\\';
Since strchr considers the terminating '\0' to be part of the string, it can be used to compute the string length:
        len = strchr( str, '\0') - str;

3.2. strrchr

char *strrchr( const char *s, int c);
strrchr returns a pointer to last occurrence of c in s or NULL if not present.

The terminating '\0' of s is considered to be part of the string.

strrchr is like strchr, except it starts its search for the character at the end of the string and goes through the string in reverse order.

For example, if we have a path name like "/user/perry/osp/a1/main.c" and want to get a pointer to the filename part, "main.c",

we could use strrchr to search for the last slash:

        char *p = strrchr( path, '/');
        char *q = (p == NULL) ? path : p+1;     /* pointer to filename part */

3.3. strstr

char *strstr( const char *s1, const char *s2);
strstr returns a pointer to first occurrence of string s2 in s1, or NULL if not present.

strstr searches a string for a sub-string.

For example, to search stdin for lines containing the phrase "beef stew":

        #include <stdio.h>
        #include <string.h>
        char line[BUFSIZ];
        ...
        while( fgets( line, BUFSIZ, stdin) != NULL)     /* read one line */
            if( strstr( line, "beef stew") != NULL)     /* check for match */
                fputs( line, stdout);
strstr can be written using strlen and strncmp.

3.4. strcmp

int strcmp( const char *s1, const char *s2);
strcmp compares string s1 to s2 and returns <0 if s1<s2, 0 if s1==s2, or >0 if s1>s2.

Note: the characters are compared, not the pointers.

For example, to search stdin for lines containing only "beef stew\n":

        while( fgets( line, BUFSIZ, stdin) != NULL)     /* read one line */
            if( strcmp( line, "beef stew\n") == 0)      /* check for match */
                fputs( line, stdout);

3.5. strcpy

char *strcpy( char * restrict s1, const char * restrict s2);
strcpy copies string s2 to string s1, including '\0', and returns s1.

3.6. strcat

char *strcat( char * restrict s1, const char * restrict s2);
strcat concatenates string s2 to end of string s1, and returns s1.

strcat can be written using strlen and strcpy:

        strcpy( &s1[strlen(s1)], s2);
Since strcat and strcpy return their first argument, the function return value can be used to simplify the writing of certain code.

For example, say we have a PC directory name and a file name, and we want to construct a complete path name:

        char dir[80] = "C:\\OSP\\A3";
        char fname[80] = "X.C";
        char path[80];
We want to set path to "C:\\OSP\\A3\\X.C" which requires copying the directory name, appending a backslash, then appending the filename:
        strcpy( path, dir);
        strcat( path, "\\");
        strcat( path, fname);
This can be done in one statement, using the strcpy and strcat return values as arguments to strcat:
        strcat( strcat( strcpy( path, dir), "\\"), fname);

3.7. strncpy

char *strncpy( char * restrict s1, const char * restrict s2, size_t n);
strncpy copies at most n characters of string s2 to s1, and returns s1.

strncpy pads s1 with '\0's if s2 has fewer than n characters.

If s2 contains n or more characters, a '\0' is not copied to s1.

For example, to replace "abc" in a string str with "xyz":

        char *p = strstr( str, "abc");
        if( p != NULL)
            strncpy( p, "xyz", 3);
Another way of describing strncpy is that it copies exactly n characters into s1, using 0's if s2 contains less than n characters.

strncpy is not a direct safe replacement for strcpy.

For example, to prevent this buffer write overflow:

  char b[10], s[100];
  ... /* now suppose strlen(s) is 50 */
  strcpy( b, s); /* overflows b */
we can't just replace strcpy with strncpy:
  strncpy( b, s, 10); /* can not overflow b */
because now b does not contain a terminating zero byte, so string access to b will go past the end of the array, i.e. a buffer read overflow.

The correct way to use strncpy in this case is:

  strncpy( b, s, 10); /* can not overflow b */
  b[9] = 0; /* terminate b */

3.8. strncat

char *strncat( char * restrict s1, const char * restrict s2, size_t n);
strncat concatenates at most n characters of string s2 to string s1, terminates s1 with '\0', and returns s1.

Examples:

        strcpy( str, "abc");
        strncat( str, "def", 2);
Now str contains "abcde".
        strcpy( str, "abc");
        strncat( str, "def", 9);
Now str contains "abcdef".

3.9. strncmp

int strncmp( const char *s1, const char *s2, size_t n);
strncmp compares at most n characters of string s1 to string s2, returns <0 if s1<s2, 0 if s1==s2, or >0 if s1>s2.

For example, to search stdin for lines starting with "int":

        while( fgets( line, BUFSIZ, stdin) != NULL)     /* read one line */
            if( strncmp( line, "int", 3) == 0)          /* check for match */
                fputs( line, stdout);

3.10. strspn

size_t strspn( const char *s1, const char *s2);
strspn returns the length of the prefix of s1 consisting of characters in s2.

In other words, strspn counts how many of the leading characters of s1 are from the set of characters specified in s2

and stops when it finds a character which is not in s2.

Say, for example, that you want to strip off the leading blanks and tabs on lines from stdin:

        int n;

        while( fgets( line, BUFSIZ, stdin) != NULL)     /* read one line */
        {
            n = strspn( line, " \t");
            fputs( &line[n], stdout);
        }
strspn can be written using strchr:
        size_t n = 0;

        while( s1[n] != '\0'  &&  strchr( s2, s1[n]) != NULL)
            ++n;

        return n;

3.11. strcspn

size_t strcspn( const char *s1, const char *s2);
strcspn returns the length of the prefix of s1 consisting of characters NOT in s2.

In other words, strcspn counts how many of the leading characters of s1 are NOT from the set of characters specified in s2.

3.12. strpbrk

char *strpbrk( const char *s1, const char *s2);
strpbrk returns a pointer to the first occurrence in string s1 of any character of string s2, or NULL if none are present.

For example, to find the first blank or tab in string str:

        char *p = strpbrk( str, " \t");

strpbrk can be written using strchr.

3.13. strtok

char *strtok( char * restrict s1, const char * restrict s2);
strtok searches s1 for tokens delimited by characters from s2.

A sequence of calls of strtok( s1, s2) splits s1 into tokens, each delimited by a character from s2.

The first call in a sequence has a non-NULL s1. It finds the first token in s1 consisting of characters not in s2; it terminates that by overwriting the next character of s1 with '\0' and returns a pointer to the token.

Each subsequent call, indicated by a NULL value of s1, returns the next such token, searching from just past the end of the previous one.

strtok returns NULL when no further token is found.

The string s2 may be different on each call.

strtok has to keep a static local variable to keep track of where it stopped in the string the last time it was called, i.e.

        static char *p = NULL;
strtok is useful for splitting a line into separate words.

For example, with lines of input like

copy Amat b
the program fragment below uses strtok in the copy() function, which does not have direct access to the line variable in main:
#include <stdio.h>
#include <string.h>

#define SPACE " \t\n"           /* whitespace on input lines */
...
void copy( void)
{
    char *from = strtok( NULL, SPACE);
    char *to = strtok( NULL, SPACE);
    char *extra = strtok( NULL, SPACE);

    if( from == NULL  ||  to == NULL  ||  extra != NULL)
    {
        error( "copy takes two arguments");
        return;
    }

    if( strcmp( from, to) != 0)
        mm_copy( from, to);
}
...
int main( void)
{
    char line[BUFSIZ];          /* one line of user input */
    char *word;                 /* pointer to one word of the input line */

    while( fgets( line, BUFSIZ, stdin) )
    {
        if( (word = strtok( line, SPACE)) == NULL)      /* empty line */
            continue;

        ... check for matching command, call a function like copy() above
    }
}

4. mem... functions

The mem... functions are meant for manipulating objects as character arrays.

The intent is an interface to efficient routines which can be implemented using low-level machine instructions.

The pointer arguments are `void *' so they can accept any pointer type.

However, you can not use a void pointer to point to anything.

Therefore, these functions must use local `char *' variables initialized with their `void *' arguments, e.g.

int     memcmp  (const void *v1, const void *v2, size_t size)
{
        const unsigned char *p1 = v1;
        const unsigned char *p2 = v2;

        ... use p1 and p2 to access the elements of v1 and v2 ...
}

4.1. memchr

void *memchr( const void *v, int c, size_t size);
memchr returns a pointer to the first occurrence of character c in v, or NULL if not present among the first `size' characters.

4.2. memset

void *memset( void *v, int c, size_t size);
memset places character c into the first `size' characters of v, and returns v.

Example implementation in C:

        void   *memset  (void *v, int c, size_t size)
        {
            char *p = v;

            while( size > 0 )
            {
                --size;
                p[size] = c;
            }
            return v;
        }

4.3. memcmp

int memcmp( const void *v1, const void *v2, size_t size);
memcmp compares the first `size' characters of v1 with v2, and returns as with strcmp.

For example, to compare two arrays of double:

        double x[NMAX], y[NMAX];
        int n; /* number of elements used */
        ...
        if( memcmp( x, y, n * sizeof(double)) == 0)
            do_something; /* arrays contain the same data */
        else
            do_something_else;

4.4. memcpy

void *memcpy( void * restrict v1, const void * restrict v2, size_t size);
memcpy copies `size' characters from v2 to v1, and returns v1.

For example, to copy array y to x from the example above:

        memcpy( x, y, n * sizeof(double));

4.5. memmove

void *memmove( void *v1, const void *v2, size_t size);
memmove is the same as memcpy except that it works even if the objects overlap.

Consider the case of copying overlapping objects using a strcpy function which copies from the beginning of the string:

        char str[80] = "abcdefg";
        char *p = &str[2];  /* p points to the 'c' in str */
strcpy( str, p) will work and str will contain "cdefg".

But strcpy( p, str) will not work properly.

If it did, you would expect str to end up as "ababcdefg".

But what happens is that the 'a' is copied into the 'c' position,

then 'b' on top of 'd',

then 'a' (now in the position of the original 'c') on top of 'e',

then 'b' ... etc. str ends up as:

        "ababababab.... never stops!
However, memmove( p, str, strlen(str)+1) will work as expected.

The +1 is needed so that the terminating zero byte will be copied.

If memcpy copies from the beginning of the object, then memmove can just call memcpy if v1 < v2. If v1 > v2, memmove has to do the copy starting from the end of v2 to the end of v1, copying the bytes in reverse order.


5. strerror

char *strerror( int errnum);
strerror returns a pointer to an implementation-defined string corresponding to error `errnum'.

On some Unix systems, strerror is implemented using an internal array of strings:

 int   sys_nerr;          /* number of error messages */
 char *sys_errlist[];     /* array of error messages */
If 0 <= errnum < sys_nerr, the strerror return value is sys_errlist[ errnum]; otherwise it may return NULL or some string, and may or may not set errno depending on the system.

The standard library function perror, which prints an error message corresponding to the most recent I/O or system error to stderr, can be written using strerror:

        #include <stdio.h>
        #include <string.h>
        #include <errno.h>        /* something like: extern int errno; */

        void perror( const char *s)
        {
            char *p = strerror( errno);

            fprintf( stderr, "%s: %s\n", s, (p == NULL) ? "" : p);
        }

6. strcoll and strxfrm

strcoll() compares two strings using a comparison rule that depends on the current locale.

strxfrm() transforms a string based on the current locale.

strcoll() and strxfrm() are not discussed further in this document.


7. Using a pointer to access an array

When an array is passed as a function argument,

only the starting address of the array is passed,

and the function receives this address in a pointer variable

which can be used as a local variable inside the function.

For example, rewriting strlen using pointer access instead of array indexing:

        size_t strlen( const char *s)
        {
            size_t len = 0;
            while( *s != '\0') { ++len; ++s; }
            return len;
        }
Alternatively, the string length can be computed by subtracting pointers to the beginning and end of the string:
        size_t strlen(const char *s)
        {
            const char *start = s;

            while( *s != '\0') ++s;

            return s - start;
        }
Functions which operate on arrays of int or double can also be written using pointer access.

For example, for the sum of an array of double:

        double sum( const double x[], int n)  /* x is `const double *' */
        {
            double r = 0;

            while( n > 0)
            {
                r += *x;  --n;

                ++x;    /*  ++x  adds  1 * sizeof(double)  */
            }
            return r;
        }