SI 204 Spring 2017 / Notes


This is the archived website of SI 204 from the Spring 2017 semester. Feel free to browse around; you may also find more recent offerings at my teaching page.

Unit 4: I/O

1 Streams

All input and output in standard C goes to a stream. Think of a stream as a potentially unending list of individual characters, where each read operation “consumes” some characters and moves past them, whereas a writing operation adds some characters to the end of the stream.

You’ve already been using streams in every fputs, writenum, and similar function calls. Now we’ll see what other kinds of streams exist, and how to use them in more interesting ways.

1.1 Standard input and output streams

You are by now quite familiar with the standard input and output streams, known as stdin and stdout respectively. They are normally used to get user input typed in at the terminal, and to print out to that terminal.

There is actually a third stream that is also defined by default, which is an output stream called stderr, or the “standard error” stream. By default, stderr also displays its output on the terminal like stdout. But unlike standard out, standard error is not buffered, so its output shows up immediately from your program (more on that below). Also, it can be convenient in testing sometimes to separate debugging messages (typically written to stderr) from normal program output written to stdout.

The best practice is to use stdout for all of your normal program print functions, and only when you are adding extra debugging messages to use stderr. This can be convenient for a number of reasons, but one of them is that it makes it very easy to search your program for the word stderr and remove those debugging messages before you submit your final product!

1.2 Creating streams from files

In the last unit, you saw how the command fopen can be used to create an input stream that reads from a file, like so:

cstring fname; // name of the file, like data.txt
stream fin;    // STREAM for the file, working like stdin

fputs("Enter a filename: ");
readstring(fname, stdin);

fin = fopen(fname, "r"); // <--- the magic is here!

int x = readnum(fin);
fputs("The first number in your file is: ", stdout);
writenum(x, stdout);
fputs("\n", stdout);

fclose(fin); // close the file that we opened

The first argument to the fopen command can be any file on your computer. If you don’t specify a directory explicitly, the current directory where you ran the program from will be searched for a file with that name.

The second argument to fopen specifies the mode of the open file. A mode of "r" means “reading” and indicates that the stream will be an input stream. This is what you have seen already.

Of course, you can also create an output stream from a filename. A mode of "w" means “writing” and causes three things to happen:

  1. If the file doesn’t exist, it is created.
  2. The file is “truncated” so that any current contents are removed and the file size goes down to zero.
  3. The returned stream will begin writing to the beginning of the file.

Notice the second step! This means that opening a file with "w" mode can be dangerous, because it will delete all the contents of the file if it already exists, without any warning.

In case you just want to add on to the end of the file and not overwrite the current contents, you can use mode "a" (for “append”).

There are also other modes that you can read about in the documentation, but for this class we will mostly just need to use reading "r" and sometimes writing "w".

As we learned in the last unit, if an error occurs during the file opening, then 0 is returned (which, as you know, will be interpreted as “false” in C). For "r" mode files, this means the file doesn’t exist or isn’t readable by the user running the program. For "w" and "a" mode files, it means the file can’t be created or can’t be changed by the current user.

And don’t forget to clean up after yourself! In the case of streams opened with fopen, this means calling fclose when you’re done in order to close the file properly, as in the example above.

1.3 Stream buffering

Doing input/output operations is typically a very slow operation for your program. Compare what has to happen in the computer when you, say, multiply two numbers, compared to writing a single char to the screen. To multiply numbers, your CPU (which is already running your program) executes a single instruction — super fast! But to write that char to your screen, it needs to send the written data to the host operating system, which in turn sends it to another program that’s running your terminal, and which ultimately causes your screen to update some pixels.

The easiest way to avoid some of that slowness is to delay sending all the characters for some time. Specifically, the characters are stored temporarily in a buffer (think of it as a holding area) until there are enough of them to all be sent at once.

In C, stdout and stdin are line-buffered, meaning that the buffer contents are sent along whenever a newline character '\n' is seen.

You can see the effects on stdin with the following program:

#include "si204.h"

int main() {
  cstring word;
  readstring(word, stdin);

  fputs("Read one word: ", stdout);
  fputs(word, stdout);
  fputs("\n", stdout);

  return 0;
}

If you run that program and type cheese on a line and hit enter, the program immediately finishes and prints out cheese. But if you type cheese and hit a space, the program hangs. You can keep typing more words with spaces, and the program sits and does nothing. Then when you finally hit enter, only the first word cheese is printed; the other words were never read.

In the second case, the program was waiting on the first word the whole time! Because stdin is line-buffered, the word cheese isn’t even sent to the program until the entire first line is ready to be sent.

Similarly, you can see the effects of buffering on stdout with the following example, which makes use of the usleep function to make the program pause and do nothing for one second:

#include "si204.h"
#include "unistd.h" // for usleep()

int main() {
  fputs("Begin test...", stdout);
  usleep(1000000); // pauses for 1 second
  fputs(" end test\n", stdout);

  return 0;
}

You can see in the code that this program prints the first part of the message, pauses for one second, then prints the second part. But when you run it, you see nothing for a whole second, and then the entire message is printed out.

As you probably guessed, this is because stdout is line-buffered too. The first fputs command really does execute before the program pauses for 1 second, but those characters are not printed to the screen until the newline in the second part of the message.

Other streams that you open with fopen are also buffered, but based on a fixed number of characters rather than newlines. When you close the stream with fclose, the first thing that happens is any buffered data that’s “on hold” is sent through.

Importantly though, stderr is not buffered at all. So if you ran the program above but replaced stdout with stderr, you would actually see the first part of the message, then a pause, then the second part! Because stderr is meant for debugging, the tradeoff in speed from buffering the output is “worth it” to be able to see your program’s output immediately.

1.4 Buffer flushing

Usually stream buffering is a good thing and you don’t want to mess with it. As long as you remember to properly close your files, everything should be fine.

But sometimes it’s necessary to manually specify that an output stream’s buffer should be emptied right now no matter what. To do that, you use the command fflush. In the example above, adding a line

fflush(stdout);

before the usleep command would cause the behaviour we probably expect — the first part of the message is output before the program pauses for a second.

2 Format strings and fprintf

2.1 The problem: too many calls to output functions

So far, you have to have a separate function call for every part of what you are writing to the screen. This can be a bit annoying when you have a mix between different types, and a single line of output can result in many lines in your code. For example:

#include "si204.h"

int main() {
  // The info we want to print
  cstring name = "Billy";
  int whichletter = 12;

  // print out the message
  fputs("Hello, ", stdout);
  fputs(name, stdout);
  fputs(". The ", stdout);
  writenum(whichletter, stdout);
  fputs("th letter of the alphabet is ", stdout);
  fputc('A' + (whichletter - 1), stdout);
  fputs(".\n", stdout);

  return 0;
}

Running that program prints just a single line of output:

Hello, Billy. The 12th letter of the alphabet is L.

But it took 7 lines of various print statements to get it! This is really what the computer has to do — the way numbers, strings, and characters are treated for printing is all different — but when you write larger programs it can be annoying to have to type in separate lines for all of this. If only there were a way to specify that entire thing on a single line…

2.2 fprintf to the rescue

The fprintf function does exactly that. It allows you to print a bunch of information, with a mixture of different types, all with a single command.

The most basic usage of fprintf is as follows:

fprintf(stdout, "Hello, world!\n");

That causes — you guessed it — Hello, world! to be printed in a single line to the screen.

The first argument, as you can see, is a stream, and the second argument is known as a format string that specifies what is to be printed. In the example above, the format string is just a normal string, so that’s what will be printed. In this simplest usage, fprintf is just like fputs, except that the stream comes first instead of second.

But fprintf can do much much more than fputs! The key is that we can insert special instructions, called conversion specifiers, into the format string in order to paste in other output between the regular characters. Every conversion specifier starts with a %, and then is followed by some other characters to describe what is being inserted and how. In the most basic usage, you will have %i for an integer, %g for a double, %c for a single char, or %s for a string.

For example, let’s say we have an int alpha that stores someone’s 6-digit alpha number. Then we could write their email address with three statements:

fputs("m", stdout);
writenum(alpha, stdout);
fputs("@usna.edu\n", stdout);

Or we can replace those three statements with a single fprintf as follows:

fprintf(stdout, "m%i@usna.edu\n", alpha);

Notice the %i in the middle of the format string — that’s where the alpha number will be inserted. Also notice that the arguments to fprintf that come after the format string, are the values that will be inserted according to the conversion specifiers.

You can also insert more than one value in the middle of the format string, by using multiple conversion specifiers, and adding more arguments to the end of the command. For example, the seven print lines in the “letters of the alphabet” program above can be written with a single fprintf statement as:

fprintf(stdout, "Hello, %s. The %ith letter of the alphabet is %c.\n",
        name, whichletter, 'A' + (whichletter - 1));

In that example, the format string contains three conversion specifiers: %s, then %i, then %c. So there are three corresponding arguments after the format string, first a cstring, then an int, and then a char.

There are of course more conversion specifiers than these that you can read about in the documentation, but mostly in this class you just need to remember %i for integers, %g for doubles, %c for chars, and %s for strings. Oh, and when you want to put an actual percent sign % in your output, you just double it up in the format string as %%.

2.3 Controlling the output format

Sometimes the way fprintf decides to format things might not agree with the way we’d like to see them formatted. For example, if x is a double with value 4.0, fprintf with %g will write “4”. Or sometimes you want to “line up” some numbers or strings in your output, by inserting extra blank spaces in front so they have the same width. Both of those tweaks can be accomplished by adding more information to the appropriate conversion specifier.

To line things up horizontally, you use the width specifier, which is just a number in the format string between the leading % sign and the conversion type such as i or s. For example, the following program to display a basketball score specifies the team name width is 10 and the score width is 3, so that the colons : and ends of the numbers will always line up:

cstring name1 = "Hornets";
cstring name2 = "76ers";

int score1 = 93;
int score2 = 102;

fprintf(stdout, "%10s: %3i\n", name1, score1);
fprintf(stdout, "%10s: %3i\n", name2, score2);

The result of running the code above would look good:

   Hornets:  93
     76ers: 102

The other thing you frequently want to adjust is the number of decimal places, also known as the precision of a double. You do that with a decimal point . and then a number for the number of digits, all after the leading % and before the type specifier such as g. Specifying a precision such as %.8g says to use up to 8 digits, counting before and after the decimal point.

That can be useful, but usually we want to specify a fixed precision after the decimal point, no matter what. In this case, you use the %f conversion instead of %g, which forces the output to never use scientific notation. For example, %.2f specifies to have exactly two digits after the decimal point (useful for printing monetary amounts!).

The width and precision can both be specified; in this case the width comes before the precision. For example, here is a small program to print out names and account balances, all nicely lined up and with the familiar two decimal points for cents:

cstring name1 = "Jared";
cstring name2 = "Elizabeth";

double amt1 = 128.5;
double amt2 = 15;

fprintf(stdout, "ACCOUNT BALANCES\n");
fprintf(stdout, "%12s: %6.2f\n", name1, amt1);
fprintf(stdout, "%12s: %6.2f\n", name2, amt2);

Running that code produces this output:

ACCOUNT BALANCES
       Jared: 128.50
   Elizabeth:  15.00

3 A quick look at pointers

Now it’s time to introduce one of the most powerful concepts in C programming, the mighty pointer. Simply put, a pointer is the memory address of some other object (such as an int, char or double). We know that everything in C is stored in memory, and pointers are the way that our program can access where in memory those things get stored.

As with any powerful tool, pointers can also be quite dangerous. When we deal with memory addresses, it’s very easy to do all sorts of horrible things like write an int where a double should be, or read from memory that doesn’t belong to our program. So, more than anything else in C so far, you are advised to only use pointers when you really NEED them, and otherwise stick to regular variables. You’ve been warned!

3.1 Getting and printing memory addresses

The ampersand operator &, when used before an expression, returns the memory address (or pointer) for that expression’s value. Usually we use this to get the address of a certain variable, so that if x is a variable, then &x is the address of x.

The format flag %p can be used in a fprintf statement to display a pointer. For example:

int x = 10;
fprintf(stdout, "The address of x is %p\n", &x);

Notice that we used %p inside the format string to tell fprintf to expect a pointer, and then we used &x to actually get the pointer to x. If you run that program the output will look something like

The address of x is 0x7ffcd3d11b20

What that’s showing you is the actual memory location in your computer where the compiler and operating system decided to store the 4 bytes for int x. The memory address is in hexadecimal notation, or base 16 — that’s what the 0x means.

Now when I said your output will look something like that, I really meant it. If you run this program on your computer, you will get a different memory address. And even if you run the same program on the same computer at different times, you are likely to get different addresses! That’s because every time you run a program, the operating system decides what part of memory that program gets to use. Since you are probably running other programs on your computer at the same time, the region of memory allocated to your program can change every time you run it.

We could play with this and probably learn quite a bit about the operating system, but that’s a topic for a different class. For our purposes, the thing to remember is that you can’t count on the specific value of a pointer in the program, since it can change every time the program is run. That is, we know for certain that &x is the address of the variable x, but we have no idea what that address will be!

3.2 Pointer types and dereferencing

Since a pointer such as &x is itself an expression in C, it must have a type. The pointer type is based on the type of the thing it’s pointing to, with an asterisk * afterwards. So if we have int x, then the type of &x is int*. Or if x were a double, then the type of &x would be double*, and so on.

Now that we have a type, we can declare variables that are pointers, using the pointer type as the variable’s type! Here’s a small example:

char letter = 'H';
char* ptr = &letter;
fprintf(stdout, "The address of %c is %p\n", letter, ptr);

Now what can we do with these pointer variables? Well, the obvious thing to do is dereferencing: getting the actual value that the pointer points to. To do that, you use the * operator right before the pointer value, and it turns it from a pointer back into the value that it’s pointing to. For example:

double length = 5.5;
double* lptr = &length;
fprintf("These two numbers are the same: %g = %g\n", length, *lptr);

Think of the * dereferencing operator as the opposite of the & operator. The & operator makes a pointer out of an object, and * makes an object out of a pointer.

Caution 1: We have now seen 3 different uses for the asterisk * in C, so let’s list them out and make sure we don’t get confused between them.

  1. Between two expressions, an asterisk * does multiplication, like (x + 3) * y
  2. After a type, an asterisk * refers to a pointer type, like int* x;
  3. Before an expression, an asterisk * does dereferencing, like *x

Caution 2: Things don’t work the way you expect when you try to declare multiple pointers on the same line! For example:

int x, y; // x and y are both int's
int* xptr, yptr; // WHOOPS! xptr is a pointer but yptr is an int
xptr = &x; // OK, the type of xptr is int*
yptr = &y; // ERROR, the type of yptr is int, not int*

This is just the way the C syntax works: when you declare multiple variables on the same line, only the type specifier (int in this case) applies to all the variables, whereas the * only applies to the variable name it’s next to (in this case, xptr). For this reason, many C programmers prefer to always put the * next to the variable name, like so:

int x, y; // x and y both have type int
int *xptr, *yptr; // xptr and yptr both have type int*

Another option is to just avoid declaring multiple variables (especially pointer variables) on the same line, like so:

int x;
int y;
int* xptr;
int* yptr;

Which way you go is a matter of personal preference; pick one option or the other and try to be consistent about it so you don’t get tripped up.

3.3 Pointers and modifying data

So what are pointers used for anyway? One of the main uses we’ll see for pointers is to refer to the same piece of data in multiple ways. Check out this example program:

#include "si204.h"

int main() {
  fprintf(stdout, "Enter x: ");
  int x = readnum(stdin);
  int y = x;
  int* xp = &x;

  fprintf(stdout, "The value of x is %i and the address of x is %p\n",
          x, &x);
  fprintf(stdout, "The value of y is %i and the address of y is %p\n",
          y, &y);
  fprintf(stdout, "The pointer xp is %p and the dereferenced value is %i\n",
          xp, *xp);

  fprintf(stdout, "\nEnter a new value for x: ");
  x = readnum(stdin);

  fprintf(stdout, "The value of x is %i and the address of x is %p\n",
          x, &x);
  fprintf(stdout, "The value of y is %i and the address of y is %p\n",
          y, &y);
  fprintf(stdout, "The pointer xp is %p and the dereferenced value is %i\n",
          xp, *xp);

  return 0;
}

And here’s a sample run of that code:

Enter x: 10
The value of x is 10 and the address of x is 0x7ffc475e0404
The value of y is 10 and the address of y is 0x7ffc475e0400
The pointer xp is 0x7ffc475e0404 and the dereferenced value is 10

Enter a new value for x: 15
The value of x is 15 and the address of x is 0x7ffc475e0404
The value of y is 10 and the address of y is 0x7ffc475e0400
The pointer xp is 0x7ffc475e0404 and the dereferenced value is 15

As you can see, changing the value of x from 10 to 15 does not (directly) change either of the other variables; y is still 10 and xp is still 0x7ffc475e0404. But since xp points to x, the value of *xp, the dereferenced value, goes from 10 to 15 when x is changed. In other words, x and *xp will always be the same value as long as xp points to x.

3.4 You are using pointers already

In your use of the si204.h file header file, you have already been using two kinds of pointers even though you didn’t know it!

The first is stream. While this looks like any other type in C, it is actually a synonym for the type FILE* — that is, a pointer to a type called FILE. Yes, even stdin and stdout are considered FILEs in C! This FILE type is part of the C standard library from the header file <stdio.h> (more on that later).

The second type of pointer you have been using is cstring, which is actually an array of chars, which in most contexts is the same as char*, a pointer to the first character in the string. We’ll learn more about arrays in a few weeks, but for now just remember that cstring should be treated like a pointer type char*.

Now for a deeper question: Why are stream and cstring pointer types? The reason is how they are used. Both stream and cstring variables are passed into functions that need to modify the underlying data. Consider the following code, which by now should be quite familiar to you:

stream fin = fopen("somefile.txt", "r");
cstring word;
readstring(word, fin);

What’s actually happening in that readstring function call? It’s copying some characters from the file into the word variable’s storage, and moving the file position forward so that the next read operation won’t try to read the same thing again.

This means that both the underlying string word and file stream fin have to be modified in order to make things work. This is why it makes sense that the readstring function needs the address of these pieces of data, so that it can make changes to what’s stored there.

The details of passing an address to a function, rather than the object itself, will be more clear in the next unit when we learn about functions, but hopefully the general idea is starting to make sense already. The most important thing to remember for now is to treat streams and cstrings like pointers when you use them.

4 fscanf

The fscanf function does for reading what fprintf does for writing: it gives us a convenient way to read multiple variables in a single line of code, using a format string. We’ll look at the most important features of fscanf below, and as always you can also browse the complete documentation online or by typing man fscanf in the terminal.

4.1 Using fscanf

As with fprintf, the format string consists of regular characters mixed in with conversion specifiers. These conversion specifiers again start with a percent sign % followed by some type indicator. These are mostly the same as in fprintf%i for an int, %c for char, and %s for strings. But double is different — you have to use %lg (notice that’s a lowercase ell l) instead of a plain %g, to indicate that you want an 8-byte double and not a 4-byte float.

For example, to read in an integer into a variable n using fscanf you would do:

int n;
fprintf(stdout, "Enter n: ");
fflush(stdout);
fscanf(stdin, " %i", &n); // <-- look carefully here!
fprintf(stdout, "The number you entered is %i\n", n);

What do you notice in the code snippet above?

  • The fflush(stdout) is necessary to ensure that the prompt "Enter n: " is printed out before the program waits for the user to type in an integer. Remember, stdout is line-buffered by default, which means there’s no guarantee that text will show up on the screen until a newline is printed or we manually call fflush.

  • There’s a space before the %i in the fscanf format string. Any space in a fscanf format string causes the program to read and skip any whitespace characters before reading the next part.

    (Actually "%i" will also skip whitespace characters before reading the number, but "%c" doesn’t skip whitespace automatically. So I (Dr. Roche) think it’s a good habit to always put the space in front of your scanf if you want it to skip whitespace.)

  • The argument to fscanf is given as a pointer &n rather than the plain variable n. This is so the fscanf function gets the address in memory where it will store the result.

Just as with fprintf, we can also read multiple variables in a single call to fscanf. This involves passing more arguments to the fscanf function, in the same order as the corresponding format specifiers. Just remember that every argument to scanf must be a pointer, because the function needs to know the addresses where it will save the data that it reads.

Here’s another example. To read the name and price of some item formatted like

Chocolate costs $1.50

we could use the following program:

cstring food;
cstring costsword;
char dollarsign;
double price;

fscanf(stdin, " %s %s %c%g", food, costsword, &dollarsign, &price);

Take careful note of the fscanf call there. As before, we put a space in front of the %s and %c specifiers to indicate that fscanf should skip any whitespace before reading the food name and before reading the dollar sign. (Technically we only really need this space in front of the %c, but Dr. Roche likes the habit of putting spaces in the format string whenever a space might be expected in the input.)

There’s something else interesting in the fscanf call above — you pass the cstring arguments directly without calling the address operator &. Remember from the last section that the type cstring can (in most situations) be thought of like char*, i.e., it’s already a pointer type. We don’t have to take the address of it for fscanf because it’s already an address!

One more thing about fscanf format strings: any non-whitespace characters that are not conversion specifiers with a % indicate characters that fscanf should “expect” to see and then ignore. So actually in the previous example, we could forget about the useless costsword and dollarsign variables and just do:

cstring food;
double price;
fscanf(stdin, " %s costs $%lg", food, &price);

Be careful with this feature of fscanf however, because the literal non-whitespace characters in the format string have to match exactly with the input given, or else the reading stops at that point. More on that below.

In summary, the biggest “gotchas” about fscanf are:

  1. Use %lg for doubles, unlike %g that you would use in fprintf.
  2. Usually put a space in front of %c to that it skips whitespace and then reads the next non-whitespace character.
  3. Pass all the arguments as addresses with the & operator, except for strings which are already pointers.
  4. Any non-whitespace literal characters in your format string are passed over by fscanf, but only if they match exactly what’s in the input.

4.2 Return value and error checking

Any time you’re reading input, things can go wrong. Someone might type a word when you’re trying to read a number, or they use the wrong format, or the file ends when you’re trying to read more data, etc. So far, using the reading functions provided in si204.h, any of these issues causes an error message like

ERROR in readnum: Maybe ill-formatted number?

and then your program aborts unceremoniously.

But with fscanf, we can detect and adapt to things going wrong, and we get to decide as programmers how to handle it.

The key is that fscanf returns an int indicating the number of arguments it successfully assigned. So if you tried to read in 3 things, and your fscanf call returns 3, everything worked. If it returns something less than 3, something went wrong.

Here’s a more complete program along the lines of the previous example, but with some actual error checking.

#include "si204.h"

int main() {
  cstring food;
  double price;

  fprintf(stdout, "Enter data in the form \"<food> costs $<price>\".\n");
  int check = fscanf(stdin, " %s costs $%lg", food, &price);

  if (check != 2) {
    fprintf(stdout, "ERROR: invalid input.\n");
    return 1;
  }

  int quantity = 6;
  fprintf(stdout, "Yum, I'll take %i %ss for $%.2f please.\n",
          quantity, food, quantity*price);

  return 0;
}

How would you modify the program above so it keeps asking for valid input in a loop until the fscanf call succeeds?

5 Do we still need si204.h?

We’ve been using the si204.h library for everything in the class so far. Here’s what it really gives us:

  1. The definition of type cstring
  2. The definition of type stream
  3. Access to everything in the standard header <string.h> including strlen, strcpy, and strcmp.
  4. Access to everything in the standard header <stdio.h> including fputs, fputc, fopen, fprintf, and fscanf.
  5. A few extra functions for reading and writing, namely readchar, readstring, readnum, and writenum.

Let’s work backwards up this list. We don’t need those four special read/write functions anymore, because fscanf and fprintf can do all of that reading and writing for us now. In fact, if you look in si204.h, you’ll see that those functions really just call fscanf or fprintf with some extra error checking.

As for (3) and (4) in the list above, we can just include those standard headers in our code directly, by putting

#include <stdio.h>
#include <string.h>

at the top of your .c file. Notice that you have to use angle brackets instead of quotation marks for these #include statements, because these header files are in standard system directories, as opposed to si204.h which always had to be in the same directory as your code.

As for (2) in the list above, we learned already that stream is the same as FILE*, and the type FILE is a standard type defined in stdio.h. So you can replace any use of the type stream with type FILE* and you’re good.

Finally we come to (1), the type of cstring. This is the only part that we really “need” si204.h for still, because we haven’t learned about arrays yet. But I’ll tell you that the type cstring is actually an array of characters, which you can get just by adding this “magic” line of code to the top of your program before main:

typedef char cstring[128];

Right now you don’t know what that means, but I bet you can make a pretty good guess! The other thing to remember about cstring is that whenever you use one in a function, it gets automatically converted to type char*, a pointer to the first character in the string.

Here’s a complete program that doesn’t use si204.h. You might call it our first “real” C program.

#include <stdio.h>
#include <string.h>

typedef char cstring[128];

int main() {
  cstring secret = "opensesame";
  cstring entered;
  int check;

  fprintf(stdout, "What's the secret password? ");
  fflush(stdout);
  check = fscanf(stdin, " %s", entered);

  if (check == 1 && strcmp(secret, entered) == 0) {
    fprintf(stdout, "Welcome, trusted friend.\n");
  }

  return 0;
}

6 Other odds and ends

6.1 Dropping stdout and stdin

You have probably noticed that most of your calls to printing functions go to the stream stdout, and most of your calls to reading functions come from the stream stdin. Since this is so common, the C standard I/O library stdio.h includes some “convenience” functions that automatically use stdin/stdout:

Long function Shorter version
fprintf(stdout, "format", arg1, ...) printf("format", arg1, ...)
fscanf(stdin, "format", arg1, ...) scanf("format", arg1, ...)

There are other convenience methods too if you look at the full documentation, but scanf and printf are the two most useful ones.

Caution: Some of these are not exactly equivalent, even though they look like they are! In particular, there is a puts function (similar to fputs, but goes to stdout), with an important difference that puts automatically inserts a newline "\n" at the end of the string you tell it to print, while fputs doesn’t.

That can all be rather confusing, so it’s probably best to stick with (f)printf and (f)scanf now that we now how to use them.

6.2 EOF

We discussed above how the int returned from a call to fscanf indicates how many arguments were successfully read in.

There is also a special return value of EOF that indicates the end of the file has been reached. This is a negative number, usually defined to be -1 but technically it could be any negative number.

This gives us a nice way to read in all the contents of a file up to the end of the file. For example, let’s say we have a file with a bunch of lines, each formatted like <name> <score>, for example:

Sagan 470
Kittel 228
Matthews 199
Greipel 178

Then we could write a loop like this to read in all the lines in the file, no matter the length of the file itself:

FILE* fin = fopen("points.txt", "r");

cstring name;
int points;

while (fscanf(fin, " %s %i", name, &points) != EOF) {
  printf("%s has %i points.\n", name, points);
}

That’s pretty good, but it has one flaw: if one of the lines is misformated (say, there’s an extra name without a score), then fscanf will return some number less than 2, but not EOF. It could go into an infinite loop of trying (and failing) to read the misformated data!

So a better solution would be more like this:

#include <stdio.h>

typedef char cstring[128];

int main() {
  FILE* fin = fopen("points.txt", "r");

  cstring name;
  int points;

  int check = fscanf(fin, " %s %i", name, &points);
  while (check == 2) {
    printf("%s has %i points.\n", name, points);
    check = fscanf(fin, " %s %i", name, &points);
  }

  if (check != EOF) {
    printf("ERROR in points.txt file\n");
    fclose(fin);
    return 1;
  }

  fclose(fin);

  return 0;
}

(Note, there is also a standard function feof that can be used to check if the given stream is at the end.)

6.3 Reading whitespace

So far we’ve made sure that all of the reading operations, for strings, characters, and numbers, skip any whitespace (spaces, tabs, or newlines) before reading the next thing. However, sometimes you really want to read in those spaces!

The easiest and simplest way to do this is one character at a time, using the format %c in fscanf without any space before it. For example, the following code will read in two strings, separated by normal spaces, and report how many spaces were between the strings:

cstring firstword;
char middle;

// read the first word
scanf(" %s", firstword);

int count = 0;

// read char's until you hit something that's not a space
scanf("%c", &middle);
while (middle == ' ') {
  scanf("%c", &middle);
}

printf("You typed %i spaces.\n"< count);

(There is also a built-in function getc that has a similar effect.)

6.4 Writing to and reading from strings

We won’t have much use for this in our class, but you might be interested to know that you can also do printf and scanf stuff where the “stream” is not actually a terminal or a file, but rather a string!

The functions to do that are sscanf and sprintf, and they take a string argument in place of the FILE* argument to fscanf or fprintf.

For example, here is a program that reads in an email address as a single string, and then uses sscanf to check whether it’s a Midshipmen email address:

#include <stdio.h>

typedef char cstring[128];

int main() {
  printf("Enter your email address: ");
  fflush(stdout);

  cstring email;
  scanf(" %s", email);

  int alpha;
  if (sscanf(email, "m%i@usna.edu", &alpha) == 1) {
    printf("Your alpha is %i.\n", alpha);
  } else {
    printf("You must not be a Mid.\n");
  }

  return 0;
}

7 Problems

  1. A simple data-conversion problem. You’d be amazed how often you need to write programs that do nothing more than convert data from one format to another. Write a program that reads in a file (name given by user) that contains points in ordered pair notation, and writes the same points to a file (name also given by user) in gnuplot notation, i.e. one point per line, each point given by x-coordinate tab (‘\t’) y-coordinate. For a nice small file to test with, we have testin.txt. For a nice big challenge file, we have in.txt.
    a solution.
  2. Census Statistics - The census keeps tables of populations and population densities for all of our states. Each state has its own file giving the names of all cities, towns, and CDP’s (“census designated place” - this appears to be census-eese for “other”) in that state. For example, take a look at Maryland’s geographic census data.

  1. Here’s a good problem to work on, as it takes into account a number of the things we’ve talked about: The file scores.txt contains the scores of students on various problems on an exam. Each row corresponds to a student, and the scores along that row are that student’s scores on problems 1, 2, 3 etc.
    Your job: figure out which problem was the hardest! You may assume that for every problem, at least one student got full credit for that problem. If the average score for problem X as a percentage of the full credit score for X is less than the average score for problem Y as a percentage of the full credit score for Y, then problem X is “harder” than problem Y.

    ~/$ ./prob1
    Problem p4 is hardest (ave = 48.5294%)

    Check out this solution.