Unit 8: Compound data
1 Struct intro
This lesson begins a big new topic for us — user defined types, or structs as they are called in C. This is something that you have, I hope, already felt the need for. Let’s consider a few example problems:
- Suppose I want a function
midpoint
that takes two points and returns their midpoint. - Suppose I want to read in a list of 20 Midshipmen names and alpha codes, and print out the Midshipmen names ordered by class year.
- Suppose I want to store a bunch of student names along with their grades on 10 homework assingments.
Each of these are things we can do (think about how), but only with difficulty. The problem is that in each case we are working with “physical” objects that do not have a corresponding built-in type in C. It would be natural to write a midpoint function if there were a type called point
that encapsulated both the x and y coordinates — it’s prototype would be point midpoint(point a, point b);
. It would be natural to sort 20 Midshipmen ordered by alpha codes (which would order by class year) if there were a type mid
that encapsulated both alpha code and name — I’d have an array mid *A = new mid[20]
. Finally, it’d be natural to store student names along with homework info if there was a type student
— I’d just store it in an array of student
objects. Clearly, all of these problems scream out for the ability of the user to wrap up one or more existing types into one package and call it a new type. In C, struct is the mechanism that allows you to do this.
1.1 First example: point struct
Let’s take the example of our midpoint function. We decided that the existence of a type point
would make such a function simple and natural. We need to wrap up a double
for the x-coordinate and a double
for the y-coordinate into a single object of a new type - point
. Here’s how that’s accomplished in C:
struct point { // Declares a new type called "struct point"
double x; // The first "field" of a point is a double named x
double y; // The second "field" of a point is a double named y
}; // Don't forget the ;
This struct definition, like function definitions, appears outside of main
or of any other function definitions, and it must appear before you try to use an object of type struct point
. From the point of this definition onwards you can use struct point
as a new type. If you want to access the double x
within a point object named P
, you write P.x
— note that P.x
is an object of type double, so anything you can do with a double you can do with P.x
! Moreover it is an l-value, it can be assigned to, passed by reference, etc. The objects packaged together in a new struct
are called data members. We’ll start off simple by creating an object of type point
, reading values into the object, and printing it out:
int main() {
// Creates an object pnt of type point
struct point P;
// Reads & stores coordinate values
printf("Enter x-coord: ");
fflush(stdout);
scanf(" %lg", &P.x);
printf("Enter y-coord: ");
fflush(stdout);
scanf(" %lg", &P.y);
// Writes out point P
printf("Point is (%g, %g)\n", P.x, P.y);
return 0;
}
Think of a struct as something very similar to an array, but where each element in the struct has its own name (instead of an array index) and may also have its own type. That second part is really important and is really where structs get their “power”, as we will see. So even though the struct point
could possibly be replaced by a size-two array of doubles
, this won’t be true when we look at more complicated struct definitions. And besides that, it’s much clearer what a program is doing when we have a variable of type struct point
rather than a variable of type, say double*
.
2 Creating new types
2.1 Typedef
In fact, we’ve already seen a way to make a new type name in C before struct
, with the typedef
keyword. As you know, the cstring
type we used with si204.h
for the first half of the semester was actually defined with:
typedef char cstring[128];
That meant that the type cstring
could be used to create (or pass to/return from functions) an array of chars of length 128.
More generally, typedef
is a way of getting a “type alias”, where we use one name to refer to some type. The exact syntax is that you start with the keyword typedef
, and then you do what looks like a normal variable declaration, except that what would be the name of the variable is the name of your type alias. Here are some more examples:
typedef int card; // type card is the same as type int
typedef double* vec; // type vec is a pointer to a double (or an array of doubles)
typedef card deck[52]; // type deck is an array of 52 cards (which are ints!)
One thing to realize about typedef
is that it’s really just for our convenience as programmers; it doesn’t really allow us to do anything new. However (as in the card
and deck
examples above), besides convenience typedefs can also be useful as a way of making our programs more clear and more flexible.
For example, given the definitions above, if I wanted to make it so cards were represented by a string instead of an int, I might just be able to change the typedef for card to something like
typedef char card[5]; // a card is a string of length at most 4
without having to change every single other definition thoughout my program.
2.2 Regular struct usage
Usually in a C program, any struct
s you use are declared at the beginning (or in header files), and given names, and then you use struct yourstructname
as a type within the program’s functions and main
.
For example, here’s a program that creates a struct
for kids with a name and age, and then in the main
reads in two kids and says which one is older:
#include <stdio.h>
struct kid {
char name[128];
int age;
};
int main() {
struct kid a;
struct kid b;
// read names and ages
printf("Enter name and age of both kids:\n");
scanf(" %s %i", a.name, &a.age);
scanf(" %s %i", b.name, &b.age);
// determine who's older
if (a.age > b.age) {
printf("%s is older.\n", a.name);
} else if (b.age > a.age) {
printf("%s is older.\n", b.name);
} else {
printf("%s and %s are the same age.\n", a.name, b.name);
}
return 0;
}
Notice that we have to end the struct declaration with a semicolon ;
after the closing curly brace — this is easy to forget! Also notice that an expression such as &a.age
that you see actually does what you expect, which is to get the address of the age
field within the kid struct a
, because the dot operator has the highest precedence in C.
2.3 Odd ways of using structs
Now let’s break down that struct
definition to get a better understanding of what’s going on. All the code in this section is just for understanding the syntax of struct
s in C — I don’t recommend you program like this!
A struct
statement in C
is actually a type. This type has to contain:
- The keyword
struct
- (optionally) A name like
point
- Opening curly brace
{
- Any number of variable declarations (not definitions, just declarations!)
- Closing curly brace
}
Since the name is optional, you must be thinking “what’s the point in declaring a struct with no name?”. Well, just like any other type, you can use this directly in a variable declaration, like:
<struct_type> <variable_name>;
For example, in the previous program we could have not declared anything before main
and just changed the first line of main
to specify the struct type and declare two variables a
and b
with that type, like:
int main() {
struct kid {
char name[128];
int age;
} a, b;
/* ...the rest of main... */
}
And now you can see why the name is optional. In situations like this where you’re just making up a new struct
type and declaring a few variables with that type, and you’ll never use it again, you can leave off the name kid
and nothing is harmed.
In fact, when you use just the struct name in a declaration like we did in the previous part:
int main() {
struct kid a;
struct kid b;
/* ...the rest of main... */
}
Writing struct kid
in this situation is like saying “look up the previous definition of struct kid
and use that again right here”.
Again, I don’t recommend programming this way, but you will see other C code that works like this, so it’s useful to understand what’s going on. Plus, it explains why you have to have that pesky semicolon at the end of a “normal” struct
declaration — that’s telling the compiler that you just declared a struct but you don’t want to declare any variables yet.
2.4 Typedef struct
Some people don’t like having to type struct mystructname
whenever they want to declare variables (or function parameters) with that type, and would rather that mystructname
by itself just stand for that new type.
Lo and behold, you can achieve that effect in C with very clever (and very common!) use of typedef
and struct
together:
typedef struct {
char name[128];
int age;
} kid;
This looks strange at first — and indeed, it is strange — but it’s just following the normal typedef <type> <name>;
syntax, where the <type>
is itself a nameless struct! If you have the definition above, then in your program you could declare variables like:
kid a;
kid b;
without having to re-use the struct
keyword every time. This is very common to see in C programs. For example, if you look up the definition of FILE
in /usr/include/stdio.h
, you’ll find that it’s just a typedef for a struct:
typedef struct _IO_FILE FILE;
You can find the actual definition of that mysterious struct in /usr/include/libio.h
if you dare to look! The point is, structs are really useful, and with a typedef
you can be using a struct (like FILE
) and not even realize it.
3 Using structs
Now that we know what a struct
definition looks like, what can we do with it?
3.1 Initialization
Let’s say we have the same kid
struct as before:
struct kid {
char name[128];
int age;
};
If we want to declare and define a kid
called theOne
with name "Neo"
and age 35
, we could do it like this:
struct kid theOne;
strcpy(theOne.name, "Neo");
theOne.age = 35;
(Notice that we had to use strcpy
from <string.h>
since you can’t copy arrays (strings) using =
.)
Just like with arrays, we can also use the curly-brace syntax to declare and initialize a struct all at once, for convenience:
struct kid theOne = {"Neo", 35};
In this case, the order of the values in the curly braces has to match the order in the struct
definition.
3.2 Struct assignment
To copy from one struct to another of the same type, you can of course copy each field one at a time, for example:
struct kid a;
struct kid b;
// copy b to a
strcpy(a.name, b.name);
a.age = b.age;
As a nice convenience, the built-in =
assignment operator also works on structs (of the same type!), and does exactly that — it makes a copy of each field from one to another, for example:
struct kid a;
struct kid b;
// copy b to a
a = b;
Simple as that!
3.3 Functions that take or return structs
As you probably guessed, you can write functions with structs too. Going back to our original motivation of getting the midpoint between two points defined with
struct point {
double x;
double y;
};
this function will compute and return the midpoint between the two points:
struct point midpoint(struct point p1, struct point p2) {
struct point mid;
mid.x = (p1.x + p2.x) / 2;
mid.y = (p1.y + p2.y) / 2;
return mid;
}
It’s important to recognize that structure arguments and return values are copied using the copy operation we just learned about. This means that, for example, if the midpoint
function above made any chances to p1.x
, those changes would not go back to the part of the program that called midpoint
, because the function would just be changing a copy of the struct, not the original one.
This is the most important difference to remember between how structs and arrays work. In most ways you can think of a struct as an array that contains a combination of different types and that uses names and the .
dot operator instead of indexes and the []
operator. But this is the key difference: structs are copied when you call a function, and arrays are not (just the pointer to the array is copied).
3.4 What we can’t do
It’s useful to also be clear about two things that are not possible with structs we define:
- Applying any built-in arithmetic operators like
+
,-
,*
,/
,&&
,||
, etc. - Reading or writing entire structs using
scanf
andprintf
.
Since we’re making up these structs, there’s no way the compiler could know how to do these things for any kind of struct. What would it even mean, for example, to divide one struct kid
by another one?
This is is also the reason why, when defining our own structs, it’s usually a good idea to also define some helper functions to (at least) read and write them, to make it easier to work with those structs in our programs. In fact, this is a good example of a place to use bottom-up programming: first define the struct you need, then write some useful functions that use that struct, and then you’ll have a much easier time of writing your main program.
4 Structs with pointers and arrays
Now that we understand the basics of using struct
s to store “heterogeneous data”, let’s see how they mix with what we’ve already learned about pointers and arrays. We’re talking about structs that contain pointers and arrays, as well as pointers and arrays of structs.
4.1 Arrays of structs
Making an array of structs is just like making an array of any other type. Going back to our example of 2D points:
struct point {
double x;
double y;
};
let’s say we want to make an array arr
of 50 points. You could declare that array on the stack like this:
struct point arr[50];
or on the heap like this:
struct point* arr = calloc(50, sizeof(struct point));
The same benefits and drawbacks of heap vs stack allocation that we learned about with arrays apply here just as well.
Now if we want to access, say, the y coordinate of the point at index 13, we have to first go into index 13 of the array with the []
operator, and then pull out the y
field with the dot operator, like so:
arr[13].y
That expression is an int
, and anything you can do with an int
variable, you can also do with arr[13].y
.
(And don’t forget to free(arr);
if you allocated using calloc
!)
4.2 Pointers to structs
We just saw one usage of a pointer to a struct in the heap-based allocation example above. As with any other type, a pointer to a struct could be just a pointer to a single object, or to the first object in an array.
For example, here is the prototype for a function that takes a pointer to a single point, and changes the coordinates of that point to it’s rotated 90 degrees counter-clockwise around the origin \((0,0)\).
void rotate(struct point* pt);
All this function needs to do is modify the x
and y
fields of the struct so that the new y-coordinate is the old x-coordinate, and the new x-coordinate is the old y-coordinate times \(-1\).
Accessing these fields inside the function gets a little tricky because the dot operator has higher precedence than the dereference operator. What that means is that if we were to write
*pt.x
in the rotate
function, it wouldn’t work! That would try to first get the x
field of variable pt
, but that’s an error since pt
is not a struct (it’s a pointer to a struct). Instead, we would have to write
(*pt).x
with parentheses to force the operator order. Because writing those parentheses is so annoying, and passing around pointers to structs is so common in C programming, there’s a special operator ->
(looks like an arrow) to do just that. It extracts a field from a struct after dereferencing a pointer to that struct.
Armed with our fancy new operator, we can write the rotate
function as follows:
void rotate(struct point* pt) {
double oldy = pt->y;
pt->y = pt->x; // new y coord is the old x coord
pt->x = - oldy; // new x coord is -1 times old y coord
}
4.3 Structs that contain arrays
Suppose we have a file namedgrades.txt, which contains student grade information. The file looks like this:
11 students
10 homeworks
Adams 58 96 65 72 93 67 59 74 95 56
Brown 96 67 56 74 94 100 98 68 95 65
.
.
.
telling us initially how many students we have, how many homework scores for each student, and then listing all the student names followed by their homework scores. Now, I’d like to simply read in this data, store it, and then answer user queries concerning the data. For an easy start, we’ll just assume that the query will simply be a student name and we’re supposed to give the homework average for that student.
Now, assuming there are numstu
students and numhw
homeworks, the natural way for us to think of this is to say “I’d like to have an array of numstu objects of type student
.” We know how to construct a struct student
, so that’ll be no problem, but what data members would we need to store this student data? We’d need a char
array to store the name, and we’d need … well, we’d need an array of numhw int
s to store the homework grades. Since we don’t know in advance how big to make this array, we’ll have to use heap allocation for the grades and store it as a pointer of type int*
:
struct student {
int* hw;
char name[128];
};
Packaged up this way, an object of type struct student
representing the student “Brown” from above would look like this:
Now, the question is, if variable stu
is the student
object from the picture, what expression would give me the value of the homework assignment with index 2? Well, stu.hw
is the name of the pointer to the array of grades, so I just need to subscript it with a 2: stu.hw[2]
! I might be tempted at this point to look back at my original problem and say that I’ll create my storage for all this student/grade data with:
struct student* class = calloc(numstu, sizeof(struct student));
It’s true that stu
is now an array containing numstu
object of type student
, but remember that each student
object has data member hw
, which is just an uninitialized pointer at this juncture. We need to go back and allocate homework-grade arrays for each student
object in the array. So creating our storage really looks like this:
struct student* class = calloc(numstu, sizeof(struct student));
for(int i = 0; i < numstu; ++i) {
stu[i].hw = calloc(numhw, sizeof(int));
}
At this point, writing the program is not very new for us: Here’s my solution. All I did was use the top-down design that we’ve seen so many times. I simply wrote the main function the way I wished I could, and created the structs and functions that made writing main that way possible.
4.4 What assignment and pass-by-value mean with structs
Recall that doing an assignment of one struct to another does member-wise assignment in C, meaning that each field is copied with a separate assignment statement.
You have to think carefully about the consequences of this. Let’s say we have two struct student
s called A
and B
. After doing A = B
, the pointers A.hw
and B.hw
have the same value … that means they both point to the same array! This may not be what you expect. Because of this, for example, the line
A.hw[5] = 100;
results in both A
and B
having 100 for the index 5 value of their hw
field.
The same holds true for pass-by-value with structs: you get member-wise copy. So, when you have structs with data members that are pointers, the pointers get copied but the actual array pointed to stays the same. The same is not true when you have stack-based arrays declared inside the struct — those actually get copied when you copy the struct.
So the main rule to remember here — and to be careful and conscious of when programming — is that copying a struct with pointers does not mean that whatever is pointed to also get copied.
5 Sorting structs
Looking at some examples where we want to sort an array of structs will be a good review of the last few units, and also shows the power and limitations of our “generic” sorting routine using the before
function.
5.1 Sort based on 4th HW
Let’s build on the previous examples with student grades and try to print out the grades in sorted order, rather than answering lookup queries. In particular, let’s sort them by their grades on homework assignment #4, so that the student with lowest grade on assignment #4 comes first, and the student with highest grade on assignment #4 comes last. We’ll have our array stu
of objects of type student, and we’ll use the same old selection sort routine we’ve always been using.
Now, at some point in selection sort, we’ll swap two elements of stu
and I want to take a brief moment to look at what that means. The following picture shows how swapping the elements at index i
and j
affects things. Hopefully you see, once again, that we’re not moving the actual arrays of grades at all, we’re simply moving the pointers to those arrays.
Now, if we’re going to sort this array of student
objects, it’s just like sorting anything else!
void sort(struct student* data, int size) {
for(int length = size; length > 1; --length) {
// Find imax, the index of the largest
int imax = 0;
for(int i = 1; i < length; ++i) {
if (before(data[imax], data[i])) {
imax = i;
}
}
// Swap data[imax] & the last element
struct student temp = data[imax];
data[imax] = data[length - 1];
data[length - 1] = temp;
}
}
So, the only thing that remains is to produce a before
function that will take two student
objects and decide whether the first needs to come before the second. Remember we want our student
objects in order from lowest to highest score on homework #4. So, for before(a,b)
, I need to determine whether the homework #4 score for a
is less than the homework #4 score for b
.
int before(struct student a, struct student b) {
return a.hw[4] < b.hw[4];
}
With this, it’s easy to write a program that prints out students and their HW#4 scores ordered from lowest to highest HW#4 score. Here’s a complete program.
5.2 Letting the user choose which grade to sort on breaks our sort scheme!
Now, a better version of the above program would allow the user to choose which homework assignment we sorted on. So, if x
is the assignment number, we’d have to modify before
to be:
int before(struct student a, struct student b) {
return a.hw[x] < b.hw[x];
}
However, there’s a problem! Where does the before
function get x
from? The only way it can get x
is if we pass it in as a parameter. So the function must look like:
int before(struct student a, struct student b, int x) {
return a.hw[x] < b.hw[x];
}
Now, this fixes before
, but it necessitates a change in sort
, which is the function that calls before
:
void sort(struct student* data, int size, int x) {
for(int length = size; length > 1; --length) {
// Find imax, the index of the largest
int imax = 0;
for(int i = 1; i < length; ++i) {
if (before(data[imax], data[i], x)) {
imax = i;
}
}
// Swap data[imax] & the last element
struct student temp = data[imax];
data[imax] = data[length - 1];
data[length - 1] = temp;
}
}
Notice that sort
doesn’t really do anything with this index x
on its own, but it passes that parameter along to the before
function.
So really our generic sorting setup is flexible enough to handle sorting based on any homework number, but in this case we have to add an extra parameter to the sort
and before
functions to pass along that extra information.
6 Composition of structs
Imagine a scenario in which we are performing experiments with cockroaches. We’ll suppose we get data readings for another roaches that give us a time in hh:mm:ss
format, and a position in (x,y) coordinates. For example, trial.txt. We want the user to be able to enter a time in hh:mm:ss
format and we’ll tell him where the roach is (i.e. en route between which two points).
We’ve dealt with points before, and we will again, so let’s go ahead and give the basic struct definition & function prototypes for points:
//--- POINT ---------------------------------//
struct point {
double x;
double y;
};
void read_point(struct point* pt, FILE* fin);
void write_point(struct point pt, FILE* fout);
Now, while we may not have dealt with times in hh:mm:ss for a while, it is not unlikely we’ll have to deal with such things again. Therefore, it is natural to give a struct definition and some prototypes for an hhmmss struct:
//--- TIME IN HH:MM:SS ----------------------//
struct hhmmss {
int hrs;
int mins;
int secs;
};
void read_hhmmss(struct hhmmss* time, FILE* fin);
int before(struct hhmmss time1, struct hhmmss time2);
Being a little adventurous, I’m living on the edge and defining the before
function for hhmmss
objects. Might come in useful if I have to figure out what happened first! Now, a data reading consists of a time and a position, so it might be nice to have a datum
struct that records a single data reading. It’d look something like this
struct datum {
struct point position;
struct hhmmss time;
};
… and I’d probably want a function void read_datum(struct datum* dat, FILE* fin);
to read in such objects. With all of this (and with all these functions defined!) writing my main function is not too difficult:
// Read and store data readings
struct datum* path = calloc(num, sizeof(struct datum));
for(int i=0; i < num; ++i) {
read_datum(&path[i], fin);
}
// Get the query time from the user
struct hhmmss time;
printf("Enter a time: ");
fflush(stdout);
read_hhmmss(&time, stdin);
// Find the first sighting at or after given time
int k = 0;
while (k < N && before(A[k].time, time)) {
++k;
}
Then it would just remain to write out the information to the user. Take a look at this complete program. There are several important things to look at here:
- I used bottom up design to solve this problem. I started off with the pieces (structs and functions) that I knew I could define easily, and that I knew would come in handy here and probably in other programs as well. Then I started to put these pieces together to create a program.
- My
datum
struct contains as data members two other structs I defined —point
andhhmmss
. This is called composition of data types. - Notice how we have to be careful all the function names are distinct when we start throwing a bunch of functions and structs into a larger program. That’s why we have to have
read_point
,read_hhmmss
, andread_datum
functions. - Notice how I used short-circuit evaluation of boolean expressions in my while-loop. The program may well have crashed without the short-circuit feature!