"If you look at an introductory statistics textbook," begins Colin Goodall, "what you see are scatter plots of data, tables of heights and weights, graphs of leaf sizes against time . . .
"It's a very restricted view of statistics — to think solely in terms of comparing one set of numbers to another."
Goodall and his students compare shapes and patterns and forms.
"It's been evolving, actually." An associate professor at Penn State, he stretches back to pull a book off a shelf, D'Arcy Thompson's 1917 On Growth and Form. "There are some classic pictures here" — he thumbs to page 295 — "describing the morphological differences between these three species of beetles." The beetle shapes sit on funhouse-mirror grids, as if the graph paper had been stretched to accommodate a mandible here, shrunk or wrinkled for another there.
"I don't really know how he did this. He produced these drawings without leaving any mathematical specifics." Goodall smiles, drops his voice. "He might have deformed the grid by eye."
Since the '70s, when computers began making libraries of images available, a handful of statisticians like Goodall has sought to make the comparison of shapes rigorous. They came to it with various needs. Goodall pulls two more books from the self, both by F.L. Bookstein of the University of Michigan. "Bookstein is very close to being a biologist," he notes. To analyze animal shapes, Bookstein added the mathematical concepts of landmarks (the tip of the tail, the point of the jaw, the sutures between bones) and bending energy (imagine the graph paper is metal, physically deformed by a certain amount of energy to match a new specimen) to Thompson's grids. David Kendall, another whose work Goodall is building on, "is a probabilitist, a mathematician, not a biologist," at the University of Cambridge. "Kendall," says Goodall (reaching for another book, and a stack of transparencies), "was motivated by archeoastronomy, by the alignments of standing stones." (What's the probability, for example, that Stonehenge's alignment with the solstice sunrise is accidental?) "He's developed a rigorous theory of shape, work that draws on differential geometry and an area of probability known as geometrical probability." It includes, Goodall adds, the concept of "shape spaces."
To compare the roughly triangular pharyngael bones of fish - in order to separate two subspecies - Goodall and graduate student Bill Cooper constructed a shape space with seven landmarks. Using the shapes of the entire fish, Cooper and Goodall were clearly able to separate four out of five closely related species.
Goodall himself entered shape statistics as a doctoral student at Stanford (he gets out his thesis, adds it to the stack on the table), wanting to apply statistics to developmental biology: He studied the development of a leaf, cell by cell, beginning with the "leaf primordia," which first appear as bumps on the tip of the stem.
To date, he and his graduate student and faculty collaborators have applied shape statistics to behavioral development (when a pregnant rat's diet is deficient, how does it affect the pups' head shapes?), machine vision (how can a robot distinguish between two very similar stacks of papers on a desk?), food science (how do you analyze the following?: 10 judges judge six different egg dishes on seven attributes, but each judge's interpretation of "taste," "smell," and "texture" differs, even as each chooses different attributes to record), reverse engineering (given a car part, perhaps made in Japan, how do you unravel its surfaces, and so learn to make it?), and zoology (what are the shape differences between two subspecies of Cichlids, a fish found in Lake Malawi?).
"Comparing two samples of numbers is a classic problem in statistics," Goodall notes. "Say you have two sets of heights, 20 men's heights, 20 women's. You compare the difference in mean heights with the variability within each sample.
"When you're comparing two samples of shapes, you can do essentially the same thing, but it's much more complicated.
"Take 10 different triangles. What's the mean shape? How do you average 10 triangles? Now you have a second sample. Say you've somehow found the mean shape of those triangles. How are you going to compare the difference between these two mean shapes to the variability?
"It's not nearly so straightforward as in the traditional, simple case of two batches of numbers."
He spreads the transparencies out on the desk, chooses a diagram of a sphere peppered with various triangles. "This shows how you can associate each triangular shape with a point on a sphere. With the North Pole, we associate an equilateral triangle" — the points are labeled 1, 2, and 3; 3 points up. "If you start here at the North Pole and run down what could be called the Greenwich Meridian, the triangle flattens until you reach the equator, which is colinear triples" — 1-3-2 in a straight line. "Then landmark number 3 moves through, and the triangle unflattens and becomes an equilateral triangle again at the South Pole" — point down. On each meridian — each great circle around the sphere through the poles — the triangle deforms from point-up equilateral to point-down in a different way, stretched and shrunk by the relative movements of its three landmarks.
"A triangle of data with landmarks labeled 1,2,3 can be matched to exactly one of the triangles on the sphere," Goodall says, "by means of translation, scaling, and rotation." The process is called "Procrustean superimposition." ("Procrustes is from Greek mythology," Goodall explains. "He was an innkeeper who had only one bed. If his guests were too small for the bed, they were stretched. If too big, their feet were chopped off.") The Procrustean distance is "the shortest distance between sets of landmarks across all possible superimpositions," Goodall says, an idea akin to the "best-fit" curve.
"This is where my specific contributions to shape analysis began," says Goodall. In the early '80s, "before anyone in the United States took proper notice of Kendall's work," Goodall began applying the Procrustes technique to the statistical analysis of shape.
"The Procrustes metric provides a sort of natural mathematical framework," he explains. "The Procrustean distance is like Euclidean distance in the usual Euclidean geometry. It turns out that for any two triangles on the shape sphere, the Procrustean distance is a simple function of the great circle distance around the sphere. Which means that, in analyzing the shapes of triangles, you can analyze the data as though you'd collected data on a sphere."
The same is true for any shape defined by a set of landmarks — in a sense. "Shape space" is defined as the complex projective space that has a dimension equal to the number of landmarks minus two, Goodall explains. (In this geometry, even the concept of "dimension" is complex: A sphere is the equivalent of a line in Euclidean geometry, which has length, but no depth or height.) "So for triangles," continues Goodall, "the dimension is one, and shape space is just a sphere." For four landmarks, it's a rather hard-to-imagine donut: "Take a circle and associate with each point of the circle another circle. That's a donut. Now, for shape spaces, do the same operation, but using spheres instead of circles."
As the landmarks increase, the shape space grows ever more complex. Goodall swivels to click on the computer behind him: a sphere full of vaguely triangular shapes fills the screen. "These are the actual shapes of 17 pharyngeal bones — the lower jaw, I think — of fish. They each have seven landmarks." The bones are from two subspecies of Cichlid that Jay Stauffer, a Penn State professor of fishery sciences, had collected in Lake Malawi; Stauffer had asked Goodall and graduate student Bill Cooper to separate the species by jawbone and to quantify the variation within each.
"What I'm showing here is the principle complex direction of the variation in the shapes. If I'm going to summarize the variation with a single sphere, this one is the best, the one that captures the most. You can also see that the actual variation in shape for these data is really quite small compared to the total variation that is theoretically — from the shape space point of view — available."
Does this sphere of vague triangles separate the species? "Not very well, no. But using the bending energy approach, one can find directions that discriminate better. Bending energy takes into account the closeness of the landmarks — it expects nearby landmarks to deform similarly.
Goodall leans back in his chair, away from the stack of books and the computer's slowly spinning sphere. "For each particular application, you have to modify your approach to suit the data," he explains. "The use of bending energy in this case is analogous, for instance, to taking covariance into account in traditional statistical studies. The Procrustes metric provides the starting point for a statistical analysis of shape — it's the simplest analogue of the two-sample test. What I'm doing now is spending time reinforcing the theory."
Colin Goodall, Ph.D., is associate professor in the Eberly College of Science,
University Park, PA 16802; 814-865-3993. Graduate students working with him include: Pramod Chikuse, a Ph.D. candidate in mechanical engineering; William Cooper, a master's student in statistics who is also working with Jay Stauffer, Ph.D., professor of ichthyology; and Magda Nitica, a master's student in statistics working on archeoastronomy. Goodall's research is funded by the National Science Foundation, the EPA, and the Pennsylvania Department of Agriculture.