Measurement Consistency an Introduction : Cliff Stamp

The need for comparative performance comparisons

When any aspect of the performance of a knife is determined, in order for this quantity to be interpreted there has to be a reference point. Performance can only be judged to be high or low if the results are known for other knives. Consider a bowie knife made from a new wonder steel which chops through twenty 2x4's and still shaves. Is this high or low performance? To get a point of comparison a well known steel such as 154CM could be used as a reference. Assume fifteen 2x4's were chopped with a 154CM bowie before it stopped shaving, twenty is more than fifteen but is this really proof the new steel is superior? Would the results actually be fifteen and twenty each time the test was run with those knives?

The necessitity of judgement of consistency

Consider if the cutting was repeated and on the next run the 154CM blade cut through twenty two 2x4's and the wonder steel only managed eighteen. It looks like now there is too much variability, that the results change too much to say which steel has better performance because on one run the 154CM blade was better but on the other run the new steel was better. In other words there is a need to determine some way to determine the general performance and some measure of consistency so as to allow a meaningful comparison. These two quantities are the basic statistical values of the the average and the standard error in the average.

The basic numerical quantities

Continuing this simulated experiment by performing five 2x4 chops with each knife; the 154CM knife achives 15,22,16,14,20 and the wonder steel gets 20,18,22,19,23 for each run before it loses shaving ability. The average and standard error are calculated for each knife and shown in the EXCEL sheet at the right. As an example formula, for the 154CM knife, the average is "AVERAGE(B3:B7)" and the standard error is just "STDEV(B3:B7)/SQRT(5)", the number five in the square root is just the number of runs.

How to compare results

The procedure for comparing the results to see if they are different and that one is superior is very straightforward. The average is 17.4 for the 154CM knife and it can be expected to change by 1.5 so it is written as 17.4 +/- 1.5 which means the performance will spread out between 17.4-1.5 to 17.4+1.5 or from 15.9 to 18.9. The wonder steel is 20.4 +/- 0.9 so it spreads out from 19.5 to 21. Since the ranges do not overlap (20.4 is higher than 18.9) it is evidence (at least for these two knives) that the wonder steel is significantly superior.

Larger trials, the power of averaging

There is a lot of variation in the results as would be expected for such work since the wood changes from one 2x4 to the next as is known by anyone who cuts lumber and as well the force and angle used on each chop will change a lot during the various runs. To make a more clear statement of superiority just take more trial runs. In the table at the right is the same experiment continued to ten runs. 154CM now spreads from 15.4 to 17.2 and the wonder steel from 19.4 to 20.6. The average value will not significantly change by the measure of consistency will get smaller and smaller as more runs are made and thus the spreads will get smaller. If there is a difference then this difference will be more clear as the spreads will become more clearly separated for a more distinct statement of advantage.


In order to judge if the performance of a knife is actually superior it has to be compared to at least one other knife. To make such a comparison in a meaningful way the average and consistency of the performance for each knife should be calculated. If the performance spreads do not overlap then one knife can said to have been shown to be superior.


Comments can be emailed to cliffstamp@[REMOVE] Delete the [REMOVE] to respond.

Written: August, 2007 Copyright (c) 2007 : Cliff Stamp