Sed and awk are two Unix editting tools that are particularly useful in processing data taken in lab. Awk
is programmable and uses the syntax of C and shell programming.
If you have little or no programming experience, and it is difficult for you to use the support programs
included in the manual, try to use Awk to process your data. It is easy to learn, and there are lots of Awk
scripts strewn throughout the manual. Awk is a free download for Win32 machines from the physics
department website, and is part of any Linux installation.
As an example we will process data for the Nuclear Counting Statistics experiment.
Suppose that we count nuclear decays for 10 second periods, obtaining the following 20 measurements
1016 1207 1186 1244 1110 1099 1099 1185 1220 1286 1117 1280 1190 1083 1177 1189 1200 1201 1188 1291 |
which we enter into a file called list , one number per line as shown above, so the command
cat list
|
produces the output above. We need to find the largest and smallest values, easy for 20 data points, but what if we had hundreds? We can sort the data using the Unix program sort, with the -n switch telling sort to sort by arithmetic value.
sort -n < list > sorted_list
|
which looks like this
1016
1083 1099 1099 1110 1117 1177 1185 1186 1188 1189 1190 1200 1201 1207 1220 1244 1280 1286 1291 |
This file will have a first line that is blank, a feature of some versions of sort, this means that
the second line of the file sorted_list and the last lineare the smallest and largest entries
in our data file. Check for this behaviour when running sort for the first time .We can pipe
the output of this program into an awk script min_max.awk that will read the file and
report the second and last records to the console. We need to tell awk that it will read a file
whose records are single lines in column format, one record per line. The record separator
RS will be a blank line, beginning with a null character, the field separator FS is a newline
character.
The normal operation mode of awk is that awk acts on each line of a file in turn, treating each line as a
record with several data fields, each separated by a field separator symbol that is user defined. The fields
are accessed by refereing to them by the names $1, $2, and so forth. The last data field is called $NF. Here
is the awk script min_max.awk
#min_max.awk
BEGIN { FS = "\n"; RS = ""} { print "smallest is " , $2, " " , " largest is", $NF} |
We can obtain the largest and smallest numbers in our data set by running sort on the file list and piping the output into this awk script
sort < list | awk -f min_max.awk
|
which produces the output line
.
Smallest is 1016 largest is 1291 |
The next stage in processing the data for this experiment is to sort the data into bins of a given width in count-space. The range of counts is

#bin_sort.awk
{ i=0 while ( i<11 ) { if ( $1 >= 1016+25*i && $1 <= 1016+25+25*i ) print i, " ", $1 ++i} } |
This will run through the loop for each line of the file list in turn, assign each data point to a bin, and print the bin first, then the data point separated by a blank space. The output of
awk -f bin_sort.awk list
|
is
0 1016 7 1207 6 1186 9 1244 3 1110 3 1099 3 1099 6 1185 8 1220 10 1286 4 1117 10 1280 6 1190 2 1083 6 1177 6 1189 7 1200 7 1201 6 1188 10 1291 |
We can sort this data by running
awk -f bin_sort.awk list | sort -n
|
which will produce the sorted output below. Note that sort will perform numeric sorting based on the first field of each data line.
0 1016
2 1083 3 1099 3 1099 3 1110 4 1117 6 1177 6 1185 6 1186 6 1188 6 1189 6 1190 7 1200 7 1201 7 1207 8 1220 9 1244 10 1280 10 1286 10 1291 |
We can do even better by telling bin_sort.awk to print only the bin number of each data
point and piping the output through sort into the uniq command that will list the number of
occurances of each bin label in the list, and output two columns, bin population followed by bin
number.
#bin_sort.awk
{ i=0 while ( i<11 ) { if ( $1 >= 1016+25*i && $1 <= 1016+25+25*i ) print i ++i} } |
We run the command
awk -f bin_sort.awk list | sort -n | uniq -c
|
which produces output
1 0
1 2 3 3 1 4 6 6 3 7 1 8 1 9 3 10 |
We can save this data to a file called bin_pop with
awk -f bin_sort.awk list | sort -n | uniq -c >bin_pop
|
and print out the contents of the file in a nice format that can be imported into a LaTeX lab report as a table
awk ’{ print $2, "&", $1, "\\\\", "\\hline"}’ bin_pop
|
which produces
0 & 1 \\ \hline
2 & 1 \\ \hline 3 & 3 \\ \hline 4 & 1 \\ \hline 6 & 6 \\ \hline 7 & 3 \\ \hline 8 & 1 \\ \hline 9 & 1 \\ \hline 10 & 3 \\ \hline |
We now add a few lines to this and we have our processed data in the form of a nice LaTeX table.
\begin{tabular}{|c|c|}\hline
Bin label & Bin population \\ \hline 0 & 1 \\ \hline 2 & 1 \\ \hline 3 & 3 \\ \hline 4 & 1 \\ \hline 6 & 6 \\ \hline 7 & 3 \\ \hline 8 & 1 \\ \hline 9 & 1 \\ \hline 10 & 3 \\ \hline \end{tabular} |
and this prints as seen below
| Bin label | Bin population |
| 0 | 1 |
| 2 | 1 |
| 3 | 3 |
| 4 | 1 |
| 6 | 6 |
| 7 | 3 |
| 8 | 1 |
| 9 | 1 |
| 10 | 3 |
We can include another column for the center of the bin with
awk -f table.bin bin_pop
|
using the script
#table.awk
{ print $2, "&", $1, "&", 1016+12+25*$2, "\\\\", "\\hline"} |
which produces the output
0 & 1 & 1028 \\ \hline 2 & 1 & 1078 \\ \hline 3 & 3 & 1103 \\ \hline 4 & 1 & 1128 \\ \hline 6 & 6 & 1178 \\ \hline 7 & 3 & 1203 \\ \hline 8 & 1 & 1228 \\ \hline 9 & 1 & 1253 \\ \hline 10 & 3 & 1278 \\ \hline |
After adding afew LaTeX lines this becomes the table
| Bin | Bin pop. | Bin center |
| 0 | 1 | 1028 |
| 2 | 1 | 1078 |
| 3 | 3 | 1103 |
| 4 | 1 | 1128 |
| 6 | 6 | 1178 |
| 7 | 3 | 1203 |
| 8 | 1 | 1228 |
| 9 | 1 | 1253 |
| 10 | 3 | 1278 |