Go to the first, previous, next, last section, table of contents.
One of the most common programming actions is to print or output,
some or all of the input. Use the print
statement
for simple output, and the printf
statement
for fancier formatting.
The print
statement is not limited when
computing which values to print. However, with two exceptions,
you cannot specify how to print them--how many
columns, whether to use exponential notation or not, and so on.
(For the exceptions, see section Output Separators, and
section Controlling Numeric Output with print
.)
For that, you need the printf
statement
(see section Using printf
Statements for Fancier Printing).
Besides basic and formatted printing, this major node
also covers I/O redirections to files and pipes, introduces
the special file names that @command{gawk} processes internally,
and discusses the close
built-in function.
print
Statement
The print
statement is used to produce output with simple, standardized
formatting. Specify only the strings or numbers to print, in a
list separated by commas. They are output, separated by single spaces,
followed by a newline. The statement looks like this:
print item1, item2, ...
The entire list of items may be optionally enclosed in parentheses. The
parentheses are necessary if any of the item expressions uses the `>'
relational operator; otherwise it could be confused with a redirection
(see section Redirecting Output of print
and printf
).
The items to print can be constant strings or numbers, fields of the
current record (such as $1
), variables, or any @command{awk}
expression. Numeric values are converted to strings and then printed.
The simple statement `print' with no items is equivalent to
`print $0': it prints the entire current record. To print a blank
line, use `print ""', where ""
is the empty string.
To print a fixed piece of text, use a string constant, such as
"Don't Panic"
, as one item. If you forget to use the
double quote characters, your text is taken as an @command{awk}
expression and you will probably get an error. Keep in mind that a
space is printed between any two items.
print
Statements
Each print
statement makes at least one line of output. However, it
isn't limited to only one line. If an item value is a string that contains a
newline, the newline is output along with the rest of the string. A
single print
statement can make any number of lines this way.
The following is an example of printing a string that contains embedded newlines (the `\n' is an escape sequence, used to represent the newline character; see section Escape Sequences):
$ awk 'BEGIN { print "line one\nline two\nline three" }' -| line one -| line two -| line three
The next example, which is run on the `inventory-shipped' file, prints the first two fields of each input record, with a space between them:
$ awk '{ print $1, $2 }' inventory-shipped -| Jan 13 -| Feb 15 -| Mar 15 ...
A common mistake in using the print
statement is to omit the comma
between two items. This often has the effect of making the items run
together in the output, with no space. The reason for this is that
juxtaposing two string expressions in @command{awk} means to concatenate
them. Here is the same program, without the comma:
$ awk '{ print $1 $2 }' inventory-shipped -| Jan13 -| Feb15 -| Mar15 ...
To someone unfamiliar with the `inventory-shipped' file, neither
example's output makes much sense. A heading line at the beginning
would make it clearer. Let's add some headings to our table of months
($1
) and green crates shipped ($2
). We do this using the
BEGIN
pattern
(see section The BEGIN
and END
Special Patterns)
so that the headings are only printed once:
awk 'BEGIN { print "Month Crates" print "----- ------" } { print $1, $2 }' inventory-shipped
When run, the program prints the following:
Month Crates ----- ------ Jan 13 Feb 15 Mar 15 ...
The only problem, however, is that the headings and the table data don't line up! We can fix this by printing some spaces between the two fields:
awk 'BEGIN { print "Month Crates" print "----- ------" } { print $1, " ", $2 }' inventory-shipped
Lining up columns this way can get pretty
complicated when there are many columns to fix. Counting spaces for two
or three columns is simple, but any more than this can take up
a lot of time. This is why the printf
statement was
created (see section Using printf
Statements for Fancier Printing);
one of its specialties is lining up columns of data.
Note: You can continue either a print
or
printf
statement simply by putting a newline after any comma
(@pxref{Statements/Lines, ,@command{awk} Statements Versus Lines}).
As mentioned previously, a print
statement contains a list
of items separated by commas. In the output, the items are normally
separated by single spaces. However, this doesn't need to be the case;
a single space is only the default. Any string of
characters may be used as the output field separator by setting the
built-in variable OFS
. The initial value of this variable
is the string " "
---that is, a single space.
The output from an entire print
statement is called an
output record. Each print
statement outputs one output
record, and then outputs a string called the output record separator
(or ORS
). The initial
value of ORS
is the string "\n"
; i.e., a newline
character. Thus, each print
statement normally makes a separate line.
In order to change how output fields and records are separated, assign
new values to the variables OFS
and ORS
. The usual
place to do this is in the BEGIN
rule
(see section The BEGIN
and END
Special Patterns), so
that it happens before any input is processed. It can also be done
with assignments on the command line, before the names of the input
files, or using the @option{-v} command-line option
(see section Command-Line Options).
The following example prints the first and second fields of each input
record, separated by a semicolon, with a blank line added after each
newline:
$ awk 'BEGIN { OFS = ";"; ORS = "\n\n" } > { print $1, $2 }' BBS-list -| aardvark;555-5553 -| -| alpo-net;555-3412 -| -| barfly;555-7685 ...
If the value of ORS
does not contain a newline, the program's output
is run together on a single line.
print
When the print
statement is used to print numeric values,
@command{awk} internally converts the number to a string of characters
and prints that string. @command{awk} uses the sprintf
function
to do this conversion
(see section String Manipulation Functions).
For now, it suffices to say that the sprintf
function accepts a format specification that tells it how to format
numbers (or strings), and that there are a number of different ways in which
numbers can be formatted. The different format specifications are discussed
more fully in
section Format-Control Letters.
The built-in variable OFMT
contains the default format specification
that print
uses with sprintf
when it wants to convert a
number to a string for printing.
The default value of OFMT
is "%.6g"
.
The way print
prints numbers can be changed
by supplying different format specifications
as the value of OFMT
, as shown in the following example:
$ awk 'BEGIN { > OFMT = "%.0f" # print numbers as integers (rounds) > print 17.23, 17.54 }' -| 17 18
According to the POSIX standard, @command{awk}'s behavior is undefined
if OFMT
contains anything but a floating-point conversion specification.
(d.c.)
printf
Statements for Fancier Printing
For more precise control over the output format than what is
normally provided by print
, use printf
.
printf
can be used to
specify the width to use for each item, as well as various
formatting choices for numbers (such as what output base to use, whether to
print an exponent, whether to print a sign, and how many digits to print
after the decimal point). This is done by supplying a string, called
the format string, that controls how and where to print the other
arguments.
printf
Statement
A simple printf
statement looks like this:
printf format, item1, item2, ...
The entire list of arguments may optionally be enclosed in parentheses. The
parentheses are necessary if any of the item expressions use the `>'
relational operator; otherwise it can be confused with a redirection
(see section Redirecting Output of print
and printf
).
The difference between printf
and print
is the format
argument. This is an expression whose value is taken as a string; it
specifies how to output each of the other arguments. It is called the
format string.
The format string is very similar to that in the ISO C library function
printf
. Most of format is text to output verbatim.
Scattered among this text are format specifiers---one per item.
Each format specifier says to output the next item in the argument list
at that place in the format.
The printf
statement does not automatically append a newline
to its output. It outputs only what the format string specifies.
So if a newline is needed, you must include one in the format string.
The output separator variables OFS
and ORS
have no effect
on printf
statements. For example:
$ awk 'BEGIN { > ORS = "\nOUCH!\n"; OFS = "+" > msg = "Dont Panic!" > printf "%s\n", msg > }' -| Dont Panic!
Here, neither the `+' nor the `OUCH' appear when the message is printed.
A format specifier starts with the character `%' and ends with
a format-control letter---it tells the printf
statement
how to output one item. The format-control letter specifies what kind
of value to print. The rest of the format specifier is made up of
optional modifiers that control how to print the value, such as
the field width. Here is a list of the format-control letters:
%c
%d, %i
%e, %E
printf "%4.3e\n", 1950prints `1.950e+03', with a total of four significant figures, three of which follow the decimal point. (The `4.3' represents two modifiers, discussed in the next node.) `%E' uses `E' instead of `e' in the output.
%f
printf "%4.3f", 1950prints `1950.000', with a total of four significant figures, three of which follow the decimal point. (The `4.3' represents two modifiers, discussed in the next node.)
%g, %G
%o
%s
%u
%x, %X
%%
Note:
When using the integer format-control letters for values that are outside
the range of a C long
integer, @command{gawk} switches to the
`%g' format specifier. Other versions of @command{awk} may print
invalid values or do something else entirely.
(d.c.)
printf
FormatsA format specification can also include modifiers that can control how much of the item's value is printed, as well as how much space it gets. The modifiers come between the `%' and the format-control letter. We will use the bullet symbol "*" in the following examples to represent spaces in the output. Here are the possible modifiers, in the order in which they may appear:
N$
printf "%s %s\n", "don't", "panic" printf "%2$s %1$s\n", "panic", "don't"prints the famous friendly message twice. At first glance, this feature doesn't seem to be of much use. It is in fact a @command{gawk} extension, intended for use in translating messages at runtime. See section Rearranging
printf
Arguments,
which describes how and why to use positional specifiers.
For now, we will not use them.
-
printf "%-4s", "foo"prints `foo*'.
space
+
#
0
width
printf "%4s", "foo"prints `*foo'. The value of width is a minimum width, not a maximum. If the item value requires more than width characters, it can be as wide as necessary. Thus, the following:
printf "%4s", "foobar"prints `foobar'. Preceding the width with a minus sign causes the output to be padded with spaces on the right, instead of on the left.
.prec
%e
, %E
, %f
%g
, %G
%d
, %i
, %o
, %u
, %x
, %X
%s
printf "%.4s", "foobar"prints `foob'.
The C library printf
's dynamic width and prec
capability (for example, "%*.*s"
) is supported. Instead of
supplying explicit width and/or prec values in the format
string, they are passed in the argument list. For example:
w = 5 p = 3 s = "abcdefg" printf "%*.*s\n", w, p, s
is exactly equivalent to:
s = "abcdefg" printf "%5.3s\n", s
Both programs output `**abc'. Earlier versions of @command{awk} did not support this capability. If you must use such a version, you may simulate this feature by using concatenation to build up the format string, like so:
w = 5 p = 3 s = "abcdefg" printf "%" w "." p "s\n", s
This is not particularly easy to read but it does work.
C programmers may be used to supplying additional
`l', `L', and `h'
modifiers in printf
format strings. These are not valid in @command{awk}.
Most @command{awk} implementations silently ignore these modifiers.
If @option{--lint} is provided on the command line
(see section Command-Line Options),
@command{gawk} warns about their use. If @option{--posix} is supplied,
their use is a fatal error.
printf
The following is a simple example of
how to use printf
to make an aligned table:
awk '{ printf "%-10s %s\n", $1, $2 }' BBS-list
This command
prints the names of the bulletin boards ($1
) in the file
`BBS-list' as a string of 10 characters that are left-justified. It also
prints the phone numbers ($2
) next on the line. This
produces an aligned two-column table of names and phone numbers,
as shown here:
$ awk '{ printf "%-10s %s\n", $1, $2 }' BBS-list -| aardvark 555-5553 -| alpo-net 555-3412 -| barfly 555-7685 -| bites 555-1675 -| camelot 555-0542 -| core 555-2912 -| fooey 555-1234 -| foot 555-6699 -| macfoo 555-6480 -| sdace 555-3430 -| sabafoo 555-2127
In this case, the phone numbers had to be printed as strings because the numbers are separated by a dash. Printing the phone numbers as numbers would have produced just the first three digits: `555'. This would have been pretty confusing.
It wasn't necessary to specify a width for the phone numbers because they are last on their lines. They don't need to have spaces after them.
The table could be made to look even nicer by adding headings to the
tops of the columns. This is done using the BEGIN
pattern
(see section The BEGIN
and END
Special Patterns)
so that the headers are only printed once, at the beginning of
the @command{awk} program:
awk 'BEGIN { print "Name Number" print "---- ------" } { printf "%-10s %s\n", $1, $2 }' BBS-list
The above example mixed print
and printf
statements in
the same program. Using just printf
statements can produce the
same results:
awk 'BEGIN { printf "%-10s %s\n", "Name", "Number" printf "%-10s %s\n", "----", "------" } { printf "%-10s %s\n", $1, $2 }' BBS-list
Printing each column heading with the same format specification used for the column elements ensures that the headings are aligned just like the columns.
The fact that the same format specification is used three times can be emphasized by storing it in a variable, like this:
awk 'BEGIN { format = "%-10s %s\n" printf format, "Name", "Number" printf format, "----", "------" } { printf format, $1, $2 }' BBS-list
At this point, it would be a worthwhile exercise to use the
printf
statement to line up the headings and table data for the
`inventory-shipped' example that was covered earlier in the minor node
on the print
statement
(see section The print
Statement).
print
and printf
So far, the output from print
and printf
has gone
to the standard
output, usually the terminal. Both print
and printf
can
also send their output to other places.
This is called redirection.
A redirection appears after the print
or printf
statement.
Redirections in @command{awk} are written just like redirections in shell
commands, except that they are written inside the @command{awk} program.
There are four forms of output redirection: output to a file, output
appended to a file, output through a pipe to another command, and output
to a coprocess. They are all shown for the print
statement,
but they work identically for printf
:
print items > output-file
$ awk '{ print $2 > "phone-list" > print $1 > "name-list" }' BBS-list $ cat phone-list -| 555-5553 -| 555-3412 ... $ cat name-list -| aardvark -| alpo-net ...Each output file contains one name or number per line.
print items >> output-file
print items | command
awk '{ print $1 > "names.unsorted" command = "sort -r > names.sorted" print $1 | command }' BBS-listThe unsorted list is written with an ordinary redirection, while the sorted list is written by piping through the @command{sort} utility. The next example uses redirection to mail a message to the mailing list `bug-system'. This might be useful when trouble is encountered in an @command{awk} script run periodically for system maintenance:
report = "mail bug-system" print "Awk script failed:", $0 | report m = ("at record number " FNR " of " FILENAME) print m | report close(report)The message is built using string concatenation and saved in the variable
m
. It is then sent down the pipeline to the @command{mail} program.
(The parentheses group the items to concatenate--see
section String Concatenation.)
The close
function is called here because it's a good idea to close
the pipe as soon as all the intended output has been sent to it.
See section Closing Input and Output Redirections,
for more information on this.
This example also illustrates the use of a variable to represent
a file or command---it is not necessary to always
use a string constant. Using a variable is generally a good idea,
because @command{awk} requires that the string value be spelled identically
every time.
print items |& command
getline
.
Thus command is a coprocess, that works together with,
but subsidiary to, the @command{awk} program.
This feature is a @command{gawk} extension, and is not available in
POSIX @command{awk}.
See section Two-Way Communications with Another Process,
for a more complete discussion.
Redirecting output using `>', `>>', `|', or `|&' asks the system to open a file, pipe, or coprocess, only if the particular file or command you specify has not already been written to by your program or if it has been closed since it was last written to.
It is a common error to use `>' redirection for the first print
to a file, and then to use `>>' for subsequent output:
# clear the file print "Don't panic" > "guide.txt" ... # append print "Avoid improbability generators" >> "guide.txt"
This is indeed how redirections must be used from the shell. But in
@command{awk}, it isn't necessary. In this kind of case, a program should
use `>' for all the print
statements, since the output file
is only opened once.
@ifnotinfo
As mentioned earlier
(see section Points About getline
to Remember),
many
@ifnottex
Many
@command{awk} implementations limit the number of pipelines that an @command{awk}
program may have open to just one! In @command{gawk}, there is no such limit.
@command{gawk} allows a program to
open as many pipelines as the underlying operating system permits.
A particularly powerful way to use redirection is to build command lines, and pipe them into the shell, @command{sh}. For example, suppose you have a list of files brought over from a system where all the file names are stored in uppercase, and you wish to rename them to have names in all lowercase. The following program is both simple and efficient:
{ printf("mv %s %s\n", $0, tolower($0)) | "sh" } END { close("sh") }
The tolower
function returns its argument string with all
uppercase characters converted to lowercase
(see section String Manipulation Functions).
The program builds up a list of command lines,
using the @command{mv} utility to rename the files.
It then sends the list to the shell for execution.
@command{gawk} provides a number of special file names that it interprets internally. These file names provide access to standard file descriptors, process-related information, and TCP/IP networking.
Running programs conventionally have three input and output streams already available to them for reading and writing. These are known as the standard input, standard output, and standard error output. These streams are, by default, connected to your terminal, but they are often redirected with the shell, via the `<', `<<', `>', `>>', `>&', and `|' operators. Standard error is typically used for writing error messages; the reason there are two separate streams, standard output, and standard error, is so that they can be redirected separately.
In other implementations of @command{awk}, the only way to write an error message to standard error in an @command{awk} program is as follows:
print "Serious error detected!" | "cat 1>&2"
This works by opening a pipeline to a shell command that can access the standard error stream that it inherits from the @command{awk} process. This is far from elegant, and it is also inefficient, because it requires a separate process. So people writing @command{awk} programs often don't do this. Instead, they send the error messages to the terminal, like this:
print "Serious error detected!" > "/dev/tty"
This usually has the same effect but not always: although the standard error stream is usually the terminal, it can be redirected; when that happens, writing to the terminal is not correct. In fact, if @command{awk} is run from a background job, it may not have a terminal at all. Then opening `/dev/tty' fails.
@command{gawk} provides special file names for accessing the three standard streams, as well as any other inherited open files. If the file name matches one of these special names when @command{gawk} redirects input or output, then it directly uses the stream that the file name stands for. (These special file names work for all operating systems that @command{gawk} has been ported to, not just those that are POSIX-compliant.):
The file names `/dev/stdin', `/dev/stdout', and `/dev/stderr' are aliases for `/dev/fd/0', `/dev/fd/1', and `/dev/fd/2', respectively. However, they are more self-explanatory. The proper way to write an error message in a @command{gawk} program is to use `/dev/stderr', like this:
print "Serious error detected!" > "/dev/stderr"
Note the use of quotes around the file name. Like any other redirection, the value must be a string. It is a common error to omit the quotes, which leads to confusing results.
@command{gawk} also provides special file names that give access to information
about the running @command{gawk} process. Each of these "files" provides
a single record of information. To read them more than once, they must
first be closed with the close
function
(see section Closing Input and Output Redirections).
The file names are:
$1
getuid
system call
(the real user ID number).
$2
geteuid
system call
(the effective user ID number).
$3
getgid
system call
(the real group ID number).
$4
getegid
system call
(the effective group ID number).
getgroups
system call.
(Multiple groups may not be supported on all systems.)
These special file names may be used on the command line as data files, as well as for I/O redirections within an @command{awk} program. They may not be used as source files with the @option{-f} option.
Note:
The special files that provide process-related information are now considered
obsolete and will disappear entirely
in the next release of @command{gawk}.
@command{gawk} prints a warning message every time you use one of
these files.
To obtain process-related information, use the PROCINFO
array.
See section Built-in Variables That Convey Information.
Starting with version 3.1 of @command{gawk}, @command{awk} programs can open a two-way TCP/IP connection, acting as either a client or server. This is done using a special file name of the form:
`/inet/protocol/local-port/remote-host/remote-port'
The protocol is one of `tcp', `udp', or `raw', and the other fields represent the other essential pieces of information for making a networking connection. These file names are used with the `|&' operator for communicating with a coprocess (see section Two-Way Communications with Another Process). This is an advanced feature, mentioned here only for completeness. Full discussion is delayed until @ref{TCP/IP Networking, ,Using @command{gawk} for Network Programming}.
Here is a list of things to bear in mind when using the special file names that @command{gawk} provides.
PROCINFO
array.
See section Built-in Variables.
dup
'ed from file descriptor 4. Most of
the time this does not matter; however, it is important to not
close any of the files related to file descriptors 0, 1, and 2.
Doing so results in unpredictable behavior.
If the same file name or the same shell command is used with getline
more than once during the execution of an @command{awk} program
(see section Explicit Input with getline
),
the file is opened (or the command is executed) the first time only.
At that time, the first record of input is read from that file or command.
The next time the same file or command is used with getline
,
another record is read from it, and so on.
Similarly, when a file or pipe is opened for output, the file name or command associated with it is remembered by @command{awk}, and subsequent writes to the same file or command are appended to the previous writes. The file or pipe stays open until @command{awk} exits.
This implies that special steps are necessary in order to read the same
file again from the beginning, or to rerun a shell command (rather than
reading more output from the same command). The close
function
makes these things possible:
close(filename)
or:
close(command)
The argument filename or command can be any expression. Its value must exactly match the string that was used to open the file or start the command (spaces and other "irrelevant" characters included). For example, if you open a pipe with this:
"sort -r names" | getline foo
then you must close it with this:
close("sort -r names")
Once this function call is executed, the next getline
from that
file or command, or the next print
or printf
to that
file or command, reopens the file or reruns the command.
Because the expression that you use to close a file or pipeline must
exactly match the expression used to open the file or run the command,
it is good practice to use a variable to store the file name or command.
The previous example becomes the following:
sortcom = "sort -r names" sortcom | getline foo ... close(sortcom)
This helps avoid hard-to-find typographical errors in your @command{awk} programs. Here are some of the reasons for closing an output file:
getline
.
If you use more files than the system allows you to have open,
@command{gawk} attempts to multiplex the available open files among
your data files. @command{gawk}'s ability to do this depends upon the
facilities of your operating system, so it may not always work. It is
therefore both good practice and good portability advice to always
use close
on your files when you are done with them.
In fact, if you are using a lot of pipes, it is essential that
you close commands when done. For example, consider something like this:
{ ... command = ("grep " $1 " /some/file | my_prog -q " $3) while ((command | getline) > 0) { process output of command } # need close(command) here }
This example creates a new pipeline based on data in each record.
Without the call to close
indicated in the comment, @command{awk}
creates child processes to run the commands, until it eventually
runs out of file descriptors for more pipelines.
Even though each command has finished (as indicated by the end-of-file
return status from getline
), the child process is not
terminated;(19)
more importantly, the file descriptor for the pipe
is not closed and released until close
is called or
@command{awk} exits.
close
will silently do nothing if given an argument that
does not represent a file, pipe or coprocess that was opened with
a redirection.
When using the `|&' operator to communicate with a coprocess,
it is occasionally useful to be able to close one end of the two-way
pipe without closing the other.
This is done by supplying a second argument to close
.
As in any other call to close
,
the first argument is the name of the command or special file used
to start the coprocess.
The second argument should be a string, with either of the values
"to"
or "from"
. Case does not matter.
As this is an advanced feature, a more complete discussion is
delayed until
section Two-Way Communications with Another Process,
which discusses it in more detail and gives an example.
close
's Return Value
In many versions of Unix @command{awk}, the close
function
is actually a statement. It is a syntax error to try and use the return
value from close
:
(d.c.)
command = "..." command | getline info retval = close(command) # syntax error in most Unix awks
@command{gawk} treats close
as a function.
The return value is -1 if the argument names something
that was never opened with a redirection, or if there is
a system problem closing the file or process.
In these cases, @command{gawk} sets the built-in variable
ERRNO
to a string describing the problem.
In @command{gawk},
when closing a pipe or coprocess,
the return value is the exit status of the command.
Otherwise, it is the return value from the system's close
or
fclose
C functions when closing input or output
files, respectively.
This value is zero if the close succeeds, or -1 if
it fails.
The return value for closing a pipeline is particularly useful. It allows you to get the output from a command as well as its exit status.
For POSIX-compliant systems, if the exit status is a number above 128, then the program was terminated by a signal. Subtract 128 to get the signal number:
exit_val = close(command) if (exit_val > 128) print command, "died with signal", exit_val - 128 else print command, "exited with code", exit_val
Currently, in @command{gawk}, this only works for commands
piping into getline
. For commands piped into
from print
or printf
, the
return value from close
is that of the library's
pclose
function.
Go to the first, previous, next, last section, table of contents.