Go to the first, previous, next, last section, table of contents.
As you have already seen, each @command{awk} statement consists of a pattern with an associated action. This major node describes how you build patterns and actions, what kinds of things you can do within actions, and @command{awk}'s built-in variables.
The pattern-action rules and the statements available for use within actions form the core of @command{awk} programming. In a sense, everything covered up to here has been the foundation that programs are built on top of. Now it's time to start building something useful.
Patterns in @command{awk} control the execution of rules--a rule is executed when its pattern matches the current input record. The following is a summary of the types of patterns in @command{awk}:
/regular expression/
expression
pat1, pat2
BEGIN
END
BEGIN
and END
Special Patterns.)
empty
Regular expressions are one of the first kinds of patterns presented in this book. This kind of pattern is simply a regexp constant in the pattern part of a rule. Its meaning is `$0 ~ /pattern/'. The pattern matches when the input record matches the regexp. For example:
/foo|bar|baz/ { buzzwords++ } END { print buzzwords, "buzzwords seen" }
Any @command{awk} expression is valid as an @command{awk} pattern.
The pattern matches if the expression's value is nonzero (if a
number) or non-null (if a string).
The expression is reevaluated each time the rule is tested against a new
input record. If the expression uses fields such as $1
, the
value depends directly on the new input record's text; otherwise it
depends on only what has happened so far in the execution of the
@command{awk} program.
Comparison expressions, using the comparison operators described in
section Variable Typing and Comparison Expressions,
are a very common kind of pattern.
Regexp matching and non-matching are also very common expressions.
The left operand of the `~' and `!~' operators is a string.
The right operand is either a constant regular expression enclosed in
slashes (/regexp/
), or any expression whose string value
is used as a dynamic regular expression
(see section Using Dynamic Regexps).
The following example prints the second field of each input record
whose first field is precisely `foo':
$ awk '$1 == "foo" { print $2 }' BBS-list
(There is no output, because there is no BBS site with the exact name `foo'.) Contrast this with the following regular expression match, which accepts any record with a first field that contains `foo':
$ awk '$1 ~ /foo/ { print $2 }' BBS-list -| 555-1234 -| 555-6699 -| 555-6480 -| 555-2127
A regexp constant as a pattern is also a special case of an expression
pattern. The expression /foo/
has the value one if `foo'
appears in the current input record. Thus, as a pattern, /foo/
matches any record containing `foo'.
Boolean expressions are also commonly used as patterns. Whether the pattern matches an input record depends on whether its subexpressions match. For example, the following command prints all the records in `BBS-list' that contain both `2400' and `foo':
$ awk '/2400/ && /foo/' BBS-list -| fooey 555-1234 2400/1200/300 B
The following command prints all records in `BBS-list' that contain either `2400' or `foo' (or both, of course):
$ awk '/2400/ || /foo/' BBS-list -| alpo-net 555-3412 2400/1200/300 A -| bites 555-1675 2400/1200/300 A -| fooey 555-1234 2400/1200/300 B -| foot 555-6699 1200/300 B -| macfoo 555-6480 1200/300 A -| sdace 555-3430 2400/1200/300 A -| sabafoo 555-2127 1200/300 C
The following command prints all records in `BBS-list' that do not contain the string `foo':
$ awk '! /foo/' BBS-list -| aardvark 555-5553 1200/300 B -| alpo-net 555-3412 2400/1200/300 A -| barfly 555-7685 1200/300 A -| bites 555-1675 2400/1200/300 A -| camelot 555-0542 300 C -| core 555-2912 1200/300 C -| sdace 555-3430 2400/1200/300 A
The subexpressions of a Boolean operator in a pattern can be constant regular
expressions, comparisons, or any other @command{awk} expressions. Range
patterns are not expressions, so they cannot appear inside Boolean
patterns. Likewise, the special patterns BEGIN
and END
,
which never match any input record, are not expressions and cannot
appear inside Boolean patterns.
A range pattern is made of two patterns separated by a comma, in the form `begpat, endpat'. It is used to match ranges of consecutive input records. The first pattern, begpat, controls where the range begins, while endpat controls where the pattern ends. For example, the following:
awk '$1 == "on", $1 == "off"' myfile
prints every record in `myfile' between `on'/`off' pairs, inclusive.
A range pattern starts out by matching begpat against every input record. When a record matches begpat, the range pattern is turned on and the range pattern matches this record as well. As long as the range pattern stays turned on, it automatically matches every input record read. The range pattern also matches endpat against every input record; when this succeeds, the range pattern is turned off again for the following record. Then the range pattern goes back to checking begpat against each record.
The record that turns on the range pattern and the one that turns it
off both match the range pattern. If you don't want to operate on
these records, you can write if
statements in the rule's action
to distinguish them from the records you are interested in.
It is possible for a pattern to be turned on and off by the same
record. If the record satisfies both conditions, then the action is
executed for just that record.
For example, suppose there is text between two identical markers (say
the `%' symbol), each on its own line, that should be ignored.
A first attempt would be to
combine a range pattern that describes the delimited text with the
next
statement
(not discussed yet, see section The next
Statement).
This causes @command{awk} to skip any further processing of the current
record and start over again with the next input record. Such a program
looks like this:
/^%$/,/^%$/ { next } { print }
This program fails because the range pattern is both turned on and turned off by the first line, which just has a `%' on it. To accomplish this task, write the program in the following manner, using a flag:
/^%$/ { skip = ! skip; next } skip == 1 { next } # skip lines with `skip' set
In a range pattern, the comma (`,') has the lowest precedence of all the operators (i.e., it is evaluated last). Thus, the following program attempts to combine a range pattern with another simpler test:
echo Yes | awk '/1/,/2/ || /Yes/'
The intent of this program is `(/1/,/2/) || /Yes/'. However, @command{awk} interprets this as `/1/, (/2/ || /Yes/)'. This cannot be changed or worked around; range patterns do not combine with other patterns:
$ echo yes | gawk '(/1/,/2/) || /Yes/' error--> gawk: cmd. line:1: (/1/,/2/) || /Yes/ error--> gawk: cmd. line:1: ^ parse error error--> gawk: cmd. line:2: (/1/,/2/) || /Yes/ error--> gawk: cmd. line:2: ^ unexpected newline
BEGIN
and END
Special Patterns
All the patterns described so far are for matching input records.
The BEGIN
and END
special patterns are different.
They supply startup and cleanup actions for @command{awk} programs.
BEGIN
and END
rules must have actions; there is no default
action for these rules because there is no current record when they run.
BEGIN
and END
rules are often referred to as
"BEGIN
and END
blocks" by long-time @command{awk}
programmers.
A BEGIN
rule is executed once only, before the first input record
is read. Likewise, an END
rule is executed once only, after all the
input is read. For example:
$ awk ' > BEGIN { print "Analysis of \"foo\"" } > /foo/ { ++n } > END { print "\"foo\" appears", n, "times." }' BBS-list -| Analysis of "foo" -| "foo" appears 4 times.
This program finds the number of records in the input file `BBS-list'
that contain the string `foo'. The BEGIN
rule prints a title
for the report. There is no need to use the BEGIN
rule to
initialize the counter n
to zero, since @command{awk} does this
automatically (see section Variables).
The second rule increments the variable n
every time a
record containing the pattern `foo' is read. The END
rule
prints the value of n
at the end of the run.
The special patterns BEGIN
and END
cannot be used in ranges
or with Boolean operators (indeed, they cannot be used with any operators).
An @command{awk} program may have multiple BEGIN
and/or END
rules. They are executed in the order in which they appear: all the BEGIN
rules at startup and all the END
rules at termination.
BEGIN
and END
rules may be intermixed with other rules.
This feature was added in the 1987 version of @command{awk} and is included
in the POSIX standard.
The original (1978) version of @command{awk}
required the BEGIN
rule to be placed at the beginning of the
program, the END
rule to be placed at the end, and only allowed one of
each.
This is no longer required, but it is a good idea to follow this template
in terms of program organization and readability.
Multiple BEGIN
and END
rules are useful for writing
library functions, because each library file can have its own BEGIN
and/or
END
rule to do its own initialization and/or cleanup.
The order in which library functions are named on the command line
controls the order in which their BEGIN
and END
rules are
executed. Therefore you have to be careful when writing such rules in
library files so that the order in which they are executed doesn't matter.
See section Command-Line Options, for more information on
using library functions.
@xref{Library Functions, ,A Library of @command{awk} Functions},
for a number of useful library functions.
If an @command{awk} program only has a BEGIN
rule and no
other rules, then the program exits after the BEGIN
rule is
run.(23) used to keep
reading and ignoring input until end of file was seen.} However, if an
END
rule exists, then the input is read, even if there are
no other rules in the program. This is necessary in case the END
rule checks the FNR
and NR
variables.
BEGIN
and END
Rules
There are several (sometimes subtle) points to remember when doing I/O
from a BEGIN
or END
rule.
The first has to do with the value of $0
in a BEGIN
rule. Because BEGIN
rules are executed before any input is read,
there simply is no input record, and therefore no fields, when
executing BEGIN
rules. References to $0
and the fields
yield a null string or zero, depending upon the context. One way
to give $0
a real value is to execute a getline
command
without a variable (see section Explicit Input with getline
).
Another way is to simply assign a value to $0
.
The second point is similar to the first but from the other direction.
Traditionally, due largely to implementation issues, $0
and
NF
were undefined inside an END
rule.
The POSIX standard specifies that NF
is available in an END
rule. It contains the number of fields from the last input record.
Most probably due to an oversight, the standard does not say that $0
is also preserved, although logically one would think that it should be.
In fact, @command{gawk} does preserve the value of $0
for use in
END
rules. Be aware, however, that Unix @command{awk}, and possibly
other implementations, do not.
The third point follows from the first two. The meaning of `print'
inside a BEGIN
or END
rule is the same as always:
`print $0'. If $0
is the null string, then this prints an
empty line. Many long time @command{awk} programmers use an unadorned
`print' in BEGIN
and END
rules, to mean `print ""',
relying on $0
being null. Although one might generally get away with
this in BEGIN
rules, it is a very bad idea in END
rules,
at least in @command{gawk}. It is also poor style, since if an empty
line is needed in the output, the program should print one explicitly.
Finally, the next
and nextfile
statements are not allowed
in a BEGIN
rule, because the implicit
read-a-record-and-match-against-the-rules loop has not started yet. Similarly, those statements
are not valid in an END
rule, since all the input has been read.
(See section The next
Statement, and see
@ref{Nextfile Statement, ,Using @command{gawk}'s nextfile
Statement}.)
An empty (i.e., non-existent) pattern is considered to match every input record. For example, the program:
awk '{ print $1 }' BBS-list
prints the first field of every record.
@command{awk} programs are often used as components in larger programs written in shell. For example, it is very common to use a shell variable to hold a pattern that the @command{awk} program searches for. There are two ways to get the value of the shell variable into the body of the @command{awk} program.
The most common method is to use shell quoting to substitute the variable's value into the program inside the script. For example, in the following program:
echo -n "Enter search pattern: " read pattern awk "/$pattern/ "'{ nmatches++ } END { print nmatches, "found" }' /path/to/data
the @command{awk} program consists of two pieces of quoted text
that are concatenated together to form the program.
The first part is double-quoted, which allows substitution of
the pattern
variable inside the quotes.
The second part is single-quoted.
Variable substitution via quoting works, but can be potentially messy. It requires a good understanding of the shell's quoting rules (see section Shell Quoting Issues), and it's often difficult to correctly match up the quotes when reading the program.
A better method is to use @command{awk}'s variable assignment feature (see section Assigning Variables on the Command Line) to assign the shell variable's value to an @command{awk} variable's value. Then use dynamic regexps to match the pattern (see section Using Dynamic Regexps). The following shows how to redo the previous example using this technique:
echo -n "Enter search pattern: " read pattern awk -v pat="$pattern" '$0 ~ pat { nmatches++ } END { print nmatches, "found" }' /path/to/data
Now, the @command{awk} program is just one single-quoted string.
The assignment `-v pat="$pattern"' still requires double quotes,
in case there is whitespace in the value of $pattern
.
The @command{awk} variable pat
could be named pattern
too, but that would be more confusing. Using a variable also
provides more flexibility, since the variable can be used anywhere inside
the program--for printing, as an array subscript, or for any other
use--without requiring the quoting tricks at every point in the program.
An @command{awk} program or script consists of a series of rules and function definitions interspersed. (Functions are described later. See section User-Defined Functions.) A rule contains a pattern and an action, either of which (but not both) may be omitted. The purpose of the action is to tell @command{awk} what to do once a match for the pattern is found. Thus, in outline, an @command{awk} program generally looks like this:
[pattern] [{ action }] [pattern] [{ action }] ... function name(args) { ... } ...
An action consists of one or more @command{awk} statements, enclosed in curly braces (`{' and `}'). Each statement specifies one thing to do. The statements are separated by newlines or semicolons. The curly braces around an action must be used even if the action contains only one statement, or if it contains no statements at all. However, if you omit the action entirely, omit the curly braces as well. An omitted action is equivalent to `{ print $0 }':
/foo/ { } matchfoo
, do nothing -- empty action /foo/ matchfoo
, print the record -- omitted action
The following types of statements are supported in @command{awk}:
if
, for
, while
, and do
) as well as a few
special ones (see section Control Statements in Actions).
if
, while
, do
,
or for
statement.
getline
command
(see section Explicit Input with getline
), the next
statement (see section The next
Statement),
and the nextfile
statement
(@pxref{Nextfile Statement, ,Using @command{gawk}'s nextfile
Statement}).
print
and printf
.
See section Printing Output.
delete
Statement.
Control statements, such as if
, while
, and so on,
control the flow of execution in @command{awk} programs. Most of the
control statements in @command{awk} are patterned on similar statements in C.
All the control statements start with special keywords, such as if
and while
, to distinguish them from simple expressions.
Many control statements contain other statements. For example, the
if
statement contains another statement that may or may not be
executed. The contained statement is called the body.
To include more than one statement in the body, group them into a
single compound statement with curly braces, separating them with
newlines or semicolons.
if
-else
Statement
The if
-else
statement is @command{awk}'s decision-making
statement. It looks like this:
if (condition) then-body [else else-body]
The condition is an expression that controls what the rest of the
statement does. If the condition is true, then-body is
executed; otherwise, else-body is executed.
The else
part of the statement is
optional. The condition is considered false if its value is zero or
the null string; otherwise the condition is true.
Refer to the following:
if (x % 2 == 0) print "x is even" else print "x is odd"
In this example, if the expression `x % 2 == 0' is true (that is,
if the value of x
is evenly divisible by two), then the first
print
statement is executed; otherwise the second print
statement is executed.
If the else
keyword appears on the same line as then-body and
then-body is not a compound statement (i.e., not surrounded by
curly braces), then a semicolon must separate then-body from
the else
.
To illustrate this, the previous example can be rewritten as:
if (x % 2 == 0) print "x is even"; else print "x is odd"
If the `;' is left out, @command{awk} can't interpret the statement and
it produces a syntax error. Don't actually write programs this way,
because a human reader might fail to see the else
if it is not
the first thing on its line.
while
Statement
In programming, a loop is a part of a program that can
be executed two or more times in succession.
The while
statement is the simplest looping statement in
@command{awk}. It repeatedly executes a statement as long as a condition is
true. For example:
while (condition) body
body is a statement called the body of the loop,
and condition is an expression that controls how long the loop
keeps running.
The first thing the while
statement does is test the condition.
If the condition is true, it executes the statement body.
After body has been executed,
condition is tested again, and if it is still true, body is
executed again. This process repeats until the condition is no longer
true. If the condition is initially false, the body of the loop is
never executed and @command{awk} continues with the statement following
the loop.
This example prints the first three fields of each record, one per line:
awk '{ i = 1 while (i <= 3) { print $i i++ } }' inventory-shipped
The body of this loop is a compound statement enclosed in braces,
containing two statements.
The loop works in the following manner: first, the value of i
is set to one.
Then, the while
statement tests whether i
is less than or equal to
three. This is true when i
equals one, so the i
-th
field is printed. Then the `i++' increments the value of i
and the loop repeats. The loop terminates when i
reaches four.
A newline is not required between the condition and the body; however using one makes the program clearer unless the body is a compound statement or else is very simple. The newline after the open-brace that begins the compound statement is not required either, but the program is harder to read without it.
do
-while
Statement
The do
loop is a variation of the while
looping statement.
The do
loop executes the body once and then repeats the
body as long as the condition is true. It looks like this:
do body while (condition)
Even if the condition is false at the start, the body is
executed at least once (and only once, unless executing body
makes condition true). Contrast this with the corresponding
while
statement:
while (condition) body
This statement does not execute body even once if the condition
is false to begin with.
The following is an example of a do
statement:
{ i = 1 do { print $0 i++ } while (i <= 10) }
This program prints each input record ten times. However, it isn't a very
realistic example, since in this case an ordinary while
would do
just as well. This situation reflects actual experience; only
occasionally is there a real use for a do
statement.
for
Statement
The for
statement makes it more convenient to count iterations of a
loop. The general form of the for
statement looks like this:
for (initialization; condition; increment) body
The initialization, condition, and increment parts are arbitrary @command{awk} expressions, and body stands for any @command{awk} statement.
The for
statement starts by executing initialization.
Then, as long
as the condition is true, it repeatedly executes body and then
increment. Typically, initialization sets a variable to
either zero or one, increment adds one to it, and condition
compares it against the desired number of iterations.
For example:
awk '{ for (i = 1; i <= 3; i++) print $i }' inventory-shipped
This prints the first three fields of each input record, with one field per line.
It isn't possible to
set more than one variable in the
initialization part without using a multiple assignment statement
such as `x = y = 0'. This makes sense only if all the initial values
are equal. (But it is possible to initialize additional variables by writing
their assignments as separate statements preceding the for
loop.)
The same is true of the increment part. Incrementing additional variables requires separate statements at the end of the loop. The C compound expression, using C's comma operator, is useful in this context but it is not supported in @command{awk}.
Most often, increment is an increment expression, as in the previous example. But this is not required; it can be any expression whatsoever. For example, the following statement prints all the powers of two between 1 and 100:
for (i = 1; i <= 100; i *= 2) print i
If there is nothing to be done, any of the three expressions in the
parentheses following the for
keyword may be omitted. Thus,
`for (; x > 0;)' is equivalent to `while (x > 0)'. If the
condition is omitted, it is treated as true, effectively
yielding an infinite loop (i.e., a loop that never terminates).
In most cases, a for
loop is an abbreviation for a while
loop, as shown here:
initialization while (condition) { body increment }
The only exception is when the continue
statement
(see section The continue
Statement) is used
inside the loop. Changing a for
statement to a while
statement in this way can change the effect of the continue
statement inside the loop.
The @command{awk} language has a for
statement in addition to a
while
statement because a for
loop is often both less work to
type and more natural to think of. Counting the number of iterations is
very common in loops. It can be easier to think of this counting as part
of looping rather than as something to do inside the loop.
break
Statement
The break
statement jumps out of the innermost for
,
while
, or do
loop that encloses it. The following example
finds the smallest divisor of any integer, and also identifies prime
numbers:
# find smallest divisor of num { num = $1 for (div = 2; div*div <= num; div++) if (num % div == 0) break if (num % div == 0) printf "Smallest divisor of %d is %d\n", num, div else printf "%d is prime\n", num }
When the remainder is zero in the first if
statement, @command{awk}
immediately breaks out of the containing for
loop. This means
that @command{awk} proceeds immediately to the statement following the loop
and continues processing. (This is very different from the exit
statement, which stops the entire @command{awk} program.
See section The exit
Statement.)
Th following program illustrates how the condition of a for
or while
statement could be replaced with a break
inside
an if
:
# find smallest divisor of num { num = $1 for (div = 2; ; div++) { if (num % div == 0) { printf "Smallest divisor of %d is %d\n", num, div break } if (div*div > num) { printf "%d is prime\n", num break } } }
The break
statement has no meaning when
used outside the body of a loop. However, although it was never documented,
historical implementations of @command{awk} treated the break
statement outside of a loop as if it were a next
statement
(see section The next
Statement).
Recent versions of Unix @command{awk} no longer allow this usage.
@command{gawk} supports this use of break
only
if @option{--traditional}
has been specified on the command line
(see section Command-Line Options).
Otherwise, it is treated as an error, since the POSIX standard
specifies that break
should only be used inside the body of a
loop.
(d.c.)
continue
Statement
As with break
, the continue
statement is used only inside
for
, while
, and do
loops. It skips
over the rest of the loop body, causing the next cycle around the loop
to begin immediately. Contrast this with break
, which jumps out
of the loop altogether.
The continue
statement in a for
loop directs @command{awk} to
skip the rest of the body of the loop and resume execution with the
increment-expression of the for
statement. The following program
illustrates this fact:
BEGIN { for (x = 0; x <= 20; x++) { if (x == 5) continue printf "%d ", x } print "" }
This program prints all the numbers from 0 to 20--except for five, for
which the printf
is skipped. Because the increment `x++'
is not skipped, x
does not remain stuck at five. Contrast the
for
loop from the previous example with the following while
loop:
BEGIN { x = 0 while (x <= 20) { if (x == 5) continue printf "%d ", x x++ } print "" }
This program loops forever once x
reaches five.
The continue
statement has no meaning when used outside the body of
a loop. Historical versions of @command{awk} treated a continue
statement outside a loop the same way they treated a break
statement outside a loop: as if it were a next
statement
(see section The next
Statement).
Recent versions of Unix @command{awk} no longer work this way, and
@command{gawk} allows it only if @option{--traditional} is specified on
the command line (see section Command-Line Options). Just like the
break
statement, the POSIX standard specifies that continue
should only be used inside the body of a loop.
(d.c.)
next
Statement
The next
statement forces @command{awk} to immediately stop processing
the current record and go on to the next record. This means that no
further rules are executed for the current record, and the rest of the
current rule's action isn't executed.
Contrast this with the effect of the getline
function
(see section Explicit Input with getline
). That also causes
@command{awk} to read the next record immediately, but it does not alter the
flow of control in any way (i.e., the rest of the current action executes
with a new input record).
At the highest level, @command{awk} program execution is a loop that reads
an input record and then tests each rule's pattern against it. If you
think of this loop as a for
statement whose body contains the
rules, then the next
statement is analogous to a continue
statement. It skips to the end of the body of this implicit loop and
executes the increment (which reads another record).
For example, suppose an @command{awk} program works only on records with four fields, and it shouldn't fail when given bad input. To avoid complicating the rest of the program, write a "weed out" rule near the beginning, in the following manner:
NF != 4 { err = sprintf("%s:%d: skipped: NF != 4\n", FILENAME, FNR) print err > "/dev/stderr" next }
Because of the next
statement,
the program's subsequent rules won't see the bad record. The error
message is redirected to the standard error output stream, as error
messages should be.
@xref{Special Files, ,Special File Names in @command{gawk}}.
According to the POSIX standard, the behavior is undefined if
the next
statement is used in a BEGIN
or END
rule.
@command{gawk} treats it as a syntax error.
Although POSIX permits it,
some other @command{awk} implementations don't allow the next
statement inside function bodies
(see section User-Defined Functions).
Just as with any other next
statement, a next
statement inside a
function body reads the next record and starts processing it with the
first rule in the program.
If the next
statement causes the end of the input to be reached,
then the code in any END
rules is executed.
See section The BEGIN
and END
Special Patterns.
nextfile
Statement
@command{gawk} provides the nextfile
statement,
which is similar to the next
statement.
However, instead of abandoning processing of the current record, the
nextfile
statement instructs @command{gawk} to stop processing the
current data file.
The nextfile
statement is a @command{gawk} extension.
In most other @command{awk} implementations,
or if @command{gawk} is in compatibility mode
(see section Command-Line Options),
nextfile
is not special.
Upon execution of the nextfile
statement, FILENAME
is
updated to the name of the next data file listed on the command line,
FNR
is reset to one, ARGIND
is incremented, and processing
starts over with the first rule in the program.
(ARGIND
hasn't been introduced yet. See section Built-in Variables.)
If the nextfile
statement causes the end of the input to be reached,
then the code in any END
rules is executed.
See section The BEGIN
and END
Special Patterns.
The nextfile
statement is useful when there are many data files
to process but it isn't necessary to process every record in every file.
Normally, in order to move on to the next data file, a program
has to continue scanning the unwanted records. The nextfile
statement accomplishes this much more efficiently.
While one might think that `close(FILENAME)' would accomplish
the same as nextfile
, this isn't true. close
is
reserved for closing files, pipes, and coprocesses that are
opened with redirections. It is not related to the main processing that
@command{awk} does with the files listed in ARGV
.
If it's necessary to use an @command{awk} version that doesn't support
nextfile
, see
section Implementing nextfile
as a Function,
for a user-defined function that simulates the nextfile
statement.
The current version of the Bell Laboratories @command{awk}
(@pxref{Other Versions, ,Other Freely Available @command{awk} Implementations})
also supports nextfile
. However, it doesn't allow the nextfile
statement inside function bodies
(see section User-Defined Functions).
@command{gawk} does; a nextfile
inside a
function body reads the next record and starts processing it with the
first rule in the program, just as any other nextfile
statement.
Caution: Versions of @command{gawk} prior to 3.0 used two
words (`next file') for the nextfile
statement.
In version 3.0, this was changed
to one word, because the treatment of `file' was
inconsistent. When it appeared after next
, `file' was a keyword;
otherwise, it was a regular identifier. The old usage is no longer
accepted; `next file' generates a syntax error.
exit
Statement
The exit
statement causes @command{awk} to immediately stop
executing the current rule and to stop processing input; any remaining input
is ignored. The exit
statement is written as follows:
exit [return code]
When an exit
statement is executed from a BEGIN
rule, the
program stops processing everything immediately. No input records are
read. However, if an END
rule is present,
as part of executing the exit
statement,
the END
rule is executed
(see section The BEGIN
and END
Special Patterns).
If exit
is used as part of an END
rule, it causes
the program to stop immediately.
An exit
statement that is not part of a BEGIN
or END
rule stops the execution of any further automatic rules for the current
record, skips reading any remaining input records, and executes the
END
rule if there is one.
In such a case,
if you don't want the END
rule to do its job, set a variable
to nonzero before the exit
statement and check that variable in
the END
rule.
See section Assertions,
for an example that does this.
If an argument is supplied to exit
, its value is used as the exit
status code for the @command{awk} process. If no argument is supplied,
exit
returns status zero (success). In the case where an argument
is supplied to a first exit
statement, and then exit
is
called a second time from an END
rule with no argument,
@command{awk} uses the previously supplied exit value.
(d.c.)
For example, suppose an error condition occurs that is difficult or
impossible to handle. Conventionally, programs report this by
exiting with a nonzero status. An @command{awk} program can do this
using an exit
statement with a nonzero argument, as shown
in the following example:
BEGIN { if (("date" | getline date_now) <= 0) { print "Can't get system date" > "/dev/stderr" exit 1 } print "current date is", date_now close("date") }
Most @command{awk} variables are available for you to use for your own purposes; they never change unless your program assigns values to them, and they never affect anything unless your program examines them. However, a few variables in @command{awk} have special built-in meanings. @command{awk} examines some of these automatically, so that they enable you to tell @command{awk} how to do certain things. Others are set automatically by @command{awk}, so that they carry information from the internal workings of @command{awk} to your program.
This minor node documents all the built-in variables of @command{gawk}, most of which are also documented in the chapters describing their areas of activity.
The following is an alphabetical list of variables that you can change to control how @command{awk} does certain things. The variables that are specific to @command{gawk} are marked with a pound sign (`#').
BINMODE #
"r"
or "w"
specify that input files and
output files, respectively, should use binary I/O.
A string value of "rw"
or "wr"
indicates that all
files should use binary I/O.
Any other string value is equivalent to "rw"
, but @command{gawk}
generates a warning message.
BINMODE
is described in more detail in
@ref{PC Using, ,Using @command{gawk} on PC Operating Systems}.
This variable is a @command{gawk} extension.
In other @command{awk} implementations
(except @command{mawk},
@pxref{Other Versions, , Other Freely Available @command{awk} Implementations}),
or if @command{gawk} is in compatibility mode
(see section Command-Line Options),
it is not special.
CONVFMT
sprintf
function
(see section String Manipulation Functions).
Its default value is "%.6g"
.
CONVFMT
was introduced by the POSIX standard.
FIELDWIDTHS #
FIELDWIDTHS
overrides the use of FS
for field splitting.
See section Reading Fixed-Width Data, for more information.
If @command{gawk} is in compatibility mode
(see section Command-Line Options), then FIELDWIDTHS
has no special meaning, and field-splitting operations occur based
exclusively on the value of FS
.
FS
""
), then each
character in the record becomes a separate field.
(This behavior is a @command{gawk} extension. POSIX @command{awk} does not
specify the behavior when FS
is the null string.)
The default value is " "
, a string consisting of a single
space. As a special exception, this value means that any
sequence of spaces, tabs, and/or newlines is a single separator.(24), newline does not count as whitespace.} It also causes
spaces, tabs, and newlines at the beginning and end of a record to be ignored.
You can set the value of FS
on the command line using the
@option{-F} option:
awk -F, 'program' input-filesIf @command{gawk} is using
FIELDWIDTHS
for field splitting,
assigning a value to FS
causes @command{gawk} to return to
the normal, FS
-based field splitting. An easy way to do this
is to simply say `FS = FS', perhaps with an explanatory comment.
IGNORECASE #
IGNORECASE
is nonzero or non-null, then all string comparisons
and all regular expression matching are case-independent. Thus, regexp
matching with `~' and `!~', as well as the gensub
,
gsub
, index
, match
, split
, and sub
functions, record termination with RS
, and field splitting with
FS
, all ignore case when doing their particular regexp operations.
However, the value of IGNORECASE
does not affect array subscripting.
See section Case Sensitivity in Matching.
If @command{gawk} is in compatibility mode
(see section Command-Line Options),
then IGNORECASE
has no special meaning. Thus, string
and regexp operations are always case-sensitive.
LINT #
"fatal"
, lint warnings become fatal errors.
Any other true value prints non-fatal warnings.
Assigning a false value to LINT
turns off the lint warnings.
This variable is a @command{gawk} extension. It is not special
in other @command{awk} implementations. Unlike the other special variables,
changing LINT
does affect the production of lint warnings,
even if @command{gawk} is in compatibility mode. Much as
the @option{--lint} and @option{--traditional} options independently
control different aspects of @command{gawk}'s behavior, the control
of lint warnings during program execution is independent of the flavor
of @command{awk} being executed.
OFMT
print
statement. It works by being passed
as the first argument to the sprintf
function
(see section String Manipulation Functions).
Its default value is "%.6g"
. Earlier versions of @command{awk}
also used OFMT
to specify the format for converting numbers to
strings in general expressions; this is now done by CONVFMT
.
OFS
print
statement. Its
default value is " "
, a string consisting of a single space.
ORS
print
statement. Its default value is "\n"
, the newline
character. (See section Output Separators.)
RS
RS
to be a regular expression
is a @command{gawk} extension.
In most other @command{awk} implementations,
or if @command{gawk} is in compatibility mode
(see section Command-Line Options),
just the first character of RS
's value is used.
SUBSEP
"\034"
and is used to separate the parts of the indices of a
multidimensional array. Thus, the expression foo["A", "B"]
really accesses foo["A\034B"]
(see section Multidimensional Arrays).
TEXTDOMAIN #
dcgettext
and bindtextdomain
functions
(@pxref{Internationalization, ,Internationalization with @command{gawk}}).
The default value of TEXTDOMAIN
is "messages"
.
This variable is a @command{gawk} extension.
In other @command{awk} implementations,
or if @command{gawk} is in compatibility mode
(see section Command-Line Options),
it is not special.
The following is an alphabetical list of variables that @command{awk} sets automatically on certain occasions in order to provide information to your program. The variables that are specific to @command{gawk} are marked with an asterisk (`*').
ARGC, ARGV
ARGV
. ARGC
is the number of command-line
arguments present. See section Other Command-Line Arguments.
Unlike most @command{awk} arrays,
ARGV
is indexed from 0 to ARGC
- 1.
In the following example:
$ awk 'BEGIN { > for (i = 0; i < ARGC; i++) > print ARGV[i] > }' inventory-shipped BBS-list -| awk -| inventory-shipped -| BBS-list
ARGV[0]
contains "awk"
, ARGV[1]
contains "inventory-shipped"
and ARGV[2]
contains
"BBS-list"
. The value of ARGC
is three, one more than the
index of the last element in ARGV
, because the elements are numbered
from zero.
The names ARGC
and ARGV
, as well as the convention of indexing
the array from 0 to ARGC
- 1, are derived from the C language's
method of accessing command-line arguments.
The value of ARGV[0]
can vary from system to system.
Also, you should note that the program text is not included in
ARGV
, nor are any of @command{awk}'s command-line options.
See section Using ARGC
and ARGV
, for information
about how @command{awk} uses these variables.
ARGIND #
ARGV
of the current file being processed.
Every time @command{gawk} opens a new data file for processing, it sets
ARGIND
to the index in ARGV
of the file name.
When @command{gawk} is processing the input files,
`FILENAME == ARGV[ARGIND]' is always true.
This variable is useful in file processing; it allows you to tell how far
along you are in the list of data files as well as to distinguish between
successive instances of the same file name on the command line.
While you can change the value of ARGIND
within your @command{awk}
program, @command{gawk} automatically sets it to a new value when the
next file is opened.
This variable is a @command{gawk} extension.
In other @command{awk} implementations,
or if @command{gawk} is in compatibility mode
(see section Command-Line Options),
it is not special.
ENVIRON
ENVIRON["HOME"]
might be `/home/arnold'. Changing this array
does not affect the environment passed on to any programs that
@command{awk} may spawn via redirection or the system
function.
Some operating systems may not have environment variables.
On such systems, the ENVIRON
array is empty (except for
ENVIRON["AWKPATH"]
,
@pxref{AWKPATH Variable, ,The @env{AWKPATH} Environment Variable}).
ERRNO #
getline
,
during a read for getline
, or during a close
operation,
then ERRNO
contains a string describing the error.
This variable is a @command{gawk} extension.
In other @command{awk} implementations,
or if @command{gawk} is in compatibility mode
(see section Command-Line Options),
it is not special.
FILENAME
FILENAME
is set to "-"
.
FILENAME
is changed each time a new file is read
(see section Reading Input Files).
Inside a BEGIN
rule, the value of FILENAME
is
""
, since there are no input files being processed
yet.(25) initialized
FILENAME
to "-"
, even if there were data files to be
processed. This behavior was incorrect and should not be relied
upon in your programs.}
(d.c.)
Note though, that using getline
(see section Explicit Input with getline
)
inside a BEGIN
rule can give
FILENAME
a value.
FNR
FNR
is
incremented each time a new record is read
(see section Explicit Input with getline
). It is reinitialized
to zero each time a new input file is started.
NF
NF
is set each time a new record is read, when a new field is
created or when $0
changes (see section Examining Fields).
NR
NR
is incremented each time a new record is read.
PROCINFO #
PROCINFO["egid"]
getegid
system call.
PROCINFO["euid"]
geteuid
system call.
PROCINFO["FS"]
"FS"
if field splitting with FS
is in effect, or it is
"FIELDWIDTHS"
if field splitting with FIELDWIDTHS
is in effect.
PROCINFO["gid"]
getgid
system call.
PROCINFO["pgrpid"]
PROCINFO["pid"]
PROCINFO["ppid"]
PROCINFO["uid"]
getuid
system call.
"group1"
through "groupN"
for some N. N is the number of
supplementary groups that the process has. Use the in
operator
to test for these elements
(see section Referring to an Array Element).
This array is a @command{gawk} extension.
In other @command{awk} implementations,
or if @command{gawk} is in compatibility mode
(see section Command-Line Options),
it is not special.
RLENGTH
match
function
(see section String Manipulation Functions).
RLENGTH
is set by invoking the match
function. Its value
is the length of the matched string, or -1 if no match is found.
RSTART
match
function
(see section String Manipulation Functions).
RSTART
is set by invoking the match
function. Its value
is the position of the string where the matched substring starts, or zero
if no match was found.
RT #
RS
, the record separator.
This variable is a @command{gawk} extension.
In other @command{awk} implementations,
or if @command{gawk} is in compatibility mode
(see section Command-Line Options),
it is not special.
NR
and FNR
@command{awk} increments NR
and FNR
each time it reads a record, instead of setting them to the absolute
value of the number of records read. This means that a program can
change these variables and their new values are incremented for
each record.
(d.c.)
This is demonstrated in the following example:
$ echo '1 > 2 > 3 > 4' | awk 'NR == 2 { NR = 17 } > { print NR }' -| 1 -| 17 -| 18 -| 19
Before FNR
was added to the @command{awk} language
(see section Major Changes Between V7 and SVR3.1),
many @command{awk} programs used this feature to track the number of
records in a file by resetting NR
to zero when FILENAME
changed.
ARGC
and ARGV
section Built-in Variables That Convey Information,
presented the following program describing the information contained in ARGC
and ARGV
:
$ awk 'BEGIN { > for (i = 0; i < ARGC; i++) > print ARGV[i] > }' inventory-shipped BBS-list -| awk -| inventory-shipped -| BBS-list
In this example, ARGV[0]
contains `awk', ARGV[1]
contains `inventory-shipped', and ARGV[2]
contains
`BBS-list'.
Notice that the @command{awk} program is not entered in ARGV
. The
other special command-line options, with their arguments, are also not
entered. This includes variable assignments done with the @option{-v}
option (see section Command-Line Options).
Normal variable assignments on the command line are
treated as arguments and do show up in the ARGV
array:
$ cat showargs.awk -| BEGIN { -| printf "A=%d, B=%d\n", A, B -| for (i = 0; i < ARGC; i++) -| printf "\tARGV[%d] = %s\n", i, ARGV[i] -| } -| END { printf "A=%d, B=%d\n", A, B } $ awk -v A=1 -f showargs.awk B=2 /dev/null -| A=1, B=0 -| ARGV[0] = awk -| ARGV[1] = B=2 -| ARGV[2] = /dev/null -| A=1, B=2
A program can alter ARGC
and the elements of ARGV
.
Each time @command{awk} reaches the end of an input file, it uses the next
element of ARGV
as the name of the next input file. By storing a
different string there, a program can change which files are read.
Use "-"
to represent the standard input. Storing
additional elements and incrementing ARGC
causes
additional files to be read.
If the value of ARGC
is decreased, that eliminates input files
from the end of the list. By recording the old value of ARGC
elsewhere, a program can treat the eliminated arguments as
something other than file names.
To eliminate a file from the middle of the list, store the null string
(""
) into ARGV
in place of the file's name. As a
special feature, @command{awk} ignores file names that have been
replaced with the null string.
Another option is to
use the delete
statement to remove elements from
ARGV
(see section The delete
Statement).
All of these actions are typically done in the BEGIN
rule,
before actual processing of the input begins.
See section Splitting a Large File into Pieces, and see
section Duplicating Output into Multiple Files, for examples
of each way of removing elements from ARGV
.
The following fragment processes ARGV
in order to examine, and
then remove, command-line options:
BEGIN { for (i = 1; i < ARGC; i++) { if (ARGV[i] == "-v") verbose = 1 else if (ARGV[i] == "-d") debug = 1 else if (ARGV[i] ~ /^-?/) { e = sprintf("%s: unrecognized option -- %c", ARGV[0], substr(ARGV[i], 1, ,1)) print e > "/dev/stderr" } else break delete ARGV[i] } }
To actually get the options into the @command{awk} program, end the @command{awk} options with @option{--} and then supply the @command{awk} program's options, in the following manner:
awk -f myprog -- -v -d file1 file2 ...
This is not necessary in @command{gawk}. Unless @option{--posix} has
been specified, @command{gawk} silently puts any unrecognized options
into ARGV
for the @command{awk} program to deal with. As soon
as it sees an unknown option, @command{gawk} stops looking for other
options that it might otherwise recognize. The previous example with
@command{gawk} would be:
gawk -f myprog -d -v file1 file2 ...
Because @option{-d} is not a valid @command{gawk} option, it and the following @option{-v} are passed on to the @command{awk} program.
Go to the first, previous, next, last section, table of contents.