Go to the first, previous, next, last section, table of contents.


Preface

Several kinds of tasks occur repeatedly when working with text files. You might want to extract certain lines and discard the rest. Or you may need to make changes wherever certain patterns appear, but leave the rest of the file alone. Writing single-use programs for these tasks in languages such as C, C++ or Pascal is time-consuming and inconvenient. Such jobs are often easier with @command{awk}. The @command{awk} utility interprets a special-purpose programming language that makes it easy to handle simple data-reformatting jobs.

The GNU implementation of @command{awk} is called @command{gawk}; it is fully compatible with the System V Release 4 version of @command{awk}. @command{gawk} is also compatible with the POSIX specification of the @command{awk} language. This means that all properly written @command{awk} programs should work with @command{gawk}. Thus, we usually don't distinguish between @command{gawk} and other @command{awk} implementations.

Using @command{awk} allows you to:

In addition, @command{gawk} provides facilities that make it easy to:

This Info file teaches you about the @command{awk} language and how you can use it effectively. You should already be familiar with basic system commands, such as @command{cat} and @command{ls},(1) as well as basic shell facilities, such as Input/Output (I/O) redirection and pipes.

Implementations of the @command{awk} language are available for many different computing environments. This Info file, while describing the @command{awk} language in general, also describes the particular implementation of @command{awk} called @command{gawk} (which stands for "GNU awk"). @command{gawk} runs on a broad range of Unix systems, ranging from 80386 PC-based computers, up through large-scale systems, such as Crays. @command{gawk} has also been ported to Mac OS X, MS-DOS, Microsoft Windows (all versions) and OS/2 PC's, Atari and Amiga micro-computers, BeOS, Tandem D20, and VMS.

History of @command{awk} and @command{gawk}

Recipe For A Programming Language

1 part egrep 1 part snobol
2 parts ed 3 parts C

Blend all parts well using lex and yacc. Document minimally and release.

After eight years, add another part egrep and two more parts C. Document very well and release.

The name @command{awk} comes from the initials of its designers: Alfred V. Aho, Peter J. Weinberger and Brian W. Kernighan. The original version of @command{awk} was written in 1977 at AT&T Bell Laboratories. In 1985, a new version made the programming language more powerful, introducing user-defined functions, multiple input streams, and computed regular expressions. This new version became widely available with Unix System V Release 3.1 (SVR3.1). The version in SVR4 added some new features and cleaned up the behavior in some of the "dark corners" of the language. The specification for @command{awk} in the POSIX Command Language and Utilities standard further clarified the language. Both the @command{gawk} designers and the original Bell Laboratories @command{awk} designers provided feedback for the POSIX specification. Paul Rubin wrote the GNU implementation, @command{gawk}, in 1986. Jay Fenlason completed it, with advice from Richard Stallman. John Woods contributed parts of the code as well. In 1988 and 1989, David Trueman, with help from me, thoroughly reworked @command{gawk} for compatibility with the newer @command{awk}. Circa 1995, I became the primary maintainer. Current development focuses on bug fixes, performance improvements, standards compliance, and occasionally, new features. In May of 1997, J@"urgen Kahrs felt the need for network access from @command{awk}, and with a little help from me, set about adding features to do this for @command{gawk}. At that time, he also wrote the bulk of TCP/IP Internetworking with @command{gawk} (a separate document, available as part of the @command{gawk} distribution). His code finally became part of the main @command{gawk} distribution with @command{gawk} version 3.1. @xref{Contributors, ,Major Contributors to @command{gawk}}, for a complete list of those who made important contributions to @command{gawk}.

A Rose by Any Other Name

The @command{awk} language has evolved over the years. Full details are provided in @ref{Language History, ,The Evolution of the @command{awk} Language}. The language described in this Info file is often referred to as "new @command{awk}" (@command{nawk}).

Because of this, many systems have multiple versions of @command{awk}. Some systems have an @command{awk} utility that implements the original version of the @command{awk} language and a @command{nawk} utility for the new version. Others have an @command{oawk} for the "old @command{awk}" language and plain @command{awk} for the new one. Still others only have one version, which is usually the new one.(2) for their @command{awk} implementation!}

All in all, this makes it difficult for you to know which version of @command{awk} you should run when writing your programs. The best advice I can give here is to check your local documentation. Look for @command{awk}, @command{oawk}, and @command{nawk}, as well as for @command{gawk}. It is likely that you already have some version of new @command{awk} on your system, which is what you should use when running your programs. (Of course, if you're reading this Info file, chances are good that you have @command{gawk}!)

Throughout this Info file, whenever we refer to a language feature that should be available in any complete implementation of POSIX @command{awk}, we simply use the term @command{awk}. When referring to a feature that is specific to the GNU implementation, we use the term @command{gawk}.

Using This Book

Documentation is like sex: when it is good, it is very, very good; and when it is bad, it is better than nothing.
Dick Brandon

The term @command{awk} refers to a particular program as well as to the language you use to tell this program what to do. When we need to be careful, we call the program "the @command{awk} utility" and the language "the @command{awk} language." This Info file explains both the @command{awk} language and how to run the @command{awk} utility. The term @command{awk program} refers to a program written by you in the @command{awk} programming language.

Primarily, this Info file explains the features of @command{awk}, as defined in the POSIX standard. It does so in the context of the @command{gawk} implementation. While doing so, it also attempts to describe important differences between @command{gawk} and other @command{awk} implementations.(3) and @command{awk}."} Finally, any @command{gawk} features that are not in the POSIX standard for @command{awk} are noted.

@ifnotinfo This Info file has the difficult task of being both a tutorial and a reference. If you are a novice, feel free to skip over details that seem too complex. You should also ignore the many cross references; they are for the expert user and for the online Info version of the document.

There are subsections labelled as Advanced Notes scattered throughout the Info file. They add a more complete explanation of points that are relevant, but not likely to be of interest on first reading. All appear in the index, under the heading "advanced notes."

Most of the time, the examples use complete @command{awk} programs. In some of the more advanced sections, only the part of the @command{awk} program that illustrates the concept currently being described is shown.

While this Info file is aimed principally at people who have not been exposed to @command{awk}, there is a lot of information here that even the @command{awk} expert should find useful. In particular, the description of POSIX @command{awk} and the example programs in @ref{Library Functions, ,A Library of @command{awk} Functions}, and in @ref{Sample Programs, ,Practical @command{awk} Programs}, should be of interest.

@ref{Getting Started, ,Getting Started with @command{awk}}, provides the essentials you need to know to begin using @command{awk}.

section Regular Expressions, introduces regular expressions in general, and in particular the flavors supported by POSIX @command{awk} and @command{gawk}.

section Reading Input Files, describes how @command{awk} reads your data. It introduces the concepts of records and fields, as well as the getline command. I/O redirection is first described here.

section Printing Output, describes how @command{awk} programs can produce output with print and printf.

section Expressions, describes expressions, which are the basic building blocks for getting most things done in a program.

section Patterns, Actions, and Variables, describes how to write patterns for matching records, actions for doing something when a record is matched, and the built-in variables @command{awk} and @command{gawk} use.

@ref{Arrays, ,Arrays in @command{awk}}, covers @command{awk}'s one-and-only data structure: associative arrays. Deleting array elements and whole arrays is also described, as well as sorting arrays in @command{gawk}.

section Functions, describes the built-in functions @command{awk} and @command{gawk} provide for you, as well as how to define your own functions.

@ref{Internationalization, ,Internationalization with @command{gawk}}, describes special features in @command{gawk} for translating program messages into different languages at runtime.

@ref{Advanced Features, ,Advanced Features of @command{gawk}}, describes a number of @command{gawk}-specific advanced features. Of particular note are the abilities to have two-way communications with another process, perform TCP/IP networking, and profile your @command{awk} programs.

@ref{Invoking Gawk, ,Running @command{awk} and @command{gawk}}, describes how to run @command{gawk}, the meaning of its command-line options, and how it finds @command{awk} program source files.

@ref{Library Functions, ,A Library of @command{awk} Functions}, and @ref{Sample Programs, ,Practical @command{awk} Programs}, provide many sample @command{awk} programs. Reading them allows you to see @command{awk} being used for solving real problems.

@ref{Language History, ,The Evolution of the @command{awk} Language}, describes how the @command{awk} language has evolved since it was first released to present. It also describes how @command{gawk} has acquired features over time.

@ref{Installation, ,Installing @command{gawk}}, describes how to get @command{gawk}, how to compile it under Unix, and how to compile and use it on different non-Unix systems. It also describes how to report bugs in @command{gawk} and where to get three other freely available implementations of @command{awk}.

section Implementation Notes, describes how to disable @command{gawk}'s extensions, as well as how to contribute new code to @command{gawk}, how to write extension libraries, and some possible future directions for @command{gawk} development.

section Basic Programming Concepts, provides some very cursory background material for those who are completely unfamiliar with computer programming. Also centralized there is a discussion of some of the issues involved in using floating-point numbers.

The section Glossary, defines most, if not all, the significant terms used throughout the book. If you find terms that you aren't familiar with, try looking them up.

section GNU General Public License, and section GNU Free Documentation License, present the licenses that cover the @command{gawk} source code, and this Info file, respectively.

Typographical Conventions

This Info file is written using Texinfo, the GNU documentation formatting language. A single Texinfo source file is used to produce both the printed and online versions of the documentation. Because of this, the typographical conventions are slightly different than in other books you may have read. @ifnottex This minor node briefly documents the typographical conventions used in Texinfo.

Examples you would type at the command-line are preceded by the common shell primary and secondary prompts, `$' and `>'. Output from the command is preceded by the glyph "-|". This typically represents the command's standard output. Error messages, and other output on the command's standard error, are preceded by the glyph "error-->". For example:

$ echo hi on stdout
-| hi on stdout
$ echo hello on stderr 1>&2
error--> hello on stderr

In the text, command names appear in this font, while code segments appear in the same font and quoted, `like this'. Some things are emphasized like this, and if a point needs to be made strongly, it is done like this. The first occurrence of a new term is usually its definition and appears in the same font as the previous occurrence of "definition" in this sentence. file names are indicated like this: `/path/to/ourfile'.

Characters that you type at the keyboard look like this. In particular, there are special characters called "control characters." These are characters that you type by holding down both the CONTROL key and another key, at the same time. For example, a Ctrl-d is typed by first pressing and holding the CONTROL key, next pressing the d key and finally releasing both keys.

Dark Corners

Dark corners are basically fractal -- no matter how much you illuminate, there's always a smaller but darker one.
Brian Kernighan

Until the POSIX standard (and The Gawk Manual), many features of @command{awk} were either poorly documented or not documented at all. Descriptions of such features (often called "dark corners") are noted in this Info file with the picture of a flashlight in the margin, as shown here. (d.c.) @ifnottex "(d.c.)". They also appear in the index under the heading "dark corner."

As noted by the opening quote, though, any coverage of dark corners is, by definition, something that is incomplete.

The GNU Project and This Book

Software is like sex: it's better when it's free.
Linus Torvalds

The Free Software Foundation (FSF) is a non-profit organization dedicated to the production and distribution of freely distributable software. It was founded by Richard M. Stallman, the author of the original Emacs editor. GNU Emacs is the most widely used version of Emacs today.

The GNU(4) Project is an ongoing effort on the part of the Free Software Foundation to create a complete, freely distributable, POSIX-compliant computing environment. The FSF uses the "GNU General Public License" (GPL) to ensure that their software's source code is always available to the end user. A copy of the GPL is included @ifnotinfo in this Info file for your reference (see section GNU General Public License). The GPL applies to the C language source code for @command{gawk}. To find out more about the FSF and the GNU Project online, see the GNU Project's home page. This Info file may also be read from their web site.

A shell, an editor (Emacs), highly portable optimizing C, C++, and Objective-C compilers, a symbolic debugger and dozens of large and small utilities (such as @command{gawk}), have all been completed and are freely available. The GNU operating system kernel (the HURD), has been released but is still in an early stage of development.

Until the GNU operating system is more fully developed, you should consider using GNU/Linux, a freely distributable, Unix-like operating system for Intel 80386, DEC Alpha, Sun SPARC, IBM S/390, and other systems.(5) There are many books on GNU/Linux. One that is freely available is Linux Installation and Getting Started, by Matt Welsh. Many GNU/Linux distributions are often available in computer stores or bundled on CD-ROMs with books about Linux. (There are three other freely available, Unix-like operating systems for 80386 and other systems: NetBSD, FreeBSD, and OpenBSD. All are based on the 4.4-Lite Berkeley Software Distribution, and they use recent versions of @command{gawk} for their versions of @command{awk}.)

@ifnotinfo The Info file you are reading now is actually free--at least, the information in it is free to anyone. The machine readable source code for the Info file comes with @command{gawk}; anyone may take this Info file to a copying machine and make as many copies of it as they like. (Take a moment to check the Free Documentation License; see section GNU Free Documentation License.)

Although you could just print it out yourself, bound books are much easier to read and use. Furthermore, the proceeds from sales of this book go back to the FSF to help fund development of more free software.

The Info file itself has gone through a number of previous editions. Paul Rubin wrote the very first draft of The GAWK Manual; it was around 40 pages in size. Diane Close and Richard Stallman improved it, yielding a version that was around 90 pages long and barely described the original, "old" version of @command{awk}.

I started working with that version in the fall of 1988. As work on it progressed, the FSF published several preliminary versions (numbered 0.x). In 1996, Edition 1.0 was released with @command{gawk} 3.0.0. The FSF published the first two editions under the title The GNU Awk User's Guide.

This edition maintains the basic structure of Edition 1.0, but with significant additional material, reflecting the host of new features in @command{gawk} version 3.1. Of particular note is @ref{Array Sorting, ,Sorting Array Values and Indices with @command{gawk}}, as well as @ref{Bitwise Functions, ,Using @command{gawk}'s Bit Manipulation Functions}, @ref{Internationalization, ,Internationalization with @command{gawk}}, and also @ref{Advanced Features, ,Advanced Features of @command{gawk}}, and @ref{Dynamic Extensions, ,Adding New Built-in Functions to @command{gawk}}.

GAWK: Effective AWK Programming will undoubtedly continue to evolve. An electronic version comes with the @command{gawk} distribution from the FSF. If you find an error in this Info file, please report it! See section Reporting Problems and Bugs, for information on submitting problem reports electronically, or write to me in care of the publisher.

How to Contribute

As the maintainer of GNU @command{awk}, I am starting a collection of publicly available @command{awk} programs. For more information, see ftp://ftp.freefriends.org/arnold/Awkstuff. If you have written an interesting @command{awk} program, or have written a @command{gawk} extension that you would like to share with the rest of the world, please contact me (arnold@gnu.org). Making things available on the Internet helps keep the @command{gawk} distribution down to manageable size.

Acknowledgments

The initial draft of The GAWK Manual had the following acknowledgments:

Many people need to be thanked for their assistance in producing this manual. Jay Fenlason contributed many ideas and sample programs. Richard Mlynarik and Robert Chassell gave helpful comments on drafts of this manual. The paper A Supplemental Document for @command{awk} by John W. Pierce of the Chemistry Department at UC San Diego, pinpointed several issues relevant both to @command{awk} implementation and to this manual, that would otherwise have escaped us.

I would like to acknowledge Richard M. Stallman, for his vision of a better world and for his courage in founding the FSF and starting the GNU project.

The following people (in alphabetical order) provided helpful comments on various versions of this book, up to and including this edition. Rick Adams, Nelson H.F. Beebe, Karl Berry, Dr. Michael Brennan, Rich Burridge, Claire Coutier, Diane Close, Scott Deifik, Christopher ("Topher") Eliot, Jeffrey Friedl, Dr. Darrel Hankerson, Michal Jaegermann, Dr. Richard J. LeBlanc, Michael Lijewski, Pat Rankin, Miriam Robbins, Mary Sheehan, and Chuck Toporek.

Robert J. Chassell provided much valuable advice on the use of Texinfo. He also deserves special thanks for convincing me not to title this Info file How To Gawk Politely. Karl Berry helped significantly with the TeX part of Texinfo.

I would like to thank Marshall and Elaine Hartholz of Seattle and Dr. Bert and Rita Schreiber of Detroit for large amounts of quiet vacation time in their homes, which allowed me to make significant progress on this Info file and on @command{gawk} itself.

Phil Hughes of SSC contributed in a very important way by loaning me his laptop GNU/Linux system, not once, but twice, which allowed me to do a lot of work while away from home.

David Trueman deserves special credit; he has done a yeoman job of evolving @command{gawk} so that it performs well and without bugs. Although he is no longer involved with @command{gawk}, working with him on this project was a significant pleasure.

The intrepid members of the GNITS mailing list, and most notably Ulrich Drepper, provided invaluable help and feedback for the design of the internationalization features.

Nelson Beebe, Martin Brown, Scott Deifik, Darrel Hankerson, Michal Jaegermann, J@"urgen Kahrs, Pat Rankin, Kai Uwe Rommel, and Eli Zaretskii (in alphabetical order) are long-time members of the @command{gawk} "crack portability team." Without their hard work and help, @command{gawk} would not be nearly the fine program it is today. It has been and continues to be a pleasure working with this team of fine people.

David and I would like to thank Brian Kernighan of Bell Laboratories for invaluable assistance during the testing and debugging of @command{gawk}, and for help in clarifying numerous points about the language. We could not have done nearly as good a job on either @command{gawk} or its documentation without his help.

Chuck Toporek, Mary Sheehan, and Claire Coutier of O'Reilly & Associates contributed significant editorial help for this Info file for the 3.1 release of @command{gawk}.

I must thank my wonderful wife, Miriam, for her patience through the many versions of this project, for her proof-reading, and for sharing me with the computer. I would like to thank my parents for their love, and for the grace with which they raised and educated me. Finally, I also must acknowledge my gratitude to G-d, for the many opportunities He has sent my way, as well as for the gifts He has given me with which to take advantage of those opportunities.

Arnold Robbins
Nof Ayalon
ISRAEL
March, 2001


Go to the first, previous, next, last section, table of contents.