plan9port

fork of plan9port with libvec, libstr and libsdb
Log | Files | Refs | README | LICENSE

awk.1 (11268B)


      1 .TH AWK 1
      2 .SH NAME
      3 awk \- pattern-directed scanning and processing language
      4 .SH SYNOPSIS
      5 .B awk
      6 [
      7 .B -F
      8 .I fs
      9 ]
     10 [
     11 .B -d
     12 ]
     13 [
     14 .BI -mf
     15 .I n
     16 ]
     17 [
     18 .B -mr
     19 .I n
     20 ]
     21 [
     22 .B -safe
     23 ]
     24 [
     25 .B -v
     26 .I var=value
     27 ]
     28 [
     29 .B -f
     30 .I progfile
     31 |
     32 .I prog
     33 ]
     34 [
     35 .I file ...
     36 ]
     37 .SH DESCRIPTION
     38 .I Awk
     39 scans each input
     40 .I file
     41 for lines that match any of a set of patterns specified literally in
     42 .I prog
     43 or in one or more files
     44 specified as
     45 .B -f
     46 .IR progfile .
     47 With each pattern
     48 there can be an associated action that will be performed
     49 when a line of a
     50 .I file
     51 matches the pattern.
     52 Each line is matched against the
     53 pattern portion of every pattern-action statement;
     54 the associated action is performed for each matched pattern.
     55 The file name 
     56 .L -
     57 means the standard input.
     58 Any
     59 .IR file
     60 of the form
     61 .I var=value
     62 is treated as an assignment, not a file name,
     63 and is executed at the time it would have been opened if it were a file name.
     64 The option
     65 .B -v
     66 followed by
     67 .I var=value
     68 is an assignment to be done before the program
     69 is executed;
     70 any number of
     71 .B -v
     72 options may be present.
     73 .B -F
     74 .IR fs
     75 option defines the input field separator to be the regular expression
     76 .IR fs .
     77 .PP
     78 An input line is normally made up of fields separated by white space,
     79 or by regular expression
     80 .BR FS .
     81 The fields are denoted
     82 .BR $1 ,
     83 .BR $2 ,
     84 \&..., while
     85 .B $0
     86 refers to the entire line.
     87 If
     88 .BR FS
     89 is null, the input line is split into one field per character.
     90 .PP
     91 To compensate for inadequate implementation of storage management,
     92 the 
     93 .B -mr
     94 option can be used to set the maximum size of the input record,
     95 and the
     96 .B -mf
     97 option to set the maximum number of fields.
     98 .PP
     99 The
    100 .B -safe
    101 option causes
    102 .I awk
    103 to run in 
    104 ``safe mode,''
    105 in which it is not allowed to 
    106 run shell commands or open files
    107 and the environment is not made available
    108 in the 
    109 .B ENVIRON
    110 variable.
    111 .PP
    112 A pattern-action statement has the form
    113 .IP
    114 .IB pattern " { " action " }
    115 .PP
    116 A missing 
    117 .BI { " action " }
    118 means print the line;
    119 a missing pattern always matches.
    120 Pattern-action statements are separated by newlines or semicolons.
    121 .PP
    122 An action is a sequence of statements.
    123 A statement can be one of the following:
    124 .PP
    125 .EX
    126 .ta \w'\fLdelete array[expression]'u
    127 if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
    128 while(\fI expression \fP)\fI statement\fP
    129 for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
    130 for(\fI var \fPin\fI array \fP)\fI statement\fP
    131 do\fI statement \fPwhile(\fI expression \fP)
    132 break
    133 continue
    134 {\fR [\fP\fI statement ... \fP\fR] \fP}
    135 \fIexpression\fP	#\fR commonly\fP\fI var = expression\fP
    136 print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
    137 printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
    138 return\fR [ \fP\fIexpression \fP\fR]\fP
    139 next	#\fR skip remaining patterns on this input line\fP
    140 nextfile	#\fR skip rest of this file, open next, start at top\fP
    141 delete\fI array\fP[\fI expression \fP]	#\fR delete an array element\fP
    142 delete\fI array\fP	#\fR delete all elements of array\fP
    143 exit\fR [ \fP\fIexpression \fP\fR]\fP	#\fR exit immediately; status is \fP\fIexpression\fP
    144 .EE
    145 .DT
    146 .PP
    147 Statements are terminated by
    148 semicolons, newlines or right braces.
    149 An empty
    150 .I expression-list
    151 stands for
    152 .BR $0 .
    153 String constants are quoted \&\fL"\ "\fR,
    154 with the usual C escapes recognized within.
    155 Expressions take on string or numeric values as appropriate,
    156 and are built using the operators
    157 .B + \- * / % ^
    158 (exponentiation), and concatenation (indicated by white space).
    159 The operators
    160 .B
    161 ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
    162 are also available in expressions.
    163 Variables may be scalars, array elements
    164 (denoted
    165 .IB x  [ i ] )
    166 or fields.
    167 Variables are initialized to the null string.
    168 Array subscripts may be any string,
    169 not necessarily numeric;
    170 this allows for a form of associative memory.
    171 Multiple subscripts such as
    172 .B [i,j,k]
    173 are permitted; the constituents are concatenated,
    174 separated by the value of
    175 .BR SUBSEP .
    176 .PP
    177 The
    178 .B print
    179 statement prints its arguments on the standard output
    180 (or on a file if
    181 .BI > file
    182 or
    183 .BI >> file
    184 is present or on a pipe if
    185 .BI | cmd
    186 is present), separated by the current output field separator,
    187 and terminated by the output record separator.
    188 .I file
    189 and
    190 .I cmd
    191 may be literal names or parenthesized expressions;
    192 identical string values in different statements denote
    193 the same open file.
    194 The
    195 .B printf
    196 statement formats its expression list according to the format
    197 (see
    198 .IR fprintf (3)) .
    199 The built-in function
    200 .BI close( expr )
    201 closes the file or pipe
    202 .IR expr .
    203 The built-in function
    204 .BI fflush( expr )
    205 flushes any buffered output for the file or pipe
    206 .IR expr .
    207 If
    208 .IR expr
    209 is omitted or is a null string, all open files are flushed.
    210 .PP
    211 The mathematical functions
    212 .BR exp ,
    213 .BR log ,
    214 .BR sqrt ,
    215 .BR sin ,
    216 .BR cos ,
    217 and
    218 .BR atan2 
    219 are built in.
    220 Other built-in functions:
    221 .TF length
    222 .TP
    223 .B length
    224 If its argument is a string, the string's length is returned.
    225 If its argument is an array, the number of subscripts in the array is returned.
    226 If no argument, the length of
    227 .B $0
    228 is returned.
    229 .TP
    230 .B rand
    231 random number on (0,1)
    232 .TP
    233 .B srand
    234 sets seed for
    235 .B rand
    236 and returns the previous seed.
    237 .TP
    238 .B int
    239 truncates to an integer value
    240 .TP
    241 .B utf
    242 converts its numerical argument, a character number, to a
    243 .SM UTF
    244 string
    245 .TP
    246 .BI substr( s , " m" , " n\fL)
    247 the
    248 .IR n -character
    249 substring of
    250 .I s
    251 that begins at position
    252 .IR m 
    253 counted from 1.
    254 .TP
    255 .BI index( s , " t" )
    256 the position in
    257 .I s
    258 where the string
    259 .I t
    260 occurs, or 0 if it does not.
    261 .TP
    262 .BI match( s , " r" )
    263 the position in
    264 .I s
    265 where the regular expression
    266 .I r
    267 occurs, or 0 if it does not.
    268 The variables
    269 .B RSTART
    270 and
    271 .B RLENGTH
    272 are set to the position and length of the matched string.
    273 .TP
    274 .BI split( s , " a" , " fs\fL)
    275 splits the string
    276 .I s
    277 into array elements
    278 .IB a [1]\f1,
    279 .IB a [2]\f1,
    280 \&...,
    281 .IB a [ n ]\f1,
    282 and returns
    283 .IR n .
    284 The separation is done with the regular expression
    285 .I fs
    286 or with the field separator
    287 .B FS
    288 if
    289 .I fs
    290 is not given.
    291 An empty string as field separator splits the string
    292 into one array element per character.
    293 .TP
    294 .BI sub( r , " t" , " s\fL)
    295 substitutes
    296 .I t
    297 for the first occurrence of the regular expression
    298 .I r
    299 in the string
    300 .IR s .
    301 If
    302 .I s
    303 is not given,
    304 .B $0
    305 is used.
    306 .TP
    307 .B gsub
    308 same as
    309 .B sub
    310 except that all occurrences of the regular expression
    311 are replaced;
    312 .B sub
    313 and
    314 .B gsub
    315 return the number of replacements.
    316 .TP
    317 .BI sprintf( fmt , " expr" , " ...\fL)
    318 the string resulting from formatting
    319 .I expr ...
    320 according to the
    321 .I printf
    322 format
    323 .I fmt
    324 .TP
    325 .BI system( cmd )
    326 executes
    327 .I cmd
    328 and returns its exit status
    329 .TP
    330 .BI tolower( str )
    331 returns a copy of
    332 .I str
    333 with all upper-case characters translated to their
    334 corresponding lower-case equivalents.
    335 .TP
    336 .BI toupper( str )
    337 returns a copy of
    338 .I str
    339 with all lower-case characters translated to their
    340 corresponding upper-case equivalents.
    341 .PD
    342 .PP
    343 The ``function''
    344 .B getline
    345 sets
    346 .B $0
    347 to the next input record from the current input file;
    348 .B getline
    349 .BI < file
    350 sets
    351 .B $0
    352 to the next record from
    353 .IR file .
    354 .B getline
    355 .I x
    356 sets variable
    357 .I x
    358 instead.
    359 Finally,
    360 .IB cmd " | getline
    361 pipes the output of
    362 .I cmd
    363 into
    364 .BR getline ;
    365 each call of
    366 .B getline
    367 returns the next line of output from
    368 .IR cmd .
    369 In all cases,
    370 .B getline
    371 returns 1 for a successful input,
    372 0 for end of file, and \-1 for an error.
    373 .PP
    374 Patterns are arbitrary Boolean combinations
    375 (with
    376 .BR "! || &&" )
    377 of regular expressions and
    378 relational expressions.
    379 Regular expressions are as in
    380 .MR regexp (7) .
    381 Isolated regular expressions
    382 in a pattern apply to the entire line.
    383 Regular expressions may also occur in
    384 relational expressions, using the operators
    385 .BR ~
    386 and
    387 .BR !~ .
    388 .BI / re /
    389 is a constant regular expression;
    390 any string (constant or variable) may be used
    391 as a regular expression, except in the position of an isolated regular expression
    392 in a pattern.
    393 .PP
    394 A pattern may consist of two patterns separated by a comma;
    395 in this case, the action is performed for all lines
    396 from an occurrence of the first pattern
    397 though an occurrence of the second.
    398 .PP
    399 A relational expression is one of the following:
    400 .IP
    401 .I expression matchop regular-expression
    402 .br
    403 .I expression relop expression
    404 .br
    405 .IB expression " in " array-name
    406 .br
    407 .BI ( expr , expr,... ") in " array-name
    408 .PP
    409 where a
    410 .I relop
    411 is any of the six relational operators in C,
    412 and a
    413 .I matchop
    414 is either
    415 .B ~
    416 (matches)
    417 or
    418 .B !~
    419 (does not match).
    420 A conditional is an arithmetic expression,
    421 a relational expression,
    422 or a Boolean combination
    423 of these.
    424 .PP
    425 The special patterns
    426 .B BEGIN
    427 and
    428 .B END
    429 may be used to capture control before the first input line is read
    430 and after the last.
    431 .B BEGIN
    432 and
    433 .B END
    434 do not combine with other patterns.
    435 .PP
    436 Variable names with special meanings:
    437 .TF FILENAME
    438 .TP
    439 .B CONVFMT
    440 conversion format used when converting numbers
    441 (default
    442 .BR "%.6g" )
    443 .TP
    444 .B FS
    445 regular expression used to separate fields; also settable
    446 by option
    447 .BI \-F fs\f1.
    448 .TP
    449 .BR NF
    450 number of fields in the current record
    451 .TP
    452 .B NR
    453 ordinal number of the current record
    454 .TP
    455 .B FNR
    456 ordinal number of the current record in the current file
    457 .TP
    458 .B FILENAME
    459 the name of the current input file
    460 .TP
    461 .B RS
    462 input record separator (default newline)
    463 .TP
    464 .B OFS
    465 output field separator (default blank)
    466 .TP
    467 .B ORS
    468 output record separator (default newline)
    469 .TP
    470 .B OFMT
    471 output format for numbers (default
    472 .BR "%.6g" )
    473 .TP
    474 .B SUBSEP
    475 separates multiple subscripts (default 034)
    476 .TP
    477 .B ARGC
    478 argument count, assignable
    479 .TP
    480 .B ARGV
    481 argument array, assignable;
    482 non-null members are taken as file names
    483 .TP
    484 .B ENVIRON
    485 array of environment variables; subscripts are names.
    486 .PD
    487 .PP
    488 Functions may be defined (at the position of a pattern-action statement) thus:
    489 .IP
    490 .L
    491 function foo(a, b, c) { ...; return x }
    492 .PP
    493 Parameters are passed by value if scalar and by reference if array name;
    494 functions may be called recursively.
    495 Parameters are local to the function; all other variables are global.
    496 Thus local variables may be created by providing excess parameters in
    497 the function definition.
    498 .SH EXAMPLES
    499 .TP
    500 .L
    501 length($0) > 72
    502 Print lines longer than 72 characters.
    503 .TP
    504 .L
    505 { print $2, $1 }
    506 Print first two fields in opposite order.
    507 .PP
    508 .EX
    509 BEGIN { FS = ",[ \et]*|[ \et]+" }
    510       { print $2, $1 }
    511 .EE
    512 .ns
    513 .IP
    514 Same, with input fields separated by comma and/or blanks and tabs.
    515 .PP
    516 .EX
    517 	{ s += $1 }
    518 END	{ print "sum is", s, " average is", s/NR }
    519 .EE
    520 .ns
    521 .IP
    522 Add up first column, print sum and average.
    523 .TP
    524 .L
    525 /start/, /stop/
    526 Print all lines between start/stop pairs.
    527 .PP
    528 .EX
    529 BEGIN	{	# Simulate echo(1)
    530 	for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
    531 	printf "\en"
    532 	exit }
    533 .EE
    534 .SH SOURCE
    535 .B \*9/src/cmd/awk
    536 .SH SEE ALSO
    537 .MR sed (1) ,
    538 .MR regexp (7) ,
    539 .br
    540 A. V. Aho, B. W. Kernighan, P. J. Weinberger,
    541 .I
    542 The AWK Programming Language,
    543 Addison-Wesley, 1988.  ISBN 0-201-07981-X
    544 .SH BUGS
    545 There are no explicit conversions between numbers and strings.
    546 To force an expression to be treated as a number add 0 to it;
    547 to force it to be treated as a string concatenate
    548 \&\fL""\fP to it.
    549 .PP
    550 The scope rules for variables in functions are a botch;
    551 the syntax is worse.
    552 .PP
    553 UTF is not always dealt with correctly,
    554 though
    555 .I awk
    556 does make an attempt to do so.
    557 The
    558 .I split
    559 function with an empty string as final argument now copes
    560 with UTF in the string being split.