awk.1 (11268B)
1 .TH AWK 1 2 .SH NAME 3 awk \- pattern-directed scanning and processing language 4 .SH SYNOPSIS 5 .B awk 6 [ 7 .B -F 8 .I fs 9 ] 10 [ 11 .B -d 12 ] 13 [ 14 .BI -mf 15 .I n 16 ] 17 [ 18 .B -mr 19 .I n 20 ] 21 [ 22 .B -safe 23 ] 24 [ 25 .B -v 26 .I var=value 27 ] 28 [ 29 .B -f 30 .I progfile 31 | 32 .I prog 33 ] 34 [ 35 .I file ... 36 ] 37 .SH DESCRIPTION 38 .I Awk 39 scans each input 40 .I file 41 for lines that match any of a set of patterns specified literally in 42 .I prog 43 or in one or more files 44 specified as 45 .B -f 46 .IR progfile . 47 With each pattern 48 there can be an associated action that will be performed 49 when a line of a 50 .I file 51 matches the pattern. 52 Each line is matched against the 53 pattern portion of every pattern-action statement; 54 the associated action is performed for each matched pattern. 55 The file name 56 .L - 57 means the standard input. 58 Any 59 .IR file 60 of the form 61 .I var=value 62 is treated as an assignment, not a file name, 63 and is executed at the time it would have been opened if it were a file name. 64 The option 65 .B -v 66 followed by 67 .I var=value 68 is an assignment to be done before the program 69 is executed; 70 any number of 71 .B -v 72 options may be present. 73 .B -F 74 .IR fs 75 option defines the input field separator to be the regular expression 76 .IR fs . 77 .PP 78 An input line is normally made up of fields separated by white space, 79 or by regular expression 80 .BR FS . 81 The fields are denoted 82 .BR $1 , 83 .BR $2 , 84 \&..., while 85 .B $0 86 refers to the entire line. 87 If 88 .BR FS 89 is null, the input line is split into one field per character. 90 .PP 91 To compensate for inadequate implementation of storage management, 92 the 93 .B -mr 94 option can be used to set the maximum size of the input record, 95 and the 96 .B -mf 97 option to set the maximum number of fields. 98 .PP 99 The 100 .B -safe 101 option causes 102 .I awk 103 to run in 104 ``safe mode,'' 105 in which it is not allowed to 106 run shell commands or open files 107 and the environment is not made available 108 in the 109 .B ENVIRON 110 variable. 111 .PP 112 A pattern-action statement has the form 113 .IP 114 .IB pattern " { " action " } 115 .PP 116 A missing 117 .BI { " action " } 118 means print the line; 119 a missing pattern always matches. 120 Pattern-action statements are separated by newlines or semicolons. 121 .PP 122 An action is a sequence of statements. 123 A statement can be one of the following: 124 .PP 125 .EX 126 .ta \w'\fLdelete array[expression]'u 127 if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP 128 while(\fI expression \fP)\fI statement\fP 129 for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP 130 for(\fI var \fPin\fI array \fP)\fI statement\fP 131 do\fI statement \fPwhile(\fI expression \fP) 132 break 133 continue 134 {\fR [\fP\fI statement ... \fP\fR] \fP} 135 \fIexpression\fP #\fR commonly\fP\fI var = expression\fP 136 print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP 137 printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP 138 return\fR [ \fP\fIexpression \fP\fR]\fP 139 next #\fR skip remaining patterns on this input line\fP 140 nextfile #\fR skip rest of this file, open next, start at top\fP 141 delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP 142 delete\fI array\fP #\fR delete all elements of array\fP 143 exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP 144 .EE 145 .DT 146 .PP 147 Statements are terminated by 148 semicolons, newlines or right braces. 149 An empty 150 .I expression-list 151 stands for 152 .BR $0 . 153 String constants are quoted \&\fL"\ "\fR, 154 with the usual C escapes recognized within. 155 Expressions take on string or numeric values as appropriate, 156 and are built using the operators 157 .B + \- * / % ^ 158 (exponentiation), and concatenation (indicated by white space). 159 The operators 160 .B 161 ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?: 162 are also available in expressions. 163 Variables may be scalars, array elements 164 (denoted 165 .IB x [ i ] ) 166 or fields. 167 Variables are initialized to the null string. 168 Array subscripts may be any string, 169 not necessarily numeric; 170 this allows for a form of associative memory. 171 Multiple subscripts such as 172 .B [i,j,k] 173 are permitted; the constituents are concatenated, 174 separated by the value of 175 .BR SUBSEP . 176 .PP 177 The 178 .B print 179 statement prints its arguments on the standard output 180 (or on a file if 181 .BI > file 182 or 183 .BI >> file 184 is present or on a pipe if 185 .BI | cmd 186 is present), separated by the current output field separator, 187 and terminated by the output record separator. 188 .I file 189 and 190 .I cmd 191 may be literal names or parenthesized expressions; 192 identical string values in different statements denote 193 the same open file. 194 The 195 .B printf 196 statement formats its expression list according to the format 197 (see 198 .IR fprintf (3)) . 199 The built-in function 200 .BI close( expr ) 201 closes the file or pipe 202 .IR expr . 203 The built-in function 204 .BI fflush( expr ) 205 flushes any buffered output for the file or pipe 206 .IR expr . 207 If 208 .IR expr 209 is omitted or is a null string, all open files are flushed. 210 .PP 211 The mathematical functions 212 .BR exp , 213 .BR log , 214 .BR sqrt , 215 .BR sin , 216 .BR cos , 217 and 218 .BR atan2 219 are built in. 220 Other built-in functions: 221 .TF length 222 .TP 223 .B length 224 If its argument is a string, the string's length is returned. 225 If its argument is an array, the number of subscripts in the array is returned. 226 If no argument, the length of 227 .B $0 228 is returned. 229 .TP 230 .B rand 231 random number on (0,1) 232 .TP 233 .B srand 234 sets seed for 235 .B rand 236 and returns the previous seed. 237 .TP 238 .B int 239 truncates to an integer value 240 .TP 241 .B utf 242 converts its numerical argument, a character number, to a 243 .SM UTF 244 string 245 .TP 246 .BI substr( s , " m" , " n\fL) 247 the 248 .IR n -character 249 substring of 250 .I s 251 that begins at position 252 .IR m 253 counted from 1. 254 .TP 255 .BI index( s , " t" ) 256 the position in 257 .I s 258 where the string 259 .I t 260 occurs, or 0 if it does not. 261 .TP 262 .BI match( s , " r" ) 263 the position in 264 .I s 265 where the regular expression 266 .I r 267 occurs, or 0 if it does not. 268 The variables 269 .B RSTART 270 and 271 .B RLENGTH 272 are set to the position and length of the matched string. 273 .TP 274 .BI split( s , " a" , " fs\fL) 275 splits the string 276 .I s 277 into array elements 278 .IB a [1]\f1, 279 .IB a [2]\f1, 280 \&..., 281 .IB a [ n ]\f1, 282 and returns 283 .IR n . 284 The separation is done with the regular expression 285 .I fs 286 or with the field separator 287 .B FS 288 if 289 .I fs 290 is not given. 291 An empty string as field separator splits the string 292 into one array element per character. 293 .TP 294 .BI sub( r , " t" , " s\fL) 295 substitutes 296 .I t 297 for the first occurrence of the regular expression 298 .I r 299 in the string 300 .IR s . 301 If 302 .I s 303 is not given, 304 .B $0 305 is used. 306 .TP 307 .B gsub 308 same as 309 .B sub 310 except that all occurrences of the regular expression 311 are replaced; 312 .B sub 313 and 314 .B gsub 315 return the number of replacements. 316 .TP 317 .BI sprintf( fmt , " expr" , " ...\fL) 318 the string resulting from formatting 319 .I expr ... 320 according to the 321 .I printf 322 format 323 .I fmt 324 .TP 325 .BI system( cmd ) 326 executes 327 .I cmd 328 and returns its exit status 329 .TP 330 .BI tolower( str ) 331 returns a copy of 332 .I str 333 with all upper-case characters translated to their 334 corresponding lower-case equivalents. 335 .TP 336 .BI toupper( str ) 337 returns a copy of 338 .I str 339 with all lower-case characters translated to their 340 corresponding upper-case equivalents. 341 .PD 342 .PP 343 The ``function'' 344 .B getline 345 sets 346 .B $0 347 to the next input record from the current input file; 348 .B getline 349 .BI < file 350 sets 351 .B $0 352 to the next record from 353 .IR file . 354 .B getline 355 .I x 356 sets variable 357 .I x 358 instead. 359 Finally, 360 .IB cmd " | getline 361 pipes the output of 362 .I cmd 363 into 364 .BR getline ; 365 each call of 366 .B getline 367 returns the next line of output from 368 .IR cmd . 369 In all cases, 370 .B getline 371 returns 1 for a successful input, 372 0 for end of file, and \-1 for an error. 373 .PP 374 Patterns are arbitrary Boolean combinations 375 (with 376 .BR "! || &&" ) 377 of regular expressions and 378 relational expressions. 379 Regular expressions are as in 380 .MR regexp (7) . 381 Isolated regular expressions 382 in a pattern apply to the entire line. 383 Regular expressions may also occur in 384 relational expressions, using the operators 385 .BR ~ 386 and 387 .BR !~ . 388 .BI / re / 389 is a constant regular expression; 390 any string (constant or variable) may be used 391 as a regular expression, except in the position of an isolated regular expression 392 in a pattern. 393 .PP 394 A pattern may consist of two patterns separated by a comma; 395 in this case, the action is performed for all lines 396 from an occurrence of the first pattern 397 though an occurrence of the second. 398 .PP 399 A relational expression is one of the following: 400 .IP 401 .I expression matchop regular-expression 402 .br 403 .I expression relop expression 404 .br 405 .IB expression " in " array-name 406 .br 407 .BI ( expr , expr,... ") in " array-name 408 .PP 409 where a 410 .I relop 411 is any of the six relational operators in C, 412 and a 413 .I matchop 414 is either 415 .B ~ 416 (matches) 417 or 418 .B !~ 419 (does not match). 420 A conditional is an arithmetic expression, 421 a relational expression, 422 or a Boolean combination 423 of these. 424 .PP 425 The special patterns 426 .B BEGIN 427 and 428 .B END 429 may be used to capture control before the first input line is read 430 and after the last. 431 .B BEGIN 432 and 433 .B END 434 do not combine with other patterns. 435 .PP 436 Variable names with special meanings: 437 .TF FILENAME 438 .TP 439 .B CONVFMT 440 conversion format used when converting numbers 441 (default 442 .BR "%.6g" ) 443 .TP 444 .B FS 445 regular expression used to separate fields; also settable 446 by option 447 .BI \-F fs\f1. 448 .TP 449 .BR NF 450 number of fields in the current record 451 .TP 452 .B NR 453 ordinal number of the current record 454 .TP 455 .B FNR 456 ordinal number of the current record in the current file 457 .TP 458 .B FILENAME 459 the name of the current input file 460 .TP 461 .B RS 462 input record separator (default newline) 463 .TP 464 .B OFS 465 output field separator (default blank) 466 .TP 467 .B ORS 468 output record separator (default newline) 469 .TP 470 .B OFMT 471 output format for numbers (default 472 .BR "%.6g" ) 473 .TP 474 .B SUBSEP 475 separates multiple subscripts (default 034) 476 .TP 477 .B ARGC 478 argument count, assignable 479 .TP 480 .B ARGV 481 argument array, assignable; 482 non-null members are taken as file names 483 .TP 484 .B ENVIRON 485 array of environment variables; subscripts are names. 486 .PD 487 .PP 488 Functions may be defined (at the position of a pattern-action statement) thus: 489 .IP 490 .L 491 function foo(a, b, c) { ...; return x } 492 .PP 493 Parameters are passed by value if scalar and by reference if array name; 494 functions may be called recursively. 495 Parameters are local to the function; all other variables are global. 496 Thus local variables may be created by providing excess parameters in 497 the function definition. 498 .SH EXAMPLES 499 .TP 500 .L 501 length($0) > 72 502 Print lines longer than 72 characters. 503 .TP 504 .L 505 { print $2, $1 } 506 Print first two fields in opposite order. 507 .PP 508 .EX 509 BEGIN { FS = ",[ \et]*|[ \et]+" } 510 { print $2, $1 } 511 .EE 512 .ns 513 .IP 514 Same, with input fields separated by comma and/or blanks and tabs. 515 .PP 516 .EX 517 { s += $1 } 518 END { print "sum is", s, " average is", s/NR } 519 .EE 520 .ns 521 .IP 522 Add up first column, print sum and average. 523 .TP 524 .L 525 /start/, /stop/ 526 Print all lines between start/stop pairs. 527 .PP 528 .EX 529 BEGIN { # Simulate echo(1) 530 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] 531 printf "\en" 532 exit } 533 .EE 534 .SH SOURCE 535 .B \*9/src/cmd/awk 536 .SH SEE ALSO 537 .MR sed (1) , 538 .MR regexp (7) , 539 .br 540 A. V. Aho, B. W. Kernighan, P. J. Weinberger, 541 .I 542 The AWK Programming Language, 543 Addison-Wesley, 1988. ISBN 0-201-07981-X 544 .SH BUGS 545 There are no explicit conversions between numbers and strings. 546 To force an expression to be treated as a number add 0 to it; 547 to force it to be treated as a string concatenate 548 \&\fL""\fP to it. 549 .PP 550 The scope rules for variables in functions are a botch; 551 the syntax is worse. 552 .PP 553 UTF is not always dealt with correctly, 554 though 555 .I awk 556 does make an attempt to do so. 557 The 558 .I split 559 function with an empty string as final argument now copes 560 with UTF in the string being split.