html.3 (29268B)
1 .TH HTML 3 2 .SH NAME 3 parsehtml, 4 printitems, 5 validitems, 6 freeitems, 7 freedocinfo, 8 dimenkind, 9 dimenspec, 10 targetid, 11 targetname, 12 fromStr, 13 toStr 14 \- HTML parser 15 .SH SYNOPSIS 16 .nf 17 .PP 18 .ft L 19 #include <u.h> 20 #include <libc.h> 21 #include <html.h> 22 .ft P 23 .PP 24 .ta \w'\fLToken* 'u 25 .B 26 Item* parsehtml(uchar* data, int datalen, Rune* src, int mtype, 27 .B 28 int chset, Docinfo** pdi) 29 .PP 30 .B 31 void printitems(Item* items, char* msg) 32 .PP 33 .B 34 int validitems(Item* items) 35 .PP 36 .B 37 void freeitems(Item* items) 38 .PP 39 .B 40 void freedocinfo(Docinfo* d) 41 .PP 42 .B 43 int dimenkind(Dimen d) 44 .PP 45 .B 46 int dimenspec(Dimen d) 47 .PP 48 .B 49 int targetid(Rune* s) 50 .PP 51 .B 52 Rune* targetname(int targid) 53 .PP 54 .B 55 uchar* fromStr(Rune* buf, int n, int chset) 56 .PP 57 .B 58 Rune* toStr(uchar* buf, int n, int chset) 59 .SH DESCRIPTION 60 .PP 61 This library implements a parser for HTML 4.0 documents. 62 The parsed HTML is converted into an intermediate representation that 63 describes how the formatted HTML should be laid out. 64 .PP 65 .I Parsehtml 66 parses an entire HTML document contained in the buffer 67 .I data 68 and having length 69 .IR datalen . 70 The URL of the document should be passed in as 71 .IR src . 72 .I Mtype 73 is the media type of the document, which should be either 74 .B TextHtml 75 or 76 .BR TextPlain . 77 The character set of the document is described in 78 .IR chset , 79 which can be one of 80 .BR US_Ascii , 81 .BR ISO_8859_1 , 82 .B UTF_8 83 or 84 .BR Unicode . 85 The return value is a linked list of 86 .B Item 87 structures, described in detail below. 88 As a side effect, 89 .BI * pdi 90 is set to point to a newly created 91 .B Docinfo 92 structure, containing information pertaining to the entire document. 93 .PP 94 The library expects two allocation routines to be provided by the 95 caller, 96 .B emalloc 97 and 98 .BR erealloc . 99 These routines are analogous to the standard malloc and realloc routines, 100 except that they should not return if the memory allocation fails. 101 In addition, 102 .B emalloc 103 is required to zero the memory. 104 .PP 105 For debugging purposes, 106 .I printitems 107 may be called to display the contents of an item list; individual items may 108 be printed using the 109 .B %I 110 print verb, installed on the first call to 111 .IR parsehtml . 112 .I validitems 113 traverses the item list, checking that all of the pointers are valid. 114 It returns 115 .B 1 116 is everything is ok, and 117 .B 0 118 if an error was found. 119 Normally, one would not call these routines directly. 120 Instead, one sets the global variable 121 .I dbgbuild 122 and the library calls them automatically. 123 One can also set 124 .IR warn , 125 to cause the library to print a warning whenever it finds a problem with the 126 input document, and 127 .IR dbglex , 128 to print debugging information in the lexer. 129 .PP 130 When an item list is finished with, it should be freed with 131 .IR freeitems . 132 Then, 133 .I freedocinfo 134 should be called on the pointer returned in 135 .BI * pdi\f1. 136 .PP 137 .I Dimenkind 138 and 139 .I dimenspec 140 are provided to interpret the 141 .B Dimen 142 type, as described in the section 143 .IR "Dimension Specifications" . 144 .PP 145 Frame target names are mapped to integer ids via a global, permanent mapping. 146 To find the value for a given name, call 147 .IR targetid , 148 which allocates a new id if the name hasn't been seen before. 149 The name of a given, known id may be retrieved using 150 .IR targetname . 151 The library predefines 152 .BR FTtop , 153 .BR FTself , 154 .B FTparent 155 and 156 .BR FTblank . 157 .PP 158 The library handles all text as Unicode strings (type 159 .BR Rune* ). 160 Character set conversion is provided by 161 .I fromStr 162 and 163 .IR toStr . 164 .I FromStr 165 takes 166 .I n 167 Unicode characters from 168 .I buf 169 and converts them to the character set described by 170 .IR chset . 171 .I ToStr 172 takes 173 .I n 174 bytes from 175 .IR buf , 176 interpretted as belonging to character set 177 .IR chset , 178 and converts them to a Unicode string. 179 Both routines null-terminate the result, and use 180 .B emalloc 181 to allocate space for it. 182 .SS Items 183 The return value of 184 .I parsehtml 185 is a linked list of variant structures, 186 with the generic portion described by the following definition: 187 .PP 188 .EX 189 .ta 6n +\w'Genattr* 'u 190 typedef struct Item Item; 191 struct Item 192 { 193 Item* next; 194 int width; 195 int height; 196 int ascent; 197 int anchorid; 198 int state; 199 Genattr* genattr; 200 int tag; 201 }; 202 .EE 203 .PP 204 The field 205 .B next 206 points to the successor in the linked list of items, while 207 .BR width , 208 .BR height , 209 and 210 .B ascent 211 are intended for use by the caller as part of the layout process. 212 .BR Anchorid , 213 if non-zero, gives the integer id assigned by the parser to the anchor that 214 this item is in (see section 215 .IR Anchors ). 216 .B State 217 is a collection of flags and values described as follows: 218 .PP 219 .EX 220 .ta 6n +\w'IFindentshift = 'u 221 enum 222 { 223 IFbrk = 0x80000000, 224 IFbrksp = 0x40000000, 225 IFnobrk = 0x20000000, 226 IFcleft = 0x10000000, 227 IFcright = 0x08000000, 228 IFwrap = 0x04000000, 229 IFhang = 0x02000000, 230 IFrjust = 0x01000000, 231 IFcjust = 0x00800000, 232 IFsmap = 0x00400000, 233 IFindentshift = 8, 234 IFindentmask = (255<<IFindentshift), 235 IFhangmask = 255 236 }; 237 .EE 238 .PP 239 .B IFbrk 240 is set if a break is to be forced before placing this item. 241 .B IFbrksp 242 is set if a 1 line space should be added to the break (in which case 243 .B IFbrk 244 is also set). 245 .B IFnobrk 246 is set if a break is not permitted before the item. 247 .B IFcleft 248 is set if left floats should be cleared (that is, if the list of pending left floats should be placed) 249 before this item is placed, and 250 .B IFcright 251 is set for right floats. 252 In both cases, IFbrk is also set. 253 .B IFwrap 254 is set if the line containing this item is allowed to wrap. 255 .B IFhang 256 is set if this item hangs into the left indent. 257 .B IFrjust 258 is set if the line containing this item should be right justified, 259 and 260 .B IFcjust 261 is set for center justified lines. 262 .B IFsmap 263 is used to indicate that an image is a server-side map. 264 The low 8 bits, represented by 265 .BR IFhangmask , 266 indicate the current hang into left indent, in tenths of a tabstop. 267 The next 8 bits, represented by 268 .B IFindentmask 269 and 270 .BR IFindentshift , 271 indicate the current indent in tab stops. 272 .PP 273 The field 274 .B genattr 275 is an optional pointer to an auxiliary structure, described in the section 276 .IR "Generic Attributes" . 277 .PP 278 Finally, 279 .B tag 280 describes which variant type this item has. 281 It can have one of the values 282 .BR Itexttag , 283 .BR Iruletag , 284 .BR Iimagetag , 285 .BR Iformfieldtag , 286 .BR Itabletag , 287 .B Ifloattag 288 or 289 .BR Ispacertag . 290 For each of these values, there is an additional structure defined, which 291 includes Item as an unnamed initial substructure, and then defines additional 292 fields. 293 .PP 294 Items of type 295 .B Itexttag 296 represent a piece of text, using the following structure: 297 .PP 298 .EX 299 .ta 6n +\w'Rune* 'u 300 struct Itext 301 { 302 Item; 303 Rune* s; 304 int fnt; 305 int fg; 306 uchar voff; 307 uchar ul; 308 }; 309 .EE 310 .PP 311 Here 312 .B s 313 is a null-terminated Unicode string of the actual characters making up this text item, 314 .B fnt 315 is the font number (described in the section 316 .IR "Font Numbers" ), 317 and 318 .B fg 319 is the RGB encoded color for the text. 320 .B Voff 321 measures the vertical offset from the baseline; subtract 322 .B Voffbias 323 to get the actual value (negative values represent a displacement down the page). 324 The field 325 .B ul 326 is the underline style: 327 .B ULnone 328 if no underline, 329 .B ULunder 330 for conventional underline, and 331 .B ULmid 332 for strike-through. 333 .PP 334 Items of type 335 .B Iruletag 336 represent a horizontal rule, as follows: 337 .PP 338 .EX 339 .ta 6n +\w'Dimen 'u 340 struct Irule 341 { 342 Item; 343 uchar align; 344 uchar noshade; 345 int size; 346 Dimen wspec; 347 }; 348 .EE 349 .PP 350 Here 351 .B align 352 is the alignment specification (described in the corresponding section), 353 .B noshade 354 is set if the rule should not be shaded, 355 .B size 356 is the height of the rule (as set by the size attribute), 357 and 358 .B wspec 359 is the desired width (see section 360 .IR "Dimension Specifications" ). 361 .PP 362 Items of type 363 .B Iimagetag 364 describe embedded images, for which the following structure is defined: 365 .PP 366 .EX 367 .ta 6n +\w'Iimage* 'u 368 struct Iimage 369 { 370 Item; 371 Rune* imsrc; 372 int imwidth; 373 int imheight; 374 Rune* altrep; 375 Map* map; 376 int ctlid; 377 uchar align; 378 uchar hspace; 379 uchar vspace; 380 uchar border; 381 Iimage* nextimage; 382 }; 383 .EE 384 .PP 385 Here 386 .B imsrc 387 is the URL of the image source, 388 .B imwidth 389 and 390 .BR imheight , 391 if non-zero, contain the specified width and height for the image, 392 and 393 .B altrep 394 is the text to use as an alternative to the image, if the image is not displayed. 395 .BR Map , 396 if set, points to a structure describing an associated client-side image map. 397 .B Ctlid 398 is reserved for use by the application, for handling animated images. 399 .B Align 400 encodes the alignment specification of the image. 401 .B Hspace 402 contains the number of pixels to pad the image with on either side, and 403 .B Vspace 404 the padding above and below. 405 .B Border 406 is the width of the border to draw around the image. 407 .B Nextimage 408 points to the next image in the document (the head of this list is 409 .BR Docinfo.images ). 410 .PP 411 For items of type 412 .BR Iformfieldtag , 413 the following structure is defined: 414 .PP 415 .EX 416 .ta 6n +\w'Formfield* 'u 417 struct Iformfield 418 { 419 Item; 420 Formfield* formfield; 421 }; 422 .EE 423 .PP 424 This adds a single field, 425 .BR formfield , 426 which points to a structure describing a field in a form, described in section 427 .IR Forms . 428 .PP 429 For items of type 430 .BR Itabletag , 431 the following structure is defined: 432 .PP 433 .EX 434 .ta 6n +\w'Table* 'u 435 struct Itable 436 { 437 Item; 438 Table* table; 439 }; 440 .EE 441 .PP 442 .B Table 443 points to a structure describing the table, described in the section 444 .IR Tables . 445 .PP 446 For items of type 447 .BR Ifloattag , 448 the following structure is defined: 449 .PP 450 .EX 451 .ta 6n +\w'Ifloat* 'u 452 struct Ifloat 453 { 454 Item; 455 Item* item; 456 int x; 457 int y; 458 uchar side; 459 uchar infloats; 460 Ifloat* nextfloat; 461 }; 462 .EE 463 .PP 464 The 465 .B item 466 points to a single item (either a table or an image) that floats (the text of the 467 document flows around it), and 468 .B side 469 indicates the margin that this float sticks to; it is either 470 .B ALleft 471 or 472 .BR ALright . 473 .B X 474 and 475 .B y 476 are reserved for use by the caller; these are typically used for the coordinates 477 of the top of the float. 478 .B Infloats 479 is used by the caller to keep track of whether it has placed the float. 480 .B Nextfloat 481 is used by the caller to link together all of the floats that it has placed. 482 .PP 483 For items of type 484 .BR Ispacertag , 485 the following structure is defined: 486 .PP 487 .EX 488 .ta 6n +\w'Item; 'u 489 struct Ispacer 490 { 491 Item; 492 int spkind; 493 }; 494 .EE 495 .PP 496 .B Spkind 497 encodes the kind of spacer, and may be one of 498 .B ISPnull 499 (zero height and width), 500 .B ISPvline 501 (takes on height and ascent of the current font), 502 .B ISPhspace 503 (has the width of a space in the current font) and 504 .B ISPgeneral 505 (for all other purposes, such as between markers and lists). 506 .SS Generic Attributes 507 .PP 508 The genattr field of an item, if non-nil, points to a structure that holds 509 the values of attributes not specific to any particular 510 item type, as they occur on a wide variety of underlying HTML tags. 511 The structure is as follows: 512 .PP 513 .EX 514 .ta 6n +\w'SEvent* 'u 515 typedef struct Genattr Genattr; 516 struct Genattr 517 { 518 Rune* id; 519 Rune* class; 520 Rune* style; 521 Rune* title; 522 SEvent* events; 523 }; 524 .EE 525 .PP 526 Fields 527 .BR id , 528 .BR class , 529 .B style 530 and 531 .BR title , 532 when non-nil, contain values of correspondingly named attributes of the HTML tag 533 associated with this item. 534 .B Events 535 is a linked list of events (with corresponding scripted actions) associated with the item: 536 .PP 537 .EX 538 .ta 6n +\w'SEvent* 'u 539 typedef struct SEvent SEvent; 540 struct SEvent 541 { 542 SEvent* next; 543 int type; 544 Rune* script; 545 }; 546 .EE 547 .PP 548 Here, 549 .B next 550 points to the next event in the list, 551 .B type 552 is one of 553 .BR SEonblur , 554 .BR SEonchange , 555 .BR SEonclick , 556 .BR SEondblclick , 557 .BR SEonfocus , 558 .BR SEonkeypress , 559 .BR SEonkeyup , 560 .BR SEonload , 561 .BR SEonmousedown , 562 .BR SEonmousemove , 563 .BR SEonmouseout , 564 .BR SEonmouseover , 565 .BR SEonmouseup , 566 .BR SEonreset , 567 .BR SEonselect , 568 .B SEonsubmit 569 or 570 .BR SEonunload , 571 and 572 .B script 573 is the text of the associated script. 574 .SS Dimension Specifications 575 .PP 576 Some structures include a dimension specification, used where 577 a number can be followed by a 578 .B % 579 or a 580 .B * 581 to indicate 582 percentage of total or relative weight. 583 This is encoded using the following structure: 584 .PP 585 .EX 586 .ta 6n +\w'int 'u 587 typedef struct Dimen Dimen; 588 struct Dimen 589 { 590 int kindspec; 591 }; 592 .EE 593 .PP 594 Separate kind and spec values are extracted using 595 .I dimenkind 596 and 597 .IR dimenspec . 598 .I Dimenkind 599 returns one of 600 .BR Dnone , 601 .BR Dpixels , 602 .B Dpercent 603 or 604 .BR Drelative . 605 .B Dnone 606 means that no dimension was specified. 607 In all other cases, 608 .I dimenspec 609 should be called to find the absolute number of pixels, the percentage of total, 610 or the relative weight. 611 .SS Background Specifications 612 .PP 613 It is possible to set the background of the entire document, and also 614 for some parts of the document (such as tables). 615 This is encoded as follows: 616 .PP 617 .EX 618 .ta 6n +\w'Rune* 'u 619 typedef struct Background Background; 620 struct Background 621 { 622 Rune* image; 623 int color; 624 }; 625 .EE 626 .PP 627 .BR Image , 628 if non-nil, is the URL of an image to use as the background. 629 If this is nil, 630 .B color 631 is used instead, as the RGB value for a solid fill color. 632 .SS Alignment Specifications 633 .PP 634 Certain items have alignment specifiers taken from the following 635 enumerated type: 636 .PP 637 .EX 638 .ta 6n 639 enum 640 { 641 ALnone = 0, ALleft, ALcenter, ALright, ALjustify, 642 ALchar, ALtop, ALmiddle, ALbottom, ALbaseline 643 }; 644 .EE 645 .PP 646 These values correspond to the various alignment types named in the HTML 4.0 647 standard. 648 If an item has an alignment of 649 .B ALleft 650 or 651 .BR ALright , 652 the library automatically encapsulates it inside a float item. 653 .PP 654 Tables, and the various rows, columns and cells within them, have a more 655 complex alignment specification, composed of separate vertical and 656 horizontal alignments: 657 .PP 658 .EX 659 .ta 6n +\w'uchar 'u 660 typedef struct Align Align; 661 struct Align 662 { 663 uchar halign; 664 uchar valign; 665 }; 666 .EE 667 .PP 668 .B Halign 669 can be one of 670 .BR ALnone , 671 .BR ALleft , 672 .BR ALcenter , 673 .BR ALright , 674 .B ALjustify 675 or 676 .BR ALchar . 677 .B Valign 678 can be one of 679 .BR ALnone , 680 .BR ALmiddle , 681 .BR ALbottom , 682 .BR ALtop 683 or 684 .BR ALbaseline . 685 .SS Font Numbers 686 .PP 687 Text items have an associated font number (the 688 .B fnt 689 field), which is encoded as 690 .BR style*NumSize+size . 691 Here, 692 .B style 693 is one of 694 .BR FntR , 695 .BR FntI , 696 .B FntB 697 or 698 .BR FntT , 699 for roman, italic, bold and typewriter font styles, respectively, and size is 700 .BR Tiny , 701 .BR Small , 702 .BR Normal , 703 .B Large 704 or 705 .BR Verylarge . 706 The total number of possible font numbers is 707 .BR NumFnt , 708 and the default font number is 709 .B DefFnt 710 (which is roman style, normal size). 711 .SS Document Info 712 .PP 713 Global information about an HTML page is stored in the following structure: 714 .PP 715 .EX 716 .ta 6n +\w'DestAnchor* 'u 717 typedef struct Docinfo Docinfo; 718 struct Docinfo 719 { 720 // stuff from HTTP headers, doc head, and body tag 721 Rune* src; 722 Rune* base; 723 Rune* doctitle; 724 Background background; 725 Iimage* backgrounditem; 726 int text; 727 int link; 728 int vlink; 729 int alink; 730 int target; 731 int chset; 732 int mediatype; 733 int scripttype; 734 int hasscripts; 735 Rune* refresh; 736 Kidinfo* kidinfo; 737 int frameid; 738 739 // info needed to respond to user actions 740 Anchor* anchors; 741 DestAnchor* dests; 742 Form* forms; 743 Table* tables; 744 Map* maps; 745 Iimage* images; 746 }; 747 .EE 748 .PP 749 .B Src 750 gives the URL of the original source of the document, 751 and 752 .B base 753 is the base URL. 754 .B Doctitle 755 is the document's title, as set by a 756 .B <title> 757 element. 758 .B Background 759 is as described in the section 760 .IR "Background Specifications" , 761 and 762 .B backgrounditem 763 is set to be an image item for the document's background image (if given as a URL), 764 or else nil. 765 .B Text 766 gives the default foregound text color of the document, 767 .B link 768 the unvisited hyperlink color, 769 .B vlink 770 the visited hyperlink color, and 771 .B alink 772 the color for highlighting hyperlinks (all in 24-bit RGB format). 773 .B Target 774 is the default target frame id. 775 .B Chset 776 and 777 .B mediatype 778 are as for the 779 .I chset 780 and 781 .I mtype 782 parameters to 783 .IR parsehtml . 784 .B Scripttype 785 is the type of any scripts contained in the document, and is always 786 .BR TextJavascript . 787 .B Hasscripts 788 is set if the document contains any scripts. 789 Scripting is currently unsupported. 790 .B Refresh 791 is the contents of a 792 .B "<meta http-equiv=Refresh ...>" 793 tag, if any. 794 .B Kidinfo 795 is set if this document is a frameset (see section 796 .IR Frames ). 797 .B Frameid 798 is this document's frame id. 799 .PP 800 .B Anchors 801 is a list of hyperlinks contained in the document, 802 and 803 .B dests 804 is a list of hyperlink destinations within the page (see the following section for details). 805 .BR Forms , 806 .B tables 807 and 808 .B maps 809 are lists of the various forms, tables and client-side maps contained 810 in the document, as described in subsequent sections. 811 .B Images 812 is a list of all the image items in the document. 813 .SS Anchors 814 .PP 815 The library builds two lists for all of the 816 .B <a> 817 elements (anchors) in a document. 818 Each anchor is assigned a unique anchor id within the document. 819 For anchors which are hyperlinks (the 820 .B href 821 attribute was supplied), the following structure is defined: 822 .PP 823 .EX 824 .ta 6n +\w'Anchor* 'u 825 typedef struct Anchor Anchor; 826 struct Anchor 827 { 828 Anchor* next; 829 int index; 830 Rune* name; 831 Rune* href; 832 int target; 833 }; 834 .EE 835 .PP 836 .B Next 837 points to the next anchor in the list (the head of this list is 838 .BR Docinfo.anchors ). 839 .B Index 840 is the anchor id; each item within this hyperlink is tagged with this value 841 in its 842 .B anchorid 843 field. 844 .B Name 845 and 846 .B href 847 are the values of the correspondingly named attributes of the anchor 848 (in particular, href is the URL to go to). 849 .B Target 850 is the value of the target attribute (if provided) converted to a frame id. 851 .PP 852 Destinations within the document (anchors with the name attribute set) 853 are held in the 854 .B Docinfo.dests 855 list, using the following structure: 856 .PP 857 .EX 858 .ta 6n +\w'DestAnchor* 'u 859 typedef struct DestAnchor DestAnchor; 860 struct DestAnchor 861 { 862 DestAnchor* next; 863 int index; 864 Rune* name; 865 Item* item; 866 }; 867 .EE 868 .PP 869 .B Next 870 is the next element of the list, 871 .B index 872 is the anchor id, 873 .B name 874 is the value of the name attribute, and 875 .B item 876 is points to the item within the parsed document that should be considered 877 to be the destination. 878 .SS Forms 879 .PP 880 Any forms within a document are kept in a list, headed by 881 .BR Docinfo.forms . 882 The elements of this list are as follows: 883 .PP 884 .EX 885 .ta 6n +\w'Formfield* 'u 886 typedef struct Form Form; 887 struct Form 888 { 889 Form* next; 890 int formid; 891 Rune* name; 892 Rune* action; 893 int target; 894 int method; 895 int nfields; 896 Formfield* fields; 897 }; 898 .EE 899 .PP 900 .B Next 901 points to the next form in the list. 902 .B Formid 903 is a serial number for the form within the document. 904 .B Name 905 is the value of the form's name or id attribute. 906 .B Action 907 is the value of any action attribute. 908 .B Target 909 is the value of the target attribute (if any) converted to a frame target id. 910 .B Method 911 is one of 912 .B HGet 913 or 914 .BR HPost . 915 .B Nfields 916 is the number of fields in the form, and 917 .B fields 918 is a linked list of the actual fields. 919 .PP 920 The individual fields in a form are described by the following structure: 921 .PP 922 .EX 923 .ta 6n +\w'Formfield* 'u 924 typedef struct Formfield Formfield; 925 struct Formfield 926 { 927 Formfield* next; 928 int ftype; 929 int fieldid; 930 Form* form; 931 Rune* name; 932 Rune* value; 933 int size; 934 int maxlength; 935 int rows; 936 int cols; 937 uchar flags; 938 Option* options; 939 Item* image; 940 int ctlid; 941 SEvent* events; 942 }; 943 .EE 944 .PP 945 Here, 946 .B next 947 points to the next field in the list. 948 .B Ftype 949 is the type of the field, which can be one of 950 .BR Ftext , 951 .BR Fpassword , 952 .BR Fcheckbox , 953 .BR Fradio , 954 .BR Fsubmit , 955 .BR Fhidden , 956 .BR Fimage , 957 .BR Freset , 958 .BR Ffile , 959 .BR Fbutton , 960 .B Fselect 961 or 962 .BR Ftextarea . 963 .B Fieldid 964 is a serial number for the field within the form. 965 .B Form 966 points back to the form containing this field. 967 .BR Name , 968 .BR value , 969 .BR size , 970 .BR maxlength , 971 .B rows 972 and 973 .B cols 974 each contain the values of corresponding attributes of the field, if present. 975 .B Flags 976 contains per-field flags, of which 977 .B FFchecked 978 and 979 .B FFmultiple 980 are defined. 981 .B Image 982 is only used for fields of type 983 .BR Fimage ; 984 it points to an image item containing the image to be displayed. 985 .B Ctlid 986 is reserved for use by the caller, typically to store a unique id 987 of an associated control used to implement the field. 988 .B Events 989 is the same as the corresponding field of the generic attributes 990 associated with the item containing this field. 991 .B Options 992 is only used by fields of type 993 .BR Fselect ; 994 it consists of a list of possible options that may be selected for that 995 field, using the following structure: 996 .PP 997 .EX 998 .ta 6n +\w'Option* 'u 999 typedef struct Option Option; 1000 struct Option 1001 { 1002 Option* next; 1003 int selected; 1004 Rune* value; 1005 Rune* display; 1006 }; 1007 .EE 1008 .PP 1009 .B Next 1010 points to the next element of the list. 1011 .B Selected 1012 is set if this option is to be displayed initially. 1013 .B Value 1014 is the value to send when the form is submitted if this option is selected. 1015 .B Display 1016 is the string to display on the screen for this option. 1017 .SS Tables 1018 .PP 1019 The library builds a list of all the tables in the document, 1020 headed by 1021 .BR Docinfo.tables . 1022 Each element of this list has the following format: 1023 .PP 1024 .EX 1025 .ta 6n +\w'Tablecell*** 'u 1026 typedef struct Table Table; 1027 struct Table 1028 { 1029 Table* next; 1030 int tableid; 1031 Tablerow* rows; 1032 int nrow; 1033 Tablecol* cols; 1034 int ncol; 1035 Tablecell* cells; 1036 int ncell; 1037 Tablecell*** grid; 1038 Align align; 1039 Dimen width; 1040 int border; 1041 int cellspacing; 1042 int cellpadding; 1043 Background background; 1044 Item* caption; 1045 uchar caption_place; 1046 Lay* caption_lay; 1047 int totw; 1048 int toth; 1049 int caph; 1050 int availw; 1051 Token* tabletok; 1052 uchar flags; 1053 }; 1054 .EE 1055 .PP 1056 .B Next 1057 points to the next element in the list of tables. 1058 .B Tableid 1059 is a serial number for the table within the document. 1060 .B Rows 1061 is an array of row specifications (described below) and 1062 .B nrow 1063 is the number of elements in this array. 1064 Similarly, 1065 .B cols 1066 is an array of column specifications, and 1067 .B ncol 1068 the size of this array. 1069 .B Cells 1070 is a list of all cells within the table (structure described below) 1071 and 1072 .B ncell 1073 is the number of elements in this list. 1074 Note that a cell may span multiple rows and/or columns, thus 1075 .B ncell 1076 may be smaller than 1077 .BR nrow*ncol . 1078 .B Grid 1079 is a two-dimensional array of cells within the table; the cell 1080 at row 1081 .B i 1082 and column 1083 .B j 1084 is 1085 .BR Table.grid[i][j] . 1086 A cell that spans multiple rows and/or columns will 1087 be referenced by 1088 .B grid 1089 multiple times, however it will only occur once in 1090 .BR cells . 1091 .B Align 1092 gives the alignment specification for the entire table, 1093 and 1094 .B width 1095 gives the requested width as a dimension specification. 1096 .BR Border , 1097 .B cellspacing 1098 and 1099 .B cellpadding 1100 give the values of the corresponding attributes for the table, 1101 and 1102 .B background 1103 gives the requested background for the table. 1104 .B Caption 1105 is a linked list of items to be displayed as the caption of the 1106 table, either above or below depending on whether 1107 .B caption_place 1108 is 1109 .B ALtop 1110 or 1111 .BR ALbottom . 1112 Most of the remaining fields are reserved for use by the caller, 1113 except 1114 .BR tabletok , 1115 which is reserved for internal use. 1116 The type 1117 .B Lay 1118 is not defined by the library; the caller can provide its 1119 own definition. 1120 .PP 1121 The 1122 .B Tablecol 1123 structure is defined for use by the caller. 1124 The library ensures that the correct number of these 1125 is allocated, but leaves them blank. 1126 The fields are as follows: 1127 .PP 1128 .EX 1129 .ta 6n +\w'Point 'u 1130 typedef struct Tablecol Tablecol; 1131 struct Tablecol 1132 { 1133 int width; 1134 Align align; 1135 Point pos; 1136 }; 1137 .EE 1138 .PP 1139 The rows in the table are specified as follows: 1140 .PP 1141 .EX 1142 .ta 6n +\w'Background 'u 1143 typedef struct Tablerow Tablerow; 1144 struct Tablerow 1145 { 1146 Tablerow* next; 1147 Tablecell* cells; 1148 int height; 1149 int ascent; 1150 Align align; 1151 Background background; 1152 Point pos; 1153 uchar flags; 1154 }; 1155 .EE 1156 .PP 1157 .B Next 1158 is only used during parsing; it should be ignored by the caller. 1159 .B Cells 1160 provides a list of all the cells in a row, linked through their 1161 .B nextinrow 1162 fields (see below). 1163 .BR Height , 1164 .B ascent 1165 and 1166 .B pos 1167 are reserved for use by the caller. 1168 .B Align 1169 is the alignment specification for the row, and 1170 .B background 1171 is the background to use, if specified. 1172 .B Flags 1173 is used by the parser; ignore this field. 1174 .PP 1175 The individual cells of the table are described as follows: 1176 .PP 1177 .EX 1178 .ta 6n +\w'Background 'u 1179 typedef struct Tablecell Tablecell; 1180 struct Tablecell 1181 { 1182 Tablecell* next; 1183 Tablecell* nextinrow; 1184 int cellid; 1185 Item* content; 1186 Lay* lay; 1187 int rowspan; 1188 int colspan; 1189 Align align; 1190 uchar flags; 1191 Dimen wspec; 1192 int hspec; 1193 Background background; 1194 int minw; 1195 int maxw; 1196 int ascent; 1197 int row; 1198 int col; 1199 Point pos; 1200 }; 1201 .EE 1202 .PP 1203 .B Next 1204 is used to link together the list of all cells within a table 1205 .RB ( Table.cells ), 1206 whereas 1207 .B nextinrow 1208 is used to link together all the cells within a single row 1209 .RB ( Tablerow.cells ). 1210 .B Cellid 1211 provides a serial number for the cell within the table. 1212 .B Content 1213 is a linked list of the items to be laid out within the cell. 1214 .B Lay 1215 is reserved for the user to describe how these items have 1216 been laid out. 1217 .B Rowspan 1218 and 1219 .B colspan 1220 are the number of rows and columns spanned by this cell, 1221 respectively. 1222 .B Align 1223 is the alignment specification for the cell. 1224 .B Flags 1225 is some combination of 1226 .BR TFparsing , 1227 .B TFnowrap 1228 and 1229 .B TFisth 1230 or'd together. 1231 Here 1232 .B TFparsing 1233 is used internally by the parser, and should be ignored. 1234 .B TFnowrap 1235 means that the contents of the cell should not be 1236 wrapped if they don't fit the available width, 1237 rather, the table should be expanded if need be 1238 (this is set when the nowrap attribute is supplied). 1239 .B TFisth 1240 means that the cell was created by the 1241 .B <th> 1242 element (rather than the 1243 .B <td> 1244 element), 1245 indicating that it is a header cell rather than a data cell. 1246 .B Wspec 1247 provides a suggested width as a dimension specification, 1248 and 1249 .B hspec 1250 provides a suggested height in pixels. 1251 .B Background 1252 gives a background specification for the individual cell. 1253 .BR Minw , 1254 .BR maxw , 1255 .B ascent 1256 and 1257 .B pos 1258 are reserved for use by the caller during layout. 1259 .B Row 1260 and 1261 .B col 1262 give the indices of the row and column of the top left-hand 1263 corner of the cell within the table grid. 1264 .SS Client-side Maps 1265 .PP 1266 The library builds a list of client-side maps, headed by 1267 .BR Docinfo.maps , 1268 and having the following structure: 1269 .PP 1270 .EX 1271 .ta 6n +\w'Rune* 'u 1272 typedef struct Map Map; 1273 struct Map 1274 { 1275 Map* next; 1276 Rune* name; 1277 Area* areas; 1278 }; 1279 .EE 1280 .PP 1281 .B Next 1282 points to the next element in the list, 1283 .B name 1284 is the name of the map (use to bind it to an image), and 1285 .B areas 1286 is a list of the areas within the image that comprise the map, 1287 using the following structure: 1288 .PP 1289 .EX 1290 .ta 6n +\w'Dimen* 'u 1291 typedef struct Area Area; 1292 struct Area 1293 { 1294 Area* next; 1295 int shape; 1296 Rune* href; 1297 int target; 1298 Dimen* coords; 1299 int ncoords; 1300 }; 1301 .EE 1302 .PP 1303 .B Next 1304 points to the next element in the map's list of areas. 1305 .B Shape 1306 describes the shape of the area, and is one of 1307 .BR SHrect , 1308 .B SHcircle 1309 or 1310 .BR SHpoly . 1311 .B Href 1312 is the URL associated with this area in its role as 1313 a hypertext link, and 1314 .B target 1315 is the target frame it should be loaded in. 1316 .B Coords 1317 is an array of coordinates for the shape, and 1318 .B ncoords 1319 is the size of this array (number of elements). 1320 .SS Frames 1321 .PP 1322 If the 1323 .B Docinfo.kidinfo 1324 field is set, the document is a frameset. 1325 In this case, it is typical for 1326 .I parsehtml 1327 to return nil, as a document which is a frameset should have no actual 1328 items that need to be laid out (such will appear only in subsidiary documents). 1329 It is possible that items will be returned by a malformed document; the caller 1330 should check for this and free any such items. 1331 .PP 1332 The 1333 .B Kidinfo 1334 structure itself reflects the fact that framesets can be nested within a document. 1335 If is defined as follows: 1336 .PP 1337 .EX 1338 .ta 6n +\w'Kidinfo* 'u 1339 typedef struct Kidinfo Kidinfo; 1340 struct Kidinfo 1341 { 1342 Kidinfo* next; 1343 int isframeset; 1344 1345 // fields for "frame" 1346 Rune* src; 1347 Rune* name; 1348 int marginw; 1349 int marginh; 1350 int framebd; 1351 int flags; 1352 1353 // fields for "frameset" 1354 Dimen* rows; 1355 int nrows; 1356 Dimen* cols; 1357 int ncols; 1358 Kidinfo* kidinfos; 1359 Kidinfo* nextframeset; 1360 }; 1361 .EE 1362 .PP 1363 .B Next 1364 is only used if this structure is part of a containing frameset; it points to the next 1365 element in the list of children of that frameset. 1366 .B Isframeset 1367 is set when this structure represents a frameset; if clear, it is an individual frame. 1368 .PP 1369 Some fields are used only for framesets. 1370 .B Rows 1371 is an array of dimension specifications for rows in the frameset, and 1372 .B nrows 1373 is the length of this array. 1374 .B Cols 1375 is the corresponding array for columns, of length 1376 .BR ncols . 1377 .B Kidinfos 1378 points to a list of components contained within this frameset, each 1379 of which may be a frameset or a frame. 1380 .B Nextframeset 1381 is only used during parsing, and should be ignored. 1382 .PP 1383 The remaining fields are used if the structure describes a frame, not a frameset. 1384 .B Src 1385 provides the URL for the document that should be initially loaded into this frame. 1386 Note that this may be a relative URL, in which case it should be interpretted 1387 using the containing document's URL as the base. 1388 .B Name 1389 gives the name of the frame, typically supplied via a name attribute in the HTML. 1390 If no name was given, the library allocates one. 1391 .BR Marginw , 1392 .B marginh 1393 and 1394 .B framebd 1395 are the values of the marginwidth, marginheight and frameborder attributes, respectively. 1396 .B Flags 1397 can contain some combination of the following: 1398 .B FRnoresize 1399 (the frame had the noresize attribute set, and the user should not be allowed to resize it), 1400 .B FRnoscroll 1401 (the frame should not have any scroll bars), 1402 .B FRhscroll 1403 (the frame should have a horizontal scroll bar), 1404 .B FRvscroll 1405 (the frame should have a vertical scroll bar), 1406 .B FRhscrollauto 1407 (the frame should be automatically given a horizontal scroll bar if its contents 1408 would not otherwise fit), and 1409 .B FRvscrollauto 1410 (the frame gets a vertical scrollbar only if required). 1411 .SH SOURCE 1412 .B \*9/src/libhtml 1413 .SH SEE ALSO 1414 .MR fmt (1) 1415 .PP 1416 W3C World Wide Web Consortium, 1417 ``HTML 4.01 Specification''. 1418 .SH BUGS 1419 The entire HTML document must be loaded into memory before 1420 any of it can be parsed.