This document is also available
as one big HTML file intended for printout. Please note that not
all links in this version work!
cat Library/Implementation/Version.makeassuming that you are at the top of the WWW tree created when unpacking the distribution file. You can compare your version of the code with the current version which is available from the online documentation from our WWW server.
The Library functionality is divided into a set of modules that each has its own include file. They all have a name starting with WWW, and as they will be referenced throughout this guide, we might as well introduce them from the beginning:
The application must explicitly initialize the Library before it can start using it so that the internal file descriptors, and variables can be set to their respective values. This only has to be done once while the application is running and is typically done when the application is started. The application also should close down the Library when it has stopped using it - typically when the application is closing down. The Library will then return resources like file descriptors and dynamic memory to the operating system. In practice the initialization and termination is done using the following two functions:
BOOL HTLibInit(const char * AppName, const char * AppVersion)This function initializes memory, file descriptors, and interrupt handlers etc. By default it also calls initialization functions for all the dynamic modules in the Library. The dynamic modules are described in "Libwww Architecture". A major part of the User's Guide is devoted to describing how the Library can be configured, both at run time and at compile time, and the dynamic modules are an important part of the Library configuration.
The two arguments to the function are the name of the application and the version number respectively. It is not a requirement that these values are unique and they can both be the empty string (""). However, as the strings are used in the HTTP protocol module when communicating with other WWW applications it is strongly recommended that the values are chosen carefully according to the HTTP specifications. The most important requirement is to use normal ASCII characters without any form for space as we will see in the example below.
BOOL HTLibTerminate()This function cleans up the memory, closes open file descriptors, and returns all resources to the operating system. It is essential that
HTLibInit(...) is the first call to the Library and
HTLibTerminate() is the last as the behavior otherwise is undefined.
WWWLib.h and also knows where to find the binary library, often
called libwww.a on Unix platforms and libwww.lib on Windows. Again, the
result might depend on the setup of the dynamic modules, but if no dynamic modules are enabled then
the example will generate an executable file. If you are in doubt about how to set your compiler
then you can often get some good ideas by looking into the Line Mode
Browser.
#include "WWWLib.h"
int main()
{
HTLibInit("TestApp", "1.0");
HTLibTerminate();
return 0;
}
Some platforms require a socket library when building network applications. This is for example the
case when building on Macintosh or Windows machines. The Library uses the GUSI socket library on the
Macintosh and the WinSock
library on windows platforms. Please check the documentation on these libraries for how to
install them and also if there are any specific requirements on your platform when building network
applications.
While the previous application wasn't capable of doing anything we will now add functionality so that we can request a URL from a remote Web server and save the result to a file. To do this we need to register two additional modules after initializing the Library: A protocol module that handles HTTP and a stream that can save data to a local file. Both these modules are already in the distribution file and in this example we show how to enable them. It is also possible to write your own versions of these modules and then register them instead of the ones provided with the Library. This makes no difference to the core part of the Library and is an example of how the functionality can be extended or changed by adding new modules as needed.
#include "WWWLib.h"
#include "HTTP.h"
int main()
{
HTList *converters = HTList_new(); /* Create a list object */
/* Initialize the Library */
HTLibInit("TestApp", "1.0");
/* Register the HTTP Module */
HTProtocol_add("http", YES, HTLoadHTTP);
/* Add a conversion to our empty list */
HTConversion_add(converters, "*/*", "www/present", HTSaveLocally, 1.0, 0.0, 0.0);
/* Register our list with one conversion */
HTFormat_setConversion(converters);
/* Delete the list with one conversion */
HTConversion_deleteAll(converters);
/* Terminate the Library */
HTLibTerminate();
return 0;
}
The two new things in this example is that we now have two registration functions. We will explain
more about these functions as we go along; for now we will only introduce the functions and their
arguments. The interesting part about the two registration functions is that they represent the two
ways of registration in the Library: Some things are registered directly like the protocol module
and other things are registered as lists of objects like the list of converters. The reason for this
is to make the registration process easier for the application to handle; protocol modules are often
initialized only once while the application is running. Therefore it is easier to register them
directly. As we will see later in this guide, converters, however, can be enabled and disabled
depending on a regular basis depending on what the application is trying to do. It is therefore
easier to keep the converters in lists so that they can be enabled and disabled in batch. Now, let's take a closer look at the two registration functions. The first registers the HTTP protocol module which enables the Library of accessing documents using HTTP from any server on the Internet.
extern BOOL HTProtocol_add (const char * name, BOOL preemtive, HTEventCallBack * callback);The first argument is a name equivalent to the scheme part in a URL, for example http://www.w3.org, where http is the scheme part. When a request is issued to the Library using a URL, it looks at the URL scheme and sees if it knows how to handle it. If not then an error is issued. The second argument describes whether the protocol module supports non-blocking sockets or not. This is a decision to be made when the module is first designed and can normally not be changed. In the example we register HTTP for using blocking sockets, but all native Library protocol modules including HTTP, FTP, News, Gopher, and access to the local file system supports non-blocking sockets. The third argument is the name of the protocol function to be called when the Library is about to hand off the request to the module.
extern void HTConversion_add (HTList * conversions, CONST char * rep_in, CONST char * rep_out, HTConverter * converter, double quality, double secs, double secs_per_byte);This function has many arguments and we will not go into details at this point. The important thing to note is that we build a list of converters. Each call to the
HTConversion_add
creates a new converter object and adds it to the list. A converter object is described by an input
format (rep_in), an output format (rep_out), the function name of the
converter, and a quality factor describing how good the conversion is. The last two arguments are
currently not used but are reserved for future use. The quality factor later where we will see how
it can be used to distinguish between multiple conversions in order to pick the best one. Even though we now have initialized a protocol module and a converter, the program example is still not actively doing anything. It only starts the Library, registers two modules and then terminates the Library again. Our third and last example in this section does the same amount of initialization but does also issue a request to the Library for fetching a URL.
#include "WWWLib.h"
#include "HTTP.h"
int main (int argc, char ** argv)
{
HTList *converters = HTList_new();
HTRequest *request = HTRequest_new(); /* Create a request object */
HTLibInit("TestApp", "1.0");
HTProtocol_add("http", YES, HTLoadHTTP);
HTConversion_add(converters, "*/*", "www/present", HTSaveLocally, 1.0, 0.0, 0.0);
HTFormat_setConversion(converters);
if (argc == 2) {
HTLoadAbsolute(argv[1], request);
} else
printf("Type the URL to fetch\n");
HTRequest_delete(request); /* Delete the request object */
HTConversion_deleteAll(converters);
HTLibTerminate();
return 0;
}
When this program is run, it will take the argument and call the Library to fetch it. As we haven't
given any name for the file which we are creating on our local disk, the Library will prompt the
user for a file name. Automatic redirection and access authentication is handled by the HTTP module
but might require the user to type in a user name and a password. An example on how to run this
program is:
./fetch_url http://www.w3.org/pub/WWW/The results stored in the file contains the whole message returned by the remote HTTP server except for the status line. This means that if we ask a HTTP/1.0 compliant server then we receive a header and a body where the header contains metainformation about the object, for example content type, content language etc. We shall later see how the MIME parser stream can strip out the header information so that we end up with the body of the response message.
In the next chapters we shall see that protocol modules and converters only is a part of what can be registered in the Library and that the application can specify many other types of preferences and capabilities.
The Core is basically a set of registration mechanisms that glue together the application modules, and in the following chapter we will look how to configure the core to contain exactly the functionality we want for our application. If you are interested in a more detailed description of the architecture of the core to see how the glue is designed then please read the chapter on the model behind the Core of the Library in the Architecture document. In this section we will concentrate on the APIs defined by the WWWCore.h include file and how to use the Core in a real application.
HTConversion_add(...):
extern void HTConversion_add (HTList * conversions, CONST char * rep_in, CONST char * rep_out, HTConverter * converter, double quality, double secs, double secs_per_byte);The first argument is a list object. List objects are one of the several container objects in the Library and they are explained in more details in the W3C Library Internals. All we have to know at this point is to create a list object:
extern HTList * HTList_new (void);The two next arguments describes the input format and the output format of the data that is entering and leaving the converter respectively. The syntax for these formats follow the syntax defined by the HTTP Protocol and the MIME specification which has a type string and a subtype string separated by a slash "/"
<type> "/" <subtype>Some of the most common examples are
text/plain text/html image/gif audio/basic */*In addition to these "official" MIME types, the Library has a small set of internal representations that uniquely exist within the Library. They are used to describe data formats that are not really formats but an intermediate state of the document. The two most used formats of this type are
www/present www/unknownThe internal formats are characterized by having the type www which doesn't exist anywhere but in the Library. The first of the two subtypes shown represent the rendered document as presented to the user and the second subtype represents an unknown data format.
The converter argument is a pointer to the function that is to be called in order to
create a converter object capable of handling the conversion from the input type to the output
type. By registering a pointer pointing to the converter, the converter can be set up dynamically.
This allows the Library to evaluate the set of registered converters each time a conversion is
requested and then chose the best suitable converter on the fly.
The next argument is the quality factor which we will describe in a separate paragraph later in this chapter. The last two arguments are not currently used but are reserved for future use. For now, using a value of 0 is perfectly valid.
Converters are intended to be used when we have our own module to handle the data coming from the remote server. The module can either be one provided by the Library or one made by the application. However, in some cases we would rather hand off the data to an external application for presenting the data. Often external applications are viewers of some sort, for example a postscript viewer or a mpeg viewer. The Library lets us register external applications as presenters very much like converters. This will become obvious if we take a look at how we register presenters:
extern void HTPresentation_add (HTList * conversions, CONST char * representation, CONST char * command, CONST char * test_command, double quality, double secs, double secs_per_byte);As was the case with converters, the first argument is a list which we create in exactly the same way as shown before. Presenters only need a input format as we hand off the data to the external application and never sees it again. A special thing about presenters and converters is that as they are very similar they are also treated very much alike internally in the Library. Therefore a list object can contain both converters and presenters at the same time. This makes often the management easier for the application instead of having to deal with two separate lists.
The next field is reserved to be used in connection with mail cap parsers as the test field of a mail cap file. The Library does not yet directly support Mail Cap files but the registration of presenters is foreseen to be able to work with mail cap files. The Arena browser is an example of an application having its own Mail Cap file parser while using the Library. The description of the test field in RFC 1524 is included below:
The "test" field may be used to test some external condition (e.g., the machine architecture, or the window system in use) to determine whether or not the mail cap line applies. It specifies a program to be run to test some condition. The semantics of execution and of the value returned by the test program are operating system dependent, with UNIX semantics specified in Appendix A. If the test fails, a subsequent mail cap entry should be sought. Multiple test fields are not permitted -- since a test can call a program, it can already be arbitrarily complex.
The last three arguments are exactly identical to the conversion registration so there is no need to describe them any more here. Again, the quality factor will be described in details later in this chapter.
extern void HTLanguage_add (HTList * list, CONST char * lang, double quality);The list object containing the set of natural languages is similar to the list elements containing the converters and the presenters. However, in contrast to the former two which actually can be one list, the list of natural languages must be a list on its own.
The semantics of the language argument follows closely the Language tag of the HTTP protocol which in terms is based on the RFC 1766. Some example tags are
en en-US en-cockney i-cherokee x-pig-latinwhere any two-letter primary tag is n ISO 639 language abbreviation and any two-letter initial subtag in an ISO 3166 country code.
extern void HTEncoding_add (HTList * list, CONST char * encoding, double quality);The list argument is the now well-known way of handling these preferences and we will see this many more times throughout the guide. The "encoding" argument is a constant string just like the data format descriptions in the registration of converters and presenters. The values are also inspired strongly by the HTTP Protocol and the MIME specification and some of the most common examples are:
base64 compress gzipAs the list of natural languages, the list of encoders and decoder must be a separate list.
extern void HTCharset_add (HTList * list, CONST char * charset, double quality);Also the charset argument is inspired by the HTTP Protocol and the MIME specification. Some of the most common examples of the charset parameter is:
US-ASCII ISO-8859-1 UNICODE-1-1Again, the list of preferred character sets must be a separate list.
It is a bit different for converters where it is often the application's ability of handling the data format rather than the user's perception. As an example it is often faster to use a converter than a presenter as it takes time to launch the external application and the Library can not use progressive display mechanisms which is often the case for converters. Therefore, as an example, if we capable of handling an image in png format inline but rely on an external viewer for presenting postscript, we might set up the following list:
HTConversion_add (converters, "image/gif", "www/present", GifPresenter, 1.0, 0.0, 0.0); HTPresentation_add (presenters, "application/postscript", "ghostview %s", NULL, 0.5, 0.0, 0.0);where the gif converter is registered with a quality factor of 1.0 and the postscript presenter with a quality factor of 0.5.
Here we will only show how to enable the preferences globally. Later when we have discussed how to create a request object we will see how to enable the preferences locally and also if they are to be added to the global list or completely override the global list for a particular request.
extern void HTFormat_setConversion (HTList *list); extern HTList * HTFormat_conversion (void);
extern void HTFormat_setEncoding (HTList *list); extern HTList * HTFormat_encoding (void);
extern void HTFormat_setLanguage (HTList *list); extern HTList * HTFormat_language (void);
extern void HTFormat_setCharset (HTList *list); extern HTList * HTFormat_charset (void);
Common for the cleanup methods is that when they have been called you can nor more use the lists as they are not pointing to valid places in the memory. The first mechanism for cleaning up lists is by calling the cleanup method of each preference as indicated below:
extern void HTConversion_deleteAll (HTList * list);
extern void HTPresentation_deleteAll (HTList * list);
extern void HTEncoding_deleteAll (HTList * list);
extern void HTLanguage_deleteAll (HTList * list);
extern void HTCharset_deleteAll (HTList * list);The second mechanism which at once cleans up all globally registered preferences can often be used in order to simplify the management done by the application. Note, however, that all globally lists become inaccessible for future reference. In you want to define new sets of preferences then you need to start all over again and create a new list object.
extern void HTFormat_deleteAll (void);
extern void HTConverterInit (HTList * conversions);There is a similar function for registering a common set of presenters that can be found on many (especially Unix) platforms:
extern void HTPresenterInit (HTList * conversions);In order to show the similarity between how converters and presenters are handled in the Library, there is also a single function that does the work of the two previous functions at once:
extern void HTFormatInit (HTList * conversions);
All protocol modules are dynamically bound to an access scheme. Take for example the following URL:
http://www.w3.org/It has the access scheme http and if we have a protocol module capable of handling HTTP then we can make the binding between http and this module. As mentioned in the introduction to this chapter, the Library already comes with a large set of protocol module, including HTTP so all we have to do in this case is to register the HTTP module to the Library as being capable of handling http URLs.
Let's see how we can register a protocol module. The support for this is provided by the protocol manager which exports the following function:
extern BOOL HTProtocol_add (CONST char * scheme, BOOL preemtive, HTEventCallBack * callback);This function follows exactly the same naming scheme as we have seen many times before. The first argument the access scheme which the protocol module is capable of handling. This can for example be http, but it can also be non-existent schemes which can be used for experimental protocol implementations, for example whois etc. In case a protocol module is capable of handling more than one access scheme, it can be registered multiple time with different schemes. This is the case with the Telnet access module which also can handle rlogin and tn3270 terminal sessions.
The next argument describes to the Library whether it is capable of handling non-blocking sockets or not. This is normally a design decision when implementing the protocol module, and we will not stretch this argument anymore in this guide. The Library Architecture document discusses in more detail how a protocol module can be designed to support non-blocking sockets.
The last argument is the actual function name to call when a request has been issued and a protocol
module has been found associated with the access scheme used. Even though it is not clear at this
point the HTEventCallBack type is a function that the event handler uses in order to
initiate requests in the Library.
A protocol module can be disabled at any time during execution. In most cases this is not uses very often but the dynamic nature of the binding leaves this choice free to the application. In case it is desired, you can do so by calling the following function:
extern BOOL HTProtocol_delete (CONST char * scheme);The argument is exactly the same scheme as described above. One special case is the support for access to WAIS databases. WAIS has its own code Library called freeWAIS which is required in order to directly access wais URLs. We shall not describe in describe in detail here how this can be enabled as it is described in the the WWW-WAIS gateway.
Often files in a file system is classified by some sort of a suffix, for example GIF files are often ending in .gif, text files in .txt etc. This binding is not static and it is therefore required to have a dynamic binding just like the preferences themselves. An example of the latter is HTML files which on most Unix systems end in .html whereas they on many MS-DOS based systems end in .htm.
The HTBind module provides a generic binding mechanism between a file and its representation internally in the Library. It is not limited to simple file suffix classification but can also be used in more advanced environments using data bases etc. However, at this point we are interested in how we can register bindings between file suffixes and for example content types, content languages etc.
Before starting a more detailed description of how to register file suffixes, it might be required to define what actually is a file suffix and what is the set of delimiters separating them on a particular platform. The Bind manager is born with a certain knowledge about the set of delimiters but more can be added to provide the functionality desired. This can be done using the following function:
extern void HTBind_caseSensitive (BOOL sensitive);where sensitive can either be YES or NO. Also the set of delimiters can be defined using the following function:
extern CONST char *HTBind_delimiters (void); extern void HTBind_setDelimiters (CONST char * new_suffixes);Examples of a list of suffixes are
"._" "." "._-"Note that the suffixes chosen do not have to be connected with what is available on a particular platform. However, a certain coupling will probably make maintenance of the file system easier for all parties. In the following we will show the API for adding bindings between the preferences and the file system. You can add a binding between a Content type and a suffix by using the following function:
extern BOOL HTBind_addType (CONST char * suffix, CONST char * format, double value);Calling this with suffix set to "*" will set the default representation which is used in case no other suffix fits the actual file. Using a suffix set to "*.*" will set the default representation for unknown suffix files which contain a "." The format argument is exactly like described in the section Request Preferences. In exactly the same way you can add a binding between an encoding anda file suffix using the following function:
extern BOOL HTBind_addEncoding (CONST char * suffix, CONST char * encoding, double value);Bindings can also be made between a file suffix and a specific natural language:
extern BOOL HTBind_addLanguage (CONST char * suffix, CONST char * language, double value);In all cases, it should be mentioned, that any of the suffixes can contain characters that normally must be escaped in a URL, for example space < >. However, they should not be encoded when parsed as the
suffix parameter but left as is.
From and Pragma. The reason is that
the former in general requires permission by the user and the latter has special meanings for proxy
servers.
It should be mentioned, however, that this API is simple to use if you have a relative small amount of extra metainformation to provide and that it easily fits into an existing protocol. It is not suited for building entire new protocols, or to provide a massive amount of new information. In this case you need a more powerful model which the Library also provides: building your own stream. Actually this is exactly the way the the Library implements large parts of itself, but it requires normally a bit more work before you can get an application pout together.
Let us jump right in to it and have a closer look at the API. Exactly as for the request preferences you can add and delete an element, which in this case is a callback function. This function has a special definition which is given by
typedef int HTPostCallback (HTRequest *request, HTStream * target);We have already seen the Request object before, but the Stream object is new. Or actually it isn't, it has just not been mentioned explicitly in the previous sections. We will hear a lot more about the stream object later in this guide. For now it is sufficient to know that a stream i an object that accepts streams of characters - much like an ANSI file stream object does. The return value of the callback function is currently not used but is reserved for future use. We can register a callback function of type HTPostCallback by using the following function:
extern BOOL HTGenerator_add (HTList * gens, HTPostCallback * callback);The first argument is the well-known list object and the second is the address of the function that we want to be called each time a request is generated. When the callback function is called by the Library it must generate its metainformation and send it down the stream which eventually will end up on the network as part of the final request. In exactly the same way you can unregister a callback function at any time by calling the following function:
extern BOOL HTGenerator_delete (HTList * gens, HTPostCallback * callback);
Allow
ContentEncoding
ContentLanguage
ContentLength
ContentType
ContentType header now support the charset parameter and the
level parameter, however none of them are used by the HTML parser
Date, Expires, RetryAfter, and LastModified
DerivedFrom, Version
Again, the API for handling extra headers is provided by the Header Manager and is based on managing list objects, just like we have seen many times before. Each time a request is received, and a unknown header is encountered by the internal MIME parser, the Library looks to see if a list of callback functions has been registered to parse additional metainformation. In case a parser is found for this particular header, the call back is called with the header and all parameters that might follow it. As MIME headers can contain line wrappings, the MIME parser canonicalizes the header line before the callback function is called which makes the job easier for the callback function.
Exactly as for the header generators you can add and delete an element, which also in this case is a callback function. This function has a special definition which is given by
typedef int HTParserCallback (HTRequest * request, CONST char * token);The request object is the current request being handled and the token is the header that was encountered together with all parameters following it. The callback can return a value to the Library by using the return code of the callback function. Currently there are two return values recognized by the Library:
HT_OK if the token is received and understood
HT_ERROR if the callback encounters a fatal error and any further parsing should be stopped.
extern BOOL HTParser_add (HTList * parsers, CONST char * token, BOOL case_sensitive, HTParserCallback * callback);Again, the first argument is a list as we have seen before. The token is a specific token by which the callback function should be called. This token can contain a wild card (*) which will match zero or more arbitrary characters. You can also specify whether the token should be matched using a case sensitive or case insensitive matching algorithm. Let's look at an example of how to register a parser callback function:
HTParser_add(mylist, "PICS-*", NO, myparser);This registers the
myparser function as being capable of handling all tokens starting
with "PICS", "PiCs", "pics", for example:
PICS-start pics-Token PICSAs for header generators, you can unregister a callback function by using the following function:
extern BOOL HTParser_delete (HTList * parsers, CONST char * token);
Here we will only show how to handle the global registration as the local registration is part of the description of the request object.
extern void HTHeader_setParser (HTList * list); extern HTList * HTHeader_parser (void);
extern void HTHeader_setGenerator (HTList * list); extern HTList * HTHeader_generator (void);
As for the other deletion methods, when they have been called you can nor more use the lists as they are not pointing to valid places in the memory. The first mechanism for cleaning up lists is by calling the cleanup method of each preference as indicated below:
extern BOOL HTParser_delete (HTList * parsers, CONST char * token); extern BOOL HTParser_deleteAll (HTList * parsers);
extern BOOL HTGenerator_delete (HTList * gens, HTPostCallback * callback); extern BOOL HTGenerator_deleteAll (HTList * gens);The easy way of cleaning up all global lists at once is calling the following function
extern void HTHeader_deleteAll (void);
The Library does provide a large amount of such pre- and post processing modules. However, the exact amount used by an application depends on the purpose of the application. Simple script-like applications typically do not need any history mechanism etc. Therefore these modules are not a part of the core but instead they can be registered as all other preferences. The Net Manager provides functionality for registering a set of callback functions that can be called before and after a request has been executed. Of course, the result of a pre-processing might be that the request does not have to be executed at all in which case the request can be terminated before the protocol module is called to execute the request.
extern BOOL HTNetCall_add (HTList * list, HTNetCallback * cbf, int status);The callback function has to be of type
HTNetCallback which is defined as
typedef int HTNetCallback (HTRequest * request, int result);This means that a callback function is called with the current request object and the result of the request. Now, if the callback is registered as a pre callback then we obviously do not yet have a result and the functions is called with the code HT_OK. However, if it is a post callback function then the result code may take any of the following values:
A callback function may return any code it likes, but IF the return code is different than HT_OK, then the callback loop is stopped. If we are in the before loop and a function returns anything else than HT_OK then we immediately jump to the after loop passing the last return code from the before loop.
Likewise, a callback function can be removed from a list using the following function:
extern BOOL HTNetCall_delete (HTList * list, HTNetCallback *cbf);or if you simply want to remove all functions from a list then you can use
extern BOOL HTNetCall_deleteAll (HTList * list);
extern BOOL HTNet_setBefore (HTList * list);In many cases you know when you register a callback function that this is a function that you always want to be called when either a request starts up or terminates. In the former case you can simply register the callback directly using the following function:
extern BOOL HTNetCall_addbefore (HTNetCallback *cbf, int status);
extern BOOL HTNet_setAfter (HTList * list); extern BOOL HTNetCall_addBefore (HTNetCallback *cbf, int status);
As a part of the core Library, the error manager is intended to pass information about errors and messages occuring in the Library back to the application. Each error is kept as an object so multiple errors can be nested together using the well-known HTList object. Nested error management can be used to build complicated error messages which an arbitrary level of details, for example:
This URL could not be retrieved: http://www.foo.com Reason: The host name could not be resolved Reason: DNS service is not availableThe principle behind the error manager is exactly like any other registration module in the Library in that it creates an object and binds it to a list that the caller provides. Often, errors are related to a specific request object and each request object will therefore keep its own list of errors. However, errors can also be maintained as separate lists which are not directly related to a request, for example, the application can keep its own list of errors independent of any Library errors.
Errors are roughly categorized into two classes: system errors and other errors. System errors include all errors that occur while interacting with the operating system. Often these errors occurs as a result of insufficient availability or authentication to a system resource. In many operating systems, the system provides a set of error messages which is associated with an error code made available to the application via the errno variable or equivalent. All other errors are registered with an error message belonging to the Library Error manager. Note, that there are no difference in how system errors and other errors are treated, they are the same data objects and can be registered together with no exception.
extern BOOL HTError_add (HTList * list, HTSeverity severity, BOOL ignore, int element, void * parameter, unsigned int length, char * where);The first argument is a list object and as always, we need to create a list object using the
HTList_new method. The next element is an indication of how serious the error is in
the situation where it occured. Classification of errors are known from many operating systems, for
example VMS, and it gives the application the opportunity to decide whether the current operation
should be continued or aborted. The Library provides four severity categories:
typedef enum _HTSeverity {
ERR_FATAL,
ERR_NON_FATAL,
ERR_WARN,
ERR_INFO
} HTSeverity;
It is not always that an error is an error immediately when it occurs. In some situations it might
first become an error later in the process depending on the outcome of other factors - or it might
be circumvented so that no special action is required. The ignore flag provides this
functionality in that an error can be registered at any time with the notion: "Register this error
but ignore it for now".
The element argument is an index into a table of all error messages. This table is
maintained in the HTError Module and contains an error
message together with a URl that might be included in an error message presented to the user. The
values of the element argument itself is given by the HTErrorElement
enumeration definition in the HTEvntrg Module.
The next two arguments are used to register any parameters associated with the error. This can for example be the file name of a file which could not be opened, a URL which could not be accessed etc. By letting the parameter be a void pointer together with a length indication, the parameter can be an arbitrary data object. The last argument is a location description to indicate where the error occured. Often this is the name of the function or a module.
One thing, we didn't mention when describing the request object was that the Request Object provides a similar function for directly associating an error object with a request object. These functions uses request objects and not a list as the basic data object and hence the caller does not have to worry about creating or assigning the list to the request object; this is done automatically. The request version of how to register an error looks very much like its more generic companion, and it should not be necessary to explain the arguments any further.
extern BOOL HTRequest_addError (HTRequest * request, HTSeverity severity, BOOL ignore, int element, void * par, unsigned int length, char * where);System errors can be registered in very much the same way as described above, but the set of parameters is a bit smaller and hopefully a bit easier to handle. The registration function is defined as:
extern BOOL HTError_addSystem (HTList * list, HTSeverity severity, int errornumber, BOOL ignore, char * syscall);The only difference is the
errornumber argument which, as described above, in many
situations is provided by the operating system, for example as a errno variable. The
syscall is simply the name of the function. Also this function has a mirror function in
the HTRequest object, and again they look very much
alike:
extern BOOL HTRequest_addSystemError (HTRequest * request, HTSeverity severity, int errornumber, BOOL ignore, char * syscall);Let's take a look at two examples of registering errors. The first example registers an informational error message explaining that the HTTP module received a redirection notification from the remote HTTP server. The first example uses the Request versions of the error registration functions, and the second example uses the generic versions:
BOOL HTTPRedirect (HTRequest * request, int status, char * location)
{
if (location) {
if (status == 301) {
HTRequest_addError(request, ERR_INFO, NO, HTERR_MOVED,
location, strlen(location), "HTTPRedirect");
} else if (status == 302) {
HTRequest_addError(request, ERR_INFO, NO, HTERR_FOUND,
location, strlen(location), "HTTPRedirect");
}
return YES;
} else {
HTRequest_addError(request, ERR_FATAL, NO, HTERR_BAD_REPLY,
NULL, 0, "HTTPRedirect");
return NO;
}
}
The second example shows how to register a system error:
BOOL HTReadDir (HTRequest * request, const * directory)
{
DIR *dp;
if ((dp = opendir(directory))) {
STRUCT_DIRENT * dirbuf;
while ((dirbuf = readdir(dp))) {
/* Read Directory */
}
closedir(dp);
return YES;
} else {
HTError_addSystem(errorlist, ERR_FATAL, errno, NO, "opendir");
return NO;
}
}
typedef enum _HTErrorShow {
HT_ERR_SHOW_FATAL, /* Show only fatal errors */
HT_ERR_SHOW_NON_FATAL, /* Show non fatal and fatal errors */
HT_ERR_SHOW_WARNING, /* Show warnings, non fatal, and fatal errors */
HT_ERR_SHOW_INFO, /* Show all of errors */
HT_ERR_SHOW_PARS, /* Show any parameters (if any) */
HT_ERR_SHOW_LOCATION, /* Show the location where the error occured */
HT_ERR_SHOW_IGNORE, /* Show errors even if they are ignored */
HT_ERR_SHOW_FIRST, /* Show only the first registered error */
HT_ERR_SHOW_LINKS /* Show any HTML links (if any) */
HT_ERR_SHOW_DEFAULT, /* Default level of details *
HT_ERR_SHOW_DETAILED, /* Somewhat detailed level */
HT_ERR_SHOW_DEBUG, /* Very detailed */
} HTErrorShow;
The last three entries in the enumeration list are only for the convenience of the application. They
provide some useful default values for how error messages can be presented to the user. The
setup can be modified using the following functions:
extern HTErrorShow HTError_show (void); extern BOOL HTError_setShow (HTErrorShow mask);The actual generation of error messages often involves a platform dependent interface including special windows etc. In order to keep the error manager itself completely platform independent, the error presentation functionality is part of the Messaging Module which is described in detail later in this guide.
extern BOOL HTError_doShow (HTError * info); extern BOOL HTError_ignoreLast (HTList * list); extern BOOL HTError_setIgnore (HTError * info); extern int HTError_index (HTError * info); extern HTSeverity HTError_severity (HTError * info); extern int HTError_parameter (HTError * info, void *parameter); extern CONST char * HTError_location (HTError * info);
HTError_add and
HTError_addSystem), the cleanup is done exactly like for all other list based
registration mechanisms in the Library. In case you are using the request specific version, the
request manager both handles creating and deletion of error lists, so you do not have to do
anything. The generic interface for cleaning up looks like:
extern BOOL HTError_deleteAll (HTList * list); extern BOOL HTError_deleteLast (HTList * list);
extern BOOL HTLoad (HTRequest * request, HTPriority priority, BOOL recursive);
extern HTNetCallBack HTLoad_terminate;
extern HTRequest * HTRequest_new (void);
extern void HTRequest_delete (HTRequest * request);
extern void HTRequest_setAnchor (HTRequest *request, HTAnchor *anchor); extern HTParentAnchor * HTRequest_anchor (HTRequest *request);
Methods are handled by the Method Module, and the default value is "GET".
extern void HTRequest_setMethod (HTRequest *request, HTMethod method); extern HTMethod HTRequest_method (HTRequest *request);
typedef enum _HTReload {
HT_ANY_VERSION = 0x0, /* Use any version available */
HT_MEM_REFRESH = 0x1, /* Reload from file cache or network */
HT_CACHE_REFRESH = 0x2, /* Update from network with IMS */
HT_FORCE_RELOAD = 0x4 /* Update from network with no-cache */
} HTReload;
extern void HTRequest_setReloadMode (HTRequest *request, HTReload mode);
extern HTReload HTRequest_reloadMode (HTRequest *request);
extern BOOL HTRequest_setMaxRetry (int newmax); extern int HTRequest_maxRetry (void); extern BOOL HTRequest_retry (HTRequest *request);
extern time_t HTRequest_retryTime (HTRequest * request);
Each request can have its local set of accept headers that either are added to the global set or replaces the global set of accept headers. Non of the headers have to be set. If the global set is sufficient for all requests then this us perfectly fine. If the parameter "override" is set then only local accept headers are used, else both local and global headers are used.
extern void HTRequest_setFormat (HTRequest *request, HTList *type, BOOL override); extern HTList * HTRequest_format (HTRequest *request);
extern void HTRequest_setEncoding (HTRequest *request, HTList *enc, BOOL override); extern HTList * HTRequest_encoding (HTRequest *request);
extern void HTRequest_setLanguage (HTRequest *request, HTList *lang, BOOL override); extern HTList * HTRequest_language (HTRequest *request);
extern void HTRequest_setCharset (HTRequest *request, HTList *charset, BOOL override); extern HTList * HTRequest_charset (HTRequest *request);
typedef enum _HTGnHd {
HT_DATE = 0x1,
HT_FORWARDED = 0x2,
HT_MESSAGE_ID = 0x4,
HT_MIME = 0x8,
HT_NO_CACHE = 0x10 /* Pragma */
} HTGnHd;
#define DEFAULT_GENERAL_HEADERS 0
extern void HTRequest_setGnHd (HTRequest *request, HTGnHd gnhd);
extern void HTRequest_addGnHd (HTRequest *request, HTGnHd gnhd);
extern HTGnHd HTRequest_gnHd (HTRequest *request);
From and
Pragma.
typedef enum _HTRqHd {
HT_ACCEPT_TYPE = 0x1,
HT_ACCEPT_CHAR = 0x2,
HT_ACCEPT_ENC = 0x4,
HT_ACCEPT_LAN = 0x8,
HT_FROM = 0x10,
HT_IMS = 0x20,
HT_ORIG_URI = 0x40,
HT_REFERER = 0x80,
HT_USER_AGENT = 0x200
} HTRqHd;
#define DEFAULT_REQUEST_HEADERS \
HT_ACCEPT_TYPE+HT_ACCEPT_CHAR+HT_ACCEPT_ENC+HT_ACCEPT_LAN+HT_REFERER+HT_USER_AGENT
extern void HTRequest_setRqHd (HTRequest *request, HTRqHd rqhd);
extern void HTRequest_addRqHd (HTRequest *request, HTRqHd rqhd);
extern HTRqHd HTRequest_rqHd (HTRequest *request);
typedef enum _HTEnHd {
HT_ALLOW = 0x1,
HT_CONTENT_ENCODING = 0x2,
HT_CONTENT_LANGUAGE = 0x4,
HT_CONTENT_LENGTH = 0x8,
HT_CTE = 0x10, /* Content-Transfer-Encoding */
HT_CONTENT_TYPE = 0x20,
HT_DERIVED_FROM = 0x40,
HT_EXPIRES = 0x80,
HT_LAST_MODIFIED = 0x200,
HT_LINK = 0x400,
HT_TITLE = 0x800,
HT_URI = 0x1000,
HT_VERSION = 0x2000
} HTEnHd;
#define DEFAULT_ENTITY_HEADERS 0xFFFF /* all */
extern void HTRequest_setEnHd (HTRequest *request, HTEnHd enhd);
extern void HTRequest_addEnHd (HTRequest *request, HTEnHd enhd);
extern HTEnHd HTRequest_enHd (HTRequest *request);
extern void HTRequest_setParent (HTRequest *request, HTParentAnchor *parent); extern HTParentAnchor * HTRequest_parent (HTRequest *request);
extern void HTRequest_setExtra (HTRequest *request, char *extra); extern char *HTRequest_extra (HTRequest *request);
NULL which means that the stream goes to the user
(display).
extern void HTRequest_setOutputStream (HTRequest *request, HTStream *output); extern HTStream *HTRequest_OutputStream (HTRequest *request);The desired format of the output stream. This can be used to get unconverted data etc. from the library. If
NULL, then WWW_PRESENT is default value.
extern void HTRequest_setOutputFormat (HTRequest *request, HTFormat format); extern HTFormat HTRequest_OutputFormat (HTRequest *request);
200 OK will be put down this stream. This can be
used for redirecting body information in status codes different from
"200 OK" to for example a debug window. If the value is NULL (default)
then the stream is not set up.
extern void HTRequest_setDebugStream (HTRequest *request, HTStream *debug); extern HTStream *HTRequest_DebugStream (HTRequest *request);The desired format of the error stream. This can be used to get unconverted data etc. from the library. The default value if
WWW_HTML as a character based only has one WWW_PRESENT.
extern void HTRequest_setDebugFormat (HTRequest *request, HTFormat format); extern HTFormat HTRequest_DebugFormat (HTRequest *request);
typedef int HTRequestCallback (HTRequest * request, void *param); extern void HTRequest_setCallback (HTRequest *request, HTRequestCallback *cb); extern HTRequestCallback *HTRequest_callback (HTRequest *request);The callback function can be passed an arbitrary pointer (the void part) which can describe the context of the current request structure. If such context information is required then it can be set using the following methods:
extern void HTRequest_setContext (HTRequest *request, void *context); extern void *HTRequest_context (HTRequest *request);
extern void HTRequest_setPreemtive (HTRequest *request, BOOL mode); extern BOOL HTRequest_preemtive (HTRequest *request);
extern void HTRequest_setNegotiation (HTRequest *request, BOOL mode); extern BOOL HTRequest_negotiation (HTRequest *request);
extern HTList *HTRequest_errorStack (HTRequest *request);
extern long HTRequest_bytesRead(HTRequest * request);
HT_LOADED
HT_NO_DATA
HT_NO_DATA code might be the result
when a telnet session is started etc.
HT_ERROR
HT_RETRY
HTRequest->retry_after field. No action is taken by the
Library to automatically retry the request, this is uniquely for the
application to decide.
HT_WOULD_BLOCK
MORE
OK, let's continue and get an overview of the functionality provided by the application modules.
HyperDoc structure in memory as the user keeps requesting
new documents. The HyperDoc structure is only declared in
the Library - the real definition is left to the application as it is
for the application to handle graphic objects. The Line Mode Browser
has its own definition of the HyperDoc structure called
HText. Before a request is processed over the net,
the anchor object is searched for a HyperDoc structure
and a new request is issued only if this is not present or the Library
explicitly has been asked to reload the document, which is described
in the section Short Circuiting the Cache
As the management of the graphic object is handled by the application,
it is also for the application to handle the garbage collection of the
memory cache. The Line Mode Browser
has a very simple memory management of how long graphic objects stay
around in memory. It is determined by a constant in the GridText
module and is by default set to 5 documents. This approach can be much
more advanced and the memory garbage collection can be determined by
the size of the graphic objects, when they expire etc., but the API is
the same no matter how the garbage collector is implemented.
HyperDoc structure which is referenced by the
HTAnchor object. As the definition of the
HyperDoc structure is done by the application there is no
explicit rule of what graphic objects that can not be described by the
HyperDoc, but often it is binary objects, like images
etc.
The file cache in the Library is a very simple implementation in the
sense that no intelligent garbage collection has been defined. It has
been the goal to collect experience from the file cache in the W3C
proxy server before an intelligent garbage collector is implemented in
the Library. Currently the following functions can be used to control
the cache, which is disabled by default:
HTCache_enable(), HTCache_disable(), and
HTCache_isEnabled()
HTCache_setRoot() and HTCache_getRoot()
void HTRequest_setReload (HTRequest *request, HTReload mode); HTReload HTRequest_reload (HTRequest *request);where
HTReload can be either of the values
void HTAccess_setExpiresMode (HTExpiresMode mode, char * notify); HTExpiresMode HTAccess_expiresMode ();where
HTExpiresMode can take any of the values:
HT_EXPIRES_IGNORE
HT_EXPIRES_NOTIFY
HT_EXPIRES_AUTO
The Library supports both proxies and gateways through the HTProxy module and all requests can be redirected to a proxy or a gateway, even requests on the local file system. Of course, the Library can also be used in proxy or gateway applications which in terms can use other proxies or gateways so that a single request can be passed through a series of intermediate agents.
There is one main mechanism for registering both proxies and gateways but there are two different APIs to follow. It is free to the application to chose which one suits it the best, the functionality provided by the Library is the same in both cases. The first API is based on a set of registration functions as we have seen it so often through out this guide. Regardless of the registration mechanism used, proxy servers are always rated higher than gateways so if both a proxy server and a gateway is registered for the same access method, the proxy server will be used.
http://proxy.w3.org:8001 but domain name is not required. If an entry exists for this
access then delete it and use the new one.
extern BOOL HTProxy_add (CONST char * access, CONST char * proxy);In addition to the proxy list, the Library supports a list of servers for which a proxy should not be consulted. This can be useful in order to avoid going via a proxy server for servers inside a firewall, if the server is known to be either as well connected as the proxy or the remote server is in fact itself a proxy server.
extern BOOL HTNoProxy_add (CONST char * host, CONST char * access, unsigned port);The set of server registered using this function are host names and domain names where we don't contact a proxy even though a proxy is in fact registered for this particular access method. When registering a server as a noproxy element, you can specify a specific port for this access method in which case it is valid only for requests to this port. If `port' is '0' then it applies to all ports and if `access' is NULL then it applies to to all access methods. Examples of host names are:
w3.org www.fastlink.com
http://gateway.w3.org:8001 but domain name is not required. If an entry
exists for this access then delete it and use the new one.
extern BOOL HTGateway_add (CONST char * access, CONST char * gate);
extern void HTProxy_getEnvVar (void);There is no standard for the format of the environment variables, but the most accepted convention is the format described here:
WWW_<access>_GATEWAY
<access>_proxy
no_proxy
no_proxy="cern.ch,ncsa.uiuc.edu,some.host:8080" export no_proxy
<access> is the specific access scheme and it is case sensitive as access schemes
in URIs are case sensitive. Proxy servers have precedence over gateways, so if both a proxy server
and a gateway has been defined for a specific access scheme, the proxy server is selected to handle
the request.It is important to note that the usage of proxy servers or gateways is an extension to the binding between an access scheme and a protocol module. An application can be set up to redirect all URLs with a specific access scheme without knowing about the semantics of this access scheme or how to access the information directly. That way, powerful client applications can be built having direct support for, for example, HTTP only.
Parsing a whole rule file is done using a converter stream. This means that a rule file can come
from anywhere, even across the network. We have defined a special content type for rule files
called WWW_RULES in HTFormat.
In some situations, a set of rules comes from a subset of a file or some other origin, for example INI files for X resources. In that case, you can also parse a single line from a rules file using the following function:
extern BOOL HTRule_parseLine (HTList * list, CONST char * config);You can add a rule to a list of rules as any other preference. The
pattern is a string
containing a single "*". replace points to the equivalent string with * for the place
where the text matched by * goes.
typedef enum _HTRuleOp {
HT_Invalid,
HT_Map,
HT_Pass,
HT_Fail,
HT_DefProt,
HT_Protect,
HT_Exec,
HT_Redirect,
HT_UseProxy
} HTRuleOp;
extern BOOL HTRule_add (HTList * list, HTRuleOp op, CONST char * pattern, CONST char * replace);
And as normal you can delete a set of rules by using this function:
extern BOOL HTRule_deleteAll (HTList *list);
extern HTList * HTRule_global (void); extern BOOL HTRule_setGlobal (HTList * list);
extern char * HTRule_translate (HTList * list, CONST char * token, BOOL ignore_case);
<HOST> <DATE> <METHOD> <URI> <RESULT> <CONTENT LENGTH>where the date and time stamp can be either in local time or GMT. Logging is turned off but the application can enable it at any time. However, it is also for the application to disable the logging in order to close any open file descriptors etc. The exact log API is described in the Log Manager.
The purpose of the history module is to try not to impose any particular history mechanism policy but instead to allow various different history mechanisms. The basic features of the history module are:

HTMLPresent() used to present a graphic object on the
screen from both the global and the local list of converters and
presenters.
MORE
HT_REENTRANT
strtok_r. The default value is OFF.
HT_SHARED_DISK_CACHE
HT_DIRECT_WAIS
HT_DEFAULT_WAIS_GATEWAY
HT_DIRECT_WAIS is not defined and no gateway has been
defined using environment variables
HT_FTP_NO_PORT
PASV and PORT when requesting a
document from a FTP server. If the application is a proxy server running on top of a firewall
machine then PORT is normally not allowed as a firewall does not accept incoming
connections on arbitrary ports. This define will disable the use of PORT. The default
value is to use PORT if PASV fails.
WWWLIB_SIG
HT_TMP_ROOT
/tmp which obviously is not suited for large amount of data.
HT_CACHE_ROOT
/tmp.
HT_NO_RULES
HT_NO_PROXY