SiteScope User's Guide


About URL Sequences and Dynamic Content

Web pages which include client-side programming or dynamically generated content can present problems in constructing SiteScope URL Sequence monitors. Client-side programs might include Java applets, ActiveX controls, Javascript, or VBScript. Web pages which are generated by server-side programming (Perl/CGI, ASP, CFM, SSI, and so forth) can also present a problem if link references or form attributes are changed frequently.

SiteScope does not interpret Javascript, VBScript, Java applets, or Active X Controls embedded in HTML files. This may not be a problem when the functionality of the client-side program is isolated to visual effects on the page where it is embedded. Problems can arise when the client-side program code controls links to other URL's or modifies data submitted to a server-side program. Because SiteScope does not interpret client-side programs, actions or event handlers made available by scripts or applets will be invisible to the URL Sequence wizard.

Some Web sites use dynamically generated link references on pages generated by server-side programming. While these Web pages do not contain client-side programs, frequently changing link references or "cookie" data can make it difficult to set up and maintain a URL Sequence Monitor.

Dynamic Content Workarounds

There are several ways to make a SiteScope URL Sequence monitor perform actions controlled by client-side programs and other dynamic content. Several of these workarounds are presented below. The workarounds generally require knowledge of the principles of Web page construction, CGI programming, Perl-style regular expressions, and the programming used to support the Web site being monitored.

Dynamic Content

SiteScope Work Around

A Web page contains a script which controls a link to another URL (example: onClick = "document.location='http://...) Use a Match Content regular expression in the sequence step for the subject page to retain the filename.ext value from the .location="filename.ext" match pattern. The retained value can then be passed as a URL in the Other text box of the next step of the sequence.
A client-side program reformats, edits, or adds data to a POST or GET data set collected by HTML form inputs. Manually edit the script changes into the NAME=VALUE pairs displayed for the subject sequence step. This can be done in the text box under the Form option in the URL Sequence Wizard or in the POST data box for the applicable step in the Edit URL Sequence form. Requires familiarity with the script function and CGI request headers.
A client-side program generates HTML content which, after interpretation by a Web browser, includes HTML <A HREF=...> links. Use a Match Content regular expression to return the filename.ext value from the HREF="filename.ext" pattern and pass it to the URL text box of the next sequence step.
A client-side program generates HTML content which, after interpretation by a Web browser, includes forms submitted to a CGI program. Manually enter the NAME=VALUE pairs for the subject sequence step. This can be done in the text box under the Form option in the URL Sequence Wizard or in the POST data box for the applicable step in the Edit URL Sequence form. Requires familiarity with the script, the form structure, and CGI request headers.
A script dynamically sets the ACTION attribute of an HTML <FORM> tag. Manually enter the ACTION URL for the next sequence step. This can be done in the text box under the URL option in the URL Sequence Wizard or in the Step n Reference box for the applicable step in the Edit URL Sequence form. Requires familiarity with the script.
A script dynamically sets the METHOD attribute of an HTML <FORM> tag. Manually enter the POST or GET data for the next sequence step. For POST methods, enter the data in the text box under the Form option in the URL Sequence Wizard or in the POST data box in the Edit URL Sequence form. For GET methods, enter the ACTION URL plus the &NAME=VALUE pairs in the text box under the URL option in the URL Sequence Wizard or in the Step n Reference box in the Edit URL Sequence form. Requires familiarity with the script, the form structure, and CGI request headers.

The figure below illustrates several of the principles of constructing a URL Sequence Monitor using regular expressions. The regular expressions shown in the figure can be used to extract URLs from Javascript or other Web page content. As indicated, content matches for a given step are performed on the content returned for that step. The parentheses used in the regular expressions cause the value matched by the expression inside the parentheses to be remembered or retained. This retained value can be passed on to the next step of the sequence by using the {$n} variable. Because the regular expression can contain more than one set of parentheses, the $n represents the match value from the $nth set of parentheses. The example in the figure only uses one set of parentheses and thus references the retained value as {$1}

regular expressions in URL Sequences

Web pages containing code that perform the following present additional challenges:

  • A script parses a cookie or other dynamic content to be added to a CGI GET request.
  • Link information is contained in an external script file accessed via a HTML <SCRIPT HREF="http://... > tag

Web pages with dynamically generated link and form content will probably not be parsed correctly by SiteScope URL Sequence Monitor Wizard.