SiteScope User's Guide


Monitoring the Topaz System Using the Topaz Watchdog Group

Topaz Watchdog is a SiteScope group that monitors the Topaz environment, and checks that Topaz is functioning correctly. If SiteScope discovers a problem, it can send alerts to designated users.

This chapter describes:

About the Topaz Watchdog Group

Topaz Watchdog is a SiteScope group that monitors the machines on which you deploy Topaz servers and components. It includes both component monitoring (services, processes, resources, and so forth), and end-to-end monitoring (the last reported data time from any Topaz Business Process Monitor or Client Monitor Agent).

Topaz Watchdog Monitors

SiteScope generates one Topaz Watchdog group where all Topaz services and components registered with Topaz are monitored. The Topaz Watchdog group keeps track of the Topaz servers and machines with monitors. Each monitor checks one specific component of Topaz. For example, the Supervisor Service monitor checks the status of the Topaz Supervisor service on the Agent Server, Admin Server, Graph Server, Alert Server, and Scheduled Tasks Server.

The Topaz components that are monitored by Topaz Watchdog are as follows:

  • Admin Server
  • Agent Server
  • Alert Server
  • Graph Server
  • Scheduled Tasks Server
  • Topaz Bus Server
  • Database Server
  • Topaz Business Process Monitor
  • Topaz Client Monitor Agent
  • EMS probe
  • SiteScope

The monitors are configured according to monitor templates. The templates include the conditions under which alerts are triggered.

If the status of a monitor changes to "error," SiteScope sends alerts to designated users. The status of a monitor is calculated according to predefined rules. These predefined rules can be customized by the user by editing the Topaz Watchdog monitors, or by changing the Topaz Watchdog templates.

Updating Topaz Watchdog

SiteScope automatically updates the Topaz Watchdog group when the Topaz topology changes, for example, when a new Business Process Monitor is installed. SiteScope checks for this change once hourly. Furthermore, SiteScope automatically updates the Topaz Watchdog group whenever the Topaz Watchdog settings are changed.

You can also manually update the Topaz Watchdog group. For details, see Updating the Topaz Watchdog Group Configuration.

E-mail Alerts

With minimal setup, you can immediately begin viewing reports on monitor status. However, to make SiteScope a more proactive application, some crucial Topaz Watchdog monitors are configured to send e-mail alerts when a monitor is showing consecutive errors.

Topaz Watchdog Alerts

The Topaz Watchdog group can utilize all SiteScope alert methods, such as Script alerts and SNMP trap alerts. For details, refer to the SiteScope documentation.

Setting Up and Working with the Topaz Watchdog Group

The procedure for setting up and working with the Topaz Watchdog group is as follows:

  1. Set up the Topaz Watchdog group.

    You set up the Topaz Watchdog group to monitor Topaz machines. For details, see Setting Up the Topaz Watchdog Group

  2. Set up a baseline for the Topaz Watchdog group.

    So that the Topaz Watchdog group can work reliably, you must set up a baseline for Topaz, against which you can compare subsequent monitor behavior. For details, see Creating a Baseline for the Topaz Watchdog Group

  3. Set up e-mail alerts.

    You can set up the Topaz Watchdog group to send users e-mail, to alert them to any problems that may occur on a Topaz machine.

    You provide the e-mail address of the SiteScope administrator, and you build groups of recipients to whom SiteScope sends e-mail at certain times of the day. For details, see < a href="MailPref.htm">Mail Preferences.

    Note that Topaz Watchdog alerts are configured by default to be sent to all e-mail recipients.

  4. Set up and view alerts.

    You can set up SiteScope to trigger an alert when a Topaz Watchdog monitor has an error status. You can also view alerts that have been generated during the past 24 hours. SiteScope calculates alerts according to default criteria. For details, see SiteScope Alerts.

  5. View current monitor status.

    You can view the current status of any monitor. For details, see Monitor Group Detail page.

  6. View SiteScope reports.

    You can run reports to view the recent history of the Topaz Watchdog monitors. For details, see Management Reports.

Setting Up the Topaz Watchdog Group

You set up Topaz Watchdog as a SiteScope group, and you choose which services the Topaz Watchdog group should monitor. SiteScope automatically retrieves the Topaz components.

To set up the Topaz Watchdog group:

  1. In SiteScope, choose Preferences > Topaz. The Topaz Server Registration page opens.
  2. If SiteScope is configured to report to Topaz, the Required Settings boxes are filled in. Skip to step 4. If SiteScope is not configured to report to Topaz, fill out the Required Settings fields. Click Register.

    Note: It is recommended to register SiteScope with Topaz for the Topaz Watchdog group to work correctly. You can subsequently disable the connection, if you do not want SiteScope to report to Topaz.


  3. In the next page, select a profile name. Click Save Profile. You are returned to the main page. Continue to step 3.

    If you do not want SiteScope to report to Topaz, you do not have to select a profile. Click the browser back button to return to the Topaz Server Registration page. Skip the next step and continue at the step below.

  4. Choose Preferences > Topaz to return to the Topaz Server Registration page.
  5. Go down to the bottom of the page, to the section entitled Topaz Watchdog Required Settings.
  6. Select the Enable Topaz Watchdog check box.
  7. Click Force Configuration Update. SiteScope displays a message that there is nothing to monitor (because you have not yet selected the services to monitor).
  8. Click Edit Topaz Watchdog Settings.
  9. In the Topaz Watchdog Settings page, select the Topaz services that you want the Topaz Watchdog group to monitor.
  10. Click Save Settings.

    The Watchdog Configuration Result page displays the results of SiteScope's attempt to create subgroups for the Topaz components. Results are displayed in red or black: black signifies that SiteScope was able to create a group for a Topaz machine; red signifies that SiteScope created a group on a machine that requires administrative privileges for remote access (the components that are monitored on that machine appear in bold).

    If all groups are displayed in black, you can continue with the set up. Skip to step where SiteScope displays the Topaz Machine View that shows the name, location, and Topaz service for each machine.

    For any group displayed in red, continue to the next step.

  11. You must configure remote access for any machines that are displayed in red. You do this in the Remote NT Servers page, or the Remote UNIX Server page. For details, see Remote NT or Remote UNIX. Then, reconfigure the machine in the Topaz Watchdog Settings page:
    1. In the Topaz Machine View table, clear the check box of the machine whose components you want to reconfigure.
    2. Click Save Settings.
    3. In the Watchdog Configuration Result page, click Edit Topaz Watchdog Settings.
    4. In the Topaz Machine View table, select the check box of the machine whose components you want to reconfigure.
    5. Click Save Settings.
  12. SiteScope displays the Topaz Machine View that shows the name, location, and Topaz service for each machine.

    Click SiteScope in the tool bar to return to the SiteScope main view. A new group is displayed, called Topaz Watchdog.

Updating the Topaz Watchdog Monitor Path

If Topaz is installed on a volume other than C, you must change the path of the Topaz folder.

To update the Topaz path:

  1. Open the watchdog.config file in the SiteScope\groups folder.
  2. Locate the row _twdTopazFolder=C$\Topaz.
  3. Change C to the volume on which Topaz is installed.

    If the Topaz installation folder is shared, write the name by which it is shared. For example, if the Topaz installation folder is shared by the name Topaz, replace C$\Topaz with Topaz.

  4. Restart SiteScope.

Updating the Topaz Watchdog Group Configuration

You can force a configuration update of the Topaz Watchdog group. Any changes made to the Topaz Watchdog configuration take effect within a few minutes.

To manually update Topaz Watchdog:

  1. Choose Preferences > Topaz from the SiteScope main menu.
  2. Go down to the bottom of the page, to the section entitled Topaz Watchdog Required Settings.
  3. Click Force Configuration Update.

Reconfiguring a Topaz Watchdog Component

You can reconfigure a specific component.

To reconfigure a Topaz Watchdog component:

  1. In the Topaz Machine View table, clear the check box of the machine whose monitors you want to reconfigure.
  2. Click Save Settings.
  3. In the Watchdog Configuration Result page, click Edit Topaz Watchdog Settings.
  4. In the Topaz Machine View table, select the check box of the machine whose components you want to reconfigure.
  5. Click Save Settings.

Disabling the Topaz Watchdog Group

You can disable the Topaz Watchdog group, and prevent the group appearing in the SiteScope Preferences pages.

To disable the Topaz Watchdog group:

  1. Locate the file master.config, in the \SiteScope\groups folder.
  2. Set the value of the key _disableTopazWatchdog to true.
  3. Save the file, and restart the SiteScope service.

You can also manually update the Topaz Watchdog group whenever you need to.

Creating a Baseline for the Topaz Watchdog Group

Following setup, SiteScope begins monitoring the Topaz machines registered in the Topaz Management database. You must now bring the Topaz Watchdog group to the state where all monitors have an OK status, that is, the Topaz Watchdog group's icon in the SiteScope main view is green. In this way, you can create a reliable baseline against which you can compare subsequent monitor behavior.

The following reasons explain why, following setup, the Topaz Watchdog group may show error or warning statuses:

Problem Description

Resolution

SiteScope does not have the appropriate permissions to access the machine on which the Topaz Admin Server is installed.

If this is the case, SiteScope shows error statuses for all monitors that are reporting on the operating system of the Topaz machine. For details of working with NT servers, see Remote NT. For details of working with UNIX servers, see Remote UNIX.

A machine is down for maintenance.

You can temporarily remove the machine from the Topaz Watchdog Settings page.

A Topaz Business Process Monitor is not sending data.

Check why the Business Process Monitor is not working. Possible causes are that the Business Process Monitor machine is down or a process is stuck; there are network problems so the Business Process Monitor cannot connect to the Topaz Agent Server; the Topaz loaders are failing to insert the data into the profile database; the Topaz Management database is down; the profile database is down.

Remote machines cannot be accessed from the SiteScope machine.

 

The monitor Last reported data time has an error among all its components.

The Last reported data time monitor is different from other monitors, because it checks a complete, round-trip process, and not one specific component of a process. It may happen that somewhere in the round trip a problem was found: try and isolate the problem by looking at a Topaz Agent Server monitor.

All directory monitors indicate a "directory not found" error, and all log file monitors indicate an "unable to read log file" error.

This can happen when you monitor a Topaz system running on Windows machines, since the Topaz Watchdog monitors assume that Topaz is installed on C:\Topaz.

Before making any changes:

  • Back up the original SiteScope \groups \templates.sets.topazWatchdog folder.
  • Stop SiteScope.

After making any changes:

  • Restart SiteScope.

To resolve this problem, do one (or more) of the following:

  1. Edit the Topaz Watchdog monitor sets in the <SiteScope install path>\SiteScope \groups \templates.sets.topazWatchdog directory, and replace the $TOPAZ_FOLDER$ string with the location on the local disk where Topaz is installed. For example, if Topaz is installed on volume D, change $TOPAZ_FOLDER$ to D$ \Topaz. If the Topaz folder is shared by the name Topaz, change $TOPAZ_FOLDER$ to Topaz.
    Navigate to the Topaz Watchdog Settings page and:
    • Disable machines for which the wrong directory path is used, and save your changes.
    • Re-enable the machines.

  2. Open SiteScope \groups \watchdog.config, and change the value of the _twdTopazFolder parameter.
    For example, you can set this parameter to D$:\Topaz by changing the line
    _twdTopazFolder=C$ \Topaz
    to
    _twdTopazFolder=D$ \Topaz.
    You can set this parameter to Topaz by changing the line _twdTopazFolder=C$ \Topaz
    to
    _twdTopazFolder=Topaz.

    Navigate to the Topaz Watchdog Settings page and:

    • Disable machines for which the wrong directory path is used, and save your changes.
    • Re-enable the machines.

    Note: If you clear the Topaz Watchdog settings, the _twdTopazFolder parameter is reset to C$ \Topaz.

  3. Edit each problematic monitor, and change its directory path or the log file pathname attributes to the correct directory or file path.

The monitor name: File-Age <sample type> Buffers Read on $TOPAZ_HOST_NAME$ indicates an error.
Note: <sample type> can be one of the following: Transaction, SiteScope, WebTrace, EMS, J2EE.

This problem may indicate that the Topaz loaders are failing to insert the data collection samples into the profile database.

SiteScope checks the time that the Read folder in each loader was last modified, and uses predefined thresholds for this monitor:

  • If the time since the loader was last modified is more than 4 minutes, SiteScope issues a warning.
  • If the time since the loader was last modified is more than 8 minutes, SiteScope issues an error.
If you know that the interval at which the relevant data collection agents report data to Topaz is higher than these thresholds, you should change the threshold for warnings and errors in these monitors. For details, see the help page for the relevant monitor.

Monitor Descriptions

The Topaz Watchdog group supervises the Topaz servers, machines, and components with monitors. The monitors are set up to report when any component ceases to work correctly.

The monitors are configured according to monitor templates (*.mg) located in the templates.groupsTopazWatchdog folder. The templates include the conditions under which alerts are triggered.

You can disable and enable monitors, for example, when you know in advance that monitors will be in error, such as during routine maintenance or a prolonged outage.

The monitors which should be dependent on the Topaz Supervisor service are for Topaz running on Topaz Watchdog Monitors on Windows Platform. The equivalent monitors for Topaz on Topaz Watchdog Monitors on UNIX Platform should be dependent on the Topaz Supervisor process.

Topaz Watchdog does not configure alerts for any of the monitors.

The following tables list the monitors for Topaz Watchdog Monitors on Windows Platform and Topaz Watchdog Monitors on UNIX Platform platforms, and includes:

  • the Topaz component being monitored
  • the type and name of the Topaz Watchdog monitor
  • the frequency with which a monitor checks a Topaz component
  • the resolution for any errors SiteScope displays
  • a description of the monitor
  • Topaz Watchdog Monitors on Windows Platform

    ID

    Group

    Monitor Type

    Monitor Name

    Freq. [min.]
    (on OK)

    Freq. [min.]
    (on Error)

    Warning condition

    Error Condition

    Error Resolution

    1 Common System Monitors
    (see note 4)
    Ping Ping: "<TOPAZ HOST NAME>" 10 1 n/a Ping fails Check Connectivity
    2 Common System Monitors
    (see note 4)
    CPU Utilization
    (depends on Ping)
    CPU Utilization on "<TOPAZ HOST NAME>" 10 1 CPU can't be measured CPU > 70% Check which process causes this. If it's a Topaz process, restart Topaz
    3 Common System Monitors
    (see note 4)
    Disk Space: $disk$ 
    (depends on Ping)
    Disk Space: <TOPAZ DRIVE LETTER> on "<TOPAZ HOST NAME>" 60 5 Disk Space can't be measured Disk Space > 85% Clean the disk
    4 Common System Monitors
    (see note 4)
    Memory
    (depends on Ping)
    Memory on "<TOPAZ HOST NAME>" 10 1 Memory can't be measured Memory > 85% Check which process causes this. If it's a Topaz process, restart Topaz
    5 Common System Monitors
    (see note 4)
    Memory: Available Mbytes 
    (depends on Ping)
    Memory: Available MBytes on "<TOPAZ HOST NAME>" 10 1 Avail Mbytes can't be measured Avail Mbytes < 50Mb Check which process causes this. If it's a Topaz process, restart Topaz
    6 Common Application Monitors
    (see note 5)
    Service: TopazSupervisor
    (depends on Ping)
    Topaz Supervisor Service on "<TOPAZ HOST NAME>" 10 1 Service is down n/a Restart Topaz
    7 Common Application Monitors
    (see note 5)
    Process: TopazSupervisor 
    (depends on Ping)
    Topaz Supervisor Process on "<TOPAZ HOST NAME>" 10 1 n/a Process not running Restart Topaz
    8 Graph/ Admin/ Agent Servers Service: IIS Admin Service
    (depends on Ping)
    * Disabled by default
    Service: IIS Admin Service on "<TOPAZ HOST NAME>" 10 1 n/a Service is down Restart Service / Process
    9 Graph/ Admin/ Agent Servers IIS Server
    (depends on Service: IIS Admin Service)
    * Disabled by default
    IIS Server on "<TOPAZ HOST NAME>" 10 1 See SiteScope documentation for IIS Server monitor See SiteScope documentation for IIS Server monitor Restart Service / Process
    10 Graph/ Admin/ Agent Servers Process : Thread Count : inetinfo
    (depends on Service: IIS Admin Service)
    OR
    Process : Thread Count : apache
    (depends on Service: TopazSupervisor)
    Process Thread Count: inetinfo on "<TOPAZ HOST NAME>"
    OR
    Process Thread Count: apache on "<TOPAZ HOST NAME>"
    10 1 Process not running Thread Count > 160 If number of threads > 160, restart IIS
    11 Graph/ Admin/ Agent Servers Process : Working Set : inetinfo
    (depends on Service: IIS Admin Service)
    OR
    Process : Working Set : apache
    (depends on Service: TopazSupervisor)
    Process Working Set: inetinfo on "<TOPAZ HOST NAME>"
    OR
    Process Working Set: apache on "<TOPAZ HOST NAME>"
    10 1 Process not running Working Set > 200Mb If memory increases towards more than physical memory, restart IIS
    12 Graph/ Admin/ Agent Servers Process CPU: topaz
    (depends on Service: TopazSupervisor)
    Process CPU: topaz on "<TOPAZ HOST NAME>" 10 1 n/a CPU > 85% or Process not running Restart Topaz when higher than 85% during several minutes.
    13 Graph/ Admin/ Agent Servers Process Memory: topaz
    (depends on Service: TopazSupervisor)
    Process Memory: topaz on "<TOPAZ HOST NAME>" 10 1 Process Memory > 600Mb Process Memory > 1.5Gb Restart Topaz when more than 75% of available physical Memory
    14 Graph/Admin Servers URL: TopazVerify.jsp
    (depends on Service: TopazSupervisor)
    URL: http://<TOPAZ HOST NAME> / topaz / topazVerify.jsp 10 1 n/a URL not available Restart Topaz.
    If this does not help, restart IIS
    If this does not help, reboot the machine
    15 Admin Server Process CPU: aes_twd
    (depends on Service: TopazSupervisor)
    Topaz Watchdog Aggregated Event Engine (CPU) on "<TOPAZ HOST NAME>" 10 1 n/a CPU > 85 % or Process not running Kill this process (TopazSupervisor will
    run it again)
    16 Admin Server Process Memory: aes_twd
    (depends on Service: TopazSupervisor)
    Topaz Watchdog Aggregated Event Engine (Memory) on "<TOPAZ HOST NAME>" 10 1 Process Memory > 600Mb Process Memory > 1.5Gb Kill this process (TopazSupervisor will
    run it again)
    17 Admin Server URL: sample_dispatcher URL Test for Site Scope Configuration Changes in Topaz 10 1 n/a URL results contain the word "filed" Verify that SiteScope configuration changes are reflected in Topaz.
    18 Admin Server Log File: aims.ejb.log
    (depends on Service: TopazSupervisor)
    * search for regular expression: "exception"
    Check for exceptions in SiteScope integration logs 10 10 n/a URL result contains the word "exception" Verify that SiteScope configuration changes are reflected in Topaz.
    19 Agent Server URL: getTopazServerTime URL: http://<TOPAZ HOST NAME> / topaz / topaz_api / api_getservertime.asp 10 1 n/a URL not available Restart Topaz.
    If this does not help, restart IIS
    If this does not help, reboot the machine
    20 Agent Server Directory Monitor for folder: <TOPAZ MACHINE NAME> \<TOPAZ FOLDER> \ .persist_dir \ lnch_persistent \ <TOPAZ HOST NAME>_web_driver \ guarantee \ 131072_project_topaz \.msgs Check for too many files on Guaranteed Delivery Buffers folder 10 1 n/a File Count > 2 Agent Server cannot pass messages to the
    Topaz Bus. Check if the Topaz Bus process
    is up and is not stuck.
    21 Agent Server Process: LoaderTX
    (depends on Service: TopazSupervisor)
    Transaction Loader Process on "<TOPAZ HOST NAME>" 10 1 n/a Process not running Kill this process (TopazSupervisor will
    run it again)
    22 Agent Server Process: LoaderWT
    (depends on Service: TopazSupervisor)
    WebTrace Loader Process on "<TOPAZ HOST NAME>" 10 1 n/a Process not running Kill this process (TopazSupervisor will
    run it again)
    23 Agent Server Process: LoaderSM
    (depends on Service: TopazSupervisor)
    SiteScope Loader Process on "<TOPAZ HOST NAME>" 10 1 n/a Process not running Kill this process (TopazSupervisor will
    run it again)
    24 Agent Server Process: LoaderNMMT
    (depends on Service: TopazSupervisor)
    EMS Loader Process on "<TOPAZ HOST NAME>" 10 1 n/a Process not running Kill this process (TopazSupervisor will
    run it again)
    25 Agent Server Process: LoaderABR
    (depends on Service: TopazSupervisor)
    J2EE Breakdown Loader Process on "<TOPAZ HOST NAME>" 10 1 n/a Process not running Kill this process (TopazSupervisor will
    run it again)
    26 Agent Server Directory Monitor: Transaction Buffers
    (depends on Service: TopazSupervisor)
    Too many Transaction Buffers are waiting to load on "<TOPAZ HOST NAME>" 10 10 n/a Size of "TransactionBuffers" folder > 2Mb Check whether there's a large change in # of reports.
    You may need to add an agent server if there are no problems
    27 Agent Server Directory Monitor: Transaction Buffers
    (depends on Service: TopazSupervisor)
    Files-Size of Transaction Buffers Read on "<TOPAZ HOST NAME>" 10 10 Size of "Read" folder > 2Mb Size of "Read" folder > 4Mb Check whether there's a large change in # of reports.
    You may need to add an agent server if there are no problems
    28 Agent Server Directory Monitor: Transaction Buffers
    (depends on Service: TopazSupervisor)
    Files-Age Transcation Buffers Read on "<TOPAZ HOST NAME>" 10 10 "Read" folder last modified more than 4 minutes ago "Read" folder last modified more than 8 minutes ago Not getting reports. Check connectivity
    to Agent Server.
    29 Agent Server Directory Monitor: Transaction Buffers
    (depends on Service: TopazSupervisor)
    Files-Count Transaction Buffers Failures on "<TOPAZ HOST NAME>" 10 10 n/a File Count of "Fail" folder > 1 Reprocess Failures
    30 Agent Server Directory Monitor: Transaction Buffers
    (depends on Service: TopazSupervisor)
    Files-Count Transaction Buffers Retry on "<TOPAZ HOST NAME>" 10 10 File Count of "Retry" folder > 30 File Count of "Retry" folder > 50 Indicates a problem is loading buffers to database.
    If problem is not resolved automatically, buffer files
    will be moved to the "Fail" folder and it will need
    to be reprocessed.
    31 Agent Server Directory Monitor: Transaction Buffers
    (depends on Service: TopazSupervisor)
    Files-Size of Transaction Buffers Retry on "<TOPAZ HOST NAME>" 10 10 File Size of "retry" folder > 1.5Mb File Size of "Retry" folder > 2Mb Indicates a problem is loading buffers to database.
    If problem is not resolved automatically, buffer files
    will be moved to the "Fail" folder and it will need
    to be reprocessed.
    32 Agent Server Directory Monitor: WebTrace Buffers
    (depends on Service: TopazSupervisor)
    Too many WebTrace Buffers are waiting to load on "<TOPAZ HOST NAME>" 10 10 n/a Size of "TransactionBuffers" folder > 2Mb Check whether there's a large change in # of reports.
    You may need to add an agent server if there are no problems
    33 Agent Server Directory Monitor: WebTrace Buffers
    (depends on Service: TopazSupervisor)
    Files-Size of WebTrace Buffers Read on "<TOPAZ HOST NAME>" 10 10 Size of "Read" folder > 2Mb Size of "Read" folder > 4Mb Check whether there's a large change in # of reports.
    You may need to add an agent server if there are no problems
    34 Agent Server Directory Monitor: WebTrace Buffers
    (depends on Service: TopazSupervisor)
    Files-Age WebTrace Buffers Read on "<TOPAZ HOST NAME>" 10 10 "Read" folder last modified more than 4 minutes ago "Read" folder last modified more than 8 minutes ago Not getting reports. Check connectivity
    to Agent Server.
    35 Agent Server Directory Monitor: WebTrace Buffers
    (depends on Service: TopazSupervisor)
    Files-Count WebTrace Buffers Failures on "<TOPAZ HOST NAME>" 10 10 n/a File Count of "Fail" folder > 1 Reprocess Failures
    36 Agent Server Directory Monitor: WebTrace Buffers
    (depends on Service: TopazSupervisor)
    Files-Count WebTrace Buffers Retry on "<TOPAZ HOST NAME>" 10 10 File Count of "Retry" folder > 30 File Count of "Retry" folder > 50 Indicates a problem is loading buffers to database.
    If problem is not resolved automatically, buffer files
    will be moved to the "Fail" folder and it will need
    to be reprocessed.
    37 Agent Server Directory Monitor: WebTrace Buffers
    (depends on Service: TopazSupervisor)
    Files-Size of WebTrace Buffers Retry on "<TOPAZ HOST NAME>" 10 10 File Size of "retry" folder > 1.5Mb File Size of "Retry" folder > 2Mb Indicates a problem is loading buffers to database.
    If problem is not resolved automatically, buffer files
    will be moved to the "Fail" folder and it will need
    to be reprocessed.
    38 Agent Server Directory Monitor: SiteScope Buffers
    (depends on Service: TopazSupervisor)
    Too many SiteScope Buffers are waiting to load on "<TOPAZ HOST NAME>" 10 10 n/a Size of "TransactionBuffers" folder > 2Mb Check whether there's a large change in # of reports.
    You may need to add an agent server if there are no problems
    39 Agent Server Directory Monitor: SiteScope Buffers
    (depends on Service: TopazSupervisor)
    Files-Size of SiteScope Buffers Read on "<TOPAZ HOST NAME>" 10 10 Size of "Read" folder > 2Mb Size of "Read" folder > 4Mb Check whether there's a large change in # of reports.
    You may need to add an agent server if there are no problems
    40 Agent Server Directory Monitor: SiteScope Buffers
    (depends on Service: TopazSupervisor)
    Files-Age SiteScope Buffers Read on "<TOPAZ HOST NAME>" 10 10 "Read" folder last modified more than 4 minutes ago "Read" folder last modified more than 8 minutes ago Not getting reports. Check connectivity
    to Agent Server.
    41 Agent Server Directory Monitor: SiteScope Buffers
    (depends on Service: TopazSupervisor)
    Files-Count SiteScope Buffers Failures on "<TOPAZ HOST NAME>" 10 10 n/a File Count of "Fail" folder > 1 Reprocess Failures
    42 Agent Server Directory Monitor: SiteScope Buffers
    (depends on Service: TopazSupervisor)
    Files-Count SiteScope Buffers Retry on "<TOPAZ HOST NAME>" 10 10 File Count of "Retry" folder > 30 File Count of "Retry" folder > 50 Indicates a problem is loading buffers to database.
    If problem is not resolved automatically, buffer files
    will be moved to the "Fail" folder and it will need
    to be reprocessed.
    43 Agent Server Directory Monitor: SiteScope Buffers
    (depends on Service: TopazSupervisor)
    Files-Size of SiteScope Buffers Retry on "<TOPAZ HOST NAME>" 10 10 File Size of "retry" folder > 1.5Mb File Size of "Retry" folder > 2Mb Indicates a problem is loading buffers to database.
    If problem is not resolved automatically, buffer files
    will be moved to the "Fail" folder and it will need
    to be reprocessed.
    44 Agent Server Directory Monitor: EMS Buffers
    (depends on Service: TopazSupervisor)
    * Disabled by default
    Too many EMS Buffers are waiting to load on "<TOPAZ HOST NAME>" 10 10 n/a Size of "TransactionBuffers" folder > 2Mb Check whether there's a large change in # of reports.
    You may need to add an agent server if there are no problems
    45 Agent Server Directory Monitor: EMS Buffers
    (depends on Service: TopazSupervisor)
    * Disabled by default
    Files-Size of EMS Buffers Read on "<TOPAZ HOST NAME>" 10 10 Size of "Read" folder > 2Mb Size of "Read" folder > 4Mb Check whether there's a large change in # of reports.
    You may need to add an agent server if there are no problems
    46 Agent Server Directory Monitor: EMS Buffers
    (depends on Service: TopazSupervisor)
    * Disabled by default
    Files-Age EMS Buffers Read on "<TOPAZ HOST NAME>" 10 10 "Read" folder last modified more than 4 minutes ago "Read" folder last modified more than 8 minutes ago Not getting reports. Check connectivity
    to Agent Server.
    47 Agent Server Directory Monitor: EMS Buffers
    (depends on Service: TopazSupervisor)
    * Disabled by default
    Files-Count EMS Buffers Failures on "<TOPAZ HOST NAME>" 10 10 n/a File Count of "Fail" folder > 1 Reprocess Failures
    48 Agent Server Directory Monitor: EMS Buffers
    (depends on Service: TopazSupervisor)
    * Disabled by default
    Files-Count EMS Buffers Retry on "<TOPAZ HOST NAME>" 10 10 File Count of "Retry" folder > 30 File Count of "Retry" folder > 50 Indicates a problem is loading buffers to database.
    If problem is not resolved automatically, buffer files
    will be moved to the "Fail" folder and it will need
    to be reprocessed.
    49 Agent Server Directory Monitor: EMS Buffers
    (depends on Service: TopazSupervisor)
    * Disabled by default
    Files-Size of EMS Buffers Retry on "<TOPAZ HOST NAME>" 10 10 File Size of "retry" folder > 1.5Mb File Size of "Retry" folder > 2Mb Indicates a problem is loading buffers to database.
    If problem is not resolved automatically, buffer files
    will be moved to the "Fail" folder and it will need
    to be reprocessed.
    50 Agent Server Directory Monitor: J2EE Buffers
    (depends on Service: TopazSupervisor)
    * Disabled by default
    Too many J2EE Buffers are waiting to load on "<TOPAZ HOST NAME>" 10 10 n/a Size of "TransactionBuffers" folder > 2Mb Check whether there's a large change in # of reports.
    You may need to add an agent server if there are no problems
    51 Agent Server Directory Monitor: J2EE Buffers
    (depends on Service: TopazSupervisor)
    * Disabled by default
    Files-Size of J2EE Buffers Read on "<TOPAZ HOST NAME>" 10 10 Size of "Read" folder > 2Mb Size of "Read" folder > 4Mb Check whether there's a large change in # of reports.
    You may need to add an agent server if there are no problems
    52 Agent Server Directory Monitor: J2EE Buffers
    (depends on Service: TopazSupervisor)
    * Disabled by default
    Files-Age J2EE Buffers Read on "<TOPAZ HOST NAME>" 10 10 "Read" folder last modified more than 4 minutes ago "Read" folder last modified more than 8 minutes ago Not getting reports. Check connectivity
    to Agent Server.
    53 Agent Server Directory Monitor: J2EE Buffers
    (depends on Service: TopazSupervisor)
    * Disabled by default
    Files-Count J2EE Buffers Failures on "<TOPAZ HOST NAME>" 10 10 n/a File Count of "Fail" folder > 1 Reprocess Failures
    54 Agent Server Directory Monitor: J2EE Buffers
    (depends on Service: TopazSupervisor)
    * Disabled by default
    Files-Count J2EE Buffers Retry on "<TOPAZ HOST NAME>" 10 10 File Count of "Retry" folder > 30 File Count of "Retry" folder > 50 Indicates a problem is loading buffers to database.
    If problem is not resolved automatically, buffer files
    will be moved to the "Fail" folder and it will need
    to be reprocessed.
    55 Agent Server Directory Monitor: J2EE Buffers
    (depends on Service: TopazSupervisor)
    * Disabled by default
    Files-Size of J2EE Buffers Retry on "<TOPAZ HOST NAME>" 10 10 File Size of "retry" folder > 1.5Mb File Size of "Retry" folder > 2Mb Indicates a problem is loading buffers to database.
    If problem is not resolved automatically, buffer files
    will be moved to the "Fail" folder and it will need
    to be reprocessed.
    56 Alert Server Process CPU: AlertEngineMdrv
    (depends on Service: TopazSupervisor)
    Alert Engine Process (CPU) on "<TOPAZ HOST NAME>" 10 1 n/a CPU > 85% or Process not running Kill this process (TopazSupervisor will
    run it again)
    57 Alert Server Process Memory: AlertEngineMdrv
    (depends on Service: TopazSupervisor)
    Alert Engine Process (Memory) on "<TOPAZ HOST NAME>" 10 1 Process Memory > 600Mb Process Memory > 1.5Gb Kill this process (TopazSupervisor will
    run it again)
    58 Scheduled Tasks Server Process CPU: EmailReportsMdr
    (depends on Service: TopazSupervisor)
    Scheduled Reports Engine Process (CPU) on "<TOPAZ HOST NAME>" 10 1 n/a CPU > 85% or Process not running Kill this process (TopazSupervisor will
    run it again)
    59 Scheduled Tasks Server Process Memory: EmailReportsMdr
    (depends on Service: TopazSupervisor)
    Scheduled Reports Engine Process (Memory) on "<TOPAZ HOST NAME>" 10 1 Process Memory > 600Mb Process Memory > 1.5Gb Kill this process (TopazSupervisor will
    run it again)
    60 Scheduled Tasks Server Process CPU: topaz_pm
    (depends on Service: TopazSupervisor)
    Topaz Partition Manager (CPU) on "<TOPAZ HOST NAME>" 10 1 n/a CPU > 85 % or Process not running Kill this process (TopazSupervisor will
    run it again)
    61 Scheduled Tasks Server Process Memory: topaz_pm
    (depends on Service: TopazSupervisor)
    Topaz Partition Manager (Memory) on "<TOPAZ HOST NAME>" 10 1 Process Memory > 600Mb Process Memory > 1.5Gb Kill this process (TopazSupervisor will
    run it again)
    62 EMS Probe Process CPU: TopazEmsProbe
    (depends on Service: TopazSupervisor)
    Topaz EMS Probe (CPU) on "<TOPAZ HOST NAME>" 10 1 n/a CPU > 85% or Process not running Kill this process (TopazSupervisor will
    run it again)
    63 EMS Probe Process Memory: TopazEmsProbe
    (depends on Service: TopazSupervisor)
    Topaz EMS Probe (Memory) on "<TOPAZ HOST NAME>" 10 1 Process Memory > 600Mb Process Memory > 1.5Gb Kill this process (TopazSupervisor will
    run it again)
    64 Topaz Bus Server Process CPU: dispatcher
    (depends on Service: TopazSupervisor)
    Topaz Bus Process (CPU) on "<TOPAZ HOST NAME>" 10 1 n/a CPU > 85% or Process not running Kill this process (TopazSupervisor will
    run it again)
    65 Topaz Bus Server Process Memory: dispatcher
    (depends on Service: TopazSupervisor)
    Topaz Bus Process (Memory) on "<TOPAZ HOST NAME>" 10 1 Process Memory > 600Mb Process Memory > 1.5Gb Kill this process (TopazSupervisor will
    run it again)
    66 Topaz Bus Server Process Working Set: dispatcher
    (depends on Service: TopazSupervisor)
    Process Working Set: dispatcher on "<TOPAZ HOST NAME>" 10 1 Process not running Working Set > 150Mb Kill this process (TopazSupervisor will
    run it again)
    67 Topaz Bus Server Process Thread Count: dispatcher
    (depends on Service: TopazSupervisor)
    Process Thread Count: dispatcher on "<TOPAZ HOST NAME>" 10 1 Process not running Thread Count > 10 Kill this process (TopazSupervisor will
    run it again)
    68 Topaz Bus Server Log File Monitor: <TOPAZ MACHINE NAME> \<TOPAZ FOLDER> \log\dispatcher_log.txt
    * search for regular expression: "error"
    Log File: <TOPAZ MACHINE NAME> \<TOPAZ FOLDER> \log\dispatcher_log.txt 10 10 "error" appears in log file more than 1 time "error" appears in log file more than 1000 times Error indicates one of two things.
    1. Topaz Bus cannot communicate with the Topaz Admin Server. This error is identified by the string "ERROR   [TMC]". Check that the Topaz Admin Server is up and running.
    2. Topaz Bus cannot translate samples which are being reported by some Business Process Monitors or SiteScopes.
    It can occur when profile configuration data is deleted, but these kind of errors should only take few minutes. If this problem persists, verify the data collection agents report valid data.
    69 Topaz Bus Server Directory Monitor: <TOPAZ MACHINE NAME> \<TOPAZ FOLDER> \.persist_dir \lnch_persistent \<TOPAZ HOST NAME>_project_topaz \guarantee
    (depends on Service: TopazSupervisor)
    Directory: <TOPAZ MACHINE NAME> \<TOPAZ FOLDER> \.persist_dir \lnch_persistent \<TOPAZ HOST NAME>_project_topaz \guarantee 10 10 Size of "guarantee" folder > 30Mb Size of "guarantee" folder > 50Mb Indicates the Topaz Bus cannot pass messages to the alert server.
    Check the Alert Server process is up and is not stuck.
    70 Topaz Bus Server Directory Monitor: <TOPAZ MACHINE NAME> \<TOPAZ FOLDER> \.persist_dir \dc_persist_queue
    (depends on Service: TopazSupervisor)
    Directory: <TOPAZ MACHINE NAME> \<TOPAZ FOLDER> \.persist_dir \dc_persist_queue 10 10 File Count of "dc_persist_queue" (including sub folders) folder > 5 File Count of "dc_persist_queue" (including sub folders) folder > 15 Indicates the Topaz Bus cannot translate samples which are being reported by some Business Process Monitors or SiteScopes.
    It can occur when profile configuration data is deleted, but these kind of errors should only take few minutes.
    If this problem persists, verify the data collection agents report valid data.
    71 Business Process Monitor/
    SiteScope/Client Monitor
    Topaz Host Last Connection Time "<TOPAZ HOST NAME>" Last Connection Time 10 5 See SiteScope documentation for Topaz Host Last Connection Time monitor See SiteScope documentation for Topaz Host Last Connection Time monitor Verify this data collection agent is up and running.
    72 Business Process Monitor/
    SiteScope / Client Monitor
    Topaz Host Last Reported Data Time <TOPAZ HOST NAME> Last Reported Data Time 10 5 See SiteScope documentation for Topaz Host Last Data Time monitor See SiteScope documentation for Topaz Host Last Data Time monitor Verify this data collection agent is up and running.
    73 SiteScope SiteScope Health Status SiteScope Health Status on "<TOPAZ HOST NAME>" 10 1 SiteScope Health monitoring indicates warning status SiteScope Health monitoring indicates error status Check the Health page of the relevant SiteScope
    74 Database Server: SQL Service: MSSQLSERVER
    (depends on Ping)
    * Disabled by default
    MSSQLSERVER Service on "<TOPAZ HOST NAME>" 10 1 Service is down n/a Restart MSSQLSERVER Service
    75 Database Server: SQL NT Event Viewer Errors NT Event Viewer Application Log 10 10 n/a New Error appear in the Application Log viewer Examine the event viewer for errors and see if these errors are related to the MSSQLSERVER service.
    76 Database Server: Oracle Process: Oracle
    (depends on Ping)
    * Disabled by default
    Oracle Service on "<TOPAZ HOST NAME>" 10 1 n/a Service is down Restart Service / Process
    77 Database Server: Oracle Process: TNS Listener
    (depends on Ping)
    * Disabled by default
    Oracle TNS Listener Service on "<TOPAZ HOST NAME>" 10 10 n/a Service is down Restart Service / Process

    Topaz Watchdog Monitors on UNIX Platform

    ID

    Group

    Monitor Type

    Monitor Name

    Freq. [min.]
    (on OK)

    Freq. [min.]
    (on Error)

    Warning condition

    Error Condition

    Error Resolution

    1 Common System Monitors
    (see note 4)
    Ping Ping: "<TOPAZ HOST NAME>" 10 1 n/a Ping fails Check Connectivity
    2 Common System Monitors
    (see note 4)
    CPU Utilization  (depends on Ping)
    CPU Utilization on "<TOPAZ HOST NAME>" 10 1 CPU can't be measured CPU > 70% Check which process causes this. If it's a Topaz process, restart Topaz
    3 Common System Monitors
    (see note 4)
    Disk Space: $disk$  (depends on Ping)
    Disk Space: <TOPAZ DRIVE LETTER> on "<TOPAZ HOST NAME>" 60 5 Disk Space can't be measured Disk Space > 85% Clean the disk
    4 Common System Monitors
    (see note 4)
    Memory (depends on Ping) Memory on "<TOPAZ HOST NAME>" 10 1 Memory can't be measured Memory > 85% Check which process causes this. If it's a Topaz process, restart Topaz
    5 Common Application Monitors
    (see note 5)
    Service: TopazSupervisor (depends on Ping) n/a 10 1 Service is down n/a Restart Topaz
    6 Graph/Admin/Agent Servers Apache Web Server (depends on Service: IIS Admin Service)
    * Disabled by default
    Apache Web Server on "<TOPAZ HOST NAME>" 10 1 See SiteScope documentation for IIS Server monitor See SiteScope documentation for IIS Server monitor Restart Service / Process
    7 Graph/Admin/Agent Servers Web Server Process Web Server Process on "<TOPAZ HOST NAME>" 10 1 Process not running Thread Count > 160 If number of threads > 160, restart IIS
    8 Graph/Admin/Agent Servers Process: topaz Topaz Process on "<TOPAZ HOST NAME>" 10 1 n/a CPU > 85% or Process not running Restart Topaz when higher than 85% during several minutes.
    9 Graph/Admin Servers URL: TopazVerify.jsp (depends on Service: TopazSupervisor) URL: http://<TOPAZ HOST NAME> /topaz /TopazVerify.jsp 10 1 n/a URL not available Restart Topaz.
    If this does not help, restart IIS
    If this does not help, reboot the machine
    10 Admin Server Process: aes_twd (depends on Service: TopazSupervisor) Topaz Watchdog Aggregated Event Engine on "<TOPAZ HOST NAME>" 10 1 n/a CPU > 85 % or Process not running Kill this process (TopazSupervisor will
    run it again)
    11 Admin Server URL: sample_dispatcher URL Test for Site Scope Configuration Changes in Topaz 10 1 n/a URL result contain the word "filed" Verify that SiteScope configuration changes are reflected in Topaz.
    12 Admin Server Log File: aims.ejb.log (depends on Service: TopazSupervisor)
    * search for regular expression: "exception"
    Check for exceptions in SiteScope integration logs 10 10 n/a URL result contains the word "exception" Verify that SiteScope configuration changes are reflected in Topaz.
    13 Agent Server URL: getTopazServerTime URL: http://<TOPAZ HOST NAME> / topaz / topaz_api / api_getservertime.asp 10 1 n/a URL not available Restart Topaz.
    If this does not help, restart IIS
    If this does not help, reboot the machine
    14 Agent Server Directory Monitor for folder: <TOPAZ MACHINE NAME> \<TOPAZ FOLDER> \.persist_dir \lnch_persistent \<TOPAZ HOST NAME>_web_driver \guarantee \131072_project_topaz \.msgs Check for too many files on Guaranteed Delivery Buffers folder 10 1 n/a File Count > 2 Agent Server cannot pass messages to the
    Topaz Bus. Check if the Topaz Bus process
    is up and is not stuck.
    15 Agent Server Process: LoaderTX (depends on Service: TopazSupervisor) Transaction Loader Process on "<TOPAZ HOST NAME>" 10 1 n/a Process not running Kill this process (TopazSupervisor will
    run it again)
    16 Agent Server Process: LoaderWT (depends on Service: TopazSupervisor) WebTrace Loader Process on "<TOPAZ HOST NAME>" 10 1 n/a Process not running Kill this process (TopazSupervisor will
    run it again)
    17 Agent Server Process: LoaderSM (depends on Service: TopazSupervisor) SiteScope Loader Process on "<TOPAZ HOST NAME>" 10 1 n/a Process not running Kill this process (TopazSupervisor will
    run it again)
    18 Agent Server Process: LoaderNMMT (depends on Service: TopazSupervisor) EMS Loader Process on "<TOPAZ HOST NAME>" 10 1 n/a Process not running Kill this process (TopazSupervisor will
    run it again)
    19 Agent Server Process: LoaderABR (depends on Service: TopazSupervisor) J2EE Breakdown Loader Process on "<TOPAZ HOST NAME>" 10 1 n/a Process not running Kill this process (TopazSupervisor will
    run it again)
    20 Alert Server Process: AlertEngineMdrv (depends on Service: TopazSupervisor) Alert Engine Process on "<TOPAZ HOST NAME>" 10 1 n/a CPU > 85% or Process not running Kill this process (TopazSupervisor will
    run it again)
    21 Scheduled Tasks Server Process: EmailReportsMdrv (depends on Service: TopazSupervisor) Scheduled Reports Engine Process on "<TOPAZ HOST NAME>" 10 1 n/a CPU > 85% or Process not running Kill this process (TopazSupervisor will
    run it again)
    22 Scheduled Tasks Server Process: topaz_pm (depends on Service: TopazSupervisor) Topaz Partition Manager Process on "<TOPAZ HOST NAME>" 10 1 n/a CPU > 85% or Process not running Kill this process (TopazSupervisor will
    run it again)
    23 Topaz Bus Server Process: dispatcher (depends on Service: TopazSupervisor) Topaz Bus Process on "<TOPAZ HOST NAME>" 10 1 n/a CPU > 85% or Process not running Kill this process (TopazSupervisor will
    run it again)
    24 Topaz Bus Server Log File: /opt /Topaz /log /dispatcher_log.txt
    * search for regular expression: "error"
    Log File: /opt /Topaz /log /dispatcher_log.txt 10 10 "error" appears in log file more than 1 time "error" appears in log file more than 1000 times Error indicates one of two things.
    1. Topaz Bus cannot communicate with the Topaz Admin Server. This error is identified by the string "ERROR   [TMC]". Check that the Topaz Admin Server is up and running.
    2. Topaz Bus cannot translate samples which are being reported by some Business Process Monitors or SiteScopes.
    It can occur when profile configuration data is deleted, but these kind of errors should only take few minutes.
    If this problem persists, verify the data collection agents report valid data.
    25 Business Process Monitor/
    SiteScope/Client Monitor
    Topaz Host Last Connection Time "<TOPAZ HOST NAME>" Last Connection Time 10 5 See SiteScope documentation for Topaz Host Last Connection Time monitor See SiteScope documentation for Topaz Host Last Connection Time monitor Verify this data collection agent is up and running.
    26 Business Process Monitor/
    SiteScope/Client Monitor
    Topaz Host Last Reported Data Time <TOPAZ HOST NAME> Last Reported Data Time 10 5 See SiteScope documentation for Topaz Host Last Data Time monitor See SiteScope documentation for Topaz Host Last Data Time monitor Verify this data collection agent is up and running.
    27 SiteScope SiteScope Health Status SiteScope Health Status on "<TOPAZ HOST NAME>" 10 1 SiteScope Health monitoring indicates warning status SiteScope Health monitoring indicates error status Check the Health page of the relevant SiteScope
    28 Database Server: Oracle Process: Oracle Checkpoint Oracle Checkpoint Process on "<TOPAZ HOST NAME>" for SID <TOPAZ ORACLE SID> 10 1 n/a Process not running Restart Service / Process
    29 Database Server: Oracle Process: Oracle Process Monitor Oracle Process Monitor on "<TOPAZ HOST NAME>" for SID <TOPAZ ORACLE SID> 10 10 n/a Process not running Restart Service / Process
    30 Database Server: Oracle Process: Oracle Service Monitor Oracle Service Monitor on "<TOPAZ HOST NAME>" for SID <TOPAZ ORACLE SID> 10 1 n/a Process not running Restart Service / Process
    31 Database Server: Oracle Process: Oracle Database Writer Oracle Database Writer Process on "<TOPAZ HOST NAME>" for SID <TOPAZ ORACLE SID> 10 1 n/a Process not running Restart Service / Process
    32 Database Server: Oracle Process: Oracle Log Writer Oracle Log Writer Processes on "<TOPAZ HOST NAME>" for SID <TOPAZ ORACLE SID> 10 10 n/a Process not running Restart Service / Process

    Notes:

    1. For some monitors the note "(depends on )” is specified. This means that if the will present an error status, the specified monitor will too. This can be prevented by configuring a dependency in SiteScope between the two monitors (The Topaz Watchdog does not configure the dependency automatically).
    2. The monitors which should be dependent on the Topaz Supervisor service are for Topaz on Windows. The equivalent monitors for Topaz on Unix should be dependent on the Topaz Supervisor process.
    3. The Topaz Watchdog does not configure alerts for any of the monitors.
    4. Monitors which are configured on every Topaz machine including Topaz databases.

      Note: it does not include data collection agents (for example, Business Process Monitors, SiteScopes and Client Monitors).

      These should be configured for the following Topaz services:

      • Admin Server
      • Graph Server
      • Agent Server
      • Topaz Bus Server
      • Alert Server
      • Scheduled Tasks Server
      • Aggregation Server
      • Topaz Database Server(s)
      • EMS Probe
    5. Monitors which are configured on every Topaz machine besides the data collection machines (for example, Business Process Monitors, SiteScopes and Client Monitors) and besides Topaz databases.

      Should be configured for the following Topaz services:

      • Admin Server
      • Graph Server
      • Agent Server
      • Topaz Bus Server
      • Alert Server
      • Scheduled Tasks Server
      • Aggregation Server
      • EMS Probe
    6. In the Topaz Watchdog monitor set templates the variable $TOPAZ_DRIVE_LETTER$ is replaced automatically with all the actual drives on the machine by the Topaz Watchdog.
    7. Loader Problem - Indicates a problem with the loaders transferring data to the Topaz Management database. If the problem is not resolved, Topaz moves the buffer files to the Retry folder for several retries. If the problem is still not resolved, Topaz moves the files to the Fail subfolder. For details on loaders and on recovering the data stored in the Fail subfolder, refer to the Topaz Administration Guide.

    Monitor Templates

    Monitors check the Topaz components, according to the definitions in the *.mset files, located in the templates.sets.TopazWatchdog folder. These files are compatible with SiteScope monitor sets. For details on defining monitor sets, see Monitor Sets.

    The Topaz Watchdog monitors can be customized either by changing one of the monitor sets or by adding monitor sets, and assigning them to a specific Topaz service.

    The file which maps monitor sets for each Topaz service (for example, Topaz Alert Server, Topaz Graph Server, and so forth) is named defaultTopazWatchdogMonitorSets.config, and is located in the SiteScope\classes folder. This file is copied to the SiteScope\templates.sets.topazWatchdog folder with the name topazWatchdogMonitorSets.config the first time that SiteScope starts. (It is not copied if topazWatchdogMonitorSets.config already resides in the SiteScope\templates.sets.topazWatchdog folder).

    The following is an example of an entry in this file:

    _descriptionForUi=Alert Server

    _topazHostTypeMask=128

    _monitorSets_Windows=Common.mset,TopazSupervisor.mset,AlertServer.mset

    _monitorSets_Unix=CommonUNIX.mset,TopazSupervisorUNIX.mset,AlertServerUNIX.mset

    where:

    _descriptionForUi is a description of the specific Topaz service

    topazHostTypeMask is an internal ID of the specific Topaz service. You must not change this value.

    _monitorSets_Windows is a comma separated list of Monitor Set files which are associated with this Topaz service on a Windows platform (the specified monitor sets must reside in the SiteScope\templates.sets.topazWatchdog folder).

    _monitorSets_UNIX is a comma separated list of monitor set files that are associated with this Topaz service on a UNIX platform (the specified monitor sets must reside in the SiteScope\templates.sets.topazWatchdog folder).


    Important: Before making any direct modifications, make a complete backup of the SiteScope folders. After making any modifications, test that your monitors, alerts, and reports are functioning correctly before returning them to a production environment.


    Configuring Monitor Set Templates

    You can replicate monitors across multiple servers or locations using the SiteScope monitor set functionality. You work with Topaz Watchdog monitor templates in the same way as you work with SiteScope monitor set templates.

    To enable working with Topaz Watchdog monitor sets:

    1. In the SiteScope\templates.sets.topazWatchdog folder, choose the templates with which you want to work. Monitor template files have the .mset extension.

      Each template includes a list of variables, their descriptions and values, and the monitors that are configured by the template. For example, the Topaz Graph Server template includes the variable $TOPAZ_HOST_NAME$. Its description is Server_to_monitor (the underscores do not appear in the SiteScope page.

      For details on defining monitor templates, see Working with Monitor Set Templates

      .
    2. Copy the template .mset files to the SiteScope\templates.sets folder.
    3. To use the templates, click the Topaz Watchdog group in the SiteScope main view, then click Add Monitor Set. The Add Monitor Set to Group page opens. Select the monitor template that you want to configure, and click Configure.

      For details on defining monitor sets, see Monitor Sets.

    4. Click Submit to save the new monitor set.

    Note: Topaz Watchdog templates are consistent with the Best Practices documents we have regarding How to Monitor Topaz.

    Topaz Watchdog Troubleshooting

    Problem

    Solution(s)

    In a Windows 2000 installation, drive letters are replaced by HarddiskVolume1, HarddiskVolume2, and so forth.

    For the solution, refer to: http://support.microsoft.com/support/kb/ articles/Q274/3/11.asp. (Microsoft Knowledge Base article number Q274311)

    In a Windows 2000 or 2003 installation, no disks are monitored by SiteScope.

    This occurs when a Windows 2000 installation has disk monitoring disabled by default. To enable disk monitoring, open a command line window, and enter the command diskperf -y After restarting the computer, the disks are added to SiteScope.

    In the case that the Topaz Admin Server is installed on Apache, on a machine that automatically runs Microsoft IIS, SiteScope may not measure the correct process, and performance may be impeded.

    Make sure that only the Web server that Topaz uses is running. Some systems, for example, Windows 2000, have IIS automatically installed and running. Therefore, if Topaz is installed on Apache, on a Windows 2000 server, both IIS and Apache will be running, which will slow down performance and scalability.

    Stop the IIS service and disable it from running automatically. This will ensure that the machine runs faster, and that SiteScope measures the correct process.

    If a network drive to a server has been mapped with non-administrative credentials, you cannot open a new authenticated network connection with administrative permissions, or change the existing one.

    This is a Windows networking limitation. The solution is not to disconnect the mapped drive, as the connection will remain alive. The problem also occurs if you run Terminal services to a machine or just explore it with Explorer. It can be destroyed only after a reboot of the SiteScope machine. If you are still not able to monitor a server even if you know the administrator user name and password, you should run netstat -a, to see if there is an active (established) connection to that machine.

    SiteScope cannot connect to a remote NT server.

    If a connection cannot be made, check the user access permissions that have been granted to the SiteScope account on the remote server. SiteScope requires remote registry permissions to be able to monitor server statistics. Try connecting to the remote machine using Perfmon.

    SiteScope is not allowed to use the permissions of a full administrator account.

    This may be for security reasons. SiteScope can be granted restricted monitoring access by editing certain Registry Keys. See the Enabling Non-Admin Users to Remotely Monitor with PERFMON support note on the Microsoft support site for more information.

    SiteScope cannot monitor a stand-alone server, or one that is part of a domain already visible to the SiteScope server.

    Try entering the machine name followed by a slash and the server name in the Login entry box. For example, type cats/administrator

    When monitoring Windows 2000 servers from SiteScope running on Windows NT 4, problems have arisen.

    Check whether the problem involves incompatibility of the DLLs used by the operating system to communicate between the servers.

    SiteScope does not monitor the load balancer machine.

    SiteScope cannot display data about load balancer machines, because they are not Topaz hosts.

    You can set up a SiteScope monitor to monitor the load balancer machines, and to send an alert when the load balancer is in error. For details, refer to the SiteScope documentation.

    Monitors are returning errors on specific machines.

    • Check whether the machine has been added to the Remote NT Servers or Remote UNIX Servers list. For details, see "Defining Permissions for NT Servers" and "Defining Permissions for UNIX Servers".
    • Check whether remoteRegistryService is running on the Topaz machine.

      A Microsoft Windows problem causes the registry service RemoteRegistryService to hold too many handles that are not released. Every time SiteScope logs in to the Topaz server, another handle is added. To see how many handles are being held by a process, display the Processes tab in the Windows Task Manager, and look for the Handles column. (If the Handles column does not appear, choose View > Select Columns, and select Handles.)

      To release handles, restart the service, either manually:
      net stop "remote registry"
      net start "remote registry"
      or with an script that is set to run, for example, every six hours (but can depend on the specific setup), by using Windows Task Scheduler ("Scheduled Tasks" in Explorer).

      Example of the script:
      at 12:00 /every:Su,M,T,W,TH,F,S net stop RemoteRegistry at 12:01 /every:Su,M,T,W,TH,F,S net start RemoteRegistry at 18:00 /every:Su,M,T,W,TH,F,S net stop RemoteRegistry at 18:01 /every:Su,M,T,W,TH,F,S net start RemoteRegistry at 00:00 /every:Su,M,T,W,TH,F,S net stop RemoteRegistry at 00:01 /every:Su,M,T,W,TH,F,S net start RemoteRegistry at 06:00 /every:Su,M,T,W,TH,F,S net stop RemoteRegistry at 06:01 /every:Su,M,T,W,TH,F,S net start RemoteRegistry

    • Sometimes counters for perfmon objects are disabled on the Topaz machine (for example, for the Process object). To monitor such a machine, you must enable these objects. You can do this either with the help of the Windows Resources Kit, or via the registry.

      With the Windows Resources Kit, in the Extensible Counter Kit dialog box, check that the Performance Counters Enabled check box is selected, for the PerfProc and PerfOS objects. Using the registry, check that the following files are set to 0:

      HKEY_LOCAL_MACHINE\SYSTEM\
      CurrentControlSet\Services\PerfProc\
      Performance]"Disable Performance Counters"= dword:00000000

      and
      HKEY_LOCAL_MACHINE\SYSTEM\
      CurrentControlSet\Services\PerfOS\
      Performance]"Disable Performance Counters"= dword:00000000

    SiteScope sends false alerts.

    This is an integration issue, with two causes:

    • SiteScope has been disconnected from Topaz, and alerts are sent several minutes after the disconnection. To prevent this, clear the check box next to the SiteScope machines, or use the Remove Agent Service page, accessed from the Administration menu in the Topaz Reports and Diagnostics application, to remove the SiteScope no longer in use.
    • SiteScope has been moved from one profile to another, and alerts are sent during the period when SiteScope is in downtime. To prevent false alerts, delete the old SiteScope profile (preferred), or disable the alerts for this host.

    The same EMS host appears twice in the SiteScope main panel.

    During EMS probe definition, the probe's location is registered by its relationship to the location of the Business Process Monitor host that is running the probe. If the definition of this Business Process Monitor is changed, and if its new location is different from the previous one, the new registration and location for the probe is added to the Host table in the Topaz Management Database. The result is that the table includes two probe hosts with the same name but with different locations. This causes SiteScope to display the probe host twice in the main panel.

    To remove the extra Probe host instance, see the procedure in "Displaying Topaz Machines"