|
|
|
|
|
|
|
|
|
ID |
Group |
Monitor Type |
Monitor Name |
Freq.
[min.]
(on OK) |
Freq.
[min.]
(on Error) |
Warning
condition |
Error
Condition |
Error
Resolution |
1 |
Common
System Monitors
(see note 4) |
Ping |
Ping:
"<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
Ping fails |
Check
Connectivity |
2 |
Common System Monitors
(see note 4) |
CPU Utilization (depends
on Ping)
|
CPU Utilization on "<TOPAZ HOST NAME>" |
10 |
1 |
CPU can't be measured |
CPU > 70% |
Check which process causes this. If it's a Topaz process,
restart Topaz |
3 |
Common System Monitors
(see note 4) |
Disk Space: $disk$
(depends on Ping)
|
Disk Space: <TOPAZ DRIVE LETTER> on
"<TOPAZ HOST NAME>" |
60 |
5 |
Disk Space can't be measured |
Disk Space > 85% |
Clean the disk |
4 |
Common System Monitors
(see note 4) |
Memory (depends on Ping) |
Memory on "<TOPAZ HOST NAME>" |
10 |
1 |
Memory can't be measured |
Memory > 85% |
Check which process causes this. If it's a Topaz process,
restart Topaz |
5 |
Common System Monitors
(see note 4) |
Memory: Available Mbytes
(depends on Ping) |
Memory: Available MBytes on "<TOPAZ HOST NAME>" |
10 |
1 |
Avail Mbytes can't be measured |
Avail Mbytes < 50Mb |
Check which process causes this. If it's a Topaz process,
restart Topaz |
6 |
Common Application Monitors
(see note 5) |
Service: TopazSupervisor (depends on Ping) |
Topaz Supervisor Service on "<TOPAZ HOST NAME>" |
10 |
1 |
Service is
down |
n/a |
Restart Topaz |
7 |
Common Application Monitors
(see note 5) |
Process: TopazSupervisor
(depends on Ping) |
Topaz Supervisor Process on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
Process not running |
Restart
Topaz |
8 |
Graph/ Admin/ Agent Servers |
Service: IIS Admin Service (depends on Ping)
* Disabled by default |
Service: IIS Admin Service on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
Service is
down |
Restart
Service / Process |
9 |
Graph/ Admin/ Agent Servers |
IIS Server (depends
on Service: IIS Admin Service)
* Disabled by default |
IIS Server on "<TOPAZ HOST NAME>" |
10 |
1 |
See SiteScope documentation for IIS Server monitor |
See SiteScope documentation for IIS Server monitor |
Restart Service / Process |
10 |
Graph/ Admin/ Agent Servers |
Process : Thread Count : inetinfo (depends on Service: IIS Admin
Service)
OR
Process : Thread Count : apache (depends on Service:
TopazSupervisor) |
Process Thread Count: inetinfo on "<TOPAZ HOST NAME>"
OR
Process Thread Count: apache on "<TOPAZ HOST NAME>" |
10 |
1 |
Process not running |
Thread Count > 160 |
If number of threads > 160, restart IIS |
11 |
Graph/ Admin/ Agent Servers |
Process : Working Set : inetinfo (depends on Service: IIS Admin
Service)
OR
Process : Working Set : apache (depends on Service:
TopazSupervisor) |
Process Working Set: inetinfo on "<TOPAZ HOST NAME>"
OR
Process Working Set: apache on "<TOPAZ HOST NAME>" |
10 |
1 |
Process not running |
Working Set > 200Mb |
If memory increases towards more than physical memory, restart
IIS |
12 |
Graph/ Admin/ Agent Servers |
Process CPU: topaz (depends on Service:
TopazSupervisor) |
Process CPU: topaz on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
CPU > 85% or Process not running |
Restart Topaz when higher than 85% during several minutes. |
13 |
Graph/ Admin/ Agent Servers |
Process Memory: topaz (depends on Service:
TopazSupervisor) |
Process Memory: topaz on "<TOPAZ HOST NAME>" |
10 |
1 |
Process Memory > 600Mb |
Process Memory > 1.5Gb |
Restart Topaz when more than 75% of available physical Memory |
14 |
Graph/Admin Servers |
URL: TopazVerify.jsp (depends on Service:
TopazSupervisor) |
URL: http://<TOPAZ HOST NAME> / topaz / topazVerify.jsp |
10 |
1 |
n/a |
URL not available |
Restart Topaz.
If this does not help, restart IIS
If this does not help, reboot the machine |
15 |
Admin Server |
Process CPU: aes_twd (depends on Service:
TopazSupervisor) |
Topaz Watchdog Aggregated Event Engine (CPU) on
"<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
CPU > 85 % or Process not running |
Kill this
process (TopazSupervisor will
run it again) |
16 |
Admin Server |
Process Memory: aes_twd (depends on Service:
TopazSupervisor) |
Topaz Watchdog Aggregated Event Engine (Memory) on
"<TOPAZ HOST NAME>" |
10 |
1 |
Process Memory > 600Mb |
Process Memory > 1.5Gb |
Kill this process (TopazSupervisor will
run it again) |
17 |
Admin Server |
URL: sample_dispatcher |
URL Test for Site Scope Configuration Changes in Topaz |
10 |
1 |
n/a |
URL results contain the word "filed" |
Verify that SiteScope configuration changes are reflected in
Topaz. |
18 |
Admin Server |
Log File: aims.ejb.log (depends on Service:
TopazSupervisor)
* search for regular expression: "exception" |
Check for exceptions in SiteScope integration logs |
10 |
10 |
n/a |
URL result
contains the word "exception" |
Verify that
SiteScope configuration changes are reflected in Topaz. |
19 |
Agent Server |
URL: getTopazServerTime |
URL:
http://<TOPAZ HOST NAME>
/ topaz
/ topaz_api
/ api_getservertime.asp |
10 |
1 |
n/a |
URL not
available |
Restart
Topaz.
If this does not help, restart IIS
If this does not help, reboot the machine |
20 |
Agent Server |
Directory Monitor for folder:
<TOPAZ MACHINE NAME> \<TOPAZ FOLDER> \ .persist_dir \ lnch_persistent \ <TOPAZ HOST NAME>_web_driver
\ guarantee \ 131072_project_topaz \.msgs |
Check for too many files on Guaranteed Delivery Buffers folder |
10 |
1 |
n/a |
File Count > 2 |
Agent Server cannot pass messages to the
Topaz Bus. Check if the Topaz Bus process
is up and is not stuck. |
21 |
Agent Server |
Process: LoaderTX (depends on Service:
TopazSupervisor) |
Transaction Loader Process on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
Process not running |
Kill this process (TopazSupervisor will
run it again) |
22 |
Agent Server |
Process: LoaderWT (depends on Service:
TopazSupervisor) |
WebTrace Loader Process on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
Process not running |
Kill this process (TopazSupervisor will
run it again) |
23 |
Agent Server |
Process: LoaderSM (depends on Service:
TopazSupervisor) |
SiteScope Loader Process on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
Process not running |
Kill this process (TopazSupervisor will
run it again) |
24 |
Agent Server |
Process: LoaderNMMT (depends on Service:
TopazSupervisor) |
EMS Loader Process on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
Process not running |
Kill this process (TopazSupervisor will
run it again) |
25 |
Agent Server |
Process: LoaderABR (depends on Service:
TopazSupervisor) |
J2EE Breakdown Loader Process on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
Process not running |
Kill this process (TopazSupervisor will
run it again) |
26 |
Agent Server |
Directory Monitor: Transaction Buffers (depends on Service: TopazSupervisor)
|
Too many Transaction Buffers are waiting to load on
"<TOPAZ HOST NAME>" |
10 |
10 |
n/a |
Size of "TransactionBuffers" folder > 2Mb |
Check whether there's a large change in # of reports.
You may need to add an agent server if there are no problems |
27 |
Agent Server |
Directory Monitor: Transaction Buffers (depends on Service: TopazSupervisor)
|
Files-Size of Transaction Buffers Read on
"<TOPAZ HOST NAME>" |
10 |
10 |
Size of "Read" folder > 2Mb |
Size of "Read" folder > 4Mb |
Check whether there's a large change in # of reports.
You may need to add an agent server if there are no problems |
28 |
Agent Server |
Directory Monitor: Transaction Buffers (depends on Service: TopazSupervisor)
|
Files-Age Transcation Buffers Read on
"<TOPAZ HOST NAME>" |
10 |
10 |
"Read" folder last modified more than 4 minutes ago |
"Read" folder last modified more than 8 minutes ago |
Not getting reports. Check connectivity
to Agent Server. |
29 |
Agent Server |
Directory Monitor: Transaction Buffers (depends on Service: TopazSupervisor)
|
Files-Count Transaction Buffers Failures on
"<TOPAZ HOST NAME>" |
10 |
10 |
n/a |
File Count of "Fail" folder > 1 |
Reprocess Failures |
30 |
Agent Server |
Directory Monitor: Transaction Buffers (depends on Service:
TopazSupervisor) |
Files-Count Transaction Buffers Retry on
"<TOPAZ HOST NAME>" |
10 |
10 |
File Count of "Retry" folder > 30 |
File Count of "Retry" folder > 50 |
Indicates
a problem is loading buffers to database.
If problem is not resolved automatically, buffer files
will be moved to the "Fail" folder and it will need
to be reprocessed. |
31 |
Agent Server |
Directory Monitor: Transaction Buffers (depends on Service:
TopazSupervisor) |
Files-Size of Transaction Buffers Retry on
"<TOPAZ HOST NAME>" |
10 |
10 |
File Size of "retry" folder > 1.5Mb |
File Size of "Retry" folder > 2Mb |
Indicates
a problem is loading buffers to database.
If problem is not resolved automatically, buffer files
will be moved to the "Fail" folder and it will need
to be reprocessed. |
32 |
Agent Server |
Directory Monitor: WebTrace Buffers (depends on Service: TopazSupervisor) |
Too many WebTrace Buffers are waiting to load on
"<TOPAZ HOST NAME>" |
10 |
10 |
n/a |
Size of "TransactionBuffers" folder > 2Mb |
Check whether there's a large change in # of reports.
You may need to add an agent server if there are no problems |
33 |
Agent Server |
Directory Monitor: WebTrace Buffers (depends on Service: TopazSupervisor) |
Files-Size of WebTrace Buffers Read on
"<TOPAZ HOST NAME>" |
10 |
10 |
Size of "Read" folder > 2Mb |
Size of "Read" folder > 4Mb |
Check whether there's a large change in # of reports.
You may need to add an agent server if there are no problems |
34 |
Agent Server |
Directory Monitor: WebTrace Buffers (depends on Service: TopazSupervisor) |
Files-Age WebTrace Buffers Read on "<TOPAZ HOST NAME>" |
10 |
10 |
"Read"
folder last modified more than 4 minutes ago |
"Read"
folder last modified more than 8 minutes ago |
Not getting reports. Check connectivity
to Agent Server. |
35 |
Agent Server |
Directory Monitor: WebTrace Buffers (depends on Service: TopazSupervisor) |
Files-Count WebTrace Buffers Failures on
"<TOPAZ HOST NAME>" |
10 |
10 |
n/a |
File Count of "Fail" folder > 1 |
Reprocess Failures |
36 |
Agent Server |
Directory Monitor: WebTrace Buffers (depends on Service: TopazSupervisor) |
Files-Count WebTrace Buffers Retry on
"<TOPAZ HOST NAME>" |
10 |
10 |
File Count of "Retry" folder > 30 |
File Count of "Retry" folder > 50 |
Indicates
a problem is loading buffers to database.
If problem is not resolved automatically, buffer files
will be moved to the "Fail" folder and it will need
to be reprocessed. |
37 |
Agent Server |
Directory Monitor: WebTrace Buffers (depends on Service: TopazSupervisor) |
Files-Size of WebTrace Buffers Retry on
"<TOPAZ HOST NAME>" |
10 |
10 |
File Size of "retry" folder > 1.5Mb |
File Size of "Retry" folder > 2Mb |
Indicates
a problem is loading buffers to database.
If problem is not resolved automatically, buffer files
will be moved to the "Fail" folder and it will need
to be reprocessed. |
38 |
Agent Server |
Directory Monitor: SiteScope Buffers (depends on Service: TopazSupervisor) |
Too many SiteScope Buffers are waiting to load on
"<TOPAZ HOST NAME>" |
10 |
10 |
n/a |
Size of "TransactionBuffers" folder > 2Mb |
Check whether there's a large change in # of reports.
You may need to add an agent server if there are no problems |
39 |
Agent Server |
Directory Monitor: SiteScope Buffers (depends on Service: TopazSupervisor) |
Files-Size of SiteScope Buffers Read on
"<TOPAZ HOST NAME>" |
10 |
10 |
Size of "Read" folder > 2Mb |
Size of "Read" folder > 4Mb |
Check whether there's a large change in # of reports.
You may need to add an agent server if there are no problems |
40 |
Agent Server |
Directory Monitor: SiteScope Buffers (depends on Service: TopazSupervisor)
|
Files-Age SiteScope Buffers Read on
"<TOPAZ HOST NAME>" |
10 |
10 |
"Read"
folder last modified more than 4 minutes ago |
"Read"
folder last modified more than 8 minutes ago |
Not getting reports. Check connectivity
to Agent Server. |
41 |
Agent Server |
Directory Monitor: SiteScope Buffers (depends on Service: TopazSupervisor)
|
Files-Count SiteScope Buffers Failures on
"<TOPAZ HOST NAME>" |
10 |
10 |
n/a |
File Count of "Fail" folder > 1 |
Reprocess Failures |
42 |
Agent Server |
Directory Monitor: SiteScope Buffers (depends on Service: TopazSupervisor) |
Files-Count SiteScope Buffers Retry on
"<TOPAZ HOST NAME>" |
10 |
10 |
File Count of "Retry" folder > 30 |
File Count of "Retry" folder > 50 |
Indicates
a problem is loading buffers to database.
If problem is not resolved automatically, buffer files
will be moved to the "Fail" folder and it will need
to be reprocessed. |
43 |
Agent Server |
Directory Monitor: SiteScope Buffers (depends on Service: TopazSupervisor) |
Files-Size of SiteScope Buffers Retry on
"<TOPAZ HOST NAME>" |
10 |
10 |
File Size of "retry" folder > 1.5Mb |
File Size of "Retry" folder > 2Mb |
Indicates
a problem is loading buffers to database.
If problem is not resolved automatically, buffer files
will be moved to the "Fail" folder and it will need
to be reprocessed. |
44 |
Agent Server |
Directory Monitor: EMS Buffers (depends on Service:
TopazSupervisor)
* Disabled by default |
Too many EMS Buffers are waiting to load on
"<TOPAZ HOST NAME>" |
10 |
10 |
n/a |
Size of "TransactionBuffers" folder > 2Mb |
Check whether there's a large change in # of reports.
You may need to add an agent server if there are no problems |
45 |
Agent Server |
Directory Monitor: EMS Buffers (depends on Service:
TopazSupervisor)
* Disabled by default |
Files-Size of EMS Buffers Read on "<TOPAZ HOST NAME>" |
10 |
10 |
Size of "Read" folder > 2Mb |
Size of "Read" folder > 4Mb |
Check whether there's a large change in # of reports.
You may need to add an agent server if there are no problems |
46 |
Agent Server |
Directory Monitor: EMS Buffers (depends on Service:
TopazSupervisor)
* Disabled by default |
Files-Age EMS Buffers Read on "<TOPAZ HOST NAME>" |
10 |
10 |
"Read" folder last modified more than 4 minutes ago |
"Read" folder last modified more than 8 minutes ago |
Not getting reports. Check connectivity
to Agent Server. |
47 |
Agent Server |
Directory Monitor: EMS Buffers (depends on Service:
TopazSupervisor)
* Disabled by default |
Files-Count EMS Buffers Failures on
"<TOPAZ HOST NAME>" |
10 |
10 |
n/a |
File Count of "Fail" folder > 1 |
Reprocess Failures |
48 |
Agent Server |
Directory Monitor: EMS Buffers (depends on Service:
TopazSupervisor)
* Disabled by default |
Files-Count EMS Buffers Retry on "<TOPAZ HOST NAME>" |
10 |
10 |
File Count of "Retry" folder > 30 |
File Count of "Retry" folder > 50 |
Indicates
a problem is loading buffers to database.
If problem is not resolved automatically, buffer files
will be moved to the "Fail" folder and it will need
to be reprocessed. |
49 |
Agent Server |
Directory Monitor: EMS Buffers (depends on Service:
TopazSupervisor)
* Disabled by default |
Files-Size of EMS Buffers Retry on "<TOPAZ HOST NAME>" |
10 |
10 |
File Size of "retry" folder > 1.5Mb |
File Size of "Retry" folder > 2Mb |
Indicates
a problem is loading buffers to database.
If problem is not resolved automatically, buffer files
will be moved to the "Fail" folder and it will need
to be reprocessed. |
50 |
Agent Server |
Directory Monitor: J2EE Buffers (depends on Service: TopazSupervisor)
* Disabled by default |
Too many J2EE Buffers are waiting to load on
"<TOPAZ HOST NAME>" |
10 |
10 |
n/a |
Size of "TransactionBuffers" folder > 2Mb |
Check whether there's a large change in # of reports.
You may need to add an agent server if there are no problems |
51 |
Agent Server |
Directory Monitor: J2EE Buffers (depends on Service: TopazSupervisor)
* Disabled by default |
Files-Size of J2EE Buffers Read on "<TOPAZ HOST NAME>" |
10 |
10 |
Size of "Read" folder > 2Mb |
Size of "Read" folder > 4Mb |
Check whether there's a large change in # of reports.
You may need to add an agent server if there are no problems |
52 |
Agent Server |
Directory Monitor: J2EE Buffers (depends on Service: TopazSupervisor)
* Disabled by default |
Files-Age J2EE Buffers Read on "<TOPAZ HOST NAME>" |
10 |
10 |
"Read"
folder last modified more than 4 minutes ago |
"Read"
folder last modified more than 8 minutes ago |
Not getting reports. Check connectivity
to Agent Server. |
53 |
Agent Server |
Directory Monitor: J2EE Buffers (depends on Service: TopazSupervisor)
* Disabled by default |
Files-Count J2EE Buffers Failures on
"<TOPAZ HOST NAME>" |
10 |
10 |
n/a |
File Count of "Fail" folder > 1 |
Reprocess Failures |
54 |
Agent Server |
Directory Monitor: J2EE Buffers (depends on Service: TopazSupervisor)
* Disabled by default |
Files-Count J2EE Buffers Retry on "<TOPAZ HOST NAME>" |
10 |
10 |
File Count of "Retry" folder > 30 |
File Count of "Retry" folder > 50 |
Indicates
a problem is loading buffers to database.
If problem is not resolved automatically, buffer files
will be moved to the "Fail" folder and it will need
to be reprocessed. |
55 |
Agent Server |
Directory Monitor: J2EE Buffers (depends on Service: TopazSupervisor)
* Disabled by default |
Files-Size of J2EE Buffers Retry on
"<TOPAZ HOST NAME>" |
10 |
10 |
File Size of "retry" folder > 1.5Mb |
File Size of "Retry" folder > 2Mb |
Indicates
a problem is loading buffers to database.
If problem is not resolved automatically, buffer files
will be moved to the "Fail" folder and it will need
to be reprocessed. |
56 |
Alert Server |
Process CPU: AlertEngineMdrv (depends on Service:
TopazSupervisor) |
Alert Engine Process (CPU) on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
CPU > 85%
or Process not running |
Kill this
process (TopazSupervisor will
run it again) |
57 |
Alert Server |
Process Memory: AlertEngineMdrv (depends on Service: TopazSupervisor) |
Alert Engine Process (Memory) on "<TOPAZ HOST NAME>" |
10 |
1 |
Process Memory > 600Mb |
Process Memory > 1.5Gb |
Kill this process (TopazSupervisor will
run it again) |
58 |
Scheduled Tasks Server |
Process CPU: EmailReportsMdr (depends on Service:
TopazSupervisor) |
Scheduled Reports Engine Process (CPU) on
"<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
CPU > 85%
or Process not running |
Kill this
process (TopazSupervisor will
run it again) |
59 |
Scheduled Tasks Server |
Process Memory: EmailReportsMdr (depends on Service: TopazSupervisor) |
Scheduled Reports Engine Process (Memory) on
"<TOPAZ HOST NAME>" |
10 |
1 |
Process Memory > 600Mb |
Process Memory > 1.5Gb |
Kill this process (TopazSupervisor will
run it again) |
60 |
Scheduled Tasks Server |
Process CPU: topaz_pm (depends on Service:
TopazSupervisor) |
Topaz Partition Manager (CPU) on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
CPU > 85 % or Process not running |
Kill this process (TopazSupervisor will
run it again) |
61 |
Scheduled Tasks Server |
Process Memory: topaz_pm (depends on Service:
TopazSupervisor) |
Topaz Partition Manager (Memory) on
"<TOPAZ HOST NAME>" |
10 |
1 |
Process Memory > 600Mb |
Process Memory > 1.5Gb |
Kill this process (TopazSupervisor will
run it again) |
62 |
EMS Probe |
Process CPU: TopazEmsProbe (depends on Service:
TopazSupervisor) |
Topaz EMS Probe (CPU) on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
CPU > 85%
or Process not running |
Kill this
process (TopazSupervisor will
run it again) |
63 |
EMS Probe |
Process Memory: TopazEmsProbe (depends on Service:
TopazSupervisor) |
Topaz EMS Probe (Memory) on "<TOPAZ HOST NAME>" |
10 |
1 |
Process Memory > 600Mb |
Process Memory > 1.5Gb |
Kill this process (TopazSupervisor will
run it again) |
64 |
Topaz Bus
Server |
Process CPU: dispatcher (depends on Service:
TopazSupervisor) |
Topaz Bus Process (CPU) on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
CPU > 85%
or Process not running |
Kill this
process (TopazSupervisor will
run it again) |
65 |
Topaz Bus
Server |
Process Memory: dispatcher (depends on Service:
TopazSupervisor) |
Topaz Bus Process (Memory) on "<TOPAZ HOST NAME>" |
10 |
1 |
Process Memory > 600Mb |
Process Memory > 1.5Gb |
Kill this process (TopazSupervisor will
run it again) |
66 |
Topaz Bus
Server |
Process Working Set: dispatcher (depends on Service: TopazSupervisor) |
Process Working Set: dispatcher on "<TOPAZ HOST NAME>" |
10 |
1 |
Process not running |
Working Set > 150Mb |
Kill this process (TopazSupervisor will
run it again) |
67 |
Topaz Bus
Server |
Process Thread Count: dispatcher (depends on Service: TopazSupervisor) |
Process Thread Count: dispatcher on
"<TOPAZ HOST NAME>" |
10 |
1 |
Process not running |
Thread Count > 10 |
Kill this process (TopazSupervisor will
run it again) |
68 |
Topaz Bus
Server |
Log File Monitor:
<TOPAZ MACHINE NAME> \<TOPAZ FOLDER> \log\dispatcher_log.txt
* search for regular expression: "error" |
Log File:
<TOPAZ MACHINE NAME> \<TOPAZ FOLDER> \log\dispatcher_log.txt |
10 |
10 |
"error" appears in log file more than 1 time |
"error" appears in log file more than 1000 times |
Error indicates one of two things.
1. Topaz Bus cannot communicate with the Topaz Admin Server. This error is
identified by the string "ERROR
[TMC]". Check that the Topaz Admin Server is up and running.
2. Topaz Bus cannot translate
samples which are being reported by
some Business Process Monitors or
SiteScopes.
It can occur when profile configuration data is deleted, but these kind of
errors should only take few minutes.
If this problem persists, verify the data collection agents report valid
data. |
69 |
Topaz Bus
Server |
Directory Monitor:
<TOPAZ MACHINE NAME> \<TOPAZ FOLDER> \.persist_dir \lnch_persistent \<TOPAZ HOST NAME>_project_topaz
\guarantee
(depends on Service: TopazSupervisor) |
Directory:
<TOPAZ MACHINE NAME> \<TOPAZ FOLDER> \.persist_dir \lnch_persistent \<TOPAZ HOST NAME>_project_topaz
\guarantee |
10 |
10 |
Size of "guarantee" folder > 30Mb |
Size of "guarantee" folder > 50Mb |
Indicates
the Topaz Bus cannot pass messages to the alert server.
Check the Alert Server process is up
and is not stuck. |
70 |
Topaz Bus
Server |
Directory Monitor:
<TOPAZ MACHINE NAME> \<TOPAZ FOLDER> \.persist_dir \dc_persist_queue (depends on
Service: TopazSupervisor)
|
Directory:
<TOPAZ MACHINE NAME> \<TOPAZ FOLDER> \.persist_dir \dc_persist_queue |
10 |
10 |
File Count of "dc_persist_queue" (including sub
folders) folder > 5 |
File Count of "dc_persist_queue" (including sub
folders) folder > 15 |
Indicates the Topaz Bus cannot translate
samples which are being reported by
some Business Process Monitors or
SiteScopes.
It can occur when profile configuration data is deleted, but these kind of
errors should only take few minutes.
If this problem persists, verify the data collection agents report valid
data. |
71 |
Business
Process Monitor/
SiteScope/Client Monitor |
Topaz Host Last Connection Time |
"<TOPAZ HOST NAME>" Last Connection Time |
10 |
5 |
See SiteScope documentation
for Topaz Host Last Connection Time monitor |
See SiteScope documentation
for Topaz Host Last Connection Time monitor |
Verify this data collection
agent is up and running. |
72 |
Business
Process Monitor/
SiteScope / Client Monitor |
Topaz Host Last Reported Data Time |
<TOPAZ HOST NAME> Last Reported Data Time |
10 |
5 |
See SiteScope
documentation for Topaz Host Last Data Time monitor |
See SiteScope
documentation for Topaz Host Last Data Time monitor |
Verify this
data collection agent is up and running. |
73 |
SiteScope |
SiteScope
Health Status |
SiteScope
Health Status on "<TOPAZ HOST NAME>" |
10 |
1 |
SiteScope Health monitoring indicates warning status |
SiteScope Health monitoring indicates error status |
Check the
Health page of the relevant SiteScope |
74 |
Database
Server: SQL |
Service:
MSSQLSERVER (depends on Ping)
* Disabled by default |
MSSQLSERVER
Service on "<TOPAZ HOST NAME>" |
10 |
1 |
Service is
down |
n/a |
Restart
MSSQLSERVER Service |
75 |
Database Server: SQL |
NT Event Viewer Errors |
NT Event Viewer Application Log |
10 |
10 |
n/a |
New Error appear in the Application Log viewer |
Examine the event viewer for errors and see if these errors are
related to the MSSQLSERVER service. |
76 |
Database Server: Oracle |
Process: Oracle (depends on Ping)
* Disabled by default |
Oracle Service on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
Service is
down |
Restart
Service / Process |
77 |
Database Server: Oracle |
Process: TNS Listener (depends on Ping)
* Disabled by default |
Oracle TNS Listener Service on "<TOPAZ HOST NAME>" |
10 |
10 |
n/a |
Service is down |
Restart Service / Process |
|
|
|
|
|
|
|
|
|
ID |
Group |
Monitor Type |
Monitor Name |
Freq.
[min.]
(on OK) |
Freq.
[min.]
(on Error) |
Warning
condition |
Error
Condition |
Error
Resolution |
1 |
Common
System Monitors
(see note 4) |
Ping |
Ping:
"<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
Ping fails |
Check
Connectivity |
2 |
Common System Monitors
(see note 4) |
CPU Utilization (depends
on Ping)
|
CPU Utilization on "<TOPAZ HOST NAME>" |
10 |
1 |
CPU can't be measured |
CPU > 70% |
Check which process causes this. If it's a Topaz process,
restart Topaz |
3 |
Common System Monitors
(see note 4) |
Disk Space: $disk$
(depends on Ping)
|
Disk Space: <TOPAZ DRIVE LETTER> on
"<TOPAZ HOST NAME>" |
60 |
5 |
Disk Space can't be measured |
Disk Space > 85% |
Clean the disk |
4 |
Common System Monitors
(see note 4) |
Memory (depends on Ping) |
Memory on "<TOPAZ HOST NAME>" |
10 |
1 |
Memory can't be measured |
Memory > 85% |
Check which process causes this. If it's a Topaz process,
restart Topaz |
5 |
Common
Application Monitors
(see note 5) |
Service:
TopazSupervisor (depends on Ping) |
n/a |
10 |
1 |
Service is
down |
n/a |
Restart Topaz |
6 |
Graph/Admin/Agent Servers |
Apache Web Server (depends on Service: IIS Admin
Service)
* Disabled by default |
Apache Web Server on "<TOPAZ HOST NAME>" |
10 |
1 |
See SiteScope documentation for IIS Server monitor |
See SiteScope documentation for IIS Server monitor |
Restart Service / Process |
7 |
Graph/Admin/Agent Servers |
Web Server Process |
Web Server Process on "<TOPAZ HOST NAME>" |
10 |
1 |
Process not running |
Thread Count > 160 |
If number of threads > 160, restart IIS |
8 |
Graph/Admin/Agent Servers |
Process: topaz |
Topaz Process on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
CPU > 85% or Process not running |
Restart Topaz when higher than 85% during several minutes. |
9 |
Graph/Admin Servers |
URL:
TopazVerify.jsp (depends on Service: TopazSupervisor) |
URL:
http://<TOPAZ HOST NAME> /topaz /TopazVerify.jsp |
10 |
1 |
n/a |
URL not available |
Restart Topaz.
If this does not help, restart IIS
If this does not help, reboot the machine |
10 |
Admin Server |
Process: aes_twd (depends on Service:
TopazSupervisor) |
Topaz
Watchdog Aggregated Event Engine on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
CPU > 85 % or Process not running |
Kill this
process (TopazSupervisor will
run it again) |
11 |
Admin Server |
URL: sample_dispatcher |
URL Test for Site Scope Configuration Changes in Topaz |
10 |
1 |
n/a |
URL result contain the word "filed" |
Verify that SiteScope configuration changes are reflected in
Topaz. |
12 |
Admin Server |
Log File: aims.ejb.log (depends on Service:
TopazSupervisor)
* search for regular expression: "exception" |
Check for exceptions in SiteScope integration logs |
10 |
10 |
n/a |
URL result
contains the word "exception" |
Verify that
SiteScope configuration changes are reflected in Topaz. |
13 |
Agent Server |
URL: getTopazServerTime |
URL:
http://<TOPAZ HOST NAME> / topaz / topaz_api / api_getservertime.asp |
10 |
1 |
n/a |
URL not
available |
Restart
Topaz.
If this does not help, restart IIS
If this does not help, reboot the machine |
14 |
Agent Server |
Directory Monitor for folder:
<TOPAZ MACHINE NAME> \<TOPAZ FOLDER> \.persist_dir \lnch_persistent \<TOPAZ HOST NAME>_web_driver
\guarantee \131072_project_topaz
\.msgs |
Check for too many files on Guaranteed Delivery Buffers folder |
10 |
1 |
n/a |
File Count > 2 |
Agent Server cannot pass messages to the
Topaz Bus. Check if the Topaz Bus process
is up and is not stuck. |
15 |
Agent Server |
Process: LoaderTX (depends on Service:
TopazSupervisor) |
Transaction Loader Process on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
Process not running |
Kill this process (TopazSupervisor will
run it again) |
16 |
Agent Server |
Process: LoaderWT (depends on Service:
TopazSupervisor) |
WebTrace Loader Process on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
Process not running |
Kill this process (TopazSupervisor will
run it again) |
17 |
Agent Server |
Process: LoaderSM (depends on Service:
TopazSupervisor) |
SiteScope Loader Process on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
Process not running |
Kill this process (TopazSupervisor will
run it again) |
18 |
Agent Server |
Process: LoaderNMMT (depends on Service:
TopazSupervisor) |
EMS Loader Process on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
Process not running |
Kill this process (TopazSupervisor will
run it again) |
19 |
Agent Server |
Process: LoaderABR (depends on Service:
TopazSupervisor) |
J2EE Breakdown Loader Process on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
Process not running |
Kill this process (TopazSupervisor will
run it again) |
20 |
Alert Server |
Process:
AlertEngineMdrv (depends on Service: TopazSupervisor) |
Alert Engine
Process on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
CPU > 85%
or Process not running |
Kill this
process (TopazSupervisor will
run it again) |
21 |
Scheduled Tasks Server |
Process:
EmailReportsMdrv (depends on Service: TopazSupervisor) |
Scheduled
Reports Engine Process on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
CPU > 85% or Process not running |
Kill this process (TopazSupervisor will
run it again) |
22 |
Scheduled Tasks Server |
Process: topaz_pm (depends on Service:
TopazSupervisor) |
Topaz Partition Manager Process on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
CPU > 85% or Process not running |
Kill this process (TopazSupervisor will
run it again) |
23 |
Topaz Bus Server |
Process:
dispatcher (depends on Service: TopazSupervisor) |
Topaz Bus
Process on "<TOPAZ HOST NAME>" |
10 |
1 |
n/a |
CPU > 85% or Process not running |
Kill this process (TopazSupervisor will
run it again) |
24 |
Topaz Bus
Server |
Log File: /opt /Topaz /log /dispatcher_log.txt
* search for regular expression: "error" |
Log File: /opt /Topaz /log /dispatcher_log.txt |
10 |
10 |
"error" appears in log file more than 1 time |
"error" appears in log file more than 1000 times |
Error indicates one of two things.
1. Topaz Bus cannot communicate with the Topaz Admin Server. This error is
identified by the string "ERROR
[TMC]". Check that the Topaz Admin Server is up and running.
2. Topaz Bus cannot translate
samples which are being reported by
some Business Process Monitors or
SiteScopes.
It can occur when profile configuration data is deleted, but these kind of
errors should only take few minutes.
If this problem persists, verify the data collection agents report valid
data. |
25 |
Business Process
Monitor/
SiteScope/Client Monitor |
Topaz Host
Last Connection Time |
"<TOPAZ HOST NAME>"
Last Connection Time |
10 |
5 |
See SiteScope documentation
for Topaz Host Last Connection Time monitor |
See SiteScope documentation
for Topaz Host Last Connection Time monitor |
Verify this data collection
agent is up and running. |
26 |
Business
Process Monitor/
SiteScope/Client Monitor |
Topaz Host Last Reported Data Time |
<TOPAZ HOST NAME> Last Reported Data Time |
10 |
5 |
See SiteScope
documentation for Topaz Host Last Data Time monitor |
See SiteScope
documentation for Topaz Host Last Data Time monitor |
Verify this
data collection agent is up and running. |
27 |
SiteScope |
SiteScope
Health Status |
SiteScope
Health Status on "<TOPAZ HOST NAME>" |
10 |
1 |
SiteScope Health monitoring indicates warning status |
SiteScope Health monitoring indicates error status |
Check the
Health page of the relevant SiteScope |
28 |
Database
Server: Oracle |
Process:
Oracle Checkpoint |
Oracle
Checkpoint Process on "<TOPAZ HOST NAME>" for SID
<TOPAZ ORACLE SID> |
10 |
1 |
n/a |
Process not
running |
Restart
Service / Process |
29 |
Database Server: Oracle |
Process: Oracle Process Monitor |
Oracle Process Monitor on "<TOPAZ HOST NAME>" for SID
<TOPAZ ORACLE SID> |
10 |
10 |
n/a |
Process not running |
Restart Service / Process |
30 |
Database Server: Oracle |
Process: Oracle Service Monitor |
Oracle Service Monitor on "<TOPAZ HOST NAME>" for SID
<TOPAZ ORACLE SID> |
10 |
1 |
n/a |
Process not running |
Restart Service / Process |
31 |
Database Server: Oracle |
Process: Oracle Database Writer |
Oracle Database Writer Process on "<TOPAZ HOST NAME>"
for SID <TOPAZ ORACLE SID> |
10 |
1 |
n/a |
Process not running |
Restart Service / Process |
32 |
Database Server: Oracle |
Process: Oracle Log Writer |
Oracle Log Writer Processes on "<TOPAZ HOST NAME>" for
SID <TOPAZ ORACLE SID> |
10 |
10 |
n/a |
Process not
running |
Restart
Service / Process |