CSA Developer's Guide
Overall Design
The clickstream data aggregation (tracking) functionality interface consists of a single session-scoped JavaBean
(com.ths.csa.TrackedSession
) and a demonstration JSP file (SessionTracker.jsp
). When the JavaBean is bound to a session and the initialize() method has been called, the bean begins tracking various
events that occur in the user's session via various methods in the interface. These events include page views, timeouts, logouts, errors, and user defined events.
When the session is closed, the contents of the session are written to a unique XML file. The filename will be the user's session ID and the path is
specified by an argument to TrackedSession
's initialize() method. The tracking functionality may be used by itself, though the reporting functionality
is dependent on the tracking functionality.
The reporting functionality interface of CSA consists of a single application-scoped JavaBean (com.ths.csa.ClickStreamAnalyzer
) called by a single JSP file (ClickStream.jsp
).
An initial call to ClickStream.jsp
will generate a form with all available folders containing tracked session XML files. The default ClickStream.jsp displays the folders in a date-wise hierarical manner based on a YYYY/MM/DD directory structure.
When the user selects some number of folders and submits the form,
ClickStreamAnalyzer
's doAnalysis() method is called. All the session XML files in all the selected folders are then parsed and summarized. Because this is a CPU and I/O intensive process, it is suggested that the reporting tool be installed on a non-production
server with network access to the filesystem containing the session XML files. doAnalysis() returns a CSAResultSet
object from which summary values are pulled and displayed.
All class files for both the tracking and reporting functionality are included in the CSA.jar file.
Installation
- Unzip the download file into a temporary directory.
- Move the CSA.jar file to an appropriate place in your directory structure for 3rd party JAR files.
- Add the CSA.jar file to the classpath of the servlet engine's VM.
- Move the JSP files in the /jsp folder to the location of your site's JSP files.
Tracking Integration
- To track all JSP pages, statically include SessionTracker.jsp into all JSP files or into any JSP already included by all other JSP files. To track only specific pages, statically include SessionTracker.jsp into those pages. Edit request parameter names and values in SessionTracker.jsp as needed.
- Inside the static initializer block of the declaration of the TrackedSession bean, call
TrackedSession
's initialize() method. The arguments to TrackedSession
's initialize() methods are:
- The HttpServletRequest object representing the first request of the user's session.
TrackedSession
extracts various information from the request object.
- The web application name
- The filesystem folder in which to write the XML file when the session is closed. As packaged, the reporting tool's
ClickStream.jsp
expects for XML files to be arranged in a YYYY/MM/DD folder format.
- The request parameter name that will represent the request ID as used by the tracking system to detect use of the back button. If a received request ID is some number less than the expected request ID, the user has pressed the browser's back button that many times.
- A PrintStream object to be used for unexpected problems. During normal operation, nothing will be sent to this stream.
- Track any other pertinent session related data in the initializer block (though session related data
may be recorded at any time during the session by calling
TrackedSession
's recordSessionAttribute() or recordSessionAttributes() methods).
Recording the initial page referrer is demonstrated in SessionTracker.jsp. A convenience method, recordLoginId() is provided to record the user's login ID as a session attribute.
- Page views are the most common event. Call recordNewPage() and pass it the HttpServletReqest object and the name of the page being viewed. Be careful to consistenly name viewed pages since the reporting tool uses the page name as
a key String to track multiple page hits and clickstream paths. e.g. /Index.jsp != Index.jsp != Index.jsp?
- Any other custom data may be tracked and associated with the current page by calling recordPageData() after recordNewPage() has been called. Clicks to offsite URLs may be recorded in this
manner using a response.redirect() call. See SessionTracker.jsp for an example of recording clicks to offsite URLs.
- Session timeout events are generated by calls to valueUnbound(), a method in the
HttpSessionBindingListener
interface implemented by TrackedSession
. When a session timesout, the actual amount of time the user spent looking at the last requested page is indeterminant.
When sessions timeout, the session duration is calculated based on the time of the last page request made by the user, and the viewing duration of the last requested page is set to zero. In the case where only one page request was made before the session timedout, the session
duration is also set to zero. Sessions with zero duration are not counted when calculating average session length, likewise page views of zero duration are not counted when calculating average page view duration.
- Logout events are generated by calls to recordLogout(). Usually this call will be followed closely by a call to session.invalidate().
- The tracking system can detect the use of the browser's back button by embedding a simple request ID as a request parameter into each link. For the parameter name, use the same String as passed to
TrackedSession
's initialize() method ('RID' in SessionTracker.jsp). Inserting
that request parameter into each link will likely become a tedious task. Writing the request ID parameter into each link using a link URL generator method is suggested.
Reporting Integration
Very little integration work is necessary to begin using the reporting tool once the tracking system is in place. As mentioned before, the reporting
tool requires the XML files generated by the tracking tool, though the tracking tool may be used without the reporting tool. The process of reading and parsing a large number of
XML files will be taxing on the hosting server both CPU-wise and I/O-wise. For best results, host the reporting tool on a non-production server with access to the filesystem containing the XML.
Because the report must be generated before the user's browser timesout waiting for the response, more powerful host servers will allow larger numbers of tracked sessions to be
included in a requested report. A feature is currently under development to allow the report to be e-mailed to the requester, thereby allowing very large numbers of tracked sessions to be included.
- The reporting tool makes use of the Collections API found in JDK version 1.2 and later (1.3 is suggested), and requires the JAXP 1.1 implementation for parsing the XML files.
Make sure the JAXP JAR files are in the servlet container's classpath.
- In ClickStream.jsp, edit the call to
ClickStreamAnalyzer
's initialize() method.
- The first argument is the path to the log files root folder. If the default directory structure is used (root/YYYY/MM/DD), this entry should match root.
- A PrintStream object to be used for unexpected problems. During normal operation, nothing will be sent to this stream.
- To better understand the different fields of the
CSAResultSet
object, see the included JavaDoc.
Integration with analysis engines via XML
CSA writes out tracked session data to XML files. These XML files can be easily imported into databases of high-end inference and rules based
analysis engines. There currently is no DTD published for the XML file formats, though each will contain one 'session' root element with various attributes, plus
some number of 'event' sub-elements also with various attributes.
Modifying and Extending Functionality
CSA was designed to allow for most required modifications to be made at the JSP level. Though extending TrackedSession
and ClickStreamAnalyzer
is certainly an option. Please
make contact if you need any assistance in that area.
Feature Requests and Bug Reports
Please don't hesistate to file feature requests, bug reports, or any other comment from the contact page.