Datasherpa Solo Core Components
Clickstream Datasherpa: The Key Data Collection Engine
The data collection engine is at the heart of all Datasherpa products. It consists of a number of automation features, behavioural metrics, data recording and management tools as well as a number of standard outputs as mentioned below:
1. Automation
a. Automated Page Tagging, Automated Instrumentation of all Pages
b. Automatic Server Error Page Tracking (200, 404, 304 HTTP status codes are logged)
c. Automated File Collation for Multiple Servers
d. Automated File Management, Automated File Transfer by means of http or ftp services
2. Behaviour Functions
a. Accurate Metrics for Visitor and Session Definition to ABCe standards
- Page Impressions
- Unique users/visitors
- Sessions
b. Unique, Collision Free, Accurate User ID
c. Simple Business User Identifier (data from one form field - e.g. telephone number bank account or similar)
d. Off-line and In-Cache Tracking Off-Line and In Cache Tracking Track 100% of cached and off-line data
e. Page View Duration
Unique metrics developed by
Clickstream; measures the exact time-on-page as defined by the interval between the end of
loading a page and the time when the page is unloaded.
3. Data Recording and Management Tools
a. Data Configuration Overview
Datasherpa™ allows the user to define what data Datasherpa will gather and what they will
look like
b. File, page or file type exclusion
To avoid the
collection of unwanted data, the user can exclude text files, image files, flash files, specific
pages, files in certain directories, etc. By default, Datasherpa does not monitor style sheets,
but it could be asked to do so.
c. Control File Types to be Logged
List of file extensions which the users do not want included in their database. By default this
is used to exclude images, dynamic link libraries, style sheets, sound files, JavaScript files,
and Shockwave/Flash files. If any of these file types are valuable to the site statistics then
they should be removed from this list.
d. Control Pages & Files to be Logged
The ability to
control not only what page types are instrumented and monitored, but also which individual pages
or sets of pages should provide data.
e. Switchable Query Strings
Sometimes it is essential to
remove query string elements which make pages unique and that should not be 'uniqued' e.g. a
userid or cache buster. Excluding query strings reduces the number of unique pages stored but can
also mean less information is available about the use of dynamic pages. The default setting is to
track query strings.
f. File Downloads
The downloads of all files are recorded and the finish of download is confirmed.
4. Output Configuration/ Output Files, Formats and Output Management
a. Datasherpa Filter Output
A log file in a proprietary format which is converted by the Datasherpa Log Processor so as to
emulate the native format of the reporting/analytics software which will read the logfile
b. Extensible Log Format Feed (ELF), equivalent to an IIS or Apache log file.
c. ABCe ELF Format
d. Management of Robot and Spider Traffic Data
Robots and spiders are captured and can be either marked, deleted or output to a separate file
for further analysis
e. Log Processing Interval Definition
Time Period for log file generation can be set to any whole division of 24 hours down to 15mins
f. Cross Site Administration Tools
g. Management of Multiple Aliases, Domains and Sites on One or Multiple Servers


