Datasherpa Omni Architecture

The Product

Datasherpa is an automatic web data collection software which replaces web server log files and page tagging as data sources for web analytics. The software monitors 100% of all content, integrates new content 'on the fly', and records all cached data accurately since the data collection algorithm is cached with a given page. Data collection is completely automatic and does not require any page tags.

The primary output of Datasherpa is a 'superlog' which can be delivered in either .csv format to a database for business intelligence or in the logfile format required by leading web analytics suites. Clickstream provides a capability which converts its own proprietary format into a standard logfile format readable by web analytics software.

The Architecture

  • Datasherpa filter
    The Datasherpa software is installed on or in front of the web server. This filter becomes a part of your web server during the initial install and works transparently to instrument and collect data.
Standard Web pages are then automatically instrumented with a data collection algorithm as they leave the server; this means that all pages are tracked even when content/pages are frequently changed or when there are many pages to be tagged. The Datasherpa software filter both inserts the tracking algorithm as well as collecting all the data.
  • Datasherpa Log Processor
    The log processing server takes in the various web logfiles generated by the filter(s), merges and processes them so that they can be read by the reporting tools in use. The transfer of files from and to the log processor and to the reporting tool is fully automatic.
  • Data Transmission
    There are two methods provided for transferring files from the web-servers to the server where the logs are to be processed for whatever reporting tool/BI software the data is intended for.
  • Transfer can be via FTP or HTTP
  • FTP Transfer can by compressed via GNU zip
  • FTP Transfer can by encrypted via SFTP (Windows) or Blowfish
  • HTTP Transfer can by encrypted via SSL
  • The FTP Service is used to push log files from the web server to the log processing server
  • The HTTP Service is used to pull log files from the web-server
  • The FTP Service can also be used to transfer files that have been processed to some other location, where for example they may be ingested into a reporting tool
  • the Merge Service
    The Merge service is used in a load balance environment where there are web logs for a particular web-site are coming from more than one server. The merge service merges the logs into one combined log ready for the log processor