WWO Statistics

From Digital Scholarship Group
Jump to navigation Jump to search

Overall usage statistics for WWO (from a WWP manager's perspective) are accessible at

http://wwo.wwp.neu.edu/usage/awstatsMain/

Subscribers should be directed to http://wwo.wwp.northeastern.edu/usage/awstats/, and to get in, they should enter:

wwostats !stats!


Interpreting AWStats data

  • pages are requests for new pages (e.g. a user clicking on something to go somewhere or make a change to the page they're reading
  • visits are a continuous sequence of interactions by a user
  • hits are any request that you haven't told AWS to ignore: any HTTP request for a resource; not as relevant; we've filtered out some basic things that are extraneous (e.g. stylesheets, image files, javascript, etc.) but not sure exactly what it's recording: a single page is constructed out of multiple components

Note that this stats package uses outmoded ideas about how web sites and browsers work Statistics about "unique visitors" will always dramatically undercount usage, since they look at unique IP addresses; most universities have dynamic IP addressing, and a limited range of IP addresses get reassigned to multiple users. A proxy server will look like a single visitor.


For IREL, if they request data on "searches" and "sessions":

  • "searches" = any access or page request; their notion of a 'search' is based on database access
  • searches = AW stats "pages"
  • sessions = AW stats "visits"


Some very basic things about awstats configuration

Awstats directory structure

There are two directories that awstats uses for storing configuration information governing who gets to see what statistics, and how those stats are reported and displayed, ../awstats/ and ../awstats_priv. I believe the former governs overall stats, while the latter governs the behavior of workstation-level stats.

Configuration files

Every subscribing institution has a unique configuration file that exists in both directories and has the general form awstats.[institution_name].conf}} (e.g. awstats.Brown_University.conf). These files are generated from a single template file -- awstats.model.conf, which lives in ../awstats/ -- every time the IP export process is run. (I believe it happens as part of the cron job that runs following IP updates, though I'm not sure and need to find out more about that process.) This means that changes made to an individual conf file will be overwritten every time IP addresses are updated on papa.

Changing who can access institutional statistics

The makeConfFiles.pl script is responsible for actually generating configuration files from this template and setting the appropriate IP access for each individual institution. It pulls IP information from the textbase_ipaddrs file that is produced during the IP export process (see this page for details) and also creates the bash script and the perl script that process all the stats stuff and run nightly as part of a cron job. [Actually, upon further inspection, it isn't clear that this process runs nightly. It is possible that makeConfFiles.pl needs to be invoked manually to generate a new set of configuration files, or that it only runs once each month when awstats collects monthly statistics.] Most of the script is devoted to parsing IP addresses and ranges properly and adding them to individual conf files, but it also contains a section that governs global IP addresses that you want to add to every configuration file -- the subnet that WWP/STG computers are one, for instance.

To update this global access information, go to the line where the $stgSubnet variable is assigned a value and change/add whatever IP you want to permit access. Multiple IP addresses can be added to the line, with a single space separating each IP or IP range. When you're done, just leave a comment in the file indicating the change.

Permitting full-year view of graphical statistics

By default, awstats disables viewing of yearly statistics for individual institutions (it's permitted via the command-line awstats interface, but since our subscribers don't have access to that, the default awstats installation doesn't provide any easy way to view information for a complete calendar year.

To permit full-year viewing, simply change the AllowFullYearView setting so that it's value is "3" (rather than the default "2") in the appropriate configuration file. Remember, only changes made to the awstats.model.conf template will be reflected in all configuration files, and since configuration files are re-generated periodically, any changes made to individual conf files will be overwritten every time the makeConfFiles.pl script runs.

As of August 14, 2008, the default setting for newly generated conf files will be to permit all institutions to view full-year statistics from the main graphical awstats page. Should this prove to be too taxing on our servers, AllowFullYearView=3 should simply be changed back to AllowFullYearView=2 in awstats.model.conf.

Where does awstats actually store its data?

When you view an awstats page in a web browser, the information that awstats is using to generate its graphical display, its lists of referring sites, pages viewed, etc., is pulled from a separate location on papa: /var/www/html/usage/awstats/[name_of_institution]. Each institution has its own directory, within which live a set of text files, one for each month for which awstats has collected statistics. These files are named awstats[mm][yyyy].[name_of_institution].txt.

For instance, Brown's statistics are stored in /var/www/html/usage/awstats/Brown_University/. Within this directory the individual stats file for, say, October of 2007 would be awstats102007.Brown_University.txt. This text file contains all of the information that awstats displays when you view the Brown University statistics for October 2007 via the awstats web interface.