Open Discussion about Data Incomparability Challenges, Successes and Ideas
See bottom for instructions about adding subjects or comments.
General comments?
Caryn Anderson - 3nov07
Feel free to post thoughts about how to improve data incomparability or to discuss your challenges or to respond to something someone else is saying here.
Some data transfer and manipulation issues
Caryn Anderson - 3nov07
From Caryn Anderson's 2006 essay Electronic Resource Usage Statistics: Defining a Complex Problem(DOC)
Problems with data collection processes and data transfer are substantial, but they can be described as problems of labor (time and expertise). If you had enough people to do the work, you could collect all the data that was available. A much more fundamental problem with vendor collected e-resource usage statistics is the fact that they do not measure the same things, in the same ways, in the same increments. As a result, it is not possible to get a comprehensive view of usage of an institution’s entire electronic resources collection because the data are not comparable. This incomparability problem is a product of two main issues: diversity of resources, and lack of standards in measurement. In order to be able to aggregate and/or compare usage of different resources, at the very least there must be agreement on:
• definitions of elements and measures (e.g. what is a “resource,” a “session,” a “item,” etc.)
• scope of coverage (e.g. full-text article retrieval as a single number versus separate numbers for full-text HTML views, full-text PDF views; or article retrievals viewed vs. printed vs. e-mailed, etc.)
• temporal increments of measures (e.g. hourly, daily, weekly, monthly, quarterly)
• types of measures (i.e. how do/can you compare/measure accesses to citations/abstracts, datasets, books, book chapters, book pages, etc.)
There are additional issues related to the reliability of measurements that concern many vendors and institutions. Examples include the double-counting of access when a user views an article and then decides to send it to themselves and/or colleagues. Should this “retrieval” be counted once or “n” times? Technical issues also complicate matters. If a user selects a PDF to view and is impatient with the speed of the download they may click again. This may count as another access in some systems. In order to facilitate better access for all users, some systems are designed to reallocate “sessions” based on activity. As such, some users are timed out of their session if they have not recently interacted with a page (e.g. they are just reading), in order to provide access to another user. When the first user returns to activity by clicking on their page, their session is reinitiated seamlessly, but this affects “sessions” counts.
Electronic journals, bibliographic citation databases and, very recently (March 2006), books and reference works have benefited greatly from the establishment of the COUNTER code of practice. Vendors who are certified as COUNTER-compliant, provide usage data according to agreed definitions, scope and temporal increments for certain types of measures. As such, an institution can aggregate and compare statistics across vendors and resources for all COUNTER-compliant data. Unfortunately, however, there are significant limitations:
• Non compliance
o Some vendors are simply not compliant yet, but some vendors have no intention of becoming COUNTER-compliant. While these vendors may provide usage data that is labeled as a “session” or “item,” there is no guarantee that the data calculated under that definition is the same as those for COUNTER-compliant vendors and therefore they cannot be aggregated or compared.
• Low compliance
o Only a comparatively few vendors are COUNTER-compliant. The number is growing but is still small when the full scope of resources and providers is considered.
• Selective compliance
o Vendors can be selectively COUNTER-compliant. There are five types of COUNTER reports for electronic journals and databases, and a code of practice for books and reference works was released in March 2006. Vendors need not be compliant in all of them. This may require double work in data collection and institutions must be careful not to mix data from the same vendor.
• Data transfer and manipulation
o The COUNTER standards are primarily focused on definitions, calculation protocols and guidelines for reporting content. In recent years they have developed an XML DTD, but providing COUNTER reports via XML is not required for a vendor to be certified as COUNTER-compliant.
• Ineligible resources
o There remain some resources that are not covered by the COUNTER standard. These resources cannot be thoroughly integrated into any comprehensive evaluation process.
Add a topic/subject (by using a double exclamation point)
and/or add a horizontal line (by entering three dashes), put your name/date below and your comments/contributions below that.
PBwiki Help
- To edit the page you are on, just click "Edit Page" at the top. Add text wherever you like and click "Preview" to see how it looks. DON'T FORGET to click "Save" when you are happy with the Preview or all your edits will be lost.
- How do I...? - Wiki Style Guide
Comments (0)
You don't have permission to comment on this page.