Digital Continuity:
Record Classification and Retention on
Shared Drives and Email Vaults

VIDALIS, Stilianos; Angelopoulou, Olga; Emmanuel, Lesly

Tools

Lists

VIDALIS, Stilianos, Angelopoulou, Olga and Emmanuel, Lesly (2011) Digital Continuity: Record Classification and Retention on Shared Drives and Email Vaults. Project Report. Welsh Government, Cardiff, UK.

[thumbnail of INFO - MERGE PDF's Dig Con Fin.pdf]

Preview

Text
INFO - MERGE PDF's Dig Con Fin.pdf
Download (2MB) | Preview

Abstract or description

In 2007 the UK government identified several objectives for improving the storage of public sector information. In particular, and of direct relevance to this project, it wanted to:
 improve the responsiveness to demands for public sector information
 ensure the most appropriate supply of information for reuse
 improve the supply of information for reuse
 promote the innovative use of public sector information.

The aim of this project was to mine, categorise and classify information from a heterogeneous large-scale computer infrastructure and then store the search results in a forensically sound manner. Duplicate information was to identified for destruction and the process designed so that it could be implemented without disrupting staff operations.
The test data was a a 217Gb (810,000 files) sample taken from the Welsh Government (WG) shared drives and email vault. The records concerned largely related to the work of the Department of Education and Skills though 25% of the sample were taken from the wider organisation in order to ensure that the classification system used were useful over a broad range of subjects. The test data was stored in an isolated test environment with virtualised structures. All development work within the project occurred within the test environment.

De-duplication of the test data was achieved. Some 35.88% of the files were identified as duplicates. Removing these files resulted in a saving of 29.49% of physical space. After one pass of the data, it was possible to generate usable metadata for 75.7% of the de-duplicated data set. This became the rich data set. The retention policies of the WG were used to design queries and rules for analysing the rich data set.

It was possible to extract 65% of the files in the rich data-set for long-term retention together with their metadata in a format that would allow transfer to the WG Electronic Document and Record Management System (ERDMS Know as iShare within the WG). This translates to 55% of the de-duplicated data set. Further analysis of the rich data set would have produced a better extraction rate. This would have been further facilitated by the use of knowledge extraction applications such as Pingar.

Item Type:	Monograph or Report (Project Report)
Additional Information:	The report was distributed to all the Governmental Departments and is currently being considered by the National Archives for further development.
Faculty:	Previous Faculty of Computing, Engineering and Sciences > Computing
Depositing User:	Stilianos VIDALIS
Date Deposited:	01 Jul 2013 08:29
Last Modified:	24 Feb 2023 13:39
Related URLs:	http://www.nationalarchives.gov.uk/infor...
URI:	https://eprints.staffs.ac.uk/id/eprint/1316

Actions (login required)

: View Item