Digital Preservation
Key Challenges in Sustainable Digital Preservation
Digital preservation faces mounting obstacles due to increasing data volumes, rising maintenance costs, and reliance on specialist staff. Traditional systems involve proprietary storage, specialised workflows, and high licensing/maintenance costs, factors that threaten long-term sustainability, especially when future budgets, staffing, and technologies are uncertain.
The sustainability problem for digital preservation is compounded by:
Dependency on niche expertise.
Costly, inflexible systems.
Unpredictable scalability and exit strategies.
Risks of data inaccessibility or loss over time.
Sustainability, in this context, means minimising dependence on uncontrollable variables, enabling organisations to adapt to change without risking preservation integrity.
A sustainable digital preservation workflow model
A more inclusive, open, and scalable digital preservation model is required. The core strategy is to embed preservation activities for digital assets into normal data workflows, making them accessible to all stakeholders, regardless of their digital preservation expertise.
What does a digital preservation workflow look like?
Ingest
Files are submitted through a Deposit Service, which scans them for duplicates, viruses, and file formats.
Metadata is extracted and verified.
Using Fedora as a repository could store files in OCFL (Oxford Common File Layout) in the cloud, supporting versioning and metadata storage alongside binary content.
Use Fedora to ensure data integrity by using checksums (SHA-256 and MD5) and transactional processes.
Preservation
Verified files are then stored in a Fedora repository with preservation copies sent to Cold Cloud Storage for the long-term.
A Chain of Integrity then tracks and verifies checksums from source to preservation storage.
Access & Management
The Workbench UI offers a central access point for users to monitor workflows, appraise content, update metadata, flag issues, generate reports, and manage users.
Access platforms like CUDL (Cambridge University Digital Library) can provide public availability as needed.
Architecture & Tools
This approach to digital preservation could be built using microservices, and the system can scale up or down easily.
Components are open source and standards-compliant, ensuring transparency and flexibility. Storage hardware and platforms are designed for ease of access and costed for the long haul.
Key tools include:
Fedora 6 is a repository operating system
OCFL on S3 for file structure
Workbench UI for management
Cloud Cold Storage for low-cost, sustainable, read-optimised long-term storage.
A Future-Proof, Inclusive Model
This model eliminates the bottleneck of requiring a “digital preservation person” to access or manage preserved content. Instead, by making the repository a shared workspace accessible through the Workbench, it democratises access and responsibility. This inclusive, automation-driven, and cloud cold storage-backed system provides a resilient solution to the sustainability challenge, and by providing dedicated media for each workflow, tenant, or dataset, it ensures the preservation of digital assets. Cloud Cold Storage can recover preserved digital assets without extensive delay.
In summary, by prioritising openness, automation, shared responsibility, low cost, environmentally aware storage and adaptability, this approach to a digital preservation infrastructure offers a blueprint for how institutions can sustainably manage and preserve digital assets for the long haul in the face of escalating scale, costs and complexity.
This approach to a sustainable and open digital preservation model is similar to that deployed by the Cambridge University Library; you can read about their implementation here.
Companies like Arkivum and Penwern can also set up managed digital preservation models.