Duplicate images are costing Canberra's major institutions serious time and money. Across the ACT's network of public repositories, digital libraries and government agencies, the unchecked accumulation of duplicate photographs, scanned documents and archived visual records has become a concrete operational headache — one that specialists say is getting harder to ignore as storage costs climb and federal digitisation programs expand.
The issue surfaced publicly in recent months as the National Archives of Australia, headquartered on Queen Victoria Terrace in Parkes, pushed further into its multi-year digitisation program. The Archives holds more than 12 kilometres of physical records, and as those materials move into digital form, duplicate image files are emerging as a systemic byproduct — the result of batch scanning errors, legacy software migrations and contributions from multiple agencies with no unified deduplication standard.
At the University of Canberra's library on Kirinari Street in Bruce, staff working with the institutional repository have flagged similar frustrations. The university's digital collections span research outputs, historical photographs and administrative records, and the absence of automated hash-matching tools — software that compares files byte-by-byte to flag exact copies — means duplicates are typically found only during manual audits, which are labour-intensive and infrequent.
Technology consultants advising ACT government agencies have pointed to perceptual hashing as the more sophisticated solution. Unlike exact-match tools, perceptual hashing can identify images that are visually identical but differ slightly in file format, compression or resolution — the kind of near-duplicates that often slip through standard checks and represent the bulk of the problem in large collections.
The Cost Question
Storage is not abstract. Cloud storage pricing for large institutions typically runs between $20 and $50 per terabyte per month for enterprise-grade solutions, and government-accredited secure storage commands a premium above that. For an agency holding tens of thousands of duplicate image files — each potentially several megabytes — the redundant storage bill across a financial year becomes measurable in tens of thousands of dollars, specialists in the field have noted, though exact figures vary significantly by agency and contract.
The ACT Government's Digital Strategy, last updated in 2024, nominates data quality and interoperability as priority areas for the territory's public sector. Deduplication sits squarely within that agenda, though no dedicated funding line for image deduplication has been publicly announced as of July 2026.
At the Australian Institute of Aboriginal and Torres Strait Islander Studies on Lawson Crescent in Acton, the stakes carry an additional dimension. The institute holds one of the country's largest collections of Indigenous photographs, audio and film, and duplicates in that context raise not just efficiency questions but cultural ones — about provenance, consent records and which version of a record carries the authoritative metadata.
Practitioners broadly recommend a staged approach: first, an audit using automated tools to map the scale of duplication within a given collection; second, a policy decision on which instance of a duplicate becomes the canonical record and which is retired; and third, the implementation of ingest-level controls so new duplicates do not enter the system. Several consultants working with Canberra agencies have suggested the audit phase alone, for a mid-sized institutional collection, can take between three and six months depending on collection size and existing metadata quality.
For Canberra's government agencies and research institutions, the practical next step is procurement. The National Archives is understood to be evaluating vendor options, though no contract has been publicly announced. Institutions still relying on manual workflows are being advised not to wait — the longer deduplication is deferred, the more expensive and complex the eventual clean-up becomes.