The Australian War Memorial holds more than 10 million catalogue records. The National Library on Parkes Place manages digitised collections running into the hundreds of terabytes. And like almost every major public archive built on legacy database systems over the past three decades, both face a version of the same unglamorous headache: duplicate image files eating storage, distorting search results, and quietly draining operational budgets.
Duplicate image replacement — the process of identifying, consolidating, and systematically removing redundant digital files from institutional repositories — has moved from an IT housekeeping task to a policy concern in cities where public institutions anchor the economy. Canberra, where the federal government is the dominant employer and public-sector digitisation programs have accelerated since 2022, sits at an interesting crossroads in how it handles the problem compared to peer capitals.
What the Problem Actually Costs
Cloud storage is cheap in absolute terms — Amazon Web Services and Microsoft Azure both price standard object storage at well under five cents per gigabyte per month — but institutional archives do not deal in gigabytes. The National Archives of Australia, headquartered in Mitchell on the city's northern fringe, manages a digital repository measured in petabytes. At that scale, duplicated files are not a quirk; they are a budget line. Industry estimates from the International Council on Archives suggest that between 15 and 30 per cent of images held in large public repositories may exist in duplicate or near-duplicate form, though figures vary significantly by institution and how aggressively records were migrated from older systems.
The ACT government's own digital records framework, updated under the Territory Records Act, requires agencies to maintain single authoritative copies of official documents — but enforcement of that standard across the dozens of directorates generating records daily is patchy in practice. The Australian National University, which runs one of the southern hemisphere's larger research data repositories through its Scholarly Communication team on Acton campus, piloted a perceptual hashing deduplication tool across its image holdings in 2024. The results, presented at an internal research data management forum, found meaningful redundancy in collections that had been migrated more than once over ten years.
Wellington, New Zealand — a public-sector capital of similar size to Canberra — began a formal duplicate-suppression program across its Archives New Zealand holdings in 2023. Singapore's National Heritage Board completed a similar audit of its digitised museum collections and reported measurable storage savings within eighteen months. Both cities benefit from having consolidated their archival infrastructure under fewer institutional roofs than Canberra, where federal and territory records systems sit on entirely separate stacks and rarely talk to each other.
Canberra's Fragmented Approach
That fragmentation is the core of Canberra's comparative disadvantage. A public servant at a Barton-based agency, a researcher at the University of Canberra's Bruce campus, and an archivist at the National Film and Sound Archive in Acton may all be dealing with versions of the same underlying digitisation infrastructure problem, but under different legislative frameworks, different procurement rules, and different vendor contracts.
The federal government's Digital Transformation Agency has published guidance on data quality for Commonwealth entities, but duplicate image management does not appear as a named priority in the agency's current 2025-26 work program, based on its publicly available delivery roadmap. That leaves individual agencies to solve the problem on their own schedules and budgets.
For institutions planning major collection migrations in the next two years — and several Canberra bodies have flagged exactly that in budget submissions tabled before Senate estimates committees — the practical advice from archival data managers is straightforward: run deduplication audits before migration, not after. Moving duplicates into a new system doubles the remediation cost. Wellington learned that the hard way in 2021 when an early migration phase had to be partially reprocessed.
Canberra has world-class archival institutions clustered within a few kilometres of each other along the parliamentary triangle. The coordination infrastructure to match them has not yet caught up. Whether it does before the next major digitisation spend hits the books is the question that archivists in Mitchell and Parkes are quietly asking each other right now.