Skip to main content
The Daily Canberra

All of Canberra, every day

News

Canberra's Duplicate Image Problem: How the Capital Stacks Up Against Cities Tackling Digital Archive Bloat

As governments worldwide pour money into digitising public records, Canberra's institutions are wrestling with a surprisingly mundane crisis — thousands of duplicate images clogging databases and costing real money to store.

Share

By Canberra News Desk · Published 5 July 2026, 5:00 am

4 min read

Updated 4 h ago· 5 July 2026, 12:58 pm

How we reported this

This article was generated by AI from the linked public sources. The Daily Canberra is independently owned and covers Canberra news free from advertiser or sponsor influence. Read our editorial standards →

The Australian War Memorial holds more than 10 million catalogue records. The National Library on Parkes Place manages digitised collections running into the hundreds of terabytes. And like almost every major public archive built on legacy database systems over the past three decades, both face a version of the same unglamorous headache: duplicate image files eating storage, distorting search results, and quietly draining operational budgets.

Duplicate image replacement — the process of identifying, consolidating, and systematically removing redundant digital files from institutional repositories — has moved from an IT housekeeping task to a policy concern in cities where public institutions anchor the economy. Canberra, where the federal government is the dominant employer and public-sector digitisation programs have accelerated since 2022, sits at an interesting crossroads in how it handles the problem compared to peer capitals.

What the Problem Actually Costs

Cloud storage is cheap in absolute terms — Amazon Web Services and Microsoft Azure both price standard object storage at well under five cents per gigabyte per month — but institutional archives do not deal in gigabytes. The National Archives of Australia, headquartered in Mitchell on the city's northern fringe, manages a digital repository measured in petabytes. At that scale, duplicated files are not a quirk; they are a budget line. Industry estimates from the International Council on Archives suggest that between 15 and 30 per cent of images held in large public repositories may exist in duplicate or near-duplicate form, though figures vary significantly by institution and how aggressively records were migrated from older systems.

The ACT government's own digital records framework, updated under the Territory Records Act, requires agencies to maintain single authoritative copies of official documents — but enforcement of that standard across the dozens of directorates generating records daily is patchy in practice. The Australian National University, which runs one of the southern hemisphere's larger research data repositories through its Scholarly Communication team on Acton campus, piloted a perceptual hashing deduplication tool across its image holdings in 2024. The results, presented at an internal research data management forum, found meaningful redundancy in collections that had been migrated more than once over ten years.

Wellington, New Zealand — a public-sector capital of similar size to Canberra — began a formal duplicate-suppression program across its Archives New Zealand holdings in 2023. Singapore's National Heritage Board completed a similar audit of its digitised museum collections and reported measurable storage savings within eighteen months. Both cities benefit from having consolidated their archival infrastructure under fewer institutional roofs than Canberra, where federal and territory records systems sit on entirely separate stacks and rarely talk to each other.

Canberra's Fragmented Approach

That fragmentation is the core of Canberra's comparative disadvantage. A public servant at a Barton-based agency, a researcher at the University of Canberra's Bruce campus, and an archivist at the National Film and Sound Archive in Acton may all be dealing with versions of the same underlying digitisation infrastructure problem, but under different legislative frameworks, different procurement rules, and different vendor contracts.

The federal government's Digital Transformation Agency has published guidance on data quality for Commonwealth entities, but duplicate image management does not appear as a named priority in the agency's current 2025-26 work program, based on its publicly available delivery roadmap. That leaves individual agencies to solve the problem on their own schedules and budgets.

For institutions planning major collection migrations in the next two years — and several Canberra bodies have flagged exactly that in budget submissions tabled before Senate estimates committees — the practical advice from archival data managers is straightforward: run deduplication audits before migration, not after. Moving duplicates into a new system doubles the remediation cost. Wellington learned that the hard way in 2021 when an early migration phase had to be partially reprocessed.

Canberra has world-class archival institutions clustered within a few kilometres of each other along the parliamentary triangle. The coordination infrastructure to match them has not yet caught up. Whether it does before the next major digitisation spend hits the books is the question that archivists in Mitchell and Parkes are quietly asking each other right now.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Canberra

Covering news in Canberra. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Canberra news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Canberra and accept our Privacy Policy. Unsubscribe anytime.

The Daily Network — local news across Australia