The ACT's major public institutions moved this week to address a persistent and costly problem in their digital collections: thousands of duplicate images cluttering online archives, slowing search results, and confusing researchers who rely on clean, accurate records. The push, which involves the ACT Heritage Library in Civic and the Mildenhall Collection held at the Dickson community precinct, marks a practical shift in how the territory manages its digitised holdings.
The timing is not accidental. Canberron has seen a surge in online access to local historical records since the pandemic years, and institutions are now grappling with the legacy of rapid digitisation programs that prioritised volume over quality control. When staff scan documents or photographs in batches, duplicates slip through — sometimes dozens of versions of the same image at different resolutions, different file names, or slight crop variations. For a researcher at the Australian National University in Acton trying to trace a Gungahlin property boundary from the 1960s, or a public servant on Northbourne Avenue verifying a heritage listing, a cluttered database wastes hours.
What Actually Happened This Week
Staff at the ACT Heritage Library on Mildura Street, Fyshwick — where physical archive storage is managed — flagged the issue formally at an internal working group meeting on Tuesday, July 1. The group, which includes representatives from Libraries ACT and the Canberra Museum and Gallery on London Circuit, agreed to a coordinated audit process beginning in mid-July. The audit will use automated detection software to flag images sharing more than 90 percent pixel similarity, with human review required before any file is removed or replaced.
Libraries ACT's digital team has been piloting similar deduplication tools since March 2026, initially applied to the collection of roughly 14,000 photographs documenting Belconnen's suburban development from the 1970s onward. Early results from that pilot found a duplication rate of around 12 percent in the scanned batches — meaning roughly one in eight images in the tested subset was a functional copy of another file already in the system. That figure, while not yet published formally, informed the decision to expand the audit territory-wide.
The problem has a direct cost dimension. Storage and cataloguing overheads for digital assets are not free. The ACT Government's digital infrastructure budget for 2025-26, as published in the territory's budget papers, allocates funding for library and archive digitisation under the Cultural Facilities Corporation's capital program — and administrators argue that cleaning up redundant files frees both storage capacity and staff hours that would otherwise go toward manual cataloguing.
Researchers and the Public: What Changes
For people actually using these systems, the most visible change will be in search results on the ACT Heritage Library's online portal. Currently, a search for images of Manuka Oval, for example, can return multiple near-identical photographs from the same negative, catalogued separately because staff in different digitisation rounds scanned the same physical item. Once deduplication is complete, the portal should return a single canonical version with clear metadata about resolution and source.
University of Canberra researchers working in the Bruce campus library's special collections unit have separately raised the issue with Libraries ACT over the past 12 months, pointing out inconsistencies in the Canberra Times photograph archive donated to public holdings in the 2010s. That collection, which spans several decades of Canberra civic life, is among the collections flagged for priority review in the July audit.
The audit is expected to conclude its first phase by September 30, 2026, with a public report to follow. Institutions involved have indicated they plan to update their digitisation intake protocols to include automatic similarity checks before new images enter the live catalogue — a step that should prevent the backlog from growing again. Researchers who rely on these collections should expect some temporary search disruptions in late July as files are reviewed and recatalogued, but administrators have said the core public-access portal will remain operational throughout.