The problem has been building for years. Canberra's government agencies — federal and territory alike — are sitting on digital image archives riddled with duplicates, near-duplicates and mis-catalogued files, and the window for dealing with them cheaply is closing fast. Storage costs are rising, audit obligations are tightening, and the ACT Government's own Digital Strategy, which runs through to the end of 2026, has flagged records quality as a priority area requiring action before the new financial year budget cycle locks in spending.
For a city whose economy runs on public administration, this is not a niche IT issue. The National Archives of Australia on Queen Victoria Terrace holds custody obligations over Commonwealth agency records, while the ACT Government's Territory Records Office, operating under the Territory Records Act 2002, carries parallel responsibilities for ACT public authority records. Both institutions face the same practical question: when two image files look identical but carry different metadata, different access classifications or different retention schedules, which one is the authoritative record — and what happens to the other?
Why the Decision Can't Wait Much Longer
The pressure has intensified because of storage economics. Cloud hosting costs for large unstructured data sets — the category that image libraries fall into — have climbed steadily, and agencies that deferred deduplication projects during the 2020-2023 period, when remote-working uploads surged, are now carrying libraries that are in some cases two to three times the size they were before the pandemic. The Australian Public Service Commission's workforce data shows the ACT remains the most densely public-servant-populated jurisdiction in the country, which translates directly into the volume of digital material generated and stored locally.
The National Archives' Ditchley Road facility in Mitchell already operates under documented capacity constraints flagged in successive Senate estimates hearings. Adding unremediated duplicate image sets to that burden has downstream consequences for how quickly agencies can respond to Freedom of Information requests — a metric that Senate committees watch closely and that the Information Commissioner's office in Barton measures annually.
At the Australian National University in Acton, the library's digital collections team has been running its own deduplication program since early 2025, using perceptual hashing tools to identify visually similar images even when file names and formats differ. The approach is not unique to ANU — the University of Canberra at Bruce has explored similar workflows — but the ANU project is among the more mature implementations in the capital, and both federal and territory archivists have been watching its progress.
The Decisions That Will Define the Outcome
Three choices dominate the immediate agenda. First, agencies need to settle on a technical standard for what counts as a duplicate. Pixel-perfect matches are straightforward; perceptual similarity at, say, a 95 per cent threshold is not, and different thresholds produce dramatically different culling outcomes. A conservative threshold might eliminate 8 per cent of a library; an aggressive one could flag 30 per cent for review.
Second, the question of human oversight versus automated deletion has to be resolved before any deduplication tool is deployed at scale. The Territory Records Act requires that destruction of records follow an approved disposal authority — an automated delete function that bypasses that step would create legal exposure, not solve a problem.
Third, agencies must decide whether to run deduplication as a one-off remediation project or embed it as an ongoing process triggered at ingest. The latter is technically cleaner but requires procurement of software that integrates with existing document management platforms, most of which in the ACT public service run on either Microsoft SharePoint or TRIM.
The ACT Government's next Digital Strategy progress review is scheduled for the third quarter of 2026. That review will effectively set the funding baseline for any territory-agency deduplication program heading into the 2027-28 budget. Federal agencies face a parallel deadline: the National Archives is expected to issue updated General Disposal Authority guidance before December 2026, and how that guidance handles near-duplicate digital images will determine whether agencies have clear legal cover to act or spend another year waiting for certainty. The clock is running.