One of my earliest and most supportive mentors, Carolyn Hurley, published a review: “Naming HLA diversity: A review of HLA nomenclature” https://doi.org/10.1016/j.humimm.2020.03.005
This is a concise summary of one of the greatest stories in history of Genetics where, unlike so much individual achievement, success emerged from the community as a whole. Working together we developed a language to speak about the most polymorphic and medically relevant part of our genomes.
The article is a solid statement of the facts and will serve as Cliff’s Notes for those entering the field who aren’t willing to plow through 17 volumes of International workshop proceedings that are out of print.
I love the paper but I disagree its final assessment about the nomenclature ‘standing the test of time’
When it comes to its most common clinical application: antibody recognition of HLA (the nomenclature has failed). It stoped naming serotypes 20 years ago. The resulting “language gap” comes at a cost: organ allocation systems are sub-optimal. The labs know more but the “system” doesn’t carry the information.
When it comes to representation of DNA based assignments the nomenclature fails outright to address ambiguity which is intrinsic to all DNA-based typing methods.
When it comes to keeping up with technology it has failed. The description of gene-feature-enumeration in the paper misses the main point: it is a system for automated curation. Most new variants go unreported because the system for describing them is manual. Its like comparing the internet and postal service and saying “two alternative ways to transmit digital information”. Sure but one is a manual human process the other is the internet.
The HLA Nomenclature Committee last met in Fall 2017. The report of its decisions has yet to be published. The KIR Nomenclature Committee (which is a subcommittee of the HLA) met in Spring 2017. The report of the decisions has yet to be published.
The 5′ and 3′ untranslated regions are a new mess. The boundaries of the UTR change silently between releases.
Names are identifiers but:
Names change for the same sequence.
Sequences change for the same name.
Expression characters come and go from the name.
The link between names and accession numbers changes.
The accession numbers are “versioned”.
The machine-readable export of the database does not conform to any standard. ENA-like?
The system and its governance are dysfunctional.
That is has lasted this long is because we as Bioinformaticist have not yet put viable alternatives in the hands of scientists and clinicians.
Its time for the Bioinformatics community to step up and do something useful.
Nobody else will.