HLA Nomenclature: Standing the test of time?


#1

One of my earliest and most supportive mentors, Carolyn Hurley, published a review: “Naming HLA diversity: A review of HLA nomenclature” https://doi.org/10.1016/j.humimm.2020.03.005

This is a concise summary of one of the greatest stories in history of Genetics where, unlike so much individual achievement, success emerged from the community as a whole. Working together we developed a language to speak about the most polymorphic and medically relevant part of our genomes.

The article is a solid statement of the facts and will serve as Cliff’s Notes for those entering the field who aren’t willing to plow through 17 volumes of International workshop proceedings that are out of print.

I love the paper but I disagree its final assessment about the nomenclature ‘standing the test of time’

  1. When it comes to its most common clinical application: antibody recognition of HLA (the nomenclature has failed). It stoped naming serotypes 20 years ago. The resulting “language gap” comes at a cost: organ allocation systems are sub-optimal. The labs know more but the “system” doesn’t carry the information.

  2. When it comes to representation of DNA based assignments the nomenclature fails outright to address ambiguity which is intrinsic to all DNA-based typing methods.

  3. When it comes to keeping up with technology it has failed. The description of gene-feature-enumeration in the paper misses the main point: it is a system for automated curation. Most new variants go unreported because the system for describing them is manual. Its like comparing the internet and postal service and saying “two alternative ways to transmit digital information”. Sure but one is a manual human process the other is the internet.

  4. The HLA Nomenclature Committee last met in Fall 2017. The report of its decisions has yet to be published. The KIR Nomenclature Committee (which is a subcommittee of the HLA) met in Spring 2017. The report of the decisions has yet to be published.

  5. The 5′ and 3′ untranslated regions are a new mess. The boundaries of the UTR change silently between releases.

Names are identifiers but:
Names change for the same sequence.
Sequences change for the same name.
Expression characters come and go from the name.
The link between names and accession numbers changes.
The accession numbers are “versioned”.
The machine-readable export of the database does not conform to any standard. ENA-like?

The system and its governance are dysfunctional.

That is has lasted this long is because we as Bioinformaticist have not yet put viable alternatives in the hands of scientists and clinicians.

Its time for the Bioinformatics community to step up and do something useful.

Nobody else will.


#2

I was recently asked why, in a 3-field resolution dataset the 2-field allele A*33:05 appeared. Here is my reply:

Welcome to HLA Nomenclature!

The name of an allele can vary from 2 to 4 fields and this is not an indication of the resolution of typing nor is it an indication of the level of definition of the allele (partial exons, all exons, full genomic).

This is why MIRING was developed which is a minimal information standard where typing metadata accompanies the result and describes what the test targeted.

The allele name A33:05 is only 2-fields because nobody has described a synonymous variant (which could be called A33:05:02) which would prompt the renaming of A33:05 to A33:05:01). If somebody describes a non-CDS variant this would be called A33:05:01:02 and the allele A33:05 would be renamed to A*33:05:01:01.

Luckily, in this case the allele A*33:05 is defined across all exons and introns so its future name is predictable.

A substantial number of alleles are/were only defined across a subset of exons which can lead to complex renamings as the rest of the sequence of an already-named allele is revealed. This is why the MIRING standard requires HLA genotyping results be accompanied by the IPD-IMGT/HLA database version used to assign it.


#3

The HLA#####.# accession numbers have an issue that make them unusable as the basis of describing genomic polymorphism: they version number only changes if there are changes to the CDS sequence. Changes to introns or CDS do not warrant an increase in the version number. There is a second issue that there is not single reference for all accession number versions. To find the sequence corresponding to a particular accession number version the history of HLA.DAT/HLA.XML files must be scanned in until one is found. This is fixable with technology but the first issue makes them unusable for genomics.

If anybody is interested we are organizing a hackathon to address this by building a database of GenBank references corresponding the reference alleles chosen for the 17th IHIW.