#1910: entry-metadata not encoding character entities

opened by njr2128

& (and maybe others?) showing up as character entity and not as literal character

For example, the heading of 4v1: https://github.com/cu-mkp/m-k-manuscript-data/blob/4625a7ad236f37bca47f4bdc27fcd536e89bcbb0/ms-xml/tl/tlp004v_preTEI.xml#L8-L9 is returned as

Black varnish for sword guard, bands for trunks, &c

It should be

Black varnish for sword guard, bands for trunks, &c


gschare commented:

Holding off this fix until next release of manuscript-object.


tcatapano commented:

other character entities appearing in entry-metadata.csv are:

#224 #230 #231 #232 #233 #234 #8211 #8217 #8230 #9728 #9790


tcatapano commented:

Note these appear in xml files under allFolios, not under ms-xml


gschare commented:

This is being handled as part of the work on the manuscript-object refactor.

It looks like the inconsistencies are caused because - allFolios/ is generated by update.py, distinctly from how other derivatives are generated - entries/ is generated by recipe.py, which makes sure to decode character entities - entry-metadata.csv is generated by one specific method in recipe.py, find_title(), which still uses regex (so it doesn't decode character entities)