& (and maybe others?) showing up as character entity and not as literal character
For example, the heading of 4v1: https://github.com/cu-mkp/m-k-manuscript-data/blob/4625a7ad236f37bca47f4bdc27fcd536e89bcbb0/ms-xml/tl/tlp004v_preTEI.xml#L8-L9 is returned as
Black varnish for sword guard, bands for trunks,
&
c
It should be
Black varnish for sword guard, bands for trunks, &c
Holding off this fix until next release of manuscript-object
.
other character entities appearing in entry-metadata.csv are:
#224 #230 #231 #232 #233 #234 #8211 #8217 #8230 #9728 #9790
Note these appear in xml files under allFolios, not under ms-xml
This is being handled as part of the work on the manuscript-object refactor.
It looks like the inconsistencies are caused because - allFolios/ is generated by update.py, distinctly from how other derivatives are generated - entries/ is generated by recipe.py, which makes sure to decode character entities - entry-metadata.csv is generated by one specific method in recipe.py, find_title(), which still uses regex (so it doesn't decode character entities)