From @thuchacz
Extract: 1. The contents of all <la></la>
tags from either the TC, TCN, or TL (there are 109 instances in each of the 3 versions, so I don't think it matters which you use) 1. I need these because I need to assign to TT and CAG the translation of Latin words, phrases, sentences and also check them against existing <comment>
tags. Extraction would help me immensely.
<fr></fr>
tags from the TL. What I need here is trickier. Before the merges you just committed, there were 1660 <fr>
tags used in the TL, 789 of which were <del><fr></fr></del>
nests, which I'm not interested in at all. Is there a way for you to provide me the strings inside <fr></fr>
but not in the <del><fr></fr></del>
? If not, I can work around this (provided they are extracted in order they appear in the folios), but it will simply take me longer.@thuchacz here are the lists: french.csv.txt latin.csv.txt
FYI, I used these XPATH expressions: for $la in //la return concat(replace($la, '\n', ' ' ), '|', $la/preceding::page[1] )
and
for $fr in //fr[not(parent::del)] return concat(replace($fr, '\n', ' ' ), '|', $fr/preceding::page[1] )