RE: [EXTERNAL] [VuFind-Tech] Revival: MARC::Record Size exceeding 99999 Bytes

Demian Katz

unread,

Feb 18, 2022, 12:13:37 PM2/18/22

to Ahrens, Helge, vufin...@lists.sourceforge.net, solrma...@googlegroups.com

Helge,

I think there may be two dimensions to this problem, and I’m copying the solrmarc-tech list into the thread in case anyone there has further feedback/suggestions.

It seems that SolrMarc may be converting MARC-XML to binary MARC even when length limits are exceeded. It might be useful to have a setting/function to retain XML format for records that exceed the 99999-byte limit. (Or maybe such a feature exists and I simply don’t know about it). In any case, VuFind can read either type of record from the index, so it’s okay to mix them if the indexing tool can handle it.
It is possible that the new MARC reader introduced in VuFind 8 doesn’t handle these overflowing MARC records in the same way that the File_MARC library used to. Maybe we need to revise/loosen the validation you highlight before. Perhaps Ere Maijala can weigh in on this, since he has led the work on the new MARC code.

Thanks for raising this issue; I look forward to hearing more on the subject from others!

- Demian

From: Ahrens, Helge <Helge....@ulb.hhu.de>
Sent: Friday, February 18, 2022 9:17 AM
To: vufin...@lists.sourceforge.net
Subject: [EXTERNAL] [VuFind-Tech] Revival: MARC::Record Size exceeding 99999 Bytes

Dear group,

today we encountered an error concerning the MARC::Record Size exceeding 99999 Bytes. A search for some information lead me to conversations in this list and on the Koha forum around the year 2012 (Links below). Unfortunately I feel like there has not really been a solution for the case of existing records that are bigger that the given Limit of 99999 Bytes. Correct me if I’m wrong.

So far if I’d just comment out line L86-L88 in Iso2709.php ( https://github.com/vufind-org/vufind/blob/2969196c94df2f97b839debec20a2399c97e8f44/module/VuFind/src/VuFind/Marc/Serialization/Iso2709.php#L86-L88 ) the record is displayed without problems, but the Exception causes VuFind to crash if uncommented.

Now the question for me is: Did anyone try to overcome this size-limit and ideally was successful? Because if the search crashes only based on such an error it’s not really explainable to the user and therefore not acceptable.

This doesn’t work: https://katalog.ulb.hhu.de/Search/Results?lookfor=%22Rheinische+Post%22&type=Title&limit=10

This works: https://katalog.ulb.hhu.de/Search/Results?lookfor=Rheinische+Post&type=Title&limit=10

Best wishes and thank you!

Helge

https://groups.google.com/g/solrmarc-tech/c/TrFs6m3DW58

https://koha-devel.koha-community.narkive.com/iUaBE7PQ/marc-record-record-length-and-leader

Robert Haschart

unread,

Feb 25, 2022, 11:19:19 AM2/25/22

to solrma...@googlegroups.com

Demian,

The 5 ascii byte record length (and 4 ascii byte field length) are fundamental problems in the design of binary MARC records. It's possible to create and use MARC records that exceed this size limit, but strictly speaking they are illegal. (and the library cops might come after you :-) )

The programs marc4j and solrmarc are designed to be able to handle illegal, oversized binary MARC records. and use a convention to represent these records in a particular way such that some other permissive MARC readers can handle them even though they are invalid.

You can add either field to your solr record containing either a binary MARC representation or an XML representation but I'd have to think how you could set it up so that binary MARC if not too big, otherwise marcXML

-Bob Haschart

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to solrmarc-tec...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/solrmarc-tech/BL1PR03MB6136F344F12157800DB0FDD3E8379%40BL1PR03MB6136.namprd03.prod.outlook.com.

Demian Katz

unread,

Feb 28, 2022, 8:12:18 AM2/28/22

to solrma...@googlegroups.com

Thanks, Bob – I wonder if some kind of custom method that takes a list of encodings would be a solution – then you could set a prioritized order of formats, and if the formatter fails, the code moves along to the next item in the list. That might cover not just the “too long for binary” scenario but also other oddball things like “data contains characters which cannot be included in XML.”

- Demian

To view this discussion on the web visit https://groups.google.com/d/msgid/solrmarc-tech/CAKrr7dY0fpRjEFtBE_Q3cCAz8-kyH3rmG%2BKcOg1jvdOz89HYCA%40mail.gmail.com.

Reply all

Reply to author

Forward