Pattern map question

2 views
Skip to first unread message

Demian Katz

unread,
Sep 9, 2019, 5:00:37 PM9/9/19
to solrma...@googlegroups.com

Hello,

 

Quick question about pattern maps: I want to change a small set of values in a field, while maintaining all the remaining values. I thought I could use a pattern map to match the things I want to rewrite, and then let everything else through by default… but the problem is that I end up duplicating the unwanted values in addition to rewriting them.

 

Here’s my configuration:

 

topic = custom, getAllSubfields(600:610:611:630:650:653:656, " "), (pattern_map.aliens)

pattern_map.aliens.pattern_0 = ^Alien criminal(.*)=>Noncitizen criminal$1

pattern_map.aliens.pattern_1 = ^Alien detention centers(.*)=>Detention centers$1

pattern_map.aliens.pattern_2 = ^Alien labor(.*)=>Noncitizen labor$1

pattern_map.aliens.pattern_3 = ^Alien property(.*)=>Foreign-owned property$1

pattern_map.aliens.pattern_4 = ^Aliens(.*)=>Noncitizens$1

pattern_map.aliens.pattern_5 = ^Children of alien laborers(.*)=>Children of noncitizen laborers$1

pattern_map.aliens.pattern_6 = ^Illegal alien children(.*)=>Children of undocumented immigrants$1

pattern_map.aliens.pattern_7 = ^Illegal aliens(.*)=>Undocumented immigrants$1

pattern_map.aliens.pattern_8 = (.*)=>$1

 

Is there a way to only apply pattern_8 when there has been no match in the preceding 7 patterns?

 

thanks,

Demian

Haschart, Robert J (rh9ec)

unread,
Sep 9, 2019, 6:23:02 PM9/9/19
to solrma...@googlegroups.com
I think if you change the last line from:

pattern_map.aliens.pattern_8 = (.*)=>$1

to

pattern_map.aliens.pattern_8 = keepRaw

which is one of three special case pattern mapping directives.  That particular directive specifies that if no patterns have matched and generated output then the original value that was passed to the map should be copied to the output.  

the other directives are "matchAll"  which applies each successive pattern to the output of the previous pattern
and "filter"  which is more geared toward in-place replacement of one string with another.


-Bob


From: solrma...@googlegroups.com <solrma...@googlegroups.com> on behalf of Demian Katz <demia...@villanova.edu>
Sent: Monday, September 9, 2019 5:00 PM
To: solrma...@googlegroups.com <solrma...@googlegroups.com>
Subject: [solrmarc-tech] Pattern map question
 
--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to solrmarc-tec...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/solrmarc-tech/BN6PR03MB25956F24BA1D162204D03231E8B70%40BN6PR03MB2595.namprd03.prod.outlook.com.

Demian Katz

unread,
Sep 10, 2019, 7:43:14 AM9/10/19
to solrma...@googlegroups.com

Thanks, Bob, I think this is exactly what I need! I’ve edited the documentation to include this information, since I’m sure it will be helpful to others:

 

https://github.com/solrmarc/solrmarc/wiki/Translation-maps

 

I wasn’t entirely clear on the purpose of filter (or at least, in how it differs from default behavior), so please feel free to improve upon my wording there if you can… but for now I assume something vague is better than nothing at all. 😊

 

- Demian

Demian Katz

unread,
Sep 10, 2019, 7:58:28 AM9/10/19
to solrma...@googlegroups.com

…and maybe I spoke too soon, because I’m seeing some weird behavior. I’m using this configuration:

 

topic_facet = 600x:610x:611x:630x:648x:650a:650x:651x:655x, (pattern_map.aliens)

topic = custom, getAllSubfields(600:610:611:630:650:653:656, " "), (pattern_map.aliens)

pattern_map.aliens.pattern_0 = ^Alien criminal(.*)=>Noncitizen criminal$1

pattern_map.aliens.pattern_1 = ^Alien detention centers(.*)=>Detention centers$1

pattern_map.aliens.pattern_2 = ^Alien labor(.*)=>Noncitizen labor$1

pattern_map.aliens.pattern_3 = ^Alien property(.*)=>Foreign-owned property$1

pattern_map.aliens.pattern_4 = ^Aliens(.*)=>Noncitizens$1

pattern_map.aliens.pattern_5 = ^Children of alien laborers(.*)=>Children of noncitizen laborers$1

pattern_map.aliens.pattern_6 = ^Illegal alien children(.*)=>Children of undocumented immigrants$1

pattern_map.aliens.pattern_7 = ^Illegal aliens(.*)=>Undocumented immigrants$1

pattern_map.aliens.pattern_8 = keepRaw

 

with the attached MARC record. Everything seems to be working as expected for the topic_facet field, but the topic field is still dropping the fake “Zarklovian electric eels” subject heading I added for testing purposes. The only difference between topic and topic_facet is that one is a text field and one is a string field, but I don’t think that matters from SolrMarc’s perspective. I tried cloning the pattern map with a different name and using separate ones for each field in case that made a difference (but it didn’t). Any idea what might be going on here?

 

If you don’t have time to delve into this right now, I can do some deeper digging of my own – just checking to see if you have a quick idea!

 

- Demian

ia.mrc

Haschart, Robert J (rh9ec)

unread,
Sep 11, 2019, 3:53:18 PM9/11/19
to solrma...@googlegroups.com
Demian,

That was a tough one.   In SolrMarc, broadly speaking, an index specification, consists of an extractor piece, a mapping piece (consisting of 0 or more maps), and a collector piece.
But the DirectMultiValueExtractor (which is used for specifications like this:  600x:610x:611x:630x:648x:650a:650x:651x:655x)  also allows maps to be specified that can affect individual sub-field values before they are joined together.   Unless you specify the "join" operator, the way a map operates should be the same if it is considered to be a part of the DirectMultiValueExtractor or if it is considered as a part of the mapping piece that is applied to the output of the extractor.   However it seems that you have found a case where it actually does matter.  

It seems that when expressed like like this:

topic_facet = 600x:610x:611x:630x:648x:650a:650x:651x:655x, (pattern_map.aliens)

The map is considered to be a part of the DirectMultiValueExtractor

However when it is expressed as a custom method call (which is a MultiValueExtractorMethodCall )

topic_facet = custom, getAllSubfields(600:610:611:630:650:653:656, " "), (pattern_map.aliens)

The map is considered a part of the mapping piece

This can be verified with the following specification:

topic_facet = 600x:610x:611x:630x:648x:650a:650x:651x:655x, join,  (pattern_map.alien)

which has a "join" operator which in this case does nothing except force the map to be considered a part of the mapping piece.
In this case the  "Zarklovian electric eels" disappear from the generated output.  

I'll have to investigate why the keepRaw operator works differently depending on where in the processing chain it is considered to be, which may take some time.

So instead I'll offer a different way of expressing the map that seems to operate as you desire.


pattern_map.aliens.pattern_0 = ^Alien criminal(.*)=>Noncitizen criminal$1
pattern_map.aliens.pattern_1 = ^Alien detention centers(.*)=>Detention centers$1
pattern_map.aliens.pattern_2 = ^Alien labor(.*)=>Noncitizen labor$1
pattern_map.aliens.pattern_3 = ^Alien property(.*)=>Foreign-owned property$1
pattern_map.aliens.pattern_4 = ^Aliens(.*)=>Noncitizens$1
pattern_map.aliens.pattern_5 = ^Children of alien laborers(.*)=>Children of noncitizen laborers$1
pattern_map.aliens.pattern_6 = ^Illegal alien children(.*)=>Children of undocumented immigrants$1
pattern_map.aliens.pattern_7 = ^Illegal aliens(.*)=>Undocumented immigrants$1
pattern_map.aliens.pattern_8 = matchAll

In this case the "matchAll" probably should be called "applyAll" or "chainPatterns" or at least something which describes what it is doing.

Essentially all  of the patterns are applied to the field values that are passed to the map, with each subsequent pattern being applied to what was produced by the previous pattern if that pattern made any changes.   It seems that "matchAll" implies "keepRaw" but as implemented it avoids whatever the issue is that you encountered.

-Bob

Save the Zarklovian electric eels!





Sent: Tuesday, September 10, 2019 7:58 AM
To: solrma...@googlegroups.com <solrma...@googlegroups.com>

Demian Katz

unread,
Sep 11, 2019, 4:22:26 PM9/11/19
to solrma...@googlegroups.com

Thanks, Bob, that’s extremely helpful! I appreciate the in-depth explanation and the suggested workaround. This should definitely keep me in business for now.

 

- Demian

President, Zarklovian Electric Eel Preservation Society

Reply all
Reply to author
Forward
0 new messages