Ardanis Posted December 7, 2021 Share Posted December 7, 2021 In that case, the next best bet is to try including separator string into expression outside of the match variable: SPRINT quote ~\("[^"]+"\)~ // anything encased within ""s that isn't a " itselfSPRINT tilda "\(~[^~]+~\)" // anything encased within ~~s that isn't a ~ itself SPRINT separator ~ , off , , hgj. ~ // without initial comma ~[%separator%]?[ ]*\(\(%quote%\|%tilda%\|[^,]\)+\)~ Quote Link to comment
Mike1072 Posted December 8, 2021 Share Posted December 8, 2021 16 hours ago, Luke said: @Mike1072 Unless I'm missing something, your solution does not work (in particular, it does not skip ", " inside ""s or ~~s)...? My apologies, I didn't test it out. There's definitely something wonky in my first regexp, but I see another problem that applies to both. The cases you mention might be caught by the first capture group ([^,]+) which would absorb everything up to the embedded comma, and then ruin the future matches. It might be possible to resolve that just by reordering the capture groups in the alternation and placing it after the other two. I'll update the post with a hopefully-working version. And I'm not testing it either just yet. Quote Link to comment
Luke Posted July 8, 2022 Author Share Posted July 8, 2022 (edited) @Ardanis, @Mike1072 Sorry for the necro, but On 12/7/2021 at 6:56 PM, Ardanis said: In that case, the next best bet is to try including separator string into expression outside of the match variable: This mostly works, the only issue is that it does not take into account the order of characters. That is to say, if my separator is "ab", then also "ba" is valid... On 12/8/2021 at 10:46 AM, Mike1072 said: The cases you mention might be caught by the first capture group ([^,]+) which would absorb everything up to the embedded comma, and then ruin the future matches. It might be possible to resolve that just by reordering the capture groups in the alternation and placing it after the other two. Unless I'm missing something, nothing changes when reordering the capture groups... Do you have any other idea...? I mean, it is certainly possible to build a parser (function) that scans the input string character-by-character (byte-by-byte) and remembers when quotation is open... Spoiler DEFINE_DIMORPHIC_FUNCTION "SPLIT_EXPR" STR_VAR "expr" = "" "pattern" = "" RET_ARRAY "array" BEGIN // Initialize ACTION_CLEAR_ARRAY "array" OUTER_SET "count" = 0 OUTER_SET "expr_length" = STRING_LENGTH "%expr%" OUTER_TEXT_SPRINT "temp" "" OUTER_SET "tilda_found" = 0 OUTER_SET "quote_found" = 0 OUTER_PATCH "%pattern%" BEGIN READ_ASCII 0x0 "1st_char" ELSE "" (1) READ_ASCII 0x1 "remaining_chars" ELSE "" (BUFFER_LENGTH - 1) END // Main OUTER_PATCH "%expr%" BEGIN WHILE ("%expr_length%") BEGIN READ_ASCII 0x0 "current_char" (1) PATCH_MATCH "%current_char%" WITH "~" WHEN !("%quote_found%") BEGIN SET "tilda_found" += 1 PATCH_IF ("%temp%" STRING_COMPARE_CASE "") BEGIN TEXT_SPRINT "temp" "%temp%%current_char%" END ELSE BEGIN TEXT_SPRINT "temp" "%current_char%" END DELETE_BYTES 0x0 0x1 SET "expr_length" -= 1 END ~"~ WHEN !("%tilda_found%") BEGIN SET "quote_found" += 1 PATCH_IF ("%temp%" STRING_COMPARE_CASE "") BEGIN TEXT_SPRINT "temp" "%temp%%current_char%" END ELSE BEGIN TEXT_SPRINT "temp" "%current_char%" END DELETE_BYTES 0x0 0x1 SET "expr_length" -= 1 END "%1st_char%" BEGIN READ_ASCII 0x1 "following_chars" ELSE "" (STRING_LENGTH "%pattern%" - 1) PATCH_IF ("%remaining_chars%" STRING_EQUAL "%following_chars%") BEGIN PATCH_IF ("%quote_found%" == 0 OR "%quote_found%" == 2) AND ("%tilda_found%" == 0 OR "%tilda_found%" == 2 OR "%tilda_found%" == 10) BEGIN DEFINE_ASSOCIATIVE_ARRAY "array" BEGIN "%count%" => "%temp%" END SET "count" += 1 DELETE_BYTES 0x0 STRING_LENGTH "%pattern%" SET "expr_length" -= STRING_LENGTH "%pattern%" // Reset vars SET "tilda_found" = 0 SET "quote_found" = 0 TEXT_SPRINT "temp" "" END ELSE BEGIN PATCH_IF ("%temp%" STRING_COMPARE_CASE "") BEGIN TEXT_SPRINT "temp" "%temp%%current_char%" END ELSE BEGIN TEXT_SPRINT "temp" "%current_char%" END DELETE_BYTES 0x0 0x1 SET "expr_length" -= 1 END END ELSE BEGIN PATCH_IF ("%temp%" STRING_COMPARE_CASE "") BEGIN TEXT_SPRINT "temp" "%temp%%current_char%" END ELSE BEGIN TEXT_SPRINT "temp" "%current_char%" END DELETE_BYTES 0x0 0x1 SET "expr_length" -= 1 END END DEFAULT PATCH_IF ("%temp%" STRING_COMPARE_CASE "") BEGIN TEXT_SPRINT "temp" "%temp%%current_char%" END ELSE BEGIN TEXT_SPRINT "temp" "%current_char%" END DELETE_BYTES 0x0 0x1 SET "expr_length" -= 1 END END END // If ~%pattern%~ is not found... ACTION_IF ("%temp%" STRING_COMPARE_CASE "") BEGIN OUTER_SET "count" = "%count%" ? "%count%" + 1 : "%count%" ACTION_DEFINE_ASSOCIATIVE_ARRAY "array" BEGIN "%count%" => "%temp%" END END ELSE BEGIN FAIL "SPLIT_EXPR: ~temp~ is empty (~expr~=~%expr%~, ~pattern~=~%pattern%~). Wut???" END END However, in case of multiple separators (i.e., if multiple separators are valid), how should I use it? Guess I should check them one by one, i.e.: Spoiler // Suppose separators "ab", "cfb89" and ">><<8677vdf2" are all valid OUTER_TEXT_SPRINT "mystring" "" // your test string OUTER_SET "found" = 0 // boolean ACTION_FOR_EACH "separator" IN "ab" "cfb89" ">><<8677vdf2" BEGIN ACTION_IF !("%found%") BEGIN LAF "SPLIT_EXPR" STR_VAR "expr" = "%mystring%" "pattern" = "%separator%" RET_ARRAY "array" END LAF ~ARRAY_LENGTH~ STR_VAR "array" RET "length" END ACTION_IF ("%length%" >= 2) BEGIN OUTER_SET "found" = 1 END END END // where function ~ARRAY_LENGTH~ is DEFINE_DIMORPHIC_FUNCTION ~ARRAY_LENGTH~ STR_VAR "array" = "" // array name RET "length" BEGIN // Initialize OUTER_SET "length" = 0 // Main ACTION_PHP_EACH "%array%" AS "key" => "value" BEGIN OUTER_SET "length" += 1 END END You see, it is a bit inelegant, but it should work... Having said that, I still did not understand whether it is possible to use a regexp or not ... Edited July 9, 2022 by Luke Quote Link to comment
Recommended Posts
Join the conversation
You are posting as a guest. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.