Add | (pipe) as a potential CSV cell separator. #1348

Closed
amfine-soft-drault wants to merge 0 commits from master into master
amfine-soft-drault commented 2018-11-09 14:37:28 +00:00 (Migrated from github.com)

Hi
A number of financial file format standards make use of the pipe as a CSV cell separator
it would be nice if you could accept this feature
I used 0 as weight thinking it was the lowest priority (i hope i didn't misread the code)

let me known if you need more info/some changes to accept the pull request
Thanks
David

Hi A number of financial file format standards make use of the pipe as a CSV cell separator it would be nice if you could accept this feature I used 0 as weight thinking it was the lowest priority (i hope i didn't misread the code) let me known if you need more info/some changes to accept the pull request Thanks David
amfine-soft-drault commented 2019-10-17 08:15:10 +00:00 (Migrated from github.com)

Hi guys
any chance this patch/feature would be approved ?

Hi guys any chance this patch/feature would be approved ?
SheetJSDev commented 2019-10-17 08:29:25 +00:00 (Migrated from github.com)

Even though Excel doesn't normally handle pipe separators, this PR is mostly ok. It's better to set the weights to 4/3/2/1.

Related questions:

  1. What standards use the pipe character? (pick a relatively important standard we can note in the README)

  2. Should ASCII 0x01, used as a delimiter in FIX, also be supported? (your opinion)

  3. Should ASCII 0x1F (field sep) and ASCII 0x1E (row sep) also be supported? (your opinion)

Even though Excel doesn't normally handle pipe separators, this PR is mostly ok. It's better to set the weights to 4/3/2/1. Related questions: 1) What standards use the pipe character? (pick a relatively important standard we can note in the README) 2) Should ASCII 0x01, used as a delimiter in FIX, also be supported? (your opinion) 3) Should ASCII 0x1F (field sep) and ASCII 0x1E (row sep) also be supported? (your opinion)
amfine-soft-drault commented 2019-10-17 09:27:13 +00:00 (Migrated from github.com)

Thanks for your reply
My answers :

  • i just commit the 4/3/2/1 priority weights
  • examples of standards using the pipe character : EPT (European PRIIPS Template), EMT (European MIFID Template) see https://findatex.eu/
    please note that the pipe is not specified by those but is the de facto cell separator that the industry is using to exchange those files
  • as for 0x01, 0x1F as separators, i don't think i really have an opinion to offer : unless i'm mistaken those are invisible chars, i've never seen a use case in which they were used/needed (even tab-separated CSV are, in my experience, really not that common)
Thanks for your reply My answers : - i just commit the 4/3/2/1 priority weights - examples of standards using the pipe character : EPT (European PRIIPS Template), EMT (European MIFID Template) see https://findatex.eu/ please note that the pipe is not specified by those but is the de facto cell separator that the industry is using to exchange those files - as for 0x01, 0x1F as separators, i don't think i really have an opinion to offer : unless i'm mistaken those are invisible chars, i've never seen a use case in which they were used/needed (even tab-separated CSV are, in my experience, really not that common)
amfine-soft-drault commented 2019-10-22 09:01:48 +00:00 (Migrated from github.com)

Hello again
i found a surprising behaviour - i don't whether it's related to my patch or not
so here it comes may be it will be meaningful to you

when parsing (valid) csv files with pipe as separator, sometimes XSLX does not detect the pipe as the separator even though it is by far the most frequent of the separators
if i replace all pipes with ; it detects the ; as separator
if i replace all pips with , it detects the , as separator
i finally found what (seems to) causes the issue : it happens when the whole file does NOT contains any , or ; !!
if this is the case, it seems the detection falls back to an incorrect separator
i tried a (dirty) workaround in my own code using XLSX :
i added an extra column (replaced the first end-of-line with |, pipe-comma)
then hurray the parsing actually uses the pipe as separator
meaning

col 1|col 2|col 3
val 1|val 2|val 3

will fail to detect the pipe whereas

col 1|col 2|col 3|fa,ke
val 1|val 2|val 3

would!
if any of the cell value on any line of the file did contain a , or a ; the issue would not happen either
(this is what made my analysis harder to narrow down what was wrong)

i tried to track the issue in the code but lost myself :(

Hello again i found a surprising behaviour - i don't whether it's related to my patch or not so here it comes may be it will be meaningful to you when parsing (valid) csv files with pipe as separator, *sometimes* XSLX does not detect the pipe as the separator even though it is by far the most frequent of the separators if i replace all pipes with ; it detects the ; as separator if i replace all pips with , it detects the , as separator i finally found what (seems to) causes the issue : it happens when the whole file does NOT contains any , or ; !! if this is the case, it seems the detection falls back to an incorrect separator i tried a (dirty) workaround in my own code using XLSX : i added an extra column (replaced the first end-of-line with |, pipe-comma) then hurray the parsing actually uses the pipe as separator meaning ``` col 1|col 2|col 3 val 1|val 2|val 3 ``` will fail to detect the pipe whereas ``` col 1|col 2|col 3|fa,ke val 1|val 2|val 3 ``` would! if any of the cell value on any line of the file did contain a , or a ; the issue would not happen either (this is what made my analysis harder to narrow down what was wrong) i tried to track the issue in the code but lost myself :(
SheetJSDev commented 2021-09-30 08:48:48 +00:00 (Migrated from github.com)
0a0244cdfa92674b93504189961f0a97a5a06177

Pull request closed

Sign in to join this conversation.
No description provided.