java - why CSVWriter and CSVReader uses different default escape characters? -


here code snippet using:

    stringwriter writer = new stringwriter();     csvwriter csvwriter = new csvwriter(writer);     string[] originalvalues = new string[2];     originalvalues[0] = "t\\est";     originalvalues[1] = "t\\est";     system.out.println("original values: " + originalvalues[0] +"," + originalvalues[1]);     csvwriter.writenext(originalvalues);      csvwriter.close();     csvreader csvreader = new csvreader(new stringreader(writer.tostring()));     string[] resultingvalues = csvreader.readnext();     system.out.println("resulting values: " + resultingvalues[0] +"," + resultingvalues[1]); 

the output of above snippet is:

original values: t\est,t\est resulting values: test,test 

back slash ('\') character gone after conversion!!!

by basic analysis figured happening because csvreader using slash ('\') default escape character csvwriter using double quote ('"') default escape character.

what reason behind inconsistency in default behavior?

to fix above problem managed find following 2 solutions:

1) overwriting default escape character of csvreader null character:

 csvparser csvparser = new csvparserbuilder().withescapechar('\0').build();  csvreader csvreader = new csvreaderbuilder(new stringreader(writer.tostring())).withcsvparser(csvparser).build(); 

2) using rfc4180parser strictly follows rfc4180 standards:

rfc4180parser rfc4180parser = new rfc4180parserbuilder().build(); csvreader csvreader = new csvreaderbuilder(new stringreader(writer.tostring())).withcsvparser(rfc4180parser).build(); 

can using of above approach cause side effects on other characters?

also why rfc4180parser not default parser? maintaining backward compatibility rfc4180parser got introduced in later versions?

i think looking @ 2 types of escaping here.

1) escaping double quote in csv:

test,"monitor 24"", samsung" test,"monitor 24\", samsung"  // linux style 

since have comma in second field, field has surrounded double quotes. double quotes inside field have escaped, "" or \".

2) \ general escape character, example \t (tab) or \n (newline).

and since 'e' not in list of characters escape, \ ignored , removed.

so if write "t\\\\est" file contain "t\\est" (escaped backslash) , show "t\est" after reading. or writing "\\test" show tab , "est" after reading.

to keep \ after reading, indeed have tell parser somehow ignore sequences, current behaviour doesn't inconsistent me - both treating \ escape character.


Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -