Escape hell: handling string values

Hi there,
I am ingesting a large corpus of data which contains a field of type string which contains ASCII text. The text contains a lot of variety and includes all sort of commas, quotes, double quotes, back/forwarded slashes and so on.
I get a lot of failures when inserting because it is unclear even how to escape a simple double quote (I use the slash but I get it back too) situation.
I am thinking there could be two ways to solve this:

  • we enable to include BASE64 encoding for string values and a type (ASCII/UTF)
  • we enable a sort of raw insertion mode via data api instead of query api

I think the first option is more simple to implement.
@james.williams

Hi, thank you for reporting this.

We’re looking at this internally and aiming to align on the standard SQL & NoSQL ways of handling strings (i.e. backslashes escaping characters aren’t stored in the database.)

Based on what you’ve said in your post only double quotes and backslashes that don’t intend to escape characters should be an issue for you right now. Have you had a look at using our concept API? Here’s a link: Concept API | Vaticle.

1 Like

Hello James,
that is correct for now my most common use cases is having include url strings (basklashes, ampercent etc) and text strings with single/double quotes.
Can you elaborate more on the Concept API?
I don’t see an entity method that would allow me to insert a thing programmatically?

There are some examples of inserting data using the concept API in the python client tests - see test_get_attributes_by_value and below:

If you had some example pieces of data that were causing problems upon attempting insertion this would help greatly.