dev

Numeric Data Types In MongoDb (Azure DocumentDb Comparison)

Outline

In the previous post I wrote about possible data loss while working with numbers in Azure DocumentDb. Because I was quite surprised by the behavior which started to make sense after I read the specification exhaustively I wanted to test it in MongoDb too. Because I was never issued while using this db.

MongoDb Data Types

Compare to Azure DocumentDb, MongoDb uses BSON serialization format that defines different data types. Especially, it defines multiple numeric types – int (int32), long (int64), double and decimal (decimal128). This means that the problem issued in DocumentDb should not occur. So let’s try it.

Conversion Test

For the test, I used MongoDb (v 3.4.3) in Docker and RoboMongo for a simple visualization. Next, I created three documents with property FacebookId and stored the value 10208580988747499 as a string, long and double. As you can see in Figure 1, the value was stored correctly as a long. When storing as a double (the value 10208580988747499.0), it was stored as 1.02085809887475e+016.0 that is the same value that stored DocumentDb in my previous test.

mongo.PNG
Figure 1: MongoDb long number storage

Conclusion

Because the issue described in previous post didn’t let me sleep and I was surprised that this never happen to me with different NoSql Db, I wanted to test the similar situation in MongoDb too. Because MongoDb has more rich set of data types, it does not behave as DocumentDb in this case which is caused by use of different data types.

Advertisements
dev

Azure DocumentDb and Long Data Type

Outline

I have been using Azure Table Storage and MongoDb for a few years for various purposes and projects based on their advantages. Then I started to use DocumentDb instead of MongoDb because it is the “official” NoSql (document based) database in Azure and it can be used just as a service.

Basically, I’m storing some geological, climate and statistics data from my IoT devices, web scrapers and public APIs. So a quite different data with various structures and data types. And until now, I had not issue, it was working as I expected, until…

The Problem

Note: The used FacebookId is a sample, I don’t know if it really exists, it is just used as an example.

Before a few weeks I was asked to update some project using DocumentDb for storing some user data in it. One of the property to be stored in it was a user’s FacebookId. The value returned by the Facebook SDK in C# is of type long. Ok, so lets store it – update the code, update tests. Nothing complex, it works and tests are passing. Done.

Not at all! Suddenly I found there are duplicates in the test database – same users with same FacebookId. What? There are tests for it, what is wrong? After debugging I found that there must be problem with  Azure DocumentDB .NET SDK. A property with value 10208580988747499 was stored as 10208580988747500. I found a GitHub issue too. It should be repaired in new version. But I already have newer version. Ok. This SDK uses Newton.JSON library for JSON (de)serialization. It can be the problem, but it wasn’t. A simple test of serialization returns expected value. So where is the problem? When I tried to stored values directly in Azure Portal – first as a string and secondly as a number, I got following result (see Figure 1 and Figure 2):

as-string
Figure 1: FacebookId stored as string
as-number
Figure 2: FacebookId stored as number

DocumentDb supports data types Null, Whitespace, Object, Value, Array, String, Boolean and Number in IEEE754 double precision definition. And it is the problem. The value is rounded. So a user with FacebookId 10208580988747499 was never stored and that’s why it was creating duplicate records (and FacebookId used in tests was lower and was not rounded and they were passing. So, If you are using long datatype, use long values in tests!).

1
Figure 3: Supported data types (source: https://azure.microsoft.com/cs-cz/blog/working-with-dates-in-azure-documentdb-4/)

Conversion Test

For a test you can use a converter on this page on the bottom. Number 10208580988747499 is converted to 1.02085809887475e16 and back, number 1.02085809887475e16 is converted to 10208580988747500. Which is exactly the same value that DocumentDb converted and stored.

Solution (for Azure DocumentDB .NET SDK and C#)

So how to solve this problem? Because mentioned Azure DocumentDB .NET SDK uses Newton.JSON internally, the solution is quite straightforward – use a custom JsonConverter  attribute that will serialize long value to string (that will be stored) and deserialize it back to long.

public class LongConverter : JsonConverter
{
    public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
    {
        var val = JToken.FromObject(value.ToString());
        val.WriteTo(writer);
    }

    public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
    {
        return reader.TokenType != JsonToken.String ? 0 : long.Parse(reader.Value.ToString());
    }

    public override bool CanConvert(Type objectType)
    {
        return true;
    }
}

And the usage:

public class UserProfile : Microsoft.Azure.Documents.Resource
{
    [JsonConverter(typeof(LongConverter))]
    public long FacebookId { get; set; }
    ...
}

Now, the FacebookId property will be stored in Db as a string without any rounding and converted back to long while getting data from Db.

Conclusion

DocumentDb is a good NoSql Db that is very easy to use. But even you face up some issue. So what I recommend – read some specs, especially about types before you start to use some new Db. Because data is what you store and data has a type. It can save you time while inspecting strange behavior.

Update 15.04.2017: Numeric Data Types In MongoDb (Azure DocumentDb Comparison)