GraphQL and 200 Not OK

The GraphQL specification for errors when resolving a field state the following.

If an error is thrown while resolving a field, it should be treated as though the field returned null, and an error must be added to the errors list in the response.

As an example imagine we the query below and that for whatever reason we couldn’t resolve the reviews.

query {
    product(id: "42") {
        name
        price {
            amount
            currency
        }
        reviews {
            title
            content
        }
    }
}

This would lead to a response that looks like the JSON below.

{
    "data": {
        "product": {
            "name": "Olympic Barbell",
            "price": {
                "amount": "199.95",
                "currency": NZD
            },
            "reviews": null
        }
    },
    "errors": [
        {
            "message": "Could not resolve reviews for product:42",
            "path": [ "product", "reviews" ],
            "locations": [ { "line": 8, "column": 9  }]
        }
    ]
}

What the specification doesn’t state is anything about the HTTP response code (the whole GraphQL specification is pretty light on HTTP details intentionally), most GraphQL frameworks will use the response code 200 OK for the above response.

When reviewing this through the above example it does feel roughly correct, we’re still getting most of the data we requested and hopefully the client can react to the null field and the error accordingly. But what about the more extreme example?

Imagine the same query but our database is currently a pile of molten metal during an air conditioner failure. Our response would look like:

{
    "data": {
        "product": null
    },
    "errors": [
        {
            "message": "Could not resolve product:42",
            "path": [ "product" ],
            "locations": [ { "line": 2, "column": 5  }]
        }
    ]
}

Now we don’t have any useful information, but still our response code will be 200 OK.

It’s common when operating a web service / API to have metrics / alerts based on the outgoing HTTP status code. However in thes above examples we could be failing completely but from that metrics point of view all is well.

Interally at Pushpay we call this scenario 200 Not OK and have some strategies to deal with it and in later posts I’ll start to dig into these.

For now please just be aware that when moving from a REST based API to a GraphQL one you’ll need to think differently about how you monitor for and measure errors.