Sunday 7 October 2012

Probable significant bug in ColdFusion 10's (and Railo's) RESTful web services

G'day:
Whilst waiting in the virtual queue to book my Glastonbury ticket this morning, I decided to review the new bugs raised for ColdFusion 10, over night. There's nothing better than sitting in front of my PC at 9am on a Sunday pressing refresh-refresh-refresh on six different browsers trying to buy a ticket for a festival that hasn't even announced its line-up yet, and doesn't even happen for another eight months. Still: after 90min of hammering my "reload" buttons, I had parted with my £50 deposit, and I'm all sorted for Glasto next year.

Anyway, this is not a post about mud and drugs and a looming sense of "am I too old for this?", it's about the safer territory of bugs in ColdFusion. And Railo.



Ray raised this bug overnight, the gist of which is:

When I run this via REST and ask for XML, I get an error because a character is not escaped:

falls for his fianc�'s

It's probably a high ascii character in the database. But it should be handled by ColdFusion natively when serializing to XML.

My initial suspicion was that Ray had slightly the wrong end of the stick here, because "é" has no special significance as far as XML goes, so it doesn't need to be escaped. But what the problem was likely to be was that ColdFusion was not encoding the response properly. The Adobe ColdFusion dev team have an almost ubiquitous problem in that whenever they add something to CFML that involves file / text handling, they always forget about character encoding. I have no idea how many times I've had to raise this with them, and I don't recall a single time they've got it right first time without prompting. I think it's fairly true to say that everything that involves text processing needs to consider the text's encoding too! It's an easy rule!

Anyway, my suspicion was borne out by a small amount of experimentation.

Here's a little RESTful application (this is my first foray into the RESTful stuff in CF, so forgive me if it's not exactly model code).

Application.cfc just sets things up so REST stuff will work:

// Application.cfc
component {

    this.name               = "restEncodingTest";
    this.restSettings = {
        cfclocation         = "services",
        skipCfcWithError    = true
    };
    this.webAdminPassword   = "123456";    // needed for Railo (which SUX)
    
    public void function onApplicationStart(){
        restInitApplication(getDirectoryFromPath(getCurrentTemplatePath()) & "services", "services");
    }
    
}

I am underwhelmed with Railo that it requires me to hard-code my admin pwd in my code for it to work. I will be raising this with them.

Person.cfc is my REST services stuff. It's got two methods - one which return a record in XML, one in JSON (just to demonstrate it's not an XML problem).

// Person.cfc

component rest=true restPath="person"  {

    pageencoding "UTF-8";

    remote query function getAsXml(required numeric id restargsource="path") httpmethod="get" restpath="{id}" produces="text/xml" {
        return getData(id);
    }

    remote query function getAsJson(required numeric id restargsource="path") httpmethod="get" restpath="{id}" produces="application/json" {
        return getData(id);
    }
    
    private query function getData(required numeric id){
        return queryNew(
            "id,lastName,firstName",
            "Integer,Varchar,Varchar",
            [[#arguments.id#, "Chabal", "Sébastien"]]
        );
    }

}

Sébastien Chabal, btw, is a monster who plays rugby for France. I believe he was also in Quest for Fire ;-). I picked him for this example solely because he's got a diacritic mark in his name. And it gave me an "in" for the Quest for Fire joke.

Anyway...

Note that the pageencoding line in there is not supposed to be telling the web services to deal with the response in a UTF-8 sort of way, it's simply so the file compiles properly with Chabal's name in there (CF would mess it up, otherwise... I've raised a bug for this: 3342141).

getPerson.cfm calls both these methods, very carefully making sure to tell the requests to use UTF-8, as well as outputting the response in UTF-8 as well. This is to make sure that it's the returned data that is bung, not the way I'm requesting or outputting it.

<!---getPerson.cfm --->
<cfset id = randRange(1,1000)>
<cfhttp method="get" url="http://#CGI.http_host#/rest/services/person/#id#" result="xmlResponse" charset="UTF-8">
    <cfhttpparam type="header" name="accept" value="text/xml">
</cfhttp>
<cfhttp method="get" url="http://#CGI.http_host#/rest/services/person/#id#" result="jsonResponse" charset="UTF-8">
    <cfhttpparam type="header" name="accept" value="application/json">
</cfhttp>


<!doctype html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <title></title>
</head>
<body>
    <cfoutput>
    <h2>XML</h2>
    Raw: #xmlResponse.filecontent#<br />
    Escaped: #htmlEditFormat(xmlResponse.filecontent)#<br />
    Parsed:
    <cfdump var="#xmlParse(xmlResponse.filecontent)#">
    
    <h2>JSON</h2>
    Raw: #jsonResponse.filecontent#<br />
    Deserialised:
    <cfdump var="#deserializeJson(jsonResponse.filecontent)#"> 
    </cfoutput>
</body>
</html>

And as a control, I call the getAsXml() method via the normal URL method, to demonstrate that it's a REST problem rather than anything else:

<!---standardRemoteCall.cfm --->
<cfscript>
id = randRange(1,1000);
thisUrlPath = CGI.script_name;
thisUrlDir = thisUrlPath.listDeleteAt(thisUrlPath.listLen("/"), "/");
targetUrl = thisUrlDir &  "/services/Person.cfc";
</cfscript>

<cfhttp method="get" url="http://#CGI.http_host##targetUrl#?method=getAsXml&id=#id#" result="response">

<!doctype html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <title></title>
</head>
<body>
    <cfoutput>#htmlEditFormat(response.filecontent)#</cfoutput>
</body>
</html>

The output for the first test is thus:

XML

Raw:732ChabalS�bastien
Escaped: <QUERY ID="1"><COLUMNNAMES><COLUMN NAME="id"></COLUMN><COLUMN NAME="lastName"></COLUMN><COLUMN NAME="firstName"></COLUMN></COLUMNNAMES><ROWS><ROW><COLUMN TYPE="NUMBER">732</COLUMN><COLUMN TYPE="STRING">Chabal</COLUMN><COLUMN TYPE="STRING">S�bastien</COLUMN></ROW></ROWS></QUERY>
Parsed:
xml document

JSON

Raw: {"COLUMNS":["ID","LASTNAME","FIRSTNAME"],"DATA":[[732,"Chabal","S�bastien"]]}
Deserialised:
struct
COLUMNS
DATA
array
1
array
1732
2Chabal
3S�bastien

(I've squished the dumps a bit slightly, but the corruption is borne out in both).

The output of the control test was this:

<wddxPacket version='1.0'><header/><data><recordset rowCount='1' fieldNames='id,lastName,firstName' type='coldfusion.sql.QueryTable'><field name='id'><number>673.0</number></field><field name='lastName'><string>Chabal</string></field><field name='firstName'><string>Sébastien</string></field></recordset></data></wddxPacket>


IE: it worked fine.

Running the same code on Railo gave the same results (bug raised as RAILO-2096, Google Groups thread here).

My first instinct is that both Ray and I were not doing something correctly, and there was a way to coerce the data into being encoded properly. I hasten to add that this should not be our job, the engine should do it for us, but still: if it was possible to make it work properly, then that would be something. Try as I might, I could not find any way of doing this. And I did google about for about an hour or so (I had a lot of time on my hands whilst waiting to book my Glasto ticket). So if there is a way of doing it, Adobe and Railo have hidden it very well.

Now if yer in USA, you might be thinking "this is no big deal", but I assure you it's bloody serious. It basically renders RESTful web services unusable in any non-English-speaking country (which is to say: almost all of them), or in any situation in which you're dealing with data from any language other than English. I mean I work in England - which is about as linguistically jingoistic as the States is - but our sites a) deal with geographical locations from around the world, so intrinsically requires proper character encoding support, b) are in 11 (soon to be 16) different languages. All of which - other than English - require the use of diacritic marks on the character data. So we could not use CF10's RESTful web services to serve our data.

I think I will raise a different bug from Ray's one regarding this (done: 3342142). And - Adobe - you better crack on with fixing it!

Update:

This has been fixed in CF11, Railo 4.2 and Lucee 4.5. Nice work, everyone.

Having said all that... if anyone knows how to make this work properly, please do let me know! It'd be great if this was a case of PEBKAC. But I suspect it isn't.

Now... I was gonna be feeding back on that "Who's using CF10?" survey today, and I might still do that. But I need to eat first. Can't have me wasting away...

--
Adam

PS: oh, Ray, btw... unless the text you're quoting is pleasingly liberal in its notion of the gender mix in a pending marriage, the word you're after is fiancée, not fiancé. IE: it's likely that in the context you're using it, you're after the feminine word, not the masculine one.