Thursday, 12 December 2013

invalid byte sequence in UTF-8

Often we come across this #FiveWordTechHorror "invalid byte sequence in UTF-8" while working on Ruby projects or projects built on Ruby Framework (Ex : Rails).

This error occurs when we try to decode any string which has foreign characters (not plain English) which are submitted either through Javascript or say form using Rails and are not properly encoded .

So, the question is how to properly encode & decode ?

1) When submitted through Javascript :-
Use encodeURI or encodeURI() or encodeURIComponent() method to encode.
Ex : Suppose name contains unicode characters.
       encoded_name = encodeURI(name)

On ruby side decode it using CGI::unescape method.
Ex: Suppose encoded_name in params is received as params[:encoded_name]
      name = CGI::unescape(param[:encoded_name])

2) When submitted through Rails form :-
i) Use CGI::escape method to encode it and CGI::unescape method to decode it.
   Ex : encoded_name = CGI::escape(name)
          name = CGI::unescape(param[:encoded_name])


ii) Use unpack method of String class and then pack it using pack method of Array class.
     Ex : unpacked_name = name.unpack('U*')
            #This above statement return array of integers which are Base10 equivalent of  each characters hex code.
            name = unpacked_name.pack('U*')