Saturday, June 28, 2008

Okapi Framework... TXT to XLIFF

I came across Okapi Framework, when I was writing scripts to extract and convert plain text to XLIFF format. So far, I have only tried using the text extraction and text rewriting tools, and find that the application is quite versatile. It allows extracting of translatable text in various formats and form XLIFF files. Besides, it has a pretty easy-to-use interface which allow users to create and test the filters using regular expressions... These filters can be used to extract translatable text from some funny formats that you have never seen..



The goal of the Okapi Framework is to allow tools developers and localizers to build new localization processes or enhance existing ones to best meet their needs, while preserving a level of compatibility and interoperability. It also provides them with a way to share (and re-use) components across different solutions. The project uses and promotes open standards, where they exist. For the aspects where open standards are not defined yet, the framework offers its own. The ultimate goal is to adopt the industry standards when they are defined and useable.

Personally, I like Okapi because it uses a common file format, XLIFF, for localization. From my work experience, I often receive materials in different formats from various departments and the result of this caused the Translation Database messy. By using one common format, it makes life easier for the translators, reviewer and linguists, as they only need to learn to handle one type of file and for the engineers to convert the file back to their native formats is just a button click away..

At the point of this post is written, Okapi is still under development, and the support for the XLIFF specification is still quite limited. It only support some tags like , and etc. Furthermore, the support for some standard file types like *.doc is also not in place yet... I think it will be a great tool if these are all implemented..... I am looking forward to it..

2 comments:

Alex Ho said...

--- Someone asked me if I could convert XLS with Okapi and asked is this something that TRADOS can't do? Lastly, the comparison between the two tools in doing it. ---

-------------------------------
The answer to it is yes and no.

Okapi does not support XLS files out of the box. Hence, you need to convert the file from XLS to CVS format first. Then, extract the text and create the XLIFF format files. Personally, I think it is a good idea to use XLIFF format for localization. It promotes one format for all... Easy to manage... it also makes the work processes more effective and efficient.

Trados can convert the files from XLS to TTX for the localization processes. However, using Trados on XLS files usually gives you lots of headache, especially when there are tons of entries in the XLS files.

Conversion from XLS to TTX format, the process can take a long time to complete. The longest time I took, in my experience, was around 1 hour and 30 mins or more to convert 1 file. Splitting up the files into small ones is a better option. However, the analysis for one file or multiple small files can take up to 4 hours to complete. That's not the worst! My worst scenario was it took more than 24 hours to complete!! AND guess what happened if the computer crashed??? *shake head* So usually for huge XLS files, my advice is to use SDLX to work on and analyze.

Alex Ho said...

---- Someone asked me in my previous blog.... As for SDLX, I understand it can deal with XLS better, but do you know of any feature to turn xls into xliff (provided you have the format information for the xliff)? ----


Personally, I have not try converting any complex XLS files to XLIFF format. Furthermore, Okapi can't convert them directly... You have to convert them to CVS format before they can be converted to XLIFF formats... Using this method, formats and formulae, however, will be lost during the conversion to CVS format....

If you are using SDLX to perform translation, it is better to use SDLX to work on the files directly.. instead of XLIFF format... Alternatively, you can try using Heartsome CAT tools to do.. they can convert XLS to XLIFF.... but you need to buy the tools...