NoteWorthy Software, Inc.
2014-04-21 12:32 AM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News: Licensed NWC2 users can upgrade their community membership [more]
 
   Home   Help Login Register  
Pages: [1]   Go Down
  Print  
Author Topic: What is the character encoding of .nwctxt files?  (Read 9696 times)
jdorocak
NWC2 User
« on: 2009-07-17 08:19 PM »

Hi Folks,

I'm trying to make a Python based nwctxt2xml.py & xml2nwctxt.py set of programs for NWC2. I have been reasonably successful so far, but in doing more extensive round trip testing, I see problems with characters like '' and ''. So what is the character encoding for .nwctxt files? Is it UTF-8, UTF-16, iso8859-1, iso8859-2, ..., ? Thanks for the help in advance.

All the best.

Joe :)
Logged

All the best,
<Image Link> Joe
Rick G.
Virtuoso
« Reply #1 on: 2009-07-27 02:31 PM »

Hi Joe,

I don't think that the concept of "character encoding" applies to nwctxt files.

Everything outside of quoted text seems to be restricted to letters, numbers and the punctuation set: |:,= (vertical colon comma equals) which are used as separators.

Inside quoted text:
  • CR, LF, TAB are represented as: \r \n \t
  • Single quote, double quote, vertical and backslash appear as: \' \" \| \\

How quoted text appears is governed by the charmap of the font selected.

I do not believe that it is possible to query a font as to its encoding, e.g., how would one query NWC2STDA.ttf as to what character is the Treble Clef or BOXMARKS2.ttf as to what character is the Breve?

This probably isn't much help but, I don't like to see a question about an important topic go a week without some response.
Logged

Registered user since 1996
BobPetty (S. Glos -UK)
Dormant Member
« Reply #2 on: 2009-08-03 04:40 PM »

Hi Everyone
I think the basis of the question may have been to ascertain if "extended characters" - that is those over ASCII 127 would be correctly translated.  As (even) DOS supported characters up to ASCII 255 (or was it 254?), even though I have not checked it for myself, I would suggest that NWCTXT files will support characters up to ASCII254.  A quick and dirty check on the actual encoding is to try the British sign.  In the DOS character set this appears at character 156, in the Windows set it is at 163.

I hope I have understood the nature of the question correctly, and not thrown in a red herring!

Bob
Logged
jdorocak
NWC2 User
« Reply #3 on: 2009-08-08 12:36 AM »

Dear Rick and Bob,

Thanks for the help. Character encoding indeed has to do with how all the characters including the sign are coded in bits. Not all characters that we want to encode these days can be held in a single byte (0-255). Here's a link to an excellent wikipedia article on the subject. http://en.wikipedia.org/wiki/Character_encoding .

My purpose in asking the question was to defeat a bug I was getting while testing a Python based nwctxt2xml.py & xml2nwctxt.py set of programs for NWC2.

I recently converted from Python 2.5 to Python 3 which uses Unicode UTF-8 enncoding. I have run a dozen "round trip" tests on the following nwctxt files, and all tested OK. Some of these tested bad using Python 2.6. My, possibly mistaken, conclusion is that nwctxt files use UTF-8 encoding. 

Here is a round trip test:
  1. nwctxt2xml.py converts mySong.nwctxt to mySong.xml
  2. xml2nwctxt.py converts mySong.xml to mySong-out.nwctxt
  3. If fileCompare mySong.nwctxt, mySong-out.nwctxt = "files are identical"  -> OK

Here is the output log of my dozen tests:
Code:   [Select · Download]
test: 001 time: Fri Aug 07 13:48:51 2009 file: ambeaut.nwctxt       -> OK 
test: 002 time: Fri Aug 07 13:48:54 2009 file: bosgbaa.nwctxt       -> OK   
test: 003 time: Fri Aug 07 13:48:55 2009 file: hillhbr.nwctxt       -> OK     
test: 004 time: Fri Aug 07 13:48:56 2009 file: hussaut.nwctxt       -> OK     
test: 005 time: Fri Aug 07 13:49:39 2009 file: JesusLM.nwctxt       -> OK   
test: 006 time: Fri Aug 07 13:49:39 2009 file: mozhall.nwctxt       -> OK   
test: 007 time: Fri Aug 07 14:01:02 2009 file: ocanada.nwctxt       -> OK   
test: 008 time: Fri Aug 07 14:01:02 2009 file: wamr01-intrt.nwctxt  -> OK   
test: 009 time: Fri Aug 07 14:01:22 2009 file: wamr02s-dies.nwctxt  -> OK 
test: 010 time: Fri Aug 07 14:01:43 2009 file: wamr07-lacri.nwctxt  -> OK   
test: 011 time: Fri Aug 07 14:01:49 2009 file: wamr10-sanct.nwctxt  -> OK   
test: 012 time: Fri Aug 07 14:01:55 2009 file: wamr12-agnus.nwctxt  -> OK   

I am going to post my code, open source, so folks can use it under another thread. If I made a mistake in my UTF-8assumption, they'll find it and hopefully let me know so I can fix the code. Thanks for the help.

Joe
Logged

All the best,
<Image Link> Joe
Rick G.
Virtuoso
« Reply #4 on: 2009-08-08 01:36 AM »

My, possibly mistaken, conclusion is that nwctxt files use UTF-8 encoding. 
I believe you are mistaken. UTF-8 is a Unicode encoding and NWC2 does not support Unicode.
Where this matters is within quoted text, such as:
Quote
!NoteWorthyComposerClip(2.0,Single)
|Text|Text:"\|----\|"|Font:StaffBold|Pos:0
!NoteWorthyComposerClip-End
The "\|----\|" part, will display as |----|
with most fonts. If WebDings or Marlett is the font, it will look quite different.

Logged

Registered user since 1996
NoteWorthy Online
Admin


« Reply #5 on: 2009-09-02 06:37 PM »

Currently, all text in NWC is ANSI text. The exact character encoding is dictated by the creator's machine. Although this should not create any problems for a user who creates and later works with the same files, a user could experience difficulty when sharing files with another user that has a different default system character encoding.

This area is the basis for one of the criteria that we have been using for future development.
Logged
Pages: [1]   Go Up
  Print  
 
Jump to:  

Powered by SMF 1.1.19 | SMF © 2013, Simple Machines
Page created in 0.065 seconds with 18 queries.

Copyright © 2014 Noteworthy Software, Inc.