Discussion:
Reading binay data
(too old to reply)
Paul
2008-04-18 08:16:43 UTC
Permalink
Hi,

I have some binary data with a 2 byte word described as follows:

LSB 08 09 10 11 12 13 14 00 MSB
LSB 00 01 02 03 04 05 06 07 MSB

I know this is supposed to be an unsigned integer value, but when I try
reading it from the file as a 2 byte int e.g. short or unsigned short I do
not get the correct value (I know what the rangee of values are).

It was suggested that each byte may have been written back to front! Is
there any way I can swap the order of the bits to try to get what looks like
a reasonable value? (I think the data may have been written using some old
HP proprietary format.)

Any suggestions most welcome
Paul
Vaclav Cechura
2008-04-18 11:48:57 UTC
Permalink
Post by Paul
Is
there any way I can swap the order of the bits to try to get what looks like
a reasonable value? (I think the data may have been written using some old
HP proprietary format.)
Any suggestions most welcome
Paul
I am not sure if you want to reverse the bits in the whole
2 byte number or reverse bits byte by byte. The fist two
functions do both using std::bitset. The third function does
the same as the first one, using shifts only.

Vaclav

#include <bitset>

const unsigned n = 16;
const unsigned nh = 8;
const unsigned nq = 4;

unsigned short reverse_bits(unsigned short num)
// returns a 16 bit number in reverse bit order
// 0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15 ->
// 15|14|13|12|11|10|9|8|7|6|5|4|3|2|1|0
{
std::bitset<n> bnum(num);

for (size_t i = 0; i < nh; i++)
{
bool bit = bnum[i];
bnum[i] = bnum[n-1-i];
bnum[n-1-i] = bit;
} // end for

return bnum.to_ulong();
}

unsigned short reverse_bits_in_bytes(unsigned short num)
// returns a 16 bit number, each byte in reversed bit order
// 0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15 ->
// 8|7|6|5|4|3|2|1|0|15|14|13|12|11|10|9|
{
std::bitset<n> bnum(num);

for (size_t i = 0; i < nq; i++)
{
bool bit = bnum[i];
bnum[i] = bnum[nh-1-i];
bnum[nh-1-i] = bit;
} // end for

for (size_t i = 0; i < nq; i++)
{
bool bit = bnum[i+nh];
bnum[i+nh] = bnum[n-1-i];
bnum[n-1-i] = bit;
} // end for


return bnum.to_ulong();
}

unsigned short reverse_bits2(unsigned short num)
// returns a 16 bit number in reverse bit order (does not use bitset)
// 0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15 ->
// 15|14|13|12|11|10|9|8|7|6|5|4|3|2|1|0
{
unsigned short ret = 0;
for (size_t i = 0; i < n-1; i++, ret <<= 1, num >>=1 )
{
ret |= num & 0x0001;
} // end for

ret |= num & 0x0001;

return ret;
}
Vaclav Cechura
2008-04-18 12:06:25 UTC
Permalink
Post by Vaclav Cechura
I am not sure if you want to reverse the bits in the whole
2 byte number or reverse bits byte by byte. The fist two
functions do both using std::bitset. The third function does
the same as the first one, using shifts only.
And here's the byte by byte function using shifts only:

unsigned short reverse_bits_in_bytes2(unsigned short num)
// returns a 16 bit number, each byte in reversed bit order (does not use bitset)
// 0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15 -> 8|7|6|5|4|3|2|1|0|15|14|13|12|11|10|9|

{
unsigned short ret = 0;
for (size_t i = 0; i < nh-1; i++, ret <<= 1, num >>=1 )
{
ret |= num & 0x0001 | num & 0x0100;
} // end for

ret |= num & 0x0001 | num & 0x0100;

return ret;
}
Vaclav Cechurq
2008-04-18 12:09:02 UTC
Permalink
Post by Vaclav Cechura
for (size_t i = 0; i < nh-1; i++, ret <<= 1, num >>=1 )
And nh is:

const unsigned nh = 8;

Vaclav
dhoke
2008-04-18 13:22:26 UTC
Permalink
Post by Paul
Hi,
LSB 08 09 10 11 12 13 14 00 MSB
LSB 00 01 02 03 04 05 06 07 MSB
I know this is supposed to be an unsigned integer value, but when I try
reading it from the file as a 2 byte int e.g. short or unsigned short I do
not get the correct value (I know what the rangee of values are).
It was suggested that each byte may have been written back to front! Is
there any way I can swap the order of the bits to try to get what looks
like a reasonable value? (I think the data may have been written using
some old HP proprietary format.)
You probably don't need to swap the bits, just the bytes...

winsock contains htons, htonl, ntohs, and ntohl which can do such byte
swapping on systems where network byte order differs from host byte order.

(host-to-network, network-to-host, where network order was [I think] big
endian order, and in the case of intel x86 family host order is little
endian. On intel systems (home to borland language tools I know about) they
can serve the purpose of converting values from big-to-little endian or
other way 'round.)

Of course you could convert it yourself too...
#include <stdio.h>

union swapdata
{
int ival ;
__int64 i64val ;
char bytes[sizeof(__int64)] ;
} uswap ;

template< typename vartype >
vartype swapbytes(vartype p_val)
{
union swapdata
{
vartype dataval ;
char bytes[sizeof(vartype)] ;
} uswap ;

int i ;
uswap.dataval = p_val ;
for( i = 0 ; i < sizeof(vartype)/2 ; i++)
{
char tmpbyte ;
tmpbyte = uswap.bytes[i] ;
uswap.bytes[i] = uswap.bytes[sizeof(uswap.dataval)-i-1] ;
uswap.bytes[sizeof(uswap.dataval)-i-1] = tmpbyte ;
}
return uswap.dataval ;
}

int main(int argc, char *argv[])
{

int yourint = 1 ;
int i ;
uswap.ival = yourint ;
for( i = 0 ; i < sizeof(yourint)/2 ; i++)
{
char tmpbyte ;
tmpbyte = uswap.bytes[i] ;
uswap.bytes[i] = uswap.bytes[sizeof(yourint)-i-1] ;
uswap.bytes[sizeof(yourint)-i-1] = tmpbyte ;
}
printf("orig %08x, swapped %08x\n", 1, uswap.ival) ;


printf("orig %08x, swapped %08x\n", yourint, swapbytes( yourint ) ) ;
yourint = 255 ;
printf("orig %08x, swapped %08x\n", yourint, swapbytes( yourint ) ) ;

return 0 ;
}
Remy Lebeau (TeamB)
2008-04-18 16:30:48 UTC
Permalink
Those looks like 8-byte integers, not 2-byte words.
Post by Paul
I know this is supposed to be an unsigned integer value
More like an unsigned __int64.
Post by Paul
but when I try reading it from the file as a 2 byte int e.g. short
or unsigned short I do not get the correct value (I know what
the rangee of values are).
You are not reading enough bytes.
Post by Paul
It was suggested that each byte may have been written back to front!
Little endian versus big endian, yes.
Post by Paul
Is there any way I can swap the order of the bits to try
to get what looks like a reasonable value?
You don't swap bits, only bytes.

For 2-byte and 4-byte values, you can use the hton...() and ntoh...()
functions. But for 8-byte values, you will have to declare an 8-byte array,
copy the integer to it, move the array elements around as needed, then copy
the array back to the integer.


Gambit
Remy Lebeau (TeamB)
2008-04-18 16:32:18 UTC
Permalink
Post by Remy Lebeau (TeamB)
Those looks like 8-byte integers, not 2-byte words.
Wait, was that diagram supposed to be displaying bits or bytes? The way you
wrote it looks like bytes. If it is displaying bits instead, then I can see
now how it would be representing a 2-byte word.


Gambit
Paul Dowd
2008-05-01 10:46:25 UTC
Permalink
Guys,

Many thanks for all the help - I'm so busy I'll get to it soon & get back -
look like the solution is there somewhere.

Yes the diagram is bits & it is 2 byte because I can read other ascii stuff
further down the file where expected.

Paul
Post by Remy Lebeau (TeamB)
Post by Remy Lebeau (TeamB)
Those looks like 8-byte integers, not 2-byte words.
Wait, was that diagram supposed to be displaying bits or bytes? The way
you wrote it looks like bytes. If it is displaying bits instead, then I
can see now how it would be representing a 2-byte word.
Gambit
Paul Dowd
2008-05-06 01:01:17 UTC
Permalink
Assuming the bytes in the 2 byte int are writtern swapped I thought I could
use this but it does not seem to produce the right results:

char var[3];
Byte b1, b2;
read(infile, &b1, 1);
read(infile, &b2, 1);
var[0] = b2;
var[1] = b1;
var[2] = '\0';
short val = (atoi)var;

Is there something I have misunderstood here?

Paul
Post by Remy Lebeau (TeamB)
Post by Remy Lebeau (TeamB)
Those looks like 8-byte integers, not 2-byte words.
Wait, was that diagram supposed to be displaying bits or bytes? The way
you wrote it looks like bytes. If it is displaying bits instead, then I
can see now how it would be representing a 2-byte word.
Gambit
Chris Uzdavinis (TeamB)
2008-05-06 12:42:37 UTC
Permalink
Post by Paul Dowd
Assuming the bytes in the 2 byte int are writtern swapped I thought I could
Why are they swapped? Does this have to do with network-byte-ordering?
If so, there are standard ways to accomodate for that without doing
the kind of swaps you're doing. (See htons, htonl, ntohs, and ntohl
for functions converting network to host and host to network (short
and long versions.))
Post by Paul Dowd
char var[3];
Byte b1, b2;
read(infile, &b1, 1);
read(infile, &b2, 1);
var[0] = b2;
var[1] = b1;
var[2] = '\0';
In terms of speed, two "read" calls for 1 byte each is twice as bad
as 1 read of 2 bytes.
Post by Paul Dowd
short val = (atoi)var;
This can't be what you really are trying, right? IT looks like you're
trying to cast an array of char into a function, and then assign the
function to a short integer.

If this does compile, what result are you getting, and what are the
inputs you're reading?
--
Chris (TeamB);
Paul Dowd
2008-05-06 14:17:21 UTC
Permalink
It is nothing to do with networks. I have some archived data I am trying to
read from disk. The format decsription says that these particular fields are
2-byte ints, but I do not get sensible values. Here is a sample output from
the first few fields in the file:

file=-130
day=-130
min=-130
sec=-130
line=CC80-7M
sp=-130
datum=-130

Fields 1-4 and 6-7 are all supposed to be 2-byte ints. Field 5 is an 8 byte
string and is the only one that has been correcly read. The ints were read
from the file using
short s;
read(infile, &s, 2);

The ints appear to be all the same but should not be - and certainly
shouldn't be negative anyway. It was suggeted to me that the system that
wrote the data may have used a convention whereby the bytes were written in
reverse. But when I try to swap them I still don't get sensible results.

I have attached the first 427 bytes of the file.

Paul
Post by Chris Uzdavinis (TeamB)
Post by Paul Dowd
Assuming the bytes in the 2 byte int are writtern swapped I thought I could
Why are they swapped? Does this have to do with network-byte-ordering?
If so, there are standard ways to accomodate for that without doing
the kind of swaps you're doing. (See htons, htonl, ntohs, and ntohl
for functions converting network to host and host to network (short
and long versions.))
Post by Paul Dowd
char var[3];
Byte b1, b2;
read(infile, &b1, 1);
read(infile, &b2, 1);
var[0] = b2;
var[1] = b1;
var[2] = '\0';
In terms of speed, two "read" calls for 1 byte each is twice as bad
as 1 read of 2 bytes.
Post by Paul Dowd
short val = (atoi)var;
This can't be what you really are trying, right? IT looks like you're
trying to cast an array of char into a function, and then assign the
function to a short integer.
If this does compile, what result are you getting, and what are the
inputs you're reading?
--
Chris (TeamB);
begin 666 sample.bin
M(! @ZB F("Q#***@P+3=-(/]@(!'W#EZS87E=> T****@T*S" ^(" 7. 85]P82
M!6%VJMT@%" @("#_YR P`7P@(" @,RPR[) @( $@(" @(" @(" @(" @(" @
M(" @(" @(" @(" @____________$#,E_Q%X%_\@("#_______________^[
MBJN*(%RP(" 0(" ",R=$(" @(" @(" @(" @(" @(" @(" @(" @(" @(" @
M(" @(" @(" @(!<X!A4@(" @(" @(!=3"<X@(" @(" @(" @(" @(" @(" @
M(&=32PYUTF .(" @(" @("#W#E[R87E=GO<)E/=A=^)Q(" @(" @("#W#E?/
M87EK)R @(" @(" @(" @(" @(" @(" @(" @(" @("!44DY%(" @(#0@(" @
M(" @(" @(" @(" @(" @(" @(" @(" @(" @(" @(" @(" @(" @(" @(" @
M(" @(" @(" @(" @(" @(" @(" @(" @(" @(" @(" @(" @(" @(" @(" @
6("4E(" @!&$Y(/___R (@FD@")1D# ``
`
end
Vaclav Cechura
2008-05-06 15:07:15 UTC
Permalink
Post by Paul Dowd
It is nothing to do with networks. I have some archived data I am trying to
read from disk. The format decsription says that these particular fields are
2-byte ints, but I do not get sensible values.
Do you use the O_BINARY flag in call to open when openning
the file? Something like this:

infile = open(filename, O_BINARY | O_RDONLY);

The file should be opened in binary mode.
Post by Paul Dowd
Here is a sample output from
file=-130
day=-130
min=-130
sec=-130
line=CC80-7M
sp=-130
datum=-130
The attached files beginning written in hexadecimal values is:

20 10 20 EA 20 26 20 2C 43 43 38 30 2D 37 4D...

the sequence 43 43 38 30 2D 37 4D is the string CC80-7M

the first eight bytes (20 10 20 EA 20 26 20 2C) if read as two
byte integers give two possible sequences of numbers (depending
on the byte order):
a) 8208, 8426, 8230, 8236
or
b) 4128, -5600 (or 59936 if unsigned), 9760, 11296

Just an idea:
All the four pairs of bytes start with the same hexadecimal
value 20, if you take it as an offset (means that you substract
8192 from the values in sequence a) or just ignore it for now
you get a modified sequence:
a) 16, 234, 38, 44

May this be file?, day?, min?, sec?

Vaclav
Alan Bellingham
2008-05-06 15:18:06 UTC
Permalink
Post by Vaclav Cechura
Do you use the O_BINARY flag in call to open when openning
infile = open(filename, O_BINARY | O_RDONLY);
The file should be opened in binary mode.
A very good point. It doesn't change the first few values, but it does
have an effect further down the file. Though I'm a little suspicious
that the \r\n sequence occurs twice, and no raw newlines happen.

So it may well be that the original file was written in text mode(!)

Alan Bellingham
--
Team Browns
ACCU Conference 2009: to be announced
Vaclav Cechura
2008-05-06 15:34:25 UTC
Permalink
Post by Vaclav Cechura
All the four pairs of bytes start with the same hexadecimal
value 20, if you take it as an offset (means that you substract
8192 from the values in sequence a) or just ignore it for now
a) 16, 234, 38, 44
May this be file?, day?, min?, sec?
There are so many bytes with value 20 (space character) in your
sample data, even in the middle of possibly binary values that
it makes me think if somebody (when writing the file) did not
preinitialize the buffer with spaces (byte 20) and then wrote only the data needed (some of them as one-byte numbers at two-
byte offset). Just thoughts...

Vaclav

Alan Bellingham
2008-05-06 15:09:04 UTC
Permalink
Post by Paul Dowd
It is nothing to do with networks. I have some archived data I am trying to
read from disk. The format decsription says that these particular fields are
2-byte ints, but I do not get sensible values. Here is a sample output from
On what machine did the program that created this file actually run?
Because, if it wasn't an Intel processor (or x86 compatible), then your
rejection of the problem may be premature. A network is a way of
transferring binary data for one machine to another. A binary file on
disc is *also* a way of transferring binary data for one machine to
another.
Post by Paul Dowd
file=-130
day=-130
min=-130
sec=-130
line=CC80-7M
sp=-130
datum=-130
Fields 1-4 and 6-7 are all supposed to be 2-byte ints. Field 5 is an 8 byte
string and is the only one that has been correcly read. The ints were read
from the file using
short s;
read(infile, &s, 2);
Really, really odd - because the first 4 fields are either

4128 -5600 9760 11296

or

8208 8426 8230 8236

depending on whether the bytes are to be reversed.
Post by Paul Dowd
The ints appear to be all the same but should not be - and certainly
shouldn't be negative anyway.
They certainly aren't in the file. My code is as follows:

#include <iostream>
#include <fstream>
#include <Winsock2.h>

int main()
{
std::ifstream is;
is.open("c:\\sample.bin", std::ios_base::in | std::ios::binary);
short i = 0;
while (is.read(reinterpret_cast<char*>(&i), sizeof(i)))
{
i = htons(i); // Comment out to remove byte reversal
std::cout << i << " ";
}
}

Alan Bellingham
--
Team Browns
ACCU Conference 2009: to be announced
Vaclav Cechura
2008-05-06 14:35:34 UTC
Permalink
Post by Paul Dowd
char var[3];
Byte b1, b2;
read(infile, &b1, 1);
read(infile, &b2, 1);
var[0] = b2;
var[1] = b1;
var[2] = '\0';
short val = (atoi)var;
I think the last line should have been:

short val = atoi(var);

But this won't work either. Note that atoi expects a string
holding a human readable representation of a number - a
sequence of it's digits (if the number is 14152, then the 'var'
array must contain characters {'1', '4', '1', '5', '2','\0'},
while you are reading two bytes that are a binary
representation of the number (the number is either b2*256+b1 or
b1*256+b2 (depends on the byte order in the file)), so b1 and
b2 won't be the decimal digits of the number and you cannot use
atoi to convers them.

Try this instead:

Byte b1, b2;
read(infile, &b1, 1);
read(infile, &b2, 1);
short val = (b2<<8)+b1; // b2<<8 is equal to b2*256
//or val =(b1<<8)+b2;

Vaclav
Loading...