Sign in to follow this  

Encoding

This topic is 2567 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Does anyone know what encoding is used for the strings of characters in the property fields of a file under the summary tab?

Basically, if you right click a file, go properties. Then select summary tabs. When you write something to one of those fields and click apply. What is that data encoded as? Unicode, ASCII, UTF32, UTF7, UTF8. Will save me much of time if someone knows or knows of a faster way of finding out other than trial and error.

Thank you.

Share this post


Link to post
Share on other sites
idk if I understand your question. If you take a file, any file really. If you right click it and then select properties and go to the summary tab. You will see fields that you can write too.

My question was what encoding is done to that string when I press apply. I am using windows so the file system is NTFS. Does that help?

Share this post


Link to post
Share on other sites
Quote:
Original post by RisingForce
idk if I understand your question. If you take a file, any file really. If you right click it and then select properties and go to the summary tab. You will see fields that you can write too.

My question was what encoding is done to that string when I press apply. I am using windows so the file system is NTFS. Does that help?

The answer to rip-off's question (and indeed mine, and probably everyone else who has no idea what you were talking about) is Windows. Not everyone here even runs Windows, and those of us that do probably mostly didn't even guess what you meant.
That tab does not appear for just any file. It only appears for word docs and mp3s? and maybe a couple of other things. Most files do not have it.

Sorry I don't have an asnwer to your question at the moment.

Share this post


Link to post
Share on other sites
They are stored in an alternate data stream.

Most modern disc formats (NTFS, ext2/3/4, UDF, etc) allow you to have multiple data streams beyond the default primary channel you store your data in.

Windows has a few NTFS data streams that it uses for this information. You can see a little bit about how to access them on this msdn example.

The two streams that Windows uses to fill in those tabs are the SummaryInforation and DocumentSummaryInformation streams, which have been reverse engineered a few times; Google for them.

Share this post


Link to post
Share on other sites

Quote:

The two streams that Windows uses to fill in those tabs are the SummaryInforation and DocumentSummaryInformation streams, which have been reverse engineered a few times; Google for them.


I am working on a program that lets the user read and write to those fields in a file, using the IPropertySetStorage and IPropertyStorage interfaces. I can write to any of those fields in ANY file whatever I want through code. I can also read any file. Here is the problem....

I am using Windows Forms, keep that in mind.
Here is where things get screwy....

Example:
// write title to textbox
String^ title = gcnew String( propRead_title.pwszVal );
textBox1->Text = title;

These interfaces use PROPVARIANT structures to hold the pointer to the string of characters of a field. It is an LPWSTR( wchar_t ).
propRead_title.pwszVal is the LPWSTR

Through debugging it has been determined that after my read operation propRead_title.pwszVal contains the correct value. IT DOES!

Basically, there are two cases that differ for an unknown reason to me.

Case 1 is the case where I have already written to the files field previously.

Case 1 works!, case one works beautiful. If I have done a write first through code it does not matter! From that time on, even closing the program and reopening it and selecting the file again and just reading, the textBox1 contains the correct string of the field. You can even go through windows and edit the field normally and my program can then read that value with no problems WITHOUT writing.

Case 2 is the case where I try to read the initial data in that field before the file was "altered" with a write.

Case 2 fails...., Here is the weird part. First I thought it was just junk. I got these weird bold vertical bars and such written to textBox1. However, when I examined it further with the debugger I realized obviously that propRead_title.pwszVal does contain the correct value, can't stress that enough but that the VALUE of the pointer which in the debugger was shown as the same weird character string that showed up in textBox1. This was the value of both the propRead_title.pwszVal pointer and the String^ title....

This is why I though that writing to the fields first changes the field's encoding somehow. I believe I am doing the conversion correct from wchar_t* to String^ because otherwise Case 1 would have failed too! right?

Share this post


Link to post
Share on other sites
Ok here are some images to show you the debugger.
lowercase "a" is the value in the title field( for the image that shows that the dereference has correct value )

http://img593.imageshack.us/img593/571/image1su.png
http://img839.imageshack.us/img839/6240/image2gq.png
http://img508.imageshack.us/img508/6195/image3lg.png


So....haha... any ideas?

I am kind of stuck on this..

Share this post


Link to post
Share on other sites

This topic is 2567 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this