Archived

This topic is now archived and is closed to further replies.

goodfella412

decompiling: is it possible?

Recommended Posts

if i have an executable, is it possible to ''decompile'' it so i end up with the source code? if it is possible how do i do it? can it be done with msvs6.0?? any help is greatly appreciated.

Share this post


Link to post
Share on other sites
quote:
Original post by goodfella412
if i have an executable, is it possible to ''decompile'' it so i end up with the source code? if it is possible how do i do it? can it be done with msvs6.0?? any help is greatly appreciated.


You won''t get back the original high level source code. Decompilers will go from executable to assembly.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
there are apps out there that will create C++ source from an executable (first asm, then C++), but all the symbol names will by garbage made up by the decompiler. If you were really clever you could turn it into something usable, but it''d probably be easier coding from scratch

Share this post


Link to post
Share on other sites
there are apps out there that will create C++ source from an executable (first asm, then C++), but all the symbol names will by garbage made up by the decompiler. If you were really clever you could turn it into something usable, but it'd probably be easier coding from scratch


that was me up there, oops... (as if you couldn't tell)

[edited by - grizwald on November 5, 2002 9:38:53 PM]

Share this post


Link to post
Share on other sites
It depends on what language the executable was written in. And what compiler was used. And what compiler settings were used when the executable was built. And the version number of the compiler. Other than that, decompiling is easy .

AFAIK, Java is able to be decompiled pretty easily, due to the fact that is basically an interpreted language. C++, on the other hand, is almost impossible. Even if you manage to decompile the .exe into C++, none of the function/variable/type names will make any sense, as that stuff is not stored in the executable.

-Mike

Share this post


Link to post
Share on other sites
Unless you are decompiling something like Java you won''t get back the original source, however I have used/reviewed several advanced decompilers that can do a pretty good job at reverse engineering code into a mostly C format (it inserts the original assembly at the places it doesn''t understand, and uses numeric variable names and address). Of course even this type of intelligent decompiler is still not very good at decoding OO programs, as the number of pointers used is much higher.


This post qualifies for 100 per cent Canadian Content under the rulings of the Canadian Internet Commission and the Federal Ministry of Communication. There are four Americans who worked on this post, but they all have landed immigrant status, and have signed CRTC affidavits swearing that they drink beer, eat back bacon, drive snowmobiles and wear toques. Any resemblance between the Content of this post and the content of any American post is purely coincidental and not the intention of the poster or the various Internet Agencies of the Canadian Government who have screened these posts prior to bulk erasing in accordance with the policies of the Federal Internet Identity Board.

Share this post


Link to post
Share on other sites
Although I''ve never had any use for a decompiler, this thread left me wondering... Are there any decompilers out there that make use of the debug info stored in MSVC debug executables, or the RTTI information (which could at least be used to restore the names of all the types, IMHO)? Or is such a thing completely impossible?

Share this post


Link to post
Share on other sites
quote:
Original post by Michalson
Unless you are decompiling something like Java you won''t get back the original source

You won''t get back the original source from Java bytecode either - there''s no direct mapping. You have more "metainformation" than you do in a C++ binary, but not enough to get the original source.
quote:
Original post by Kippesoep
or the RTTI information (which could at least be used to restore the names of all the types, IMHO)?

There''s no guarantee that RTTI provides the names of types as they are given in the source code.

Share this post


Link to post
Share on other sites
Actually, there is. How else would typeid (...).name () be getting the name? It won''t take into account typedefs or macros, obviously, but that''s hardly relevant.
I''m not sure how RTT information is stored on a binary level, but it would make sense if even an entire inheritance diagram can be reconstructed from it (given the way dynamic_cast works).

Share this post


Link to post
Share on other sites
quote:
Original post by Kippesoep
Actually, there is. How else would typeid (...).name () be getting the name?

I''m sorry, but this was so funny to me. You state the first part authoritatively, and then resort to a "How else..." argument to substantiate your previous statement.

I don''t know, for the record. And lots of people don''t use RTTI because they view it as slow.

Share this post


Link to post
Share on other sites
quote:
Original post by Kippesoep
Actually, there is.

No there isn''t. From section 18.5.1 of the C++ Standard (my emphasis):

quote:
The class type_info describes information generated by the implementation. Objects of this class effectively store a pointer to a name for the type, and an encoded value suitable for comparing two types for equality or collating order. The names, encoding rule, and collating sequences for types are all unspecified and may differ between programs.

quote:

How else would typeid (...).name () be getting the name?

I''m telling you it doesn''t get *the name*, it gets *a name*, which might be different from the name as it appears in the source.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
To answer the RTTI/debug info question:

I don''t kow if there are decompilers that will use them, but there is at least one disassembler that will. IDA Pro [ http://www.datarescue.com ]

It will recognize most formats for both of them (borland, msvc, gcc and lots of stuff I''ve never heard about)

Since it also recognizes the common library functions (for most compiler systems) statically linked into the code, I''d say it''s the best way to get a grip of some "foreign" compiled code. Way better than all decompilers I have seen.

(no, I don''t work there )

Share this post


Link to post
Share on other sites
quote:
Original post by SabreMan
I''m telling you it doesn''t get *the name*, it gets *a name*, which might be different from the name as it appears in the source.


As you may recall: I specifically mentioned MSVC. There, it gets the name.

Also, even though the standard may allow it, how many compilers will actually show you something that is different from the name used in the source code? It would be pretty much useless to the programmer to have something like "abcjda" when the actual class is "CPenguin". Decoration might be used, of course (and in MSVC, it is: the stored symbol is the decorated name, accessed through type_info::raw_name, type_info::name is slower because it undecorates), but even that is human-readable.

The C++ standard is, BTW, completely irrelevant when it comes to decompiling. The only standard that matters is the layout of the executable.

Share this post


Link to post
Share on other sites
quote:
Original post by Oluseyi
I''m sorry, but this was so funny to me. You state the first part authoritatively, and then resort to a "How else..." argument to substantiate your previous statement.



Hmmm, the "how else" is indeed a poor choice of words. Probably due to the fact that English isn''t my first language.

The point still stands, though: the type_info::name function does (at least in MSVC, most likely in most other compilers as well) return the name used in the source code.

Share this post


Link to post
Share on other sites
quote:
Original post by Kippesoep
The C++ standard is, BTW, completely irrelevant when it comes to decompiling.

That''s true. Particularly as it''s not possible to reconstitute the original source from the binary, which renders the RTTI issue largely irrelevant anyway.

Share this post


Link to post
Share on other sites
It could help in making the reconstituted source more readable. That is, assuming that the person who programmed it originally chose some sensible names. But then again, I doubt anyone would choose CChicken as a class for a texture manager and derive CPenguin from CTelevision to wrap around joystick input.

Still, I find reading any such "rebuilt" code almost impossible in many cases, just like it can take (me) ages to actually understand what''s going on from a standard disassembly.

Share this post


Link to post
Share on other sites
quote:
Original post by SabreMan
You won't get back the original source from Java bytecode either - there's no direct mapping. You have more "metainformation" than you do in a C++ binary, but not enough to get the original source.


ever done it? there's little difference between the original source code and a decompiled version. origial type information, identifier names, everything.
i've done it to programs of mine for the heck of it,and to others for educational purposes. I was pretty suprised. it really makes obfuscation necessary for commercial products

heck, I just opened a .class file in notepad and I could more or less see all the identifiers in plaintext.

[edited by - solson16 on November 6, 2002 10:25:27 PM]

Share this post


Link to post
Share on other sites
quote:
Original post by prh99
[quote]Original post by goodfella412
if i have an executable, is it possible to ''decompile'' it so i end up with the source code? if it is possible how do i do it? can it be done with msvs6.0?? any help is greatly appreciated.


You won''t get back the original high level source code. Decompilers will go from executable to assembly.



Actually, you''re completely wrong. Alot of disassembler will decompile an exe to the sources of your choice. The only problem is that the code usually has alot of gotos since decompiler have problems recreating functions.



[Cyberdrek | the last true sorcerer | Spirit Mage - mutedfaith.com]

Share this post


Link to post
Share on other sites