Tips on FileLoader code

Started by
8 comments, last by Tispe 9 years, 12 months ago

Hi

I wanted to show you a simple file loader class that can read both Asynchronously and Synchronously and have you let me know if its "good and clean".

Also, could the buffer be public or should I have some method that returns a pointer to the vector or something for when I want to process it later form outside?


class FileLoader
{
public:
	FileLoader(std::string Filename, bool Asynchronous);
	~FileLoader();

	DWORD GetStatus();

private:
	void Tick();
	void ReadAsynch();
	void ReadSynch();

	HANDLE file;
	DWORD Size;
	DWORD Status;
	DWORD NumberOfBytesTransfered;
	OVERLAPPED OverlappedInfo;
	std::vector<BYTE> buffer;
};

FileLoader::FileLoader(std::string Filename, bool Asynchronous)
{
	Size = 0;
	Status = 0;
	NumberOfBytesTransfered = 0;
	ZeroMemory(&OverlappedInfo, sizeof(OverlappedInfo));

	if (Asynchronous)
	{
		file = CreateFileA(Filename.c_str(), GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN | FILE_FLAG_OVERLAPPED, NULL);
	}
	else {
		file = CreateFileA(Filename.c_str(), GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, NULL);
	}

	if (file == INVALID_HANDLE_VALUE)
	{
		Status = FileLoaderErrorInvalidHandle;
	}
	else {
		Size = GetFileSize(file, NULL);
		if (Size == INVALID_FILE_SIZE)
		{
			Status = FileLoaderErrorInvalidSize;
			CloseHandle(file);
		}
		else {
			buffer.resize(Size);

			if (Asynchronous)
			{
				ReadAsynch();
			}
			else {
				ReadSynch();
			}

		}
	}
}

FileLoader::~FileLoader()
{
	CloseHandle(file);
}

DWORD FileLoader::GetStatus()
{
	Tick();
	return Status;
}

void FileLoader::ReadAsynch()
{
	OverlappedInfo.hEvent = CreateEvent(NULL, true, false, NULL);

	if (ReadFile(file, &buffer[0], Size, NULL, &OverlappedInfo))
	{
		// Operation has completed immediately.
		Status = FileLoaderComplete;
	}
	else {
		if (GetLastError() != ERROR_IO_PENDING)
		{
			// Some other error occurred while reading the file.
			CloseHandle(file);
			buffer.clear();
			Status = FileLoaderErrorX;
		}
		else {
			//Operation has been queued and will complete in the future. 
			Status = FileLoaderInTransfer;
		}

	}
}

void FileLoader::ReadSynch()
{
	if (ReadFile(file, &buffer[0], Size, &NumberOfBytesTransfered, NULL))
	{
		// Operation has completed.
		if (Size == NumberOfBytesTransfered){
			Status = FileLoaderComplete;
		}
		else {
			Status = FileLoaderErrorReadSize;
		}
	}
	else {
		// Some other error occurred while reading the file.
		CloseHandle(file);
		buffer.clear();
		Status = FileLoaderErrorX;
	}
}

void FileLoader::Tick()
{
	if (Status == FileLoaderInTransfer)
	{
		if (HasOverlappedIoCompleted(&OverlappedInfo))
		{
			if (GetOverlappedResult(file, &OverlappedInfo, &NumberOfBytesTransfered, false) != 0)
			{
				// Operation has completed.
				CloseHandle(file);
				if (Size == NumberOfBytesTransfered){
					Status = FileLoaderComplete;
				} else {
					Status = FileLoaderErrorReadSize;
				}

			}
		}
	}
}

//FileLoader defs
#define	FileLoaderUnknown		0
#define	FileLoaderComplete		1
#define	FileLoaderInTransfer		2
#define	FileLoaderErrorInvalidHandle	10
#define	FileLoaderErrorInvalidSize	11
#define	FileLoaderErrorX		12
#define	FileLoaderErrorReadSize		13
#define	FileLoaderError4		14
#define	FileLoaderError5		15
Advertisement

//FileLoader defs
#define FileLoaderUnknown 0
#define FileLoaderComplete 1
#define FileLoaderInTransfer 2
#define FileLoaderErrorInvalidHandle 10
#define FileLoaderErrorInvalidSize 11
#define FileLoaderErrorX 12
#define FileLoaderErrorReadSize 13
#define FileLoaderError4 14
#define FileLoaderError5 15


First suggestion would be to convert these to an enum, instead of processor macros.


// C++11 enum, doesn't pollute the entire namespaceenum
class FileLoaderCodes
{
	Unknown,
	Complete,
	InTransfer,
	ErrorInvalidHandle,
	ErrorInvalidSize,
	ErrorX,
	ErrorReadSize,
	Error4,
	Error5
};

// The new GetStatus
FileLoaderCodes GetStatus();
// The member variable
FileLoaderCodes Status;
// In code example
Status = FileLoaderCodes::Unknown;

This version is type safe and you can add more codes without needing to mess with what the numeric value actually is. There is no chance of accidental matching values as well.

"I can't believe I'm defending logic to a turing machine." - Kent Woolworth [Other Space]

I wouldn't, in any way, have a file loader class allocate it's buffer the way your code is doing it.

1) Code may not want the entire file in memory.

2) Code cannot control where the memory is allocated from.

You could, of-course, change the code to take an allocator object of some description. But I personally prefer the buffer pointer to be handed to the read operation.

For similar reasons I wouldn't try to hide the Read method away, but also wouldn't provide async and non-async operations into the same object, which confuses the API and can lead to incorrect usage. You could make the read* functions protected, give the object a protected constructor and derive 'AsyncFileReader' from it or take another route.

I personally prefer the route of obtaining a stream object and the user creates operations using that object. Allowing them to wait on that particular operation.

For example (pseudo code):


InputStream *stream = OpenInputStream( uri );
if ( NULL != stream )
{
      StreamOp *op = stream->Read( myBuffer, dataLength );
      if ( NULL == op )
      {
          // Error
      }
      else
      {
          // Move into wait state
      }
}
else
{
     // Unable to open stream
}

I actually don't think non-async operations are necessary at all, if someone wants to stall they can just call the operation->Wait() method for development purposes though nothing should be loaded like that in the end, and even then it should be frowned on as it's much harder to patch code to be async down the line.

[edit] I can think of many, many different ways of implementing the above code, so please don't take it that I'm suggesting it should be done exactly like that.

n!

Just from a coding standard view why are you placing an enter after a function or if conditional and the { and not on an else clause.

I would store the GetLastError code too to be honest or at least trace it to the output log so you can figure out why a file didn't load instead of having error X. This code is also very platform specific, this is ok if you never intend to support another platform. To get a way from this you could use fstreams, but they have no async reads in the way the windows shell file loader does it.

Other than that it looks like a good file loader.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion


Just from a coding standard view why are you placing an enter after a function or if conditional and the { and not on an else clause.

Well, I just ported to Visual Studio 2013 Express for C++11 support, and the IDE forces that behaviour.... It also wants a space between if and (. Prob someone at MS wants ifs and elses in the same vertical....

I thouched up my code, please let me know if it is unsafe:


class FileLoader
{
public:
	FileLoader(std::string Filename, bool Asynchronous);
	~FileLoader();

	FileLoaderCodes GetStatus();

private:
	void Tick();
	void ReadAsynch();
	void ReadSynch();

	HANDLE file;
	FileLoaderCodes Status;
	DWORD Size;
	DWORD NumberOfBytesTransfered;
	DWORD LastError;
	OVERLAPPED OverlappedInfo;
	std::vector<BYTE> buffer;
};

FileLoader::FileLoader(std::string Filename, bool Asynchronous)
{
	Status = FileLoaderCodes::Unknown;
	Size = 0;
	NumberOfBytesTransfered = 0;
	LastError = 0;
	ZeroMemory(&OverlappedInfo, sizeof(OverlappedInfo));

	if (Asynchronous)
	{
		file = CreateFileA(Filename.c_str(), GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN | FILE_FLAG_OVERLAPPED, NULL);
	}
	else {
		file = CreateFileA(Filename.c_str(), GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, NULL);
	}

	if (file == INVALID_HANDLE_VALUE)
	{
		LastError = GetLastError();
		Status = FileLoaderCodes::InvalidHandle;
		return;
	}

	Size = GetFileSize(file, NULL);
	if (Size == INVALID_FILE_SIZE)
	{
		LastError = GetLastError();
		Status = FileLoaderCodes::InvalidSize;
		CloseHandle(file);
		return;
	}

	buffer.resize(Size);
	if (buffer.capacity() < Size)
	{
		Status = FileLoaderCodes::AllocError;
		CloseHandle(file);
		return;
	}

	if (Asynchronous)
	{
		ReadAsynch();
	}
	else {
		ReadSynch();
	}

}

FileLoader::~FileLoader()
{
	CloseHandle(file);
}

FileLoaderCodes FileLoader::GetStatus()
{
	Tick();
	return Status;
}

void FileLoader::ReadAsynch()
{
	OverlappedInfo.hEvent = CreateEvent(NULL, true, false, NULL);

	if (OverlappedInfo.hEvent == NULL)
	{
		LastError = GetLastError();
		Status = FileLoaderCodes::CreateEventError;
		CloseHandle(file);
		return;
	}

	if (ReadFile(file, buffer.data(), Size, NULL, &OverlappedInfo))
	{
		// Operation has completed immediately.
		Status = FileLoaderCodes::Complete;
	}
	else {
		LastError = GetLastError();
		if (LastError != ERROR_IO_PENDING)
		{
			// Some other error occurred while reading the file.
			CloseHandle(file);
			Status = FileLoaderCodes::ReadError;
		}
		else {
			//Operation has been queued and will complete in the future. 
			Status = FileLoaderCodes::InTransfer;
		}

	}
}

void FileLoader::ReadSynch()
{
	if (ReadFile(file, buffer.data(), Size, &NumberOfBytesTransfered, NULL))
	{
		// Operation has completed.
		CloseHandle(file);
		if (Size == NumberOfBytesTransfered){		//Might be 0 if EOF???
			Status = FileLoaderCodes::Complete;
		}
		else {
			Status = FileLoaderCodes::ReadSizeError;
		}
	}
	else {
		// Some other error occurred while reading the file.
		LastError = GetLastError();
		CloseHandle(file);
		Status = FileLoaderCodes::ReadError;
	}
}

void FileLoader::Tick()
{
	if (Status == FileLoaderCodes::InTransfer)
	{
		if (HasOverlappedIoCompleted(&OverlappedInfo))
		{
			if (GetOverlappedResult(file, &OverlappedInfo, &NumberOfBytesTransfered, false) != 0)
			{
				// Operation has completed.
				CloseHandle(file);
				if (Size == NumberOfBytesTransfered){
					Status = FileLoaderCodes::Complete;
				}
				else {
					Status = FileLoaderCodes::ReadSizeError;
				}
			}
			else{
				// Operation has failed? / HasOverlappedIoCompleted is true
				LastError = GetLastError();
				CloseHandle(file);
				Status = FileLoaderCodes::OverlappedResultError;
			}
		}
	}
}

enum class FileLoaderCodes
{
	Unknown,
	Complete,
	InTransfer,
	InvalidHandle,
	InvalidSize,
	ReadError,
	ReadSizeError,
	AllocError,
	CreateEventError,
	OverlappedResultError
};
The most glaring problem is that the class doesn't provide a mechanism for getting to the loaded bytes, so I would guess you haven't really thought about the client is going to interact with this class. It would appear that you've designed this class in isolation from how it will eventually be used. This is a bad idea, as often when you then go to write the client code, you might find the interface somewhat awkward to use - or worse that some critical functionality is not available. Designing a general API in a vacuum is hard.

I prefer to try write my ideal client code, use this to specify an ideal interface, and then attempt to implement this. Sometimes, you'll need to modify the client code to accommodate implementation considerations, or special cases you didn't consider earlier. But either way, the end result is something that will work in for at least one client.

You have more or less re-implemented the relevant Windows error codes in your code. This has a couple of consequences.

As noted by NightCreature83, this ties your implementation heavily to Windows. If you want to create an alternative implementation on another platform, this will present problems - there might be two errors where Windows only has InvalidHandle or there is no obvious substitute for CreateEventError. You probably don't want an exhaustive list of all errors that could occur on all platforms, that is putting a lot of burden on the client code to be updated when the code is ported. If you're happy that porting the code is not a requirement, consider exposing the Windows error code directly, rather than maintaining a parallel version.

However, I'd argue that most client code probably doesn't care too much for some of these very specific statuses. Again, this may be a symptom of designing the class in isolation. The status code should probably be "actionable" to the client, otherwise why go into such exhaustive detail?

For example, an actionable status is finding the file is missing or unreadable due to a permission error. The client code could prompt the user that their install is corrupt, alternatively it might substitute place-holder content of the appropriate type, or perhaps it could fetch the file from a remote server (e.g. custom maps in a multi-player game). Either way, this is an interesting status as it enables the client to potentially take some corrective or mitigating action. Most clients will probably want to treat read errors in the same way, so exposing these as different statuses complicates things for the client. Perhaps a general I/O error status with a detailed message code for diagnostic logging would be an improved design.

The other status codes seem to be less directly actionable by client code. Arguably the client could react to an allocation error by freeing memory (perhaps discarding some cached resources) and trying again. However, as mentioned another approach to allocation is to allow the client code to control it. This gives the option that the client's allocator can centralise such cache-discard handling - or other interesting behaviour such as tracking memory leaks or logging for memory budgets, etc.

The other statuses seem to be more or less internal details. I'd consider not exposing the "In Transfer" or "Complete" as a status, but instead have dedicated member functions for this happy path, and then exposing a detailed error status, and maybe even a detailed error string function, for when things have gone wrong.

You seem to misunderstand std::vector::resize() and std::vector::capacity(). std::vector::resize() will always change the vector's length() unless the allocator throws an exception (the default allocator will). std::vector::capacity() will always be >= length(), so this test is not useful.

The class does not respect the Rule of Three.

A related note is that most games have more than one file they wish to load. As such, you may need to be able to store FileLoader instances in a container, or perhaps the class should be modified to support loading multiple files.

The most glaring problem is that the class doesn't provide a mechanism for getting to the loaded bytes, so I would guess you haven't really thought about the client is going to interact with this class

Yes, this pertains to my second question in the original post. I am considering a method GetBuffer() which returns a shared_ptr to the vector. So I will change the code:


std::shared_ptr<std::vector<BYTE> > spBuffer;

Optionally I can have some move semantics when I want to process it, which will invalidate the object for further use.

This is a simple class that should do exactly what it is supposed to do. I won't need it for other platforms then Windows.


You seem to misunderstand std::vector::resize() and std::vector::capacity(). std::vector::resize() will always change the vector's length() unless the allocator throws an exception (the default allocator will). std::vector::capacity() will always be >= length(), so this test is not useful.

I thought this was an elegant way of avoiding new/delete[]. If the resize throws then capacity will tell me if it did right?


A related note is that most games have more than one file they wish to load. As such, you may need to be able to store FileLoader instances in a container, or perhaps the class should be modified to support loading multiple files.

My previous version had containers for this. But I wanted to go more OOP on this class and have every object manage their own buffers. I will spawn hundreds of these during init to saturate the bandwidth for fast loading times, all asynchronously.

Regarding the Status codes I think the client should be able to tell what's going on without having to go through methods. A simple switch(Status) should allow the client to decide what to do next.

The most glaring problem is that the class doesn't provide a mechanism for getting to the loaded bytes, so I would guess you haven't really thought about the client is going to interact with this class


Yes, this pertains to my second question in the original post. I am considering a method GetBuffer() which returns a shared_ptr to the vector.


My thoughts:
1) It's a reading a file, not writing a file, so the buffer should be read-only (i.e. const).
2) std::vector is an internal implementation detail. Do you really want that exposed as part of your permanent public interface? Maybe yes, maybe no, but it's worth considering.
3) I don't want the overhead of a shared_ptr when just trying to access the buffer. Example:


blah = file.GetBuffer()[20];
blah = file.GetBuffer()[23];


Each call to GetBuffer() would allocate a new shared_ptr, which wouldn't be nice.

Instead, GetBuffer() should just return a const-ref. My personal opinion would be a const-ref to the bytes directly, but a const-ref to the vector itself would be fine.

If someone wants to copy the buffer, they can explicitly do so. In this case, an explicit copy is a good thing.
You could also provide a function to explicitly take *ownership* of the data, perhaps returning the vector by move semantics and giving up ownership of it. If you find this is needed in your code.

I prefer using a custom buffer class ([header][source]) which handles ownership issues, and then having single standalone functions to load or save a file entirely into memory and return the buffer class entirely - which I use for files I know will be small in size, and separate classes that handles streaming files to and from buffers.

You seem to misunderstand std::vector::resize() and std::vector::capacity(). std::vector::resize() will always change the vector's length() unless the allocator throws an exception (the default allocator will). std::vector::capacity() will always be >= length(), so this test is not useful.

I thought this was an elegant way of avoiding new/delete[]. If the resize throws then capacity will tell me if it did right?

If resize() fails the default allocator throws, and you aren't catching it anywhere, so it'd be propagated up your callstack. i.e. if it throws, your code won't ever reach the if() statement. You could change that behavior with a custom allocator for std::vector.

Yes, this pertains to my second question in the original post. I am considering a method GetBuffer() which returns a shared_ptr to the vector.

Sorry, I didn't see that part. Still, the idea holds - write some ideal client code and you'll learn something about how the buffer could be used.

Here is a simple example:

// Elsewhere...
FileLoader fileLoader(/* ... */);

// In some loop
if(fileLoader.isComplete()) {
    if(fileLoader.isSuccess()) {
        const std::vector<BYTE> &data = fileLoader.getContent();
        createResource(/* ... */, data.begin(), data.end());
    } else {
        // Log failure details
        createResource(/* ... */, defaultData.begin(), defaultData.end());
    }
}
If this was our ideal client code, then shared ownership is not a concern as we don't need the buffer to outlive the file loader. Here is another alternative:

// Elsewhere...
FileLoader fileLoader(/* ... */);

// In some loop
if(fileLoader.isComplete()) {
    if(fileLoader.isSuccess()) {
        std::vector<BYTE> data;
        fileLoader.swapContent(data);
        createResource(/* ... */, data.begin(), data.end());
    } else {
        // Log failure details
        createResource(/* ... */, defaultData.begin(), defaultData.end());
    }
}
Here, we get the benefit that the memory for the file content will be cleared as soon as the createResource() is finished, the fileLoader will not be hanging on to a potentially large vector.

My previous version had containers for this. But I wanted to go more OOP on this class and have every object manage their own buffers.

I'm not sure I understand. What I'm proposing is that a single file loader instance might be capable of handling multiple file loading requests. This is a design decision that is orthogonal from object orientation.

Something like the following:

FileLoader fileLoader;

// During startup

fileLoader.submit("foo.png");
fileLoader.submit("bar.png");
// ... more files

// Some loop
PendingResource pendingResource;
while(fileLoader.pollCompletedResource(pendingResource)) {
    if(pendingResource.good()) {
        createResource(pendingResource.name(), /* ... */, pendingResource.beginData(), pendingResource.endData());
    } else {
        // Log error
        createResource(pendingResource.name(), /* ... */, defaultData.begin(), defaultData.end());
    }
}

I will spawn hundreds of these during init to saturate the bandwidth for fast loading times, all asynchronously.

How, if you cannot store them in a container? The standard containers require objects to be copyable, and yours is not. Will you require dynamic allocation, too?

Regarding the Status codes I think the client should be able to tell what's going on without having to go through methods. A simple switch(Status) should allow the client to decide what to do next.

Sure, that is certainly an option. Personally, I prefer for success and error cases to be quite separate, as you'll almost certainly want to handle them very differently. But the real question was whether you really need all those distinct statuses - will the client really care about the differences. I'd refer you to my "ideal client code" above, if the client is going to be more or less just logging "some error occurred", then having several detailed error states isn't particularly useful. Instead the focus might be on having a detailed error string for logging.

If resize() fails the default allocator throws, and you aren't catching it anywhere, so it'd be propagated up your callstack. i.e. if it throws, your code won't ever reach the if() statement. You could change that behavior with a custom allocator for std::vector.

I don't think there is much wiggle room in the behaviour here, even a custom allocator must throw or allocate the specified memory.

I'm not sure I understand. What I'm proposing is that a single file loader instance might be capable of handling multiple file loading requests. This is a design decision that is orthogonal from object orientation.

In my previous design I had a class that did exactly this, I could submit a request and then later pick up completed files. In this design I want to do this sort of management outside because I think it will be simpler.


How, if you cannot store them in a container? The standard containers require objects to be copyable, and yours is not. Will you require dynamic allocation, too?

I want an AssetManager to handle the lifetime of these objects:


struct GameResources
{
	std::vector<std::shared_ptr<Mesh> > Meshes;
	std::vector<std::shared_ptr<Texture> > Textures;
	std::vector<std::shared_ptr<Animation> > AnimationTracks;
	std::vector<std::shared_ptr<Skeleton> > Skeletons;
};

class AssetManager
{
public:
AssetManager();
~AssetManager();


void Tick(/*.....*/);
std::shared_ptr<GameResources> GetResource(DWORD ID); //Calls GetFile if ID is not in Resources or in Transfer, return pointer is NULL if not ready.


private:
bool GetFile(DWORD ID); //table lookup ID -> Path, creates a FileLoader(Path, true) and places it on FilesInTransfer List.
void CreateResourceFromCompletedFiles(/*.....*/); //Called by Tick(), traverses FilesInTransfer and creates Resources


std::list<std::shared_ptr<FileLoader> > FilesInTransfer;
std::map<DWORD, std::shared_ptr<GameResources> > Resources;
};

This topic is closed to new replies.

Advertisement