• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
Ectara

Battery of Tests For IO Library

11 posts in this topic

I've been having a hard time thinking up a battery of tests for an IO library, and any advice or ideas would be appreciated. For a point of reference, it shares a lot in common with the C standard IO library. Ideally, I'd like to test that writing and reading works, files can be grown when written past the end if permissions allowed, file read/write permissions, file EOF flags are set properly, and file seeking and telling work as expected.

One of the problems that I run into is that it is hard to test whether reading or writing works without using the opposite functionality; it is hard to test whether or not what I've written is correct if I don't read it back in to test it. I can't seem to find a way around it. One important point is that though the IO stream can be implemented in any number of ways, they all have the same interface. I do expect at least one of these tests to fail for any of the streams, so it is of great importance.

What I have so far, in terms of tests:
-Create a new file
-Write a sequence of bytes (or a few, or rule out a fluke)
-Seek to the beginning
-Read in the written data (if a problem occurs, how can I prove that it was either the reading, or the writing)
-Try to read again, to make sure that I get the EOF flag
-Explicitly check to see if stream has the EOF flag set, similarly to feof()
-Put a single byte
-Get a single byte (same caveat with the reading and the writing)
-Try getting the current file position
-Try seeking to the beginning
-Try seeking to the stored position
-Try to get the file position again, and ensure that they're the same
-Try to seek to the end of the stream
-Get the file position at the end, and try seeking to the beginning and to the end's stored position to see if I still wind up at the end
-Seek to the first stored position and flush the stream, and ensure that the position did not change

I'm drawing a blank on the rest so far.

Additionally, should the function for getting a string from the stream explicitly call the function to get a single character until a newline is encountered, or should the string getting function be independent of the character get function for speed, and depend upon the stream's implementation?
0

Share this post


Link to post
Share on other sites
I might try adding some more "error case" tests, like attempting to read/write a negative number of bytes or something for example. It's good to make sure your libraries behave nicely even when given bad input.
0

Share this post


Link to post
Share on other sites
I personally enjoy randomized testing against a reference library (such as C's stdio or C++'s iostream) when I have the luxury of having such a reference. The idea is to perform a series of completely random operations, even operations which are meaningless and should result in errors, in parallel on both implementations, and as soon as you get divergent results which are not a feature of your library (hopefully you can enumerate those), you've failed and you go to the code to fix it. This can be executed fairly quickly and eventually the entire search space will be covered. When everything works, you're probably ready to launch the library into the real world and iron out any remaining issues through bug reports.
0

Share this post


Link to post
Share on other sites
[quote name='Samith' timestamp='1354419741' post='5006201']
I might try adding some more "error case" tests, like attempting to read/write a negative number of bytes or something for example. It's good to make sure your libraries behave nicely even when given bad input.
[/quote]
Hm... all of my quantities are unsigned, to both prevent that issue, and to allow higher quantities to be read and written at once without overflow. I could try sending zeroes, and null pointers, and also send values that would seek past the beginning or end of the stream. However, most of these would be caught by the highest level of the stream, during input validation before it ever reaches the actual implementation of the stream. These tests will still be useful, but unless I change the interface, most would no longer fail, even if the underlying implementation is faulty.
[quote name='Bacterius' timestamp='1354429453' post='5006221']
I personally enjoy randomized testing against a reference library (such as C's stdio or C++'s iostream) when I have the luxury of having such a reference. The idea is to perform a series of completely random operations, even operations which are meaningless and should result in errors, in parallel on both implementations, and as soon as you get divergent results which are not a feature of your library (hopefully you can enumerate those), you've failed and you go to the code to fix it. This can be executed fairly quickly and eventually the entire search space will be covered. When everything works, you're probably ready to launch the library into the real world and iron out any remaining issues through bug reports.
[/quote]
Randomized tests sound good. I've more than learned by now that when you write a test program, and it works, it does not mean it is bug free, like my previous memory allocator that worked perfectly until the program allocated a certain of amount memory relative to the size of the pool. I only found the bug when actually using it in an application that requested large amounts of memory. However, it does leave open the area of the features that C's standard IO does not support that mine does.

I will implement some sort of randomized tests; I never really thought of that. Most of the tests will be aimed at making sure that the underlying stream's implementation performs as expected, because there are a lot of stream types, and testing all of them bit by bit would be troublesome.
0

Share this post


Link to post
Share on other sites
[quote name='Ectara' timestamp='1354416189' post='5006192']
Additionally, should the function for getting a string from the stream explicitly call the function to get a single character until a newline is encountered, or should the string getting function be independent of the character get function for speed, and depend upon the stream's implementation?
[/quote]
0

Share this post


Link to post
Share on other sites
[quote name='Ectara' timestamp='1354484610' post='5006400']
Additionally, should the function for getting a string from the stream explicitly call the function to get a single character until a newline is encountered, or should the string getting function be independent of the character get function for speed, and depend upon the stream's implementation?
[/quote]
Well, in my opinion I'd mark the single-character function inline (as it's probably quite small) so that calling it in a loop won't be so costly, instead of duplicating code, but it really depends on how the streams are implemented. It may make more sense to buffer and seek the newline in memory if you're not already caching I/O operations (and it might be faster, though working with textual data is always going to be slow in any case).

And yes, randomized testing won't help you test stuff that the reference library doesn't do, so you'll need to resort to conventional testing for those features.
0

Share this post


Link to post
Share on other sites
[quote name='Bacterius' timestamp='1354485334' post='5006403']
Well, in my opinion I'd mark the single-character function inline (as it's probably quite small) so that calling it in a loop won't be so costly, instead of duplicating code,
[/quote]
I can't guarantee that I have inline support, since it is written in C89. For this reason, the are all normal functions; the parts that absolutely must be inlined are safely written macros.

[quote name='Bacterius' timestamp='1354485334' post='5006403']
but it really depends on how the streams are implemented. It may make more sense to buffer and seek the newline in memory if you're not already caching I/O operations (and it might be faster, though working with textual data is always going to be slow in any case).
[/quote]
This is one of the major reasons that I have it done through the stream back-end itself; for instance, the memory map stream would do it much faster by iterating through a block of memory, as opposed to a general function that reads character by character. However, it seems like I'd be saving a lot of time and strain by having the get string function be higher level, by doing it as explicitly stated in the C standard. Caching sounds like a great choice, but it would require me having a (n - 1) / n chance of reading too far and needing to seek the file position backward for a buffer size of n bytes; if the newline is not at the end of the buffer, I'd have to seek the file pointer back, and seeking backward is something I avoid at all costs.

On the other hand, streams that cannot function without seeking backward to maintain normal stream appearance (like an encrypted stream with a block cipher) would be horrendously slow to read one character at a time, as the current implementation decrypts the block to get the character requested (over time, I could optimize it to not reread the encrypted block before decrypting if the file position is still within the block) every time you call the function to get a character.

Basically, I'm torn between being fast, or being simple and easy to debug.

[quote name='Bacterius' timestamp='1354485334' post='5006403']
And yes, randomized testing won't help you test stuff that the reference library doesn't do, so you'll need to resort to conventional testing for those features.
[/quote]

To resolve this issue, I plan to ensure that the most basic file stream is bug free, and use that as a control in the test; the file stream calls whichever OS's file routines in its back-end, so it should be pretty safe.
0

Share this post


Link to post
Share on other sites
[quote]This is one of the major reasons that I have it done through the stream back-end itself; for instance, the memory map stream would do it much faster by iterating through a block of memory, as opposed to a general function that reads character by character. However, it seems like I'd be saving a lot of time and strain by having the get string function be higher level, by doing it as explicitly stated in the C standard. Caching sounds like a great choice, but it would require me having a (n - 1) / n chance of reading too far and needing to seek the file position backward for a buffer size of n bytes; if the newline is not at the end of the buffer, I'd have to seek the file pointer back, and seeking backward is something I avoid at all costs.[/quote]
Well, then since you don't want to seek backward, I don't see a solution other than incrementing the file position by one byte repeatedly until you hit the newline. Is it really that bad to seek backwards? It does cut out a large class of possible optimizations and design methods.

[quote]On the other hand, streams that cannot function without seeking backward to maintain normal stream appearance (like an encrypted stream with a block cipher) would be horrendously slow to read one character at a time, as the current implementation decrypts the block to get the character requested (over time, I could optimize it to not reread the encrypted block before decrypting if the file position is still within the block) every time you call the function to get a character.[/quote]
This is off-topic but just out of curiosity, how are you implementing encrypted streams? Are you using CTR mode for random-access encryption/decryption? And yes, I think in this situation caching would be worthwhile, as it is common to conditionally read binary streams (e.g. read first byte, then read the next four if the first byte has some value). But of course, too much caching is very bad as well - it's difficult to make the perfect library suited to every task at such a low level.

[quote name='Ectara' timestamp='1354488175' post='5006420']
Basically, I'm torn between being fast, or being simple and easy to debug.
[/quote]
Honestly I recommend a bit of both, leaning towards maintainability. Like having something completely self-documenting, but pathetically slow is just as bad as having a super-fast but unreadable implementation. Code naturally gets faster with time, but it doesn't get any cleaner.
0

Share this post


Link to post
Share on other sites
[quote name='Bacterius' timestamp='1354489706' post='5006430']
Well, then since you don't want to seek backward, I don't see a solution other than incrementing the file position by one byte repeatedly until you hit the newline.
[/quote]
That would be the way explicitly specified by the C standard; the other way would have the underlying implementation bend the rules a little bit behind the scenes, like a memory stream being able to read through the entire buffer without moving the file pointer, or a buffered stream being able to scan the buffer without performing any actions. Each stream knows the fastest way to scan its data without seeking backwards (or if the stream knows it can seek backwards, it can do so without the the caller knowing).

[quote name='Bacterius' timestamp='1354489706' post='5006430']
Is it really that bad to seek backwards? It does cut out a large class of possible optimizations and design methods.
[/quote]
I know that it won't always be possible to avoid seeking backwards; a lot of my streams require that the target stream beneath it be backward seekable for the functionality to work, but I'd like to have the parts of code that actually use the stream try to avoid seeking backward to allow things like pipes to work.
[quote name='Bacterius' timestamp='1354489706' post='5006430']
This is off-topic but just out of curiosity, how are you implementing encrypted streams? Are you using CTR mode for random-access encryption/decryption?
[/quote]
It's implemented in such a way that the encryption algorithm and mode don't make a difference. The stream makes use of my encryption module, and makes sure that block ciphers and stream ciphers both can work through the same code path. For instance, AES ECB and RC4 can both be used through the same stream implementation (not at the same time); all it requires is that you set up the encryption information correctly, and use it to attach an encrypted stream to the data. The encryption stream requires seeking backward, so if it cannot be seeked, it must be cached to a seekable stream type.

Granted, I don't have all of the block cipher modes implemented, but after doing so, it wouldn't be that difficult to implement it in a way that they function properly, and code for different ciphers don't affect each other in the general encryption stream; the stream only calls functions for optional bookkeeping, and general encrypt and decrypt functions based on the information it has.

[quote name='Bacterius' timestamp='1354489706' post='5006430']
Honestly I recommend a bit of both, leaning towards maintainability. Like having something completely self-documenting, but pathetically slow is just as bad as having a super-fast but unreadable implementation. Code naturally gets faster with time, but it doesn't get any cleaner.
[/quote]
Yeah. In the name of reasonable speed, it may be best to leave it until I can find a way to make all of the streams fast enough to make the generic version a reasonable option. Edited by Ectara
0

Share this post


Link to post
Share on other sites
[quote name='Ectara' timestamp='1354416189' post='5006192']
One of the problems that I run into is that it is hard to test whether reading or writing works without using the opposite functionality; it is hard to test whether or not what I've written is correct if I don't read it back in to test it.
[/quote]
Refactor to allow reading from and writing to abstract sources and sinks. A simple implementation of the source interface can then be trivially implemented to allow reading from character buffers for testing purposes. The same can be done for sinks/writing.

You can also have sources/sinks that fail at specific points for testing failures cases deterministically, or at random points for fuzz testing should that be required.
0

Share this post


Link to post
Share on other sites
[quote name='Ectara' timestamp='1354548828' post='5006638']
It's implemented in such a way that the encryption algorithm and mode don't make a difference. The stream makes use of my encryption module, and makes sure that block ciphers and stream ciphers both can work through the same code path. For instance, AES ECB and RC4 can both be used through the same stream implementation (not at the same time); all it requires is that you set up the encryption information correctly, and use it to attach an encrypted stream to the data. The encryption stream requires seeking backward, so if it cannot be seeked, it must be cached to a seekable stream type.
[/quote]
Wouldn't that be relatively inefficient, though? Consider a stateful mode like CBC, to encrypt at position N you'd need to encrypt everything before position N - how does your stream deal with this, do you forbid seeking ahead?
0

Share this post


Link to post
Share on other sites
[quote name='e?dd' timestamp='1354557383' post='5006693']
Refactor to allow reading from and writing to abstract sources and sinks. A simple implementation of the source interface can then be trivially implemented to allow reading from character buffers for testing purposes. The same can be done for sinks/writing.
[/quote]
I'm not quite sure what you mean, as my streams are already as abstract as it gets. I have stream type that reads from and writes to a block of memory. Only problem is, I need to test that, too

The stream are not always simple file descriptors. If I'm testing an AES or compression stream, it's incredibly time consuming to test every output for correctness by hand if I simply check the written bytes in memory. For the most part the streams are working, but I need to test edge cases.

[quote name='e?dd' timestamp='1354557383' post='5006693']
You can also have sources/sinks that fail at specific points for testing failures cases deterministically, or at random points for fuzz testing should that be required.
[/quote]

However, having tests fail purposely is useful. Perhaps having a target stream that purposely returns values that don't make sense, to see how the higher-level parts of the code deal with it, or something.

[quote name='Bacterius' timestamp='1354582469' post='5006871']
Wouldn't that be relatively inefficient, though?
[/quote]
Inefficient, yes. However, the goal of the stream interface is to abstract away the implementation to provide one uniform way of accessing and storing data. Doing it this way ensures that one code path works for all sources or destinations (like being able to load an image from anywhere through one interface), but if they need the utmost efficiency, they can use the encryption module directly, having the code now assume that input will be encrypted in the way they expect, and they can handle the data and the encryption any way they choose. The goal of the encryption stream is just to get the job done in a way that would allow it to conform to a stream interface, after all.

[quote name='Bacterius' timestamp='1354582469' post='5006871']
Consider a stateful mode like CBC, to encrypt at position N you'd need to encrypt everything before position N - how does your stream deal with this, do you forbid seeking ahead?
[/quote]
Well, it's a little simple. The streams function similar to C's standard IO, as that was my inspiration when I started. In order to write at position N in a C FILE, you'd need to have N - 1 bytes already written to the file; files can't be grown without writing data to extend the file. Thus, to write at position N, you must have written or seeked past N - 1 existing bytes. The data in my encrypted streams are already encrypted; if you attach an encryption stream to another stream, you are making the assumption that the existing data is encrypted in the way that you expect; if the data is not yet encrypted, then you need to read the data and write it through the encryption stream again in order to encrypt it.

So, first, you'd find the block where byte N resides. Then, you'd take the previous ciphertext (or the IV, if it is the first block), decrypt the current block, and XOR the two to get the plaintext. You'd then modify the plaintext, then XOR it with the previous ciphertext or the IV, and encrypt it again. It will, however, require re-encryption of all following blocks, due to the nature of CBC.

Since CBC uses the ciphertext of the previous block, to read or write at an arbitrary point only needs examining the ciphertext of the previous block. However, modifying the plaintext requires re-encrypting all following blocks until the end of the stream (or until you find one that encrypts to the same block, which is extraordinarily unlikely). When I do implement different block cipher modes, it wouldn't be too much trouble to make it agnostic to the actual cipher, and add in more optional bookkeeping depending on the mode.

As another example, when seeking backwards, it has optional bookkeeping to reset stream ciphers to their initial states, then quickly burn through iterations until it reaches the new position, so that the higher-level code can seek an RC4 stream backwards without knowing the difference, for instance.

There will be inefficiencies, but people that need to take care to avoid them would never need to hit them, because these inefficiencies are only caused to emulate functionality that other stream types take for granted. Of course, these inefficiencies don't really occur when you read or write straight through from beginning to end without changing direction or modifying the existing data, which is the most likely action to serialize data through a stream. Again, the stream's goal is to provide an interface to simply get the job done the way the caller needs it, which is immensely useful.
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0