Jump to content
  • Advertisement
Sign in to follow this  
SiS-Shadowman

Beginning with c#

This topic is 3029 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I've recently started to learn c# (coming from cpp) and rewrote a small program in c#, in order to learn to use the language. It worked pretty well, however I receive an out of memory exception pretty soon (it consumes about 1gb, reported by process explorer).

I assume this is because I am not used to managed applications, so maybe you guys could help me out, identifying bad code (I'm sure there's lots of it).

I'll attach the actual sources below (only 8 small files) and start explaining the program a bit. The main class (Bot) is responsible for requesting resources (through Crawler), and forwarding those to DataProcessors (through DataDistributor), with the core processor being UriProcessor, that tries to treat the content of those resources as text (if possible) and looks for more addresses that are given to the bot again.

Bot => Reply to DataProcessor
DataProvider => Uri to Bot
UriProcessor implements both interfaces, to create the loop.

The exception happens at about 700 uris (although not all the time), which should not take up so much space, so I guess the GC is not able to free stuff fast enough (which means I'm doing too much allocations).

I hope you can give me general advices (for example am I handling this IDisposible stuff correct?), as well as how to avoid needless allocations inside the process method (for example, can I somehow avoid recreating those streams every time?).

DataDistributor.cs


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Core
{
/**
* This class implements the DataProcessor interface and merely
* forwards all process calls to all attached DataProcessors.
*/

class DataDistributor
: DataProcessor
{
public string name() { return "DataDistributor"; }
public string author() { return "Shadowman"; }
public int version() { return 1; }


/**
* Forwards processing to all added processors.
*/

public void process(Reply reply)
{
lock (processors)
foreach (DataProcessor processor in processors)
processor.process(reply);
}

public void add(DataProcessor processor)
{
lock (processors)
processors.Add(processor);
}

private List<DataProcessor> processors = new List<DataProcessor>();
}
}





DataProcessor.cs


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Core
{
public interface DataProcessor
{
/**
* @returns the name of this processor
*/

string name();

/**
* @returns the author of this processor
*/

string author();

/**
* @returns the version of this processor
*/

int version();



/**
* Processes a reply.
*/

void process(Reply reply);
}
}





DataProvider.cs


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Core
{
public interface DataProvider
{
/**
* @returns the name of this processor
*/

string name();

/**
* @returns the author of this processor
*/

string author();

/**
* @returns the version of this processor
*/

int version();



/**
* Returns the next Uri to fetch and process.
* Shall return null when nothing needs to be processed anymore.
*/

Uri data();
}
}





UriProvider.cs


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Text.RegularExpressions;

namespace Core
{
public class UriProcessor
: DataProcessor
, DataProvider
{
public UriProcessor()
{
StringBuilder builder = new StringBuilder();
builder.Append("(http?://)"); // Protocol //< http[s] could be used
builder.Append("(([0-9]{1,3}\\.){3}[0-9]{1,3}"); // Ip address
builder.Append("|"); // OR
builder.Append("([0-9a-z_!~*'()-]+\\.)*"); // Teriary domain (www)
builder.Append("([0-9a-z][0-9a-z-]{0,61})?[0-9a-z]\\."); // Secondary domain (deviantart)
builder.Append("[a-z]{2,6})"); // primary domain (com)
builder.Append("(:[0-9]{1,4})?"); // port number
builder.Append("((/?)|"); // possible filename
builder.Append("(/[0-9a-z_!~*'().;?:@&=+$,%#-]+)+/?)"); // complete filename

regex = new Regex(builder.ToString(), RegexOptions.Compiled);
}

/**
* @returns the name of this processor
*/

public string name() { return "UriProcessor"; }

/**
* @returns the author of this processor
*/

public string author() { return "Shadowman"; }

/**
* @returns the version of this processor
*/

public int version() { return 1; }



/**
* Processes a reply.
*/

public void process(Reply reply)
{
if(!reply.contentType.Contains("text"))
return;

Stream stream = new MemoryStream(reply.content);
StreamReader reader = new StreamReader(stream);
while (reader.Peek() != -1)
builder.Append(reader.ReadLine());

stream.Close();
reader.Close();

string content = builder.ToString();
builder.Clear();

MatchCollection matches = regex.Matches(content);
foreach(Match match in matches)
{
Uri address = new Uri(match.Value);
if (address.IsWellFormedOriginalString() && visited.Contains(address) == false)
{
addresses.Enqueue(address);
visited.Add(address);
}
}
}

public Uri data()
{
return addresses.Dequeue();
}

private ThreadQueue<Uri> addresses = new ThreadQueue<Uri>();
private HashSet<Uri> visited = new HashSet<Uri>();
private Regex regex;
private StringBuilder builder = new StringBuilder();
}
}





Bot.cs


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;

namespace Core
{
public class Bot
{
public Bot()
{
UriProcessor proc = new UriProcessor();
add((DataProcessor)proc);
add((DataProvider)proc);

distribution_thread = new Thread(this.distribute);
crawler_thread = new Thread(this.crawl);

distribution_thread.Start();
crawler_thread.Start();
}

public void add(DataProvider provider)
{
crawler.add(provider);
}

public void add(Uri address)
{
crawler.get(address);
}

public void add(string address)
{
crawler.get(address);
}

public void add(DataProcessor processor)
{
distributor.add(processor);
}

/**
* @returns the number of currently active requests
*/

public int numRequests()
{
return crawler.numRequests();
}

/**
* Stops the bot and blocks until all threads have been stopped.
*/

public void stop()
{
running = false;
distribution_thread.Join();
crawler_thread.Join();
}



private void distribute()
{
while (running)
{
Reply reply = crawler.reply();
if (reply != null)
distributor.process(reply);
else
Thread.Sleep(1);
}
}

private void crawl()
{
while (running)
if (!crawler.run_once())
Thread.Sleep(1);
}

private bool running = true;

private Thread distribution_thread;
private Thread crawler_thread;

private DataDistributor distributor = new DataDistributor();
private Crawler crawler = new Crawler();
}
}





Crawler.cs

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.IO;

namespace Core
{
class Request
: IDisposable
{
public Uri address { get; private set; }
public WebRequest request { get; private set; }
public WebResponse response { get; private set; }
public Stream stream;
public MemoryStream memory;
public string contentType;
public byte[] content;

public Request(Uri address, WebRequest request)
{
this.address = address;
this.request = request;
}

public void connected(IAsyncResult ar)
{
response = request.EndGetResponse(ar);
}

public void Dispose()
{
if (stream != null)
stream.Dispose();
if (memory != null)
memory.Dispose();
if (response != null)
response.Close();
}
}

public class Crawler
{
/**
* Adds the given provider to the list of providers.
*/

public void add(DataProvider provider)
{
lock (providers)
providers.Add(provider);
}

/**
* Requests the resource from the given address.
* Once the request is finished, it can be fetched
* from the crawler by calling next.
*
* @throws FormatException when the address is not well formed
*/

public void get(string address)
{
Uri uri = new Uri(address);
if (!uri.IsWellFormedOriginalString())
throw new FormatException("The given address is not well formed");

get(uri);
}

/**
* Requests the resource from the given address.
* Once the request is finished, it can be fetched
* from the crawler by calling next.
*/

public void get(Uri address)
{
lock (addresses)
{
addresses.Enqueue(address);
}
}

/**
* Fetches the next reply from the list of replies.
*
* @returns null when there is no reply
*/

public Reply reply()
{
return replies.Dequeue();
}

/**
* @returns the number of currently active requests
*/

public int numRequests()
{
return requests.Count();
}



/**
* Runs as long as there are addresses in the queue
* and fetches the appropriate replies.
*
* @returns when there are no more addresses
*/

public int run()
{
int n = 0;
while (run_once())
++n;

return n;
}

/**
* Pops one address from the queue, fetches
* the reply and then returns.
*/

public bool run_once()
{
if (requests.Count() >= maxRequests)
return false;

Uri address = next();
if (address == null)
return false;

request(address);
return true;
}


/**
* Requests the content behind the given address.
*/

private void request(Uri address)
{
WebRequest webRequest = WebRequest.Create(address);
Request request = new Request(address, webRequest);
webRequest.BeginGetResponse(new AsyncCallback(this.connected), request);
requests.Add(request);
}

/**
* This method is called once the connection with the resource has been established.
* As a result, the content will be downloaded.
*/

private void connected(IAsyncResult ar)
{
Request request = (Request)ar.AsyncState;

try
{
request.connected(ar);
WebResponse response = request.response;
request.content = new byte[32 * 1024];
request.memory = new MemoryStream();

Stream stream = response.GetResponseStream();
request.stream = stream;
request.contentType = response.ContentType;

stream.BeginRead(request.content, 0, request.content.Length, new AsyncCallback(read), request);
}
catch (WebException)
{
requests.Remove(request);
request.Dispose();
}
}

private void read(IAsyncResult ar)
{
Request request = (Request)ar.AsyncState;

int read = request.stream.EndRead(ar);
if (read > 0)
{
request.memory.Write(request.content, 0, request.content.Length);
request.stream.BeginRead(request.content, 0, request.content.Length, new AsyncCallback(this.read), request);
}
else
{
request.content = request.memory.ToArray();
requests.Remove(request);
replies.Enqueue(new Reply(request.address, request.contentType, request.content));
request.Dispose();
}
}

/**
* @returns the next uri to look up
*/

private Uri next()
{
Uri address = addresses.Dequeue();
if (address == null)
{
// Check the providers for addresses...
lock (providers)
{
if (providers.Count() > 0)
{
int i = 0;
while(i < providers.Count())
{
Uri addr = providers.data();
if (addr != null)
{
address = addr;
break;
}

++i;
}

if (address != null)
{
// Reinsert the provider at the end of the list, so that
// other providers are asked before him the next time
DataProvider provider = providers;
providers.RemoveAt(i);
providers.Add(provider);
}
}
}
}

return address;
}

private List<DataProvider> providers = new List<DataProvider>();
private ThreadQueue<Uri> addresses = new ThreadQueue<Uri>();
private ThreadQueue<Reply> replies = new ThreadQueue<Reply>();

private List<Request> requests = new List<Request>();
private int maxRequests = 10;
}
}




Reply.cs


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.IO;

namespace Core
{
public class Reply
{
public Uri address { get; private set; }
public string contentType { get; private set; }
public byte[] content { get; private set; }



/**
* Initializes a Reply with the given address and content.
*/

public Reply(Uri address, string contentType, byte[] content)
{
this.address = address;
this.contentType = contentType;
this.content = content;
}
}
}





ThreadQueue.cs


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Core
{
/**
* Threadsafe queue.
*/

class ThreadQueue<T>
{
/**
* Enqueues the given element into the queue.
*/

public void Enqueue(T value)
{
lock (queue)
queue.Enqueue(value);
}

/**
* Removes the first item from queue.
* Returns a default T in case there is no item in the queue
*/

public T Dequeue()
{
lock (queue)
{
if (queue.Count() > 0)
return queue.Dequeue();
else
return default(T);
}
}

private System.Collections.Generic.Queue<T> queue = new Queue<T>();
}
}





Thanks for reading my mess :)

*edit*
Added code for the crawler class.

[Edited by - SiS-Shadowman on August 23, 2010 11:38:05 PM]

Share this post


Link to post
Share on other sites
Advertisement
What happens when you run it in the debugger? Where does it break when the exception occurs?

(You've given us a second copy of your bot class instead of your crawler class, by the way...)

Share this post


Link to post
Share on other sites
I would expect that the crawler is being super effective and pulling in tons of links (which just gets more and more memory until it dies), or your 'have I visited this site' constraint is broken.

Advice:

- eventually you want to learn the using keyword.
- I recommend using braces {} even when they're not required.
- Doing the thread setup in Bot's constructor (or similar complex logic in constructors) is generally poor form and not very C#-like.
- Just use ReadToEnd on the stream. No need to push it into the StringBuilder. (though understandable if being ported from C++)
- And there's likely thread utils to help more, but I don't know of them offhand.

Share this post


Link to post
Share on other sites
You should not explicitly upcast. Don't say
add((DataProcessor)proc);
Say
add(proc);
instead.

Casting operators are for downcasting -- which is undesirable and risks throwing an exception. When reading code, your reaction to seeing a cast should be "that looks unsafe." If you also write upcasting operators, you are being the boy who cries wolf.

Edit: Also, capitalize your method names. Welcome to C#.

Edit: Dequeuing should really throw an exception (or block?) instead of returning the default value. Silently returning null is for Javacoders. If necessary, add this method:

        public T DequeueWithDefault(T defaultValue)
{
lock (queue)
{
if (queue.Count > 0)
return queue.Dequeue();
else
return defaultValue;
}
}


Another good idea is as follows:

        public bool TryDeque(out T value)
{
lock (queue)
{
if (queue.Count > 0) {
value = queue.Dequeue();
return true;
}
else
return false;
}
}


Finally, note that you definitely should not call .Count(). That's an extension method defined in System.Linq.Enumerable that will enumerate over your queue in O(n) time and count how many values there are in it. You want to use the property .Count, which is part of Queue<T>, which (presumably) returns in constant time.

Share this post


Link to post
Share on other sites
In addition to what others have said, consider using Xml documentation comments. They will allow you to easily generate api docs as well as being automatically parsed by the IDE.


public interface DataProcessor
{
/**
* @returns the name of this processor
*/
string name();

/**
* @returns the author of this processor
*/
string author();

/**
* @returns the version of this processor
*/
int version();


These should really be properties.

Share this post


Link to post
Share on other sites
Other random comments:

I usually order classes like so (I'm not sure if most C# coders do this or not - just the few I've worked with), rather than the privates-last style common to C++:

member variables
properties
constructor
Dispose/OnClosing
public and private methods intermixed



Interfaces: The only real prefix I see used in C# is the capital I in front of interface types. IDataProcessor, for example. C# writers don't typically use 'C' in front of classes though.

Interfaces can specify properties in addition to methods:


public interface IDataProcessor
{
string Name { get; }
// etc...
}

class DataDistributor : IDataProcessor
{
public string Name { get { return "DataDistributor"; } }
}




For your out-of-memory error, my best guess is that your URL fetching code (in the Crawler class, which is not posted?) needs to be disposing something from the web response. Go through and see if any of the classes you get returned have 'Dispose' methods.

My second guess is that maybe you have too many threads running and the thread stacks are taking up too much memory. You could try using a ThreadPool to cleanly limit the number of active threads at a time.

Don't worry about the GC "not able to free stuff fast enough" - the GC will suspend your program when it can't make an allocation, attempt a collection to free up space and compact the heap, then it'll try to make the allocation again. If THAT fails, then you get the OOM exception. So basically, the only way to run out of memory is to actually run out of memory. I usually only get out of memory errors after hitting the 2 or 3 GB per-process limit (in 32-bit mode anyway) or the system-wide swap-space limit (usually 16GB or so).

There are only a couple 'gotchas' that lead to memory/resource leaks in .Net:

- Forgetting to Dispose things. Whenever you start using a .Net type, check to see if it has the 'Dispose' method. Unfortunately, Dispose is like 'delete' in C++; you have to figure out where to call it yourself.

- Accidentally using events in a way that keeps short-lived objects alive. (or at least you EXPECT them to be short-lived). This usually involves having a long-lived mediator class provide some kind of global event that tons of temporary objects add handlers to, but you forget to unregister those handlers when you get rid of the temporary objects. The handlers have a reference to the temporary objects which keeps them alive.

[Edited by - Nypyren on August 23, 2010 8:37:09 PM]

Share this post


Link to post
Share on other sites
Thanks for the detailed replies.
I'll try to answer everyone in order.

@mhagain
The last time the exception occured in UriProcessor when creating a new StreamReader.
(I also added the crawler code to the OP).

@Telastyn
- eventually you want to learn the using keyword.
- Just use ReadToEnd on the stream. No need to push it into the StringBuilder. (though understandable if being ported from C++)
From what I gathered from the msdn, it auto inserts some code that eventually calls Dispose on the object I'm using. I guess I should get used to stuff like this then?

string content;
using (Stream stream = new MemoryStream(reply.content))
{
using (StreamReader reader = new StreamReader(stream))
{
content = reader.ReadToEnd();
}
}



- I recommend using braces {} even when they're not required.
I'll do that.

- Doing the thread setup in Bot's constructor (or similar complex logic in constructors) is generally poor form and not very C#-like.
So should I add a Start() method for this job?

@sam_hughes
I added the upcast because of the 2 possible overloads: add(DataProcessor) and add(DataProvider), which UriProcessor implements both. I changed the methods to AddProcessor and AddProvider instead, which lets me get rid of the upcast.

I'm getting used to capitalized method names, although it seems a bit weird to me :D

About Dequeue: I don't want this method throw since the queue being empty is expected to happen, throwing in every case seems unappropriate to me. However I do like your other idea, so I'll add a blocking Dequeue and non-blocking TryDequeue method.

Thanks for the hint about Count(), I didn't know about it. I changed my code to use the property instead.

@ChaosEngine
Is there a way to have Visual Studio auto generate comment stubs for me?
I changed the interfaces to use properties for that instead.

@Nypyren
I hacked a quick 'n dirty Dispose method, however I'm really unsure if I'm doing it the right way (besides missing the recommended disposed property for the whole IDisposable-msdn-recommended pattern). WebRequest implements the IDisposable interface, however the compiler complains about a missing Dispose implementation. This doesn't make sense, the compiler should realize that Dispose is implemented elsewhere and make a virtual call. I called Close instead, but I don't know if this is correct.

// Taken from Request, inside Crawler.cs
public void Dispose()
{
if (stream != null)
stream.Dispose();
if (memory != null)
memory.Dispose();
if (response != null)
response.Close();
}



I'll upload the rest of my sources later.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiS-Shadowman
@ChaosEngine
Is there a way to have Visual Studio auto generate comment stubs for me?
I changed the interfaces to use properties for that instead.

@Nypyren
I hacked a quick 'n dirty Dispose method, however I'm really unsure if I'm doing it the right way (besides missing the recommended disposed property for the whole IDisposable-msdn-recommended pattern). WebRequest implements the IDisposable interface, however the compiler complains about a missing Dispose implementation. This doesn't make sense, the compiler should realize that Dispose is implemented elsewhere and make a virtual call. I called Close instead, but I don't know if this is correct.


For #1: In visual studio, just type three forward slashes in a row /// on the line before whatever you want to comment to auto-generate an XML comment stub for it. If it's a method, it will auto-generate a stub for each argument and the return type as well. Most other things just generate a 'summary' tag. All of the XML comment information will appear in the intellisense tooltips, too.

For #2: WebResponse.Dispose is private (yes, you can implement interface methods as private). You actually CAN call Dispose, but only if you cast to IDisposable first. Awkward, right?

From looking at WebResponse.Dispose in Reflector, it just calls 'Close' followed by 'OnDispose'. Close cleans up the object. OnDispose doesn't really do anything important. So you're correct that you can call 'Close' to clean up. (Streams work in a similar way - in their case, Stream.Dispose() calls Stream.Close() which calls Stream.Dispose(bool), which then actually does cleanup.)

Share this post


Link to post
Share on other sites
Quote:

From what I gathered from the msdn, it auto inserts some code that eventually calls Dispose on the object I'm using. I guess I should get used to stuff like this then?


Not eventually, of course, but deterministically. It gets directly translated to a try-finally block. If using blocks didn't exist, then everybody would end up writing and using a function like this:

static class Util {
public static void Using(T value, Action<T> func)
where T : IDisposable
{
try {
func(value);
}
finally {
value.Dispose();
}
}
}


And calling it in the typical fashion...

    Util.Using(resource.Open(), handle => {
var x = handle.ReadOrWhatever();
...
});


A more useful variant is the kind that takes a Func<T, U> and passes the return value along.

Another handy thing is to make an IDisposable class that takes an Action in the constructor, that exists just to do arbitrary computation in the Dispose method.

Quote:

@Nypyren
I hacked a quick 'n dirty Dispose method, however I'm really unsure if I'm doing it the right way (besides missing the recommended disposed property for the whole IDisposable-msdn-recommended pattern).


If one of the Dispose calls throws an exception, the other resources won't be disposed. You might want to consider that. Regarding the MSDN recommendations, it's pretty normal to ignore them and assume your object will get disposed exactly once. Of course, you then have to write properly exception-safe code.

Share this post


Link to post
Share on other sites
Quote:


string content;
using (Stream stream = new MemoryStream(reply.content))
{
using (StreamReader reader = new StreamReader(stream))
{
content = reader.ReadToEnd();
}
}



Actually, using doesn't need to take a block. This can be written as

string content;
using (Stream stream = new MemoryStream(reply.content))
using (StreamReader reader = new StreamReader(stream))
{
content = reader.ReadToEnd();
}


Which often makes things a bit cleaner looking (and doesn't suffer from some of the confusion that other optional-block statements).

Though the exception handling issues sadly still remain. That any Dispose method throws is one of the definite warts on the standard library.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!