Jump to content

  • Log In with Google      Sign In   
  • Create Account

Antheus

Member Since 03 Mar 2006
Offline Last Active May 31 2012 04:03 PM

#4938066 Task/Job Scheduler code review

Posted by Antheus on 07 May 2012 - 08:13 AM

I'm not a fan of such design or current implementation thereof.

There's still race conditions. Race conditions occur when two atomic operations occur one after another, yet they should be atomic as a unit (or transaction).

This, for example:
m_workerMutex.lock();
m_tasks.push(t);
m_taskCount++;
m_workerMutex.unlock();

<-- I steal a task here

m_idleMutex.lock();
m_idling = false;
m_idleMutex.unlock();
<-- I have no tasks, but I'm not idle

There is too much interlocked or duplicated state:
m_taskCount and m_tasks. Why?

Then there are m_workerMutex, m_idleMutex, m_wakeUp, m_done, m_idling, m_dependencyBlocked.

A thread is a state machine. All of the above can be converted into a single variable or a synchronization primitive. Draw a graph on paper on how worker works, which states it can be in and under which conditions it can transition. Such formalization also allows proof of correctness.

Querying for idle() is highly undesirable. It's serial action ("if (idle()) "). Conditions and serial execution harm the throughput. Worker knows exactly when it's idle (it's either executing a task, or it's waiting on a condition variable).

ready() - a bad idea. A thread is either executing a task or it's waiting on condition variable.

To manage dependencies, look into topological sort. Single responsibility principle and all that - workers work and no more. Deciding what and how to run is the job of scheduler. They only submit tasks once they know they can run.

Reason:
Tasks a, b;

a::ready() {
  return b->ready();
}
b::ready() {
  return a->ready();
}

As far as concurrency goes, here's a hard rule: Always acquire shared resources in same order.

If scheduler is supposed to resolve dependencies, it should do so at task level. Or, instead of opaque ready(), do something like this:
struct Task {
  vector<Task *> depends_on();
}
Using topological sort, scheduler can now determine the order in which to execute, or abort if a cycle is detected.


#4937794 java serializable class, really usefull?

Posted by Antheus on 06 May 2012 - 09:45 AM

mmh, I'm not getting result as good as you Antheus :


You are.

Explicit sync makes sure that each chunk is really absolutely positively written to disk. So it's the slowest possible case.


You'll notice two cases, one set of times is around 17ms, which is very close to what I get. The other is in 50ms range.

9-17ms is fairly easy to explain. Seek time of disk (around 8-10ms) plus the write.
50ms happens when OS need to force flush and wait, maybe it has something else going on, maybe another process is doing disk IO, so it takes longer.


Improvements from deferred writes vary. On laptop it might be disabled altogether for increased reliability (in case battery goes out) or the OS/disk cache might be full or too small or too slow. Deferred writes may improve things, but they aren't magic, they merely let your thread run ahead while OS does the work in the back. If that isn't possible, it won't be any faster.

Deferred writes also do not magically increase throughput. If enough data is written, times will settle at limits of disk IO, since OS cannot afford to buffer too much data it claimed to have written to disk. I could save 500MB then turn off the machine, thinking it's safe, while the OS would still need 2 minutes to flush everything from memory.


For the second case, you're using text serialization. You write roughly 2.5 times as much data. First example uses exactly 2 bytes per value. Second example uses 4-5. Timing is consistent with that.

The numbers above give hard limits on how long the disk IO takes.


#4937600 java serializable class, really usefull?

Posted by Antheus on 05 May 2012 - 07:25 AM

But my write time is about 65ms and read time 135ms wich is quite long since a scene is composed of 900 sectors..


1 sector is 64kB. 900 will be ~56MB. That takes time either way.

To write a sector:
byte sector[] = new byte[65536];
FileOutputStream fs = new FileOutputStream("sector413.dat");
fs.write(sector); // fs.read(sector); to load
fs.close();
No magic here.

How long this takes depends on disk and file system, especially if there are many files. 64/130 ms sounds a lot, but not impossible.

64k in a single file is very convenient since it doesn't cause overhead, storing the coordinates causes just about maximum overhead possible (99.7%
overhead for the coordinates due to typical 4k disk page size).

Serializable could be used for the above, but doesn't bring much to the table beyond.


One way is to keep index separate as a file which contains only:
class Index {
  float x, y;
  int filename;
}

List<Index> indices;
File is represented as a number, so '485.dat', '0.dat', ....


#4937586 Number of arrays CPU can prefetch on

Posted by Antheus on 05 May 2012 - 06:27 AM

From here, 2.1.5.4 (pdf).

Up to 32 streams, but only 1 forward/backward stream per memory page, depends on number of requests and many other factors. So if all your data is located close and iterated in same direction, you get one stream.

But it's not something you should optimize for, it's painfully model/microcode specific, a revision of CPU may change it.

structure of 25 arrays for SSE processing?


Load/stores are by far the most expensive part, so minimize that.

My guess would be that two arrays, one for input, one for output would work best.


#4937131 Building an Engine: Rough Schedule of Learning

Posted by Antheus on 03 May 2012 - 09:57 AM

10 years ago, there was no Facebook, Gmail, quad cores, Twitter, let alone Kickstarter, ...

Don't worry about 10 years from now.

Instead, make sure you complete something every week.

building a somewhat simpler 2D engine


It's called Flash. It's a painfully solved problem. Why not try that? Build a game in one week. Then another.

Then move to Unity3D and do the same.

And in one month, you'll either have 4 games or realize that what you really want is to build engines, something that has no real world use (because nobody builds an engine, everyone builds a game first, then perhaps reuses that for second one and calls it an engine).

Engines as a technical challenge were relevant in the 90s with lack of drivers and GPUs and whatnot. These days there's library for everything. Look at Unreal, 30 or so external libraries.

There are exceptions, but they don't apply to someone starting off.

To take your painting example, I would compare it rather with a new painter buying a canvas and slowly learning the techniques, each time painting a different picture, maybe one with a certain brush stroke, one with perspective etc.


That means making games.

Building an engine is akin to growing and chopping wood for frame and shearing sheep to make canvas.


#4936915 Constructing a 3D terrain using map services API.

Posted by Antheus on 02 May 2012 - 05:22 PM

'0'. High speed.

If it's cloud based, then for any non-casual amount of data you'll need to buy a commercial license or be banned after a few requests.

If it's downloadable, speed doesn't matter.

I noticed that Google Maps resolution is quite poor

10-15m is about as good as it gets. Individual features may be available at slightly higher resolutions.

I know some people who build a drone with LIDAR and they can scan surface at 5cm resolution. But it's been a while and I'm not aware of them getting any further, one would need hundreds of drones along with appropriate budget and manpower, as well as lots of permissions to scan any meaningful area. There's definitely nothing at global scale.

Keep the data size in mind. The only data about Google Maps I can find is from 2006 and they claim 70TB.


Then there's the usual disclaimer, data like this is heavily restricted, so read the licensing terms.

I also prefer to work with C++ or C APIs,


API for what?


#4936530 Confused as to why this wont compile

Posted by Antheus on 01 May 2012 - 02:51 PM

Source code (a.cpp, x.h) --(preprocessor)--> compilation_unit ---(compiler)--> a.o/.obj
Source code (b.cpp, x.h) --(preprocessor)--> compilation_unit ---(compiler)--> b.o/.obj --(linker)--> executable
Source code (c.cpp, x.h) --(preprocessor)--> compilation_unit ---(compiler)--> c.o/.obj

cpp file #includes header (.h) files. Compiler then converts stuff in those into code and symbols. nAssets is a symbol. At that point it's just a reference to something unknown, compiler doesn't care about it beyond a name.

Since each cpp is processed independently, we end up with 3 object files, each containing their own copy of nAssets. When linker puts all 3 together, it has 3 copies of nAssets.

By putting nAssets into .cpp file, such as (a.cpp), it will only generate one symbol, avoiding the confusion.

Header guard (pragma once) only prevents inclusion into same compilation unit. If x.h is included multiple times when preprocessing a.cpp, it will only appear the first time. But it will be included fully again when preprocessing b.cpp and c.cpp.

int Asset::nID=0;

This line actually says: "reserve 4 bytes inside executable at precisely this location. When someone needs nAssets symbol, give them this pointer".

Unlike most other languages, it's more than just semantic annotation, it's fairly important choice on where to put it.

it's best practice to wrap the code in a header with "#ifndef" directives.


It is, but pragma once is fairly well supported these days and does the same. I prefer header guards myself.


#4936472 Assigning lambdas

Posted by Antheus on 01 May 2012 - 11:53 AM

Visual Studio, even the latest 2011 beta is next to useless for C++0x.

2010 is not even in contest.

To mess with C++0x, use GCC, at least for next several years. CLang also appears to be a good choice.

A different comparision (note the partial support, even for VS12.0, almost everything is way behind GCC).


#4936461 Ideas for how to improve code

Posted by Antheus on 01 May 2012 - 11:10 AM

bl = (boolRep != 0) ? true : false;


Idiomatic approach, one accepted in C (standard for embedded) as well as many other languages.

b1 = !!boolrep;
! obviously stands for negation.

Result of the expression is a true boolean value and it works even without explicit bool type. As a common idiom, it also behaves intuitively.


#4935899 Music\Sounds over Network (SDL_Net)

Posted by Antheus on 29 April 2012 - 03:28 PM

I would not say that a blocking operation on send is any trouble, I mean how often can data not be sent?


If socket has too many unacked frames, it will block. In addition to usual kernel buffer overflow, where just the send buffer fills up.

However:
send(socket, buffer, 1024 * 1024);
and you're pretty much stuck for at least one round trip. While not necessarily a big deal on its own, this is the inner server loop, so for the 75ms, your server is waiting for ack, idling. 14 clients, and you're capped.

It's less of a problem on LAN, but the problem doesn't go away.

Slow start is less of a problem for persistent connections, but add some packet loss and always-blocking sockets simply aren't an option.

I found it possible to have non-blocking recieves in SDL_Net:


Receives are fine, it's sending that's the problem.


#4935876 Optimizing floating point computations (simulating electric motors)

Posted by Antheus on 29 April 2012 - 01:47 PM

from roughly 60 seconds to about 50 seconds. Not an astronomical improvement but nothing to be sneezed at either.


Is this the worst case? How many of such test cases will be run? How many hours/days/years of computational work are we talking here?

For comparison: 1 month of compute time of Amazon costs $70. For lowest end software engineer salaries, one can get 10-20 instances running full-time, giving you 2000% improvement, compared to 17% improvement you've made so far.

Another option is buying more machines. Weigh the cost of development vs. cost of an i7 (~$250).

What are the targets here?

Accuracy? SIMD and GPU may be fast, but have accuracy problems. GPUs can suffer from overheating and corrupt computations undetectably. It's no big deal for graphics, but number crunching can be a problem. Fast math is likely a no-go for this reason as well, if results need to be accurate, then it'll sooner be :precise or whatever the most robust version is.


Since 60s computation time is being mentioned, it doesn't seem it has to be interactive. In that case, range of solutions becomes very large, but as far as cost factors go, optimizing the algorithm is not one of them - hardware is simply too cheap.

it's pretty tenuous work and I have to take great care not to accidentally mess up any of the rather complicated calculations.


You do run full test suite after each change, right? You do have unoptimzed reference implementation against which the values are compared?

If not, then instead of optimization write that.

- One is a table of hand-calculated or text-book results
- Second is unoptimized reference implementation. Slow, compiled for maximum reliability, perhaps in debug mode
- Then come incremental improvements. If results differ, RMS or similar statistic is measured against first two to determine deviation

Having that, optimizaton can proceed incrementally with only a few seconds to test validity. For complex computations and simulations, any mistake will quickly cause large divergence.


#4935114 Code management

Posted by Antheus on 26 April 2012 - 10:18 AM

antheus - thanks for the response.

NuGet seems to be most closely with what I'm looking for. Unfortunately, I don't write in C#. It would be nice if there was something like this for C++.


{
  std::vector<int> foo;
}

Congratulations, you just included entire Linux or Windows kernel, all 50 million lines of it.


#4935078 Refrences and Temporaries, or Why Does This Work?

Posted by Antheus on 26 April 2012 - 08:32 AM

I was expecting an AV, or some sort of exception.


Both of those are artefacts of OS or compiler, they aren't needed in general case.

Access violation is triggered when process attempts to access part of memory it didn't allocate with sufficient permissions. As far as C and C++ go, there is absolutely nothing wrong with simply writing bits and bytes anywhere into addressable memory.

Exceptions are also compiler implementation detail, but accessing memory is allowed.

To cover all uses, C and C++ must allow for this:
int * p = (int*)0x22005a9c;
*p = 42;
p happens to be pointing to memory mapped device and writing 42 performs some action. Stuff like that is common when working with hardware directly. Before DX, in era of VESA and such, it's how graphics were done. There's something to be said about simplicity:
char * screen[80] = 0xB800; or // text buffer or 0xA000 for graphics

screen[17][25] = #; // draw # on row 17, column 25;

C++ would obviously prefer this be encapsulated somewhere, but at some level, one runs out of abstractions.

Neither C nor C++ create a perfect abstraction layer above that, meaning above functionality is always present. Java or C# however simply do not have syntactic or other constructs to make above possible.


#4934576 Java - How to stop or pause a thread?

Posted by Antheus on 24 April 2012 - 05:37 PM

"Blocks the calling thread until a thread terminates, while continuing to perform standard COM and SendMessage pumping."


That is .Net.

AsyncTask


That is Android.

stop() is deprecated,


And that is JDK/JEE's Thread class.

So the original question mixes some 3 different platforms.


For which of those is the question intended?

Is there any better solution?


For what exactly? Threads are created, they run and then terminate. Stopping or otherwise interferring with them has long been abandoned due to problems it causes.

What are you really trying to accomplish?


#4934467 Problem: turning a bit shift into a division

Posted by Antheus on 24 April 2012 - 09:04 AM

I really don't understand the goal here.

It should duplicate original behavior, yet it doesn't matter if algorithm isn't correct.
It should be portable, yet should be a perfect match for generated assembly opcode.
It should be compiler independent and standard-compliant yet perfectly optimized for any compiler, even those not written and those running on exotic architectures.
Performance is secondary to portability yet performance comes first.

Bitshift is perfectly fine solution if performance matters. Otherwise, there's a so-called reference implementation available on stack overflow.

Implement that, then profile. Is it good enough? Then it's fine. Otherwise, use an ifdef and a bitshift.

Especially since for this type of optimization keyhole optimizations won't matter. 5 lines of assembly is frequently faster than a single instruction.

And if performance is crucial, there's SIMD, for which either approach looks good.




PARTNERS