Jump to content
  • Advertisement
Sign in to follow this  
Tomaz Kristan

ArtificialSort

This topic is 4183 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Advertisement
This is interesting. I am wondering if you could write up some information about the algorithm itself, for those who are not familiar with C/C++?

Share this post


Link to post
Share on other sites
Lots of pictures...

But where's the analysis of the algorithm, proving it performs better than QS.

Best case, worst case, average, typical case...

Better than which QS. How is the boundary element for QS chosen? Single, Random? Median? Average? How many samples? etc...

Also, how was QS for comparison implemented? From a very fast and cursory look, it seems that Artificial sort is simply some form of quick sort with inner loop unrolled using insertion sort.

The reason I ask is because asymptotic performance is AS is exactly the same as QS, with exception of a constant.

Also, AS doesn't seem to unroll for array sizes of 3 elements or less, when the insertion sort is completely redundant.

These details are what give the huge running time differences for small number ranges, where QS is known to be suboptimal.

Share this post


Link to post
Share on other sites
The C++ code is on the site I gave. You can cut off the insertion portion if you like. But it gives some additional performance to Artificial, just as it gives some additional performance to Quick sort. That for sorting smaller chunks only.

But both can do almost as well without the Insertion part.

The main difference between those two sorts - Artificial and Quick is the fact, that the Artificial checks the segment's order before performs the sort. That gives it a huge edge quite often. See the table and try it yourself. Use the best known QuickSort you can get.

Share this post


Link to post
Share on other sites
And how does it compare to for example introsort?

And when I watched the animation example quicksort won :P

Share this post


Link to post
Share on other sites
I don't get it. Your "random data" animation shows QuickSort finishing first, and the QuickSort algorithm in your "stair data" animation seems to be using a silly partitioning algorithm (as opposed to Dutch Flag). Could you provide more information about the version of QuickSort you used?

Also, small nitpick: the provided source code is C, not C++.

Bigger nitpick: try to make your web site sound less like a snake oil salesman, and more like a serious computer science discussion. You are, after all, addressing to developers, which have enough knowledge about the problem field to understand what you have to say.

Share this post


Link to post
Share on other sites
For the small random files, Quick may seldom win. Also at the example shown. Facts are facts.

But you have the table for larger files, how they behave. That's important.

You are free to choose any known QuickSort algorithm in C/C++ and compare it to ArtificialSort given.

If you think it's a snake oil, just test it, it's easy to do.

Share this post


Link to post
Share on other sites
I think it should be interesting to see the complexity of this algorithm...

Probably the order is O(nlogn), but the complet complexity can prouve effectively it's faster than the quicksort (in average case).

And I have a question, with your variable "meja", you have this:
meja = ( temp1 + temp2 + 1 ) >> 1;
if ( array[ (left + right) >> 1 ] > meja )
{
meja = array[ (left + right) >> 1 ];
}

If I want to sort... clients by Lastname/Firstname... how, can I do the "( temp1 + temp2 + 1 ) >> 1" ??

Base on this fact, this algorithm, is only good for numbers... the Quicksort, work for every kind of data... and the section above, looks pretty important because it act like the pivot in the quicksort...

It's like the Counting Sort, it's work in linear time, but it's work only for number, and you have to know the range of your data entries...

The ArtificialSort has this problem... unless, someone has a suggestion to adapt the algorithm for every kind of data...

[Edited by - widggman on April 7, 2007 9:54:51 AM]

Share this post


Link to post
Share on other sites
First, a disclaimer: I'm not criticizing your algorithm at all. As a matter of fact, I don't have an opinion about it yet. What I am criticizing, however, is your web site, and the methodology you use to present information to us. It is essential, if you are serious about advertising your algorithm, or the program that generated it, that you make an effort to improve the credibility of your web site in general, and of this article in particular.

Quote:
But you have the table for larger files, how they behave. That's important.


It would have been important, had it made any sense. As it is presented here, the table provides us with:
  • The speed at which your algorithm runs on unspecified hardware.
  • The speed at which an unspecified algorithm (some flavour of QuickSort) runs on unspecified hardware.


So, we do not have access to any absolute measure of performance (such as complexity, average number of swaps, reads, writes or comparisons), which makes it hard to judge your algorithm. But, even worse, we do not have access to any relative measure of performance either, since we do not know which particular algorithm you compared it to. Was it a suboptimal quicksort? A standard library generalistic one not optimized for integers? The seed for your evolution?

Your table provides no relevant information, it's just hand-waving. If you're serious about your algorithm, the first step would be to make the information relevant by adding information about:
  • Hardware (processor, memory size) and OS
  • Compiler, optimization options
  • The source code of the quicksort algorithm used.


Without these, the table is useless. It would be even more useful if it included statistical significance information (p-value for the hypothesis, at least), and compared several more algorithms instead of just two.

Quote:
You are free to choose any known QuickSort algorithm in C/C++ and compare it to ArtificialSort given.


This is an extremely bad thing to say. Remember, you're trying to convince people that your algorithm is faster than QuickSort. So far, you've provided people with your source code, and told them to do the rest themselves. Which 90% of people will not do, because:
  • They are too lazy.
  • They can't do it right now, and will forget about it by the time they can.
  • They wait for someone else to do it.
  • They don't have access to resources for a relevant benchmark.
  • They don't care.


In the current state of your presentation and web site, most people will read over the page, think "Ok." and move on to something else. When convincing people, it is essential that you come up with clear, detailed and cleanly presented facts, and an explanation of why these facts justify your initial assertion. Counting on the addressee to prove your pont himself is argumentative suicide.

Would you be sending incomplete resumes to companies, with the footnote "you are free to examine my background yourself" ? This would certainly not get you hired. Apply the same principle here: provide complete information yourself, don't wait for people to find it on their own.

Quote:
If you think it's a snake oil, just test it, it's easy to do.


Your web site sounds like a snake oil salesman. This is most unfortunate, since it makes you lose your credibility, and even though the statements may be entirely correct, people will tend to disbelieve you.

For instance, if you were to ask a girl out, a good idea would be to explain where you will be going, what movie you will see, what kind of dinner you will eat, and other practical details about the date itself. Would you explain to her that you're not a rapist, a transsexual or already married? Keep your discourse to the bare facts, without meta-reasoning about the state of mind of the reader, philosophical subjects about how the universe works, tangent remarks about trivia, gloating about how clever and great and excellent your approach is, and anything that is not a justification of your initial assertion.

Now, does any serious site mention that the reader might not believe his own eyes? No. Most of them act as if what they were saying was obvious and self-evident truth, and then present justification in a clear and transparent way. Who mentions the reader's disbelief?
  • People who try to build up hype. "So good you won't believe your eyes." This will get the average Joe's attention, but is easily recognizable by people in the same field of competence.
  • People who don't have arguments to justify themselves, and resort to meta-reasoning to distract the reader from the point (the point being the stated facts, and not the user's belief in them).

So, when people read such remarks, they will automatically associate you with these shady types, even if you are in fact honest and innocent.

Reading your main page for Critticall, half of the page is dedicated to information not directly related to the problem at hand. This means that, upon reading a new sentence, there is a 50% chance for me not to care about what that sentence says. This is unusually high. I'm not saying that you're a snake oil salesman, but your site looks like one, and it would be best for you to move away from that image. Again, this is friendly advice about how to improve your communication, and not dismissal of what you have to say.

A great structure for your algorithm presentation would be:
  • Abstract: present in a few broad strokes what your algorithm is about. For instance, "A divide-and-conquer high-performance sort algorithm: presentation, comparison with existing solutions, source code".

  • The algorithm: provide a pseudocode explanation of your algorithm. Something that does not have the readability issues of C code. For instance, a divide-and-conquer algorithm is usually best presented as a recursive solution, not as an imperative-with-stack optimization. Also, the insertion sort optimization is not relevant to the algorithm: its rightful place is in a later section of the document, along with production code. This section is about explaining how the algorithm works, and why. It's also a great place to explain why your algorithm deserves a brand new name, even though the code provided on your page is, in fact, a typical QuickSort algorithm with an unusual partitioning routine.

    This section should not contain even a single allusion to the performance of other algorithms. You're explaining how your algorithm works, don't drown the reader in unstructured information. Performance comes in the next section, not here.

  • The results: this is where the results are. This generally involves one or two quick overviews (graphical animations, your demonstration section), followed by the actual benchmark. The benchmark should state what the hardware, OS, compiler and algorithms compared are. It should also describe the statistical significance of your results (the average time isn't enough, you will need at least the standard deviation, with p-values for the null hypothesis being expected in many cases). Provide links to the appendix.

    If at all possible, provide several benchmark cases: there is more to the world than just integer arrays, performance on string arrays (for example) would be great to have as well. Also, provide more than two algorithms. Try out several quicksort variants, along with bin sort, radix sort and other unrelated sort algorithms. And how does the algorithm compare, on nearly-sorted sets, to insertion sort or bubble sort? To quicksort with median-of-three selection?

  • The conclusion: wrap up your argument. Here, it would be that you have a new algorithm which performs faster than several typical implementations of other known fast algorithms, although it does have some documented weak points.

  • The appendix: the source code for all algorithms included in the benchmark is a must. A makefile and benchmark scaffold would be a definite plus. If you're developer-friendly, providing a generic version of your algorithm in an easily usable form (header file with inlined template implementation) would be much welcome. Make sure you give credit where credit is due, for instance for the code of the other sort algorithms in the benchmarks (including crediting yourself, if you're the author).


If you follow a similar structure, and keep to the point, it would improve credibility, provide the necessary information for judging your algorithm, and be a useful resource to everyone interested.

[Edited by - ToohrVyk on April 7, 2007 9:33:19 AM]

Share this post


Link to post
Share on other sites
Between 'Jones' and 'Lindberg' - "meja" is 'K'. Between 'ABBA' and 'ABEL', "meja" is 'ABC'. And so on. This way, the Artificial is a general sort.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!