Sign in to follow this  

Cyrus Script is now open source

This topic is 2535 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi all
Today I decide to make my script language available to others as open source.

About Cyrus Script
Cyrus Script is a script language similar to C written in C++ The main feature of Cyrus Script is its speed, It is really fast. In the early test it was just 15% slower than C++ and Its design is unique and you can bind it to your project very easily.

It has not any virtual machine and it use pointer to functions, pointer to member functions and pointer to members to run your script commands so it doesn't need stack :D or anything that slow the script languages.

It language features is not very much yet but I open sourced this in hope to get some help to improve this script language.

Here is the project page on sf.net
https://sourceforge.net/projects/cyrusscript/

The only way that you can get it is from svn.
svn co https://cyrusscript.svn.sourceforge.net/svnroot/cyrusscript cyrusscript

Here is the official site for the script language.
http://cyrusscript.com/

[Edited by - Kochol on December 28, 2010 2:41:03 AM]

Share this post


Link to post
Share on other sites
On your cyrusscript page it says:
Quote:

Taken from http://cyrusscript.com/about/
It has not any virtual machine and it use pointer to functions, pointer to member functions and pointer to members to run your script commands so it doesn't need stack or anything that slow the script languages.

But you may not have a virtual machine class, but you do have a lot of the functionality. You do have the instructions, you do have data. Most script languages have a virtual machine class. In cyrusscript, the script/instruction stream is the VM itself. It's a quite interesting idea.

You say 'In the early test it was just 15% slower then C++.', Not to criticize, but those are early test, wait till you implement script functions, structures, classes, etc. The performance will drop. But it's still very good that you're at such a performance.

But then how do you keep record of data if you don't use a stack, I can't seem to find this in your source code. How do you pass variables/operands to functions.

I must say, some of the implementation techniques used in cyrusscript are very nice.

assainator

Share this post


Link to post
Share on other sites
Quote:
Original post by assainator
On your cyrusscript page it says:
Quote:

Taken from http://cyrusscript.com/about/
It has not any virtual machine and it use pointer to functions, pointer to member functions and pointer to members to run your script commands so it doesn't need stack or anything that slow the script languages.

But you may not have a virtual machine class, but you do have a lot of the functionality. You do have the instructions, you do have data. Most script languages have a virtual machine class. In cyrusscript, the script/instruction stream is the VM itself. It's a quite interesting idea.

But then how do you keep record of data if you don't use a stack, I can't seem to find this in your source code. How do you pass variables/operands to functions.

assainator


Thanks for your interest in Cyrus Script

The first thing that I want to say is that the cyrus script is in early production and it's an idea just works :D

In my current game project I start to search for a script to bind it to my project but the bindings was difficult and the speed was low.

So I start to code my own script language that uses pointer to functions then I create Cyrus Scripts.

The problem that I had was when you have a function that the return type is not a pointer.
for solving this problem I add a temp buffer that copy an instance from returned function value and store it to the buffer so if your functions returns pointers then your scripts can run even faster.


//! Call the function and return its return
virtual void* InCallRet()
{
Treturn t = (*m_pObj.*m_pFun)(*_arg1);

return m_pPool->PushBack((void*)&t, m_iSize);
} // InCallRet


For now my script dose not support functions
But I add a syntax that you can call another script from current script but your script can't call itself.

here is an example
we have a scriptTest1 then we set its n parameter which is an integer to 15

scriptTest1.set("n", 15);
scriptTest1.Run();
int a = scriptTest1.get("n");

Share this post


Link to post
Share on other sites
Quote:
Original post by Kochol

The problem that I had was when you have a function that the return type is not a pointer.
for solving this problem I add a temp buffer that copy an instance from returned function value and store it to the buffer so if your functions returns pointers then your scripts can run even faster.


//! Call the function and return its return
virtual void* InCallRet()
{
Treturn t = (*m_pObj.*m_pFun)(*_arg1);

return m_pPool->PushBack((void*)&t, m_iSize);
} // InCallRet



I'm sorry, but this is a stack, you temporary store data witch is pushed and popped in a last on, first of method.

May I ask what books/articles you have read about the subject?

assainator

Share this post


Link to post
Share on other sites
Yes you are right.
It is a stack :D

I use this article for creating the script
http://www.flipcode.com/archives/Implementing_A_Scripting_Engine-Part_1_Overview.shtml

Share this post


Link to post
Share on other sites
Quote:
Original post by mind in a box
Hey, I just downloaded and compiled your package. I thought I inform you about the error you get because of the missing "unistd.h" on windows.
It gets #included in lex.cpp, line 26.
Commenting that line out seems to have fixed the problem.


It is not always a guaranteed fix.. You just stuff it away to see if it works, unitstd.h is a header for accessing the POSIX API.

Has been discussed on stackoverflow as well:

http://stackoverflow.com/questions/341817/is-there-a-replacement-for-unistd-h-for-windows-visual-c

Share this post


Link to post
Share on other sites
Quote:
Original post by mind in a box
Hey, I just downloaded and compiled your package. I thought I inform you about the error you get because of the missing "unistd.h" on windows.
It gets #included in lex.cpp, line 26.
Commenting that line out seems to have fixed the problem.


I added an empty unistd.h file to the folder to solve this problem.
The lex.cpp is created by flex and you don't need unistd.h file under windows.
You can comment it.

If you have any question I will happy to help you.

Share this post


Link to post
Share on other sites
Cyrus Script speed becomes very low when you want to calculate some math on script. I think it disable CPU cache or others feature on CPU when it wants to calculate your math commands and it become very slower than C++.

I start a research on OpenCL to attach it to the Cyrus script to boost the performance on math calculations.

Do you think it's a good idea to add OpenCL to Cyrus script?

Share this post


Link to post
Share on other sites
Well here you run into multiple problems.
One: You need a ATI HD 5xxx gpu or better to be capable of running opencl. And as there are a lot of users that have a 4xxx or 3xxx card, this might be a problem.

Two: Calling OpenCL add's overhead time. At a certain moment, you call a function to start a opencl function. The openCL driver need's to do some stuff and then it need's to send the parameters to the GPU where the opencl program starts, and then the result is returned and again some stuff is done and THEN you have your result. I think the overhead is to large to be of use, maybe you gain some performance if you have a vector(3/4, single/double) type you will gain any performance.

I might be wrong though, I suggest you make some benchmarks (execute 5.000.000 calculations of single and double floating point numbers and get the time it takes to execute it on the cpu and on the gpu and then compare). Make these single calculations as this is what is probably mostly used. You get something like:


float flt_num = 324.234f;

for(unsigned int i = 0; i < 5000000; i++)
{
temp float = i * flt_num;
}

for(unsigned int i = 0; i < 5000000; i++)
{
temp float = call_opencl_multiply_2f((float)i, flt_num);
}


Three: This means diverting resources from the rendering engine to the scripting engine. This might be so small that you won't even notice a difference, but it could also be slowing down the rendering by a large percentage.

The two main problems are support and performance.
If you don't care about supporting older or low-end cards, you only need to run some benchmarks and then decide if you want to use opencl or not.

A sidenote, if mathematics proves such a large performance impact, try to find the bottleneck and fix that first. It is better to remove or shrink the bottleneck then to add another huge part to your scripting engine.

assainator

Share this post


Link to post
Share on other sites
Quote:
Original post by assainator
One: You need a ATI HD 5xxx gpu or better to be capable of running opencl. And as there are a lot of users that have a 4xxx or 3xxx card, this might be a problem.
Please check your facts next time. OpenCL runs just fine on all 4xxx series ATI cards, all NVidia cards from the 8xxx series onwards, *and* on any x86/x64 CPU.

Share this post


Link to post
Share on other sites
Quote:
Original post by assainator
Well here you run into multiple problems.
One: You need a ATI HD 5xxx gpu or better to be capable of running opencl. And as there are a lot of users that have a 4xxx or 3xxx card, this might be a problem.

Two: Calling OpenCL add's overhead time. At a certain moment, you call a function to start a opencl function. The openCL driver need's to do some stuff and then it need's to send the parameters to the GPU where the opencl program starts, and then the result is returned and again some stuff is done and THEN you have your result. I think the overhead is to large to be of use, maybe you gain some performance if you have a vector(3/4, single/double) type you will gain any performance.

I might be wrong though, I suggest you make some benchmarks (execute 5.000.000 calculations of single and double floating point numbers and get the time it takes to execute it on the cpu and on the gpu and then compare). Make these single calculations as this is what is probably mostly used. You get something like:


float flt_num = 324.234f;

for(unsigned int i = 0; i < 5000000; i++)
{
temp float = i * flt_num;
}

for(unsigned int i = 0; i < 5000000; i++)
{
temp float = call_opencl_multiply_2f((float)i, flt_num);
}


Three: This means diverting resources from the rendering engine to the scripting engine. This might be so small that you won't even notice a difference, but it could also be slowing down the rendering by a large percentage.

The two main problems are support and performance.
If you don't care about supporting older or low-end cards, you only need to run some benchmarks and then decide if you want to use opencl or not.

A sidenote, if mathematics proves such a large performance impact, try to find the bottleneck and fix that first. It is better to remove or shrink the bottleneck then to add another huge part to your scripting engine.

assainator


Thanks for your feedback

One: OpenCL can run on any X86/X64 CPU that support SSE3 and the CPUs supports SSE3 since 2005
See this link for more info and a benchmark
http://www.streamcomputing.eu/blog/2010-12-08/opencl-on-the-cpu-avx-and-sse

Two and Three: I researched on OpenCL only one day and I think when you want use CPU for OpenCL you can use CL_MEM_USE_HOST_PTR flag to say OpenCL to use your array buffer for running your code so there is no overhead for sending data from Ram to GPU ram.

mathematics is the problem for any script languages.

If I run below code in both c++ and Cyrus script,
The script becomes 40 times slower.

float flt_num = 324.234f;

for(unsigned int i = 0; i < 5000000; i++)
{
temp float = i * flt_num;
}


But when I use this code (note the s.Print(); function) the script is only 15% slower than c++ so I think calling a function in the loop do something with CPU which cause to reduce the performance maybe it disable the CPU cache or something else


float flt_num = 324.234f;
string s = "Hello";
for(unsigned int i = 0; i < 5000000; i++)
{
temp float = i * flt_num;
s.Print();
}


So I think I can improve the performance of math calculations with OpenCL.

Share this post


Link to post
Share on other sites
Quote:
Original post by assainator
Info was taken from:
http://www.amd.com/us/products/desktop/graphics/ati-radeon-hd-4000/hd-4350/Pages/ati-radeon-hd-4300-specifications.aspx
Those pages don't make any mention of OpenCL, because they were written *before* AMD/ATI supported OpenCL.

Share this post


Link to post
Share on other sites
@swiftcoder: I thought I could trust the pages of AMD as they produce the cards. I'm sorry that I posted wrong information.

@swiftcode&kochel: Sorry, the articles I read only mentioned the gpu and the not cpu with the sse3 and/or avx instruction set(s). Going to read up tonight on this.

@kochel: It's kinda strange that when you ADD code (s.print()) that the code will run faster...

You could add specific type handling for ints and floats. That when it finds a instruction that operates on ints and floats, it won't call the function as you are doing now, but it will perform the calculation there. This might improve the execution speed as there is less calling. If it is indeed the cache that is creating the problem, this (possible) solution might shrink the problem.
You can also try to set the optimization mode in Visual Studio to minimize size (Properties->Configuration Properties->C++->Optimizations->Optimization Choose 'Minimize Size')

One small side question: Have you also tested the speed in release mode?

I hope this post is more helpful then my previous one.


asssaintor


EDIT:
One other question that popped up after I posted.
Why do you want to use opencl for calculating on the cpu? Isn't it faster to code this yourself as this means more function calling before the actual calculation starts?
If you want to use sse(2/3/s3/4/4.1), just google 'C++ sse tutorial' And you'll find a lot of tutorials and references to use.

assainator

Share this post


Link to post
Share on other sites
I test below code for benchmark


for (int j = 0; j < 1000; j++)
{
for (int i = 0; i < 1000; i++)
{
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
c[i] = a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i] + a[i] * b[i];
}
}


Here is the result

time with OpenCL = 119 ms
time with c++ = 329 ms
time with Cyrus script = 84748 ms

When I used simpler code to execute the c++ was 23 times faster than OpenCL but when I use a more complicated code c++ becomes more slow and OpenCL dose not change much.
For example c++ time becomes 329 ms from 5 ms but OpenCL becomes 119 ms from 115 ms

Share this post


Link to post
Share on other sites
Well it could be that the OpenCL compiler does more optimizations then the C++ one.
Essentially you are making the same calculation 20~25 times (I didn't count them)
And within these calculation, you do the same calculation, so the compiler could boil this down to:


for(int j = 0; j < 1000; j++)
{
for(int i = 0; i < 1000; i++)
{
c[i] = (a[i] * b[i]) * 5;
}
}


It can even optimize it further to the following but I don't think that it will happen:


for(int i = 0; i < 1000; i++)
{
c[i] = (a[i] * b[i]) * 5;
}


You should try to find a way in which you never do the same calculation, you could try:

for(unsigned int j = 0; j < 1000; j ++)
{
for(unsigned int i = 0; i < 1000; i++)
{
c[i] = i-(j+500) / ( ( a[i] / i ) * b[i] ) + (j-i)*2 - ((i*3)/4);
}
}

This would also give you a more complete benchmark as divisions are more difficult for a cpu/gpu then additions.

I'm not saying that OpenCL can't be faster, but I'm just trying to point out that I see some problems in you way of doing a benchmark.
The thing is, OpenCL uses sse3, sse3 allows OpenCL to do multiple calculations at once. OpenCL can do up to 4 calculation at once because it is using sse3. Therefor, OpenCL is faster at large calculation in which you basicly do the same. But when doing complex calculations in which you barely can do the same arithmetic at once, sse3 can't be used that much anymore.

And (yet again) another question. Do you plan on running whole scripts in OpenCL or only the calculations? If you only want to do calculations in OpenCL you should try something like this:
unsigned int openclstart = GetTime(); //fill your method of getting time here
for(unsigned int i = 0; i < 1000; i++)
{
//do addition
c[i] = call_opencl_add_func(a[i], b[i]);

//do multiply
c[i] = call_opencl_mul_func(a[i], b[i]);

//do divide
c[i] = call_opencl_div_func(a[i], b[i]);

//do subtract
c[i] = call_opencl_sub_func(a[i], b[i]);
}

unsigned int openclend, cppstart;
openclend = cppstart = GetTime(); //again

for(unsigned int i = 0; i < 1000; i++)
{
c[i] = a[i] + b[i];
c[i] = a[i] * b[i];
c[i] = a[i] / b[i];
c[i] = a[i] - b[i];
}


unsigned int cppend = GetTime();


//Same for Cyrusscript here



unsigned int opencl_time = openclend - openclstart;
unsigned int cpp_time =cppend - cppstart;
//same for cyrrusscript





This because chances are small you will ever do calculations on whole buffers in cyruscript

I hope this helped.

assainator

[Edited by - assainator on January 4, 2011 1:03:56 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by assainator
@kochel: It's kinda strange that when you ADD code (s.print()) that the code will run faster...

It dose not run faster. Actually c++ becomes very slower and the speed ratio becomes 0.15x faster than script from 20 times faster than before.

Quote:

You could add specific type handling for ints and floats. That when it finds a instruction that operates on ints and floats, it won't call the function as you are doing now, but it will perform the calculation there. This might improve the execution speed as there is less calling.

I don't get you. Can you please explain more how I can perform the calculation there?

Quote:
If it is indeed the cache that is creating the problem, this (possible) solution might shrink the problem.
You can also try to set the optimization mode in Visual Studio to minimize size (Properties->Configuration Properties->C++->Optimizations->Optimization Choose 'Minimize Size')

I'm not sure if catching is the problem the only thing that I know is that every script language are slow in calculations I tested Cyrus script against another script and c++.
Cyrus script was 40 times slower than c++ in calculations and was 6 times faster than the other script. This problem is not only the Cyrus script problem but I want find out a way to solve this problem or improve the speed.

Quote:

One small side question: Have you also tested the speed in release mode?

No not yet :D

Quote:

I hope this post is more helpful then my previous one.


asssaintor

Thank you very much your posts are very helpful to me and they helped me too much

Quote:
EDIT:
One other question that popped up after I posted.
Why do you want to use opencl for calculating on the cpu? Isn't it faster to code this yourself as this means more function calling before the actual calculation starts?
If you want to use sse(2/3/s3/4/4.1), just google 'C++ sse tutorial' And you'll find a lot of tutorials and references to use.

assainator

I want to make an interface to let Cyrus script users use OpenCL in their scripts so if they want to do some calculations in script they have a faster way to do it.

I want to thank you for your posts one more time

[Edited by - Kochol on January 4, 2011 3:40:13 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by assainator
Well it could be that the OpenCL compiler does more optimizations then the C++ one.
Essentially you are making the same calculation 20~25 times (I didn't count them)
And within these calculation, you do the same calculation, so the compiler could boil this down to:


for(int j = 0; j < 1000; j++)
{
for(int i = 0; i < 1000; i++)
{
c[i] = (a[i] * b[i]) * 5;
}
}


It can even optimize it further to the following but I don't think that it will happen:


for(int i = 0; i < 1000; i++)
{
c[i] = (a[i] * b[i]) * 5;
}


You should try to find a way in which you never do the same calculation, you could try:

for(unsigned int j = 0; j < 1000; j ++)
{
for(unsigned int i = 0; i < 1000; i++)
{
c[i] = i-(j+500) / ( ( a[i] / i ) * b[i] ) + (j-i)*2 - ((i*3)/4);
}
}

This would also give you a more complete benchmark as divisions are more difficult for a cpu/gpu then additions.

I'm not saying that OpenCL can't be faster, but I'm just trying to point out that I see some problems in you way of doing a benchmark.
The thing is, OpenCL uses sse3, sse3 allows OpenCL to do multiple calculations at once. OpenCL can do up to 4 calculation at once because it is using sse3. Therefor, OpenCL is faster at large calculation in which you basicly do the same. But when doing complex calculations in which you barely can do the same arithmetic at once, sse3 can't be used that much anymore.

And (yet again) another question. Do you plan on running whole scripts in OpenCL or only the calculations? If you only want to do calculations in OpenCL you should try something like this:
*** Source Snippet Removed ***
This because chances are small you will ever do calculations on whole buffers in cyruscript

I hope this helped.

assainator

Yes you are right my benchmark was wrong I bench marked OpenCL with your code 100000000 times and OpenCL takes 10 sec and c++ takes 2 sec to calculate it.
Maybe there is a way to speed up it in OpenCL that I didn't discover it yet

But it is very faster than script yet :D

Thanks for your help

Share this post


Link to post
Share on other sites
Quote:
Original post by Kochol
Quote:
Original post by assainator
@kochel: It's kinda strange that when you ADD code (s.print()) that the code will run faster...

It dose not run faster. Actually c++ becomes very slower and the speed ratio becomes 0.15x faster than script from 20 times faster than before.

Sorry, then I misunderstood.

Quote:

Quote:

You could add specific type handling for ints and floats. That when it finds a instruction that operates on ints and floats, it won't call the function as you are doing now, but it will perform the calculation there. This might improve the execution speed as there is less calling.

I don't get you. Can you please explain more how I can perform the calculation there?

Well, create another commandType like 'ETC_CALCULATION', a way to indicate the kind of calculation (add, sub, etc.) and two pointers to parameters. Once the command gets processed, just calculate the result of the parameters and store it in the memory or the pool.


Quote:

Quote:

One small side question: Have you also tested the speed in release mode?


No not yet :D

This could be THE reason Cyrus script slows down so much, the compiler or linker could add a lot of extra code in your code to allow debugging of the code. Without this code (thus in release mode) Cyrus script could run a lot faster.

Quote:

Quote:

I hope this post is more helpful then my previous one.


asssaintor

Thank you very much your posts are very helpful to me and they helped me too much

Quote:
EDIT:
One other question that popped up after I posted.
Why do you want to use opencl for calculating on the cpu? Isn't it faster to code this yourself as this means more function calling before the actual calculation starts?
If you want to use sse(2/3/s3/4/4.1), just google 'C++ sse tutorial' And you'll find a lot of tutorials and references to use.

assainator

I want to make an interface to let Cyrus script users use OpenCL in their scripts so if they want to do some calculations in script they have a faster way to do it.

Why do you want to allow users to access OpenCL? All the major calculation should happen inside your engine/framework/game. You could add in scripting engine support for vectors and matrices to speed up this type of calculations.

Quote:
Yes you are right my benchmark was wrong I bench marked OpenCL with your code 100000000 times and OpenCL takes 10 sec and c++ takes 2 sec to calculate it.
Maybe there is a way to speed up it in OpenCL that I didn't discover it yet

The thing is, even though the data/parameters don't have to be send to the GPU, there is still some overhead involved as you need quite a lot of (OpenCL internal) function calls to set the parameters, run the OpenCL kernel and then get the results back.

A good thing is to first get everything basic running in you Scripting engine, once this is done, you can start optimizing/improving it.
With everything basic I mean things Functions and conditional jumps. If you start optimizing now, there is a chance you will have a super-optimized PIECE of your scripting engine, but nothing whole.

Quote:


I want to thank you for your posts one more time

Thanks!

assainator

Share this post


Link to post
Share on other sites

This topic is 2535 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this