Finalspace

Members
  • Content count

    318
  • Joined

  • Last visited

Community Reputation

1135 Excellent

About Finalspace

  • Rank
    Member

Personal Information

  • Location
    Germany
  • Interests
    |programmer|
  1. C++

    You guessed right -> i renamed AddWork to AddTask - but forgot to update the implementation. Well, debugging threading code is mostly hard, but in this case it was easy. The main thread was waiting forever for the tasks to be finished, but even when the tasks was finished (pendingCount == 0) it wont get signaled properly. I tried a lot, like decrement the pendingCount and signal it always - the waiting is looped anyway, but this has not worked either. Now i use a spinlock instead and this works perfectly: void ThreadPool::WaitUntilDone() { queueSignal.notify_all(); while (pendingCount > 0) { std::this_thread::sleep_for(std::chrono::microseconds(100)); } } void ThreadPool::WorkerThreadProc() { ThreadPoolTask task; while (!stopped) { { std::unique_lock<std::mutex> lock(queueMutex); while (queue.empty()) { queueSignal.wait(lock); } task = queue.front(); queue.pop_front(); } task.func(task.startIndex, task.endIndex, task.deltaTime); --pendingCount; } }
  2. I have written a mathy simulation and introduced parallel computing using the available cpu cores - based on a simple thread pool written in C++/11. At the very start it seemed to work, but then it starts to freeze reproducable after some frames - but sometimes it works just fine. But i have absolutly no clue why it wont work... Can someone can look into that? Its not that much code. I looking at the code since hours and cannot figure out what is wrong... Here is the code: #include <thread> #include <mutex> #include <condition_variable> #include <vector> #include <deque> #include <atomic> struct ThreadPoolTask { size_t startIndex; size_t endIndex; float deltaTime; uint8_t padding0[4]; std::function<void(const size_t, const size_t, const float)> func; }; struct ThreadPool { std::atomic<bool> stopped; std::vector<std::thread> threads; std::atomic<int> pendingCount; std::deque<ThreadPoolTask> queue; std::mutex queueMutex; std::mutex completeMutex; std::condition_variable queueSignal; std::condition_variable completeSignal; ThreadPool(const size_t threadCount = std::thread::hardware_concurrency()); ~ThreadPool(); void AddTask(const ThreadPoolTask &task); void WaitUntilDone(); void WorkerThreadProc(); void CreateTasks(const size_t itemCount, const std::function<void(const size_t, const size_t, const float)> &function, const float deltaTime); }; ThreadPool::ThreadPool(const size_t threadCount) : stopped(false), pendingCount(0) { for (size_t workerIndex = 0; workerIndex < threadCount; ++workerIndex) { threads.push_back(std::thread(&ThreadPool::WorkerThreadProc, this)); } } ThreadPool::~ThreadPool() { stopped = true; queueSignal.notify_all(); for (size_t workerIndex = 0; workerIndex < threads.size(); ++workerIndex) threads[workerIndex].join(); } void ThreadPool::AddWork(const ThreadPoolTask &entry) { { std::unique_lock<std::mutex> lock(queueMutex); queue.push_back(entry); } pendingCount++; } void ThreadPool::WaitUntilDone() { queueSignal.notify_all(); std::unique_lock<std::mutex> lock(completeMutex); while (pendingCount > 0) { completeSignal.wait(lock); } } void ThreadPool::WorkerThreadProc() { ThreadPoolTask group; while (!stopped) { { std::unique_lock<std::mutex> lock(queueMutex); while (queue.empty()) { queueSignal.wait(lock); } group = queue.front(); queue.pop_front(); } group.func(group.startIndex, group.endIndex, group.deltaTime); if (--pendingCount == 0) { completeSignal.notify_one(); } } } void ThreadPool::CreateTasks(const size_t itemCount, const std::function<void(const size_t, const size_t, const float)> &function, const float deltaTime) { if (itemCount > 0) { const size_t itemsPerTask = std::max((size_t)1, itemCount / threads.size()); ThreadPoolTask task = {}; task.func = function; task.deltaTime = deltaTime; for (size_t itemIndex = 0; itemIndex < itemCount; itemIndex += itemsPerTask) { task.startIndex = itemIndex; task.endIndex = std::min(itemIndex + itemsPerTask - 1, itemCount - 1); AddTask(task); } } } void main() { workerPool.CreateTasks(itemCount, [=](const size_t startIndex, const size_t endIndex, const float deltaTime) { this->DoingSomeMathyWork(startIndex, endIndex, deltaTime); }, deltaTime); workerPool.WaitUntilDone(); }
  3. Well if you want simulate stars and such, pull and push each other - there is a well known technique for that, which was originally designed for astro-physics: https://en.wikipedia.org/wiki/Smoothed-particle_hydrodynamics There are dozens of papers how to implement, but the core idea is always the same: - Each Particle uses its surrounding neighbor particles to compute a density based on their accumulated weighted distances using a distance approximation (smoothing kernel). - Based on that particle density you can compute a force to pull or push particles apart. To get reasonable performance for that N-body simulation, you have may use a spatial grid to sort the particles into and to determine the neighbors much faster. Also you can parallelize it to improve performance more drastically, but best situation is when you can compute everything on the GPU.
  4. No settings, i was just building the target "Development Editor" - so you can run the editor directly from the IDE. And i most likely wont use the UE4 source again, except for looking up some things. I was just trying out the NVIDIA Flex thing. Looks like unity build is exactly what i am talking about, putting multiple source files in bigger translation units and compile those instead. Next time i am working with a bigger project, i will definitily look into this. Oh my gosh, one day for a full optimized build? Insanity. But i am not a professional game developer, so maybe thats normal these days?
  5. Yes its true it may only be usuable for small to medium size projects, because it will only work with full recompiles. Also while developing your software architecture you may end up recompiling everything anyway - especially when you dont know the full architecture yet and fiddle with the entire code base. I agree for profen stable code which do not change a lot, this can live in its own translation units. But i still think doing a full compilation for one half an hour on a modern computer is unacceptable - even for large projects like unreal engine 4.
  6. Yes is true that the language do not force you to a particular organization scheme, but 99% of all c++ applications are composed of thousands of small .cpp files which all gets compiled to translation units separately. This process is much much slower than compiling just a couple of translation units. Compiling one giant translation unit file including tons of other cpp files directly are much faster than compiling each file separatly. Its the same as when you upload thousands of small image files to your web storage - its painfully slow, even when you upload 3-4 images at once. But uploading just a single compressed archive containing all image files is a lot quicker. The only reason why you want to prevent large translation units is because of some size limitations of the compiler itself, but i am not sure about this. I am pretty confident that you can build applications much much faster when you just have one translation unit for each library/executable. - Guard all .cpp files with a ifndef block like this: #include "physics.h" #ifndef PHYSICS_IMPLEMENTATION #define PHYSICS_IMPLEMENTATION // ... #endif //PHYSICS_IMPLEMENTATION - In the main translation unit for the executable or library: // All source files are included directly in this translation unit once + only // The order is important, if physics for example uses rendering you have to include rendering first. // If rendering requires physics, you have to add another layer between rendering and physics #include "rendering.cpp" #include "physics.cpp" #include "audio.cpp" // STB Truetype does not include the implementation automatically, you have to set this constant before including the header file #define STB_TRUETYPE_IMPLEMENTATION #include "stb_truetype.h" #include "assets.cpp" // .. - Setup your IDE/Editor that it will compile the main translation unit only. In visual studio you change the item type for every .cpp file to C/C++ header. Thats is all you may need to get compilation done much faster. Try it out. The only downside of this method, you have to keep the order and do not include .cpp files into other .cpp files directly - except for the main translation unit. And yes, making heavy use of templates also increases compile time drastically. Thats the reason why i use them very rarely - mostly for containers like pools and hash tables or to replace nasty macros.
  7. I hear a lot about QTCreator, especially on linux. For me it was crashing all the time, while trying to port over fluid sandbox to linux, but maybe it runs on your side? But to solve your actual problem, please use git - its very great for source code files. Nowadays its highly important to use some sort of a version control system, even for private projects! Just because you never know when your IDE or operating system are crashing. Also doing manual or automatic backups are much slower and requires more space than git - because git stores the delta only. To get git working, you just need a git command line - thats it. Its not that hard and it will help you prevent data losses in the future.
  8. I am talking about compiling the entire engine of course, which is required when you want to use nvidia flex or other nvidia stuff in it. Sure its a complicated thing, but any application which requires more than ~3-5 minutes to full compile is a no go. Complexity is no excuse for this. The only think i would accept in terms of increasing compile times, when there are some of asset preprocessing going on, but when i look at this compilation output - its just cpp everywhere.
  9. Hmm, thats weird - i had compiled it on my dev rig too, but it was just 50% faster. Also the 28.72 seconds are wrong... its minutes! 34>Total build time: 28,72 seconds (Local executor: 27,69 seconds)
  10. I cannot edit the initial post, here are the actual question: Why is it so slow? My answer: - Its just too much c++ files going on (11213 .cpp and 36520 .h files on unreal engine 4 flex edition) -> Resulting in too many translation units. - Not even sure if there are .obj file caching works, i see a lot of files compiled multiple times... - Visual studio IDE slows down the compiler - My media center i had compiled it on is very slow (i5, source was stored on a non-ssd drive)
  11. Today i had created a new thread and wanted to change some text in it - but i cannot edit it even though i am logged in... Is that intented? *Edit: Seems that it does work for this thread, but not for my other one... weird:
  12. This is insane! I am compiling unreal engine 4 in visual studio 2017 (nvidia flex for 4.16) since one hours and its still not done. Sure its not the fastest computer in the world (i5 2500k running at 3,4 Ghz with 4 cores, 16 gig ram, GTX 1060 6 GB). I dont understand why modern applications are not built in a way, so that you compile one translation unit for each library/application and thats it - just including the cpp files directly and use a "implementation" sheme. As i can see from the compiling output even with paralell compiling and include file caching its that slow -.- Seriously, c++ is a great language - but the way how c++ source are composes (Header and source splitting) are totally broken in all ways. Look at the compile output. Its absoluty nuts, including the fast that this output is larger than pastebin allows -.- http://root.xenorate.com/final/ue4_16_flex_first_compile_insane.txt Done: 48>Total build time: 52,91 seconds (Local executor: 51,45 seconds) Insane... nothing more to say. Just to see i will compile it on my i7 rig (4 ghz, 8 cores, 16 gigs ram, gtx 970) too.