Jump to content
  • Advertisement

CulDeVu

Member
  • Content Count

    122
  • Joined

  • Last visited

Community Reputation

3038 Excellent

About CulDeVu

  • Rank
    Crossbones+

Personal Information

Recent Profile Visitors

15337 profile views
  1. CulDeVu

    Learning Global Illumination

    I've been slowly getting my feet wet with global illumination and other, non-realtime rendering methods for a change
  2. You can find the original post here. This version has been formatted to fit your scr-- I mean, formatted for GameDev.net . In 2009 engineers from the University of Wisconsin released a piece of software called PEAT. This software has been used in medicine since as a tool to diagnose patients with photosensitive epilepsy. This is a disease where strobing lights cause seizures, and it currently has no cure. Things that trigger it include lightning, flashing sirens, flickering lights, and other similar strobe effects. Other triggers that seem unrelated, but are also very cyclic in nature, include rapidly changing images, and even certain stationary stripes and checkerboard patterns. This particular form of epilepsy affects roughly 1 in every 10,000 people, and around 3.5 million people worldwide. . Despite this number, however, sufferers have to take precautions on their own. Nearly every movie has lightning in it, or camera flashes, and videos online are often a lot worse. With the exception of UK television broadcasts (which are regulated), there are no real preventative measures to combat epileptic events. . The next little mini-series of posts talks about my development of a sort of real-time epilepsy protector: a program that watches the current output of a computer monitor and detects when epileptic events may be possible and then blocks the offending region of screen from further flashing. During development, all suggestions are very welcome, as I'm probably going to go about everything in the exact wrong way. . Without further ado, let's jump right in. . W3C definition . The W3C spec for Web Accessibility is the spec that PEAT is implemented from. It defines two types of ways that an epileptic event can occur: general flashes and red flashes. For the rest of this post I'll only be dealing with general flashes, as I want to get the general case out of the way first. . So the actual spec is kinda confusing. It mixes units, defines thresholds for passing instead of failing, and tries to mix approximations and measurements. I'm going to take creative liberties to fix these issues and add new terms for further clarity and without loss of meaning of the actual spec. . Taking it all apart and extracting the actual meaning, an epileptic event can be triggered when: There are more than 3 "general flashes" in a 1 second interval A general flash is defined as: 25% of the pixels in a 0.024 steradian area on the screen, relative to the viewer, all experience the same "frame flash" at the same time A frame flash is defined per pixel as what happens when: There is a decrease followed by an increase, or an increase followed by a decrease in the histogram of luminance for a single pixel. Each increase and decrease satisfy the following: The minimum luminance in the histogram over the time-span of the increase or decrease is less than 0.8 The amount of increase or decrease of luminance in the histogram must be at least 10% of the maximum luminance in the "range" of the increase or decrease So an epileptic event can be triggered when many pixels agree that a flash happened at the same time. The W3C spec says "concurrently," so I'm taking that to mean "the same frame," even though differing by 1 frame probably isn't discernable to the eye. . So you could create a function that takes only a histogram of luminance for a single pixel and be able to get out the frame flashes for that pixel. In fact, all we need to know about the frame flashes of each pixel is when they occur, so that it can be used to determine how many happen concurrently. This is the only thing that is stored about them in the program. . There are other details in the W3C spec that I'm ignoring: Very "fine" transitions, like 1px x 1px checkerboards alternating colors every frame are fine, because the individual transitions are not discernable to the human eye. This is a complicated one to deal with, for many reasons. However, if you ignore it, you fall prey to detecting flashes in videos of light fog moving, where a lot of the transparency comes from stochastic opacity, or from grain filters. I might address this later. It gives some approximations of the areas on the screen you should be looking at for a viewing distance of 22-26 inches with a 15-17 inch screen with resolution 1024x768. This is far too specific to try to generalize to any screen, so I'm going to be doing all the calculations on my own. I'm also imposing my own restrictions on the definitions above. I have no idea if they're correct, but they make the algorithms more approachable (or rather, it makes the problem less ambiguous) The time-span of the increases and decreases of each frame flash cannot overlap between frame flashes in a single pixel A general flash is counted if 25% or more of a 0.024 steriadian viewing area experiences a common frame flash. The spec didn't specify, but this is the only thing that makes sense I'm only going to be testing fixed areas of 0.024 steradians, instead of trying to find every area that satisfies the conditions in the spec. So, just as a review of everything, the general pipeline of figuring out if an epileptic event is possible in the past second:(Sample phase) Create histogram of luminance over the past second of every sampled pixel on the screen(Frame Flash phase) For every pixel: Create a list of all frame flashes by looking at the peaks of their histogram(Gather phase) For each frame: Try to find a 0.024 steradian area where 25% or more pixels experienced a frame flash on that frame If such an area is found, increment the general flash counter by 1If the general flash counter is greater than 3, an epileptic event is possible Since epileptic events are possible when the general flash counter is greater than 3, protective measures are going to be triggered when the counter is equal to 3. Also, I don't necessarily want to just know if an epileptic event is possible, what I want to do is block it out. . In order to break everything up into multiple parts, I'll tackle each stage of the algorithm in order of what needs clarification the most. I'll cover the Frame Flash Detection phase here, and the Sample and Gather phases in another post. . Frame Flash Detection . So what we need to do is, given a histogram of luminance for a single pixel over the course of 1 second, we need to find every frame that contains a frame flash. However, the big problem is that, most of the time, flashes happen over the course of several frames. So, we need to find a way to not only find the flashes but also associate an entire flash with a single number. Recall the definition of a frame flash, re-written here: There is a decrease followed by an increase, or an increase followed by a decrease in the histogram of luminance for a single pixel. Each increase and decrease satisfy the following: The minimum luminance in the histogram over the time-span of the increase or decrease is less than 0.8 The amount of increase or decrease of luminance in the histogram must be at least 10% of the maximum luminance in the "range" of the increase or decrease This definition took me 2 or 3 different tries to implement correctly, and I may still be wrong. I blame how cryptic the W3C spec is about this. . So this is going to be implemented exactly how it's stated. We are going to keep a record of all of the increases and decreases that each satisfy the above constraints, and then match pairs of adjacent increases and decreases. For an example, consider the histogram of luminance: . . In order to find increases and decreases, we're going to look at every local extrema, compare it to the first sample, and see if the difference fulfills the above constraints. If it does, we record the result and now use that sample as the sample to compare future samples against. The code goes something like this:int extrema[NUM_SAMPLES];transition pairs[NUM_SAMPLES];int numPairs = 0;int lastExtrema = lumHistogram[(sampleCounter + 1) % NUM_SAMPLES];for (int i = 1; i < NUM_SAMPLES - 1; ++i){ int prevI = ...; int currI = ...; int postI = ...; float x_prev = lumHistogram[prevI]; float x_curr = lumHistogram[currI]; float x_post = lumHistogram[postI]; float dx1 = x_curr - x_prev; float dx2 = x_curr - x_post; // test for local extrema if ((dx1 > 0) == (dx2 > 0) && (dx1 != 0) && (dx2 != 0)) { float dx = x_curr - lastExtrema; if (0.1 0.8) continue; lastExtrema = x_curr; if (dx1 < 0) pairs[numPairs] = DECREASE; else pairs[numPairs] = INCREASE; extrema[numPairs] = i; ++numPairs; }} . The extrema[numPairs] = i; part is the part that stores the location of the "frame" of each transition. When it comes time to look for adjacent increases and decreases, you can just pick one of the two "frames" and call that the exact frame of the flash, taking care of that particular problem. . Now that we have the frame flashes found, we need to store them in a way that makes it easy on the next stage to use the info, the Gather stage. I'm going to cover that next time, but the general way I'm doing it is by making a giant array of distributions, with each distribution corresponding to a single Gather stage chunk. The specific part of the array is passed to the function so that each sampled pixel can contribute to it. This makes the gather stage really simple. . The nice thing about this method of finding frame flashes is that it doesn't matter if you traverse backwards or forwards through the histogram; as long as you're consistent then it'll work well. . Conclusion . Next time I'll go over how the Gather and Sample stages work. Until then, I'll be continuously working on the project. There are still some to-do's, even with the just the Frame Flash phase. Some of these include: Multithreading each sampled pixel, or maybe porting to the GPU. I don't want to port to CUDA because that technology isn't very universal Analyzing the pixels adjacent to the current pixel, so as to maybe combat things like grain filters and clouds triggering the detector. More stability improvements. The program doesn't pick up on a lot of really obvious cases, especially if they happen in small areas, like epilepsy gifs on Google. I feel like if I could sample at a finer granularity, that problem would go away. This ties back in to multithreading, because reducing the granularity would lead to achieving enough throughput to be able to process that extra granularity. For the whole program, minimizing to tray and some other polishing touches. Hopefully these will get resolved in some way or another sometime soon. The current project will be hosted as soon as I'm confident that Nopelepsy v0.1.0 is okay to show the world. Thanks for reading, and stay tuned for next time!
  3. Despite how important of a topic good importance sampling is in the area of global illumination, it's usually left out of all common-English conversations and tutorials about path tracing (with the exception of the Lambertian case, which is a good introduction to importance sampling). If you want to get into microfacet importance sampling, multiple importance sampling, or even, God forbid, Metropolis light transport, you're on your own. And, probably for a good reason. At some point in a path tracing project, you WILL have to butt heads with the math. And believe me, the head that the math rears is ugly. Beautiful in design, but large and ugly all the same. I'm going to try to correct that, at least for microfacet importance sampling. Present just enough math to get the point across what *needs* to be understood to implement everything correctly, with minimal pain. Choosing the Microfacet Terms The Cook-Torrance Microfacet BRDF for specular reflections comes in the form , where F is the Fresnel term, D is the microfacet distribution term, and G is the microfacet shadowing term. This is great and all, as it allows you to basically stop and swap different terms in as much as you want. Realtime graphics use that property to be able to simplify a BRDF into something pretty but efficient, by mixing and matching models with terms that cancel out. However, there is danger in this though. Choosing terms purely because they are less complex, and therefore more efficient, can lead to completely wrong results. As a case in point, let me show the difference between the implicit Smith G and the Beckmann with Smith G. For reference: For future reference, the Beckmann G1 listed is an approximation to the actual Beckmann shadowing term. The alpha_b is a tunable parameter that controls the roughness of the material. I usually just set that term to the Cook-Torrance roughness value. These are the results, after importance sampling for 500 samples: Top: Implicit G1, roughness 0.05, 500 samples of importance sampling Bottom: Beckmann G1, roughness 0.05, 500 samples of importance sampling As you can see, the Implicit G suffers greatly for its simplicity. While it calculates color noticeably faster, it does so by ignoring the cases that need extra work to represent, like really low roughness values. It approximates the mid-range fairly well, but in doing so butchers the tails. So take this example to heart: for a path tracer, don't be afraid to sacrifice a little speed for correct color, especially because people use pathtracers because they're willing to wait for good images. So. What terms should be chosen? Well, my personal favorite $$ D $$ and $$ G $$ are the Beckmann ones, so this post is going to be based off of those. A more complete description of the *whys* of all of this can be found [here](https://www.cs.cornell.edu/~srm/publications/EGSR07-btdf.pdf), but as titled, this is the "For Dummies" version, and re-stating everything found there would be a massive waste of time. The Fresnel term is taken from Cook-Torrance: The Distribution term is from Beckmann: The Geometry term is a little trickier. It involves the erf(x) function, which can be very expensive to calculate. There's an approximation of it, though, that's very good, shown again here: The only things left to clarify is alpha_b term, which is just the roughness of the material, and the chi^+ function, which enforces positivity: Importance Sampling So to importance sample the microfacet model shown above involves a lot of integrals. So many that I'll probably end up getting LaTex-pinky from typing all the brackets. So I'm just going to give the short version. Because it'd be damn near impossible to try to importance sample the entire microfacet model, with all of its terms together, the strategy is to importance sample the distribution term to get a microfacet normal, and then transform everything from all vectors relative to the normal to relative to the origin. Given 2 uniform random variables, zeta1 and zeta2, generate a microfacet normal with the angles (relative to the surface normal): Note how the phi_m term is can spin freely around the normal. This is the power that sampling only the D function has. It follows, then, that by reversing the age-old vector-reflectance equation, you get the outbound (or inbound, or whatever) light ray: Now all that's left to do, with the exception of a few caveats that I will get to in a minute, is to define the probability distribution function for this setup: There! That's all that's needed! If all you want to do is render really shiny metals, at least. However, there's some unfinished business we have to attend to concerning what to do with the un-reflected light, the amount that the Fresnel term takes away. First, though, let's look at some images. Intermediate Results [Above: Sponza atrium with a floor with Cook-Torrance roughness of 0.001, 500 samples per pixel] What's impressive about the above image is that that isn't a mirror BRDF. It's the microfacet BRDF described above, importance sampled. This amount of image clarity for a floor with a roughness that small, and at only 500 samples, is something that would have only been achieved with a special-case in the code for materials like that. However, with importance sampling, you get it all for free! Extending it With Lambertian Diffuse The Cook-Torrance BRDF is really nice and all, but it's only a specular BRDF after all. We need to fuse it with a Lambertian BRDF to finish everything up nicely. The form the BRDF takes after the fuse is: This makes a lot of sense too. A Fresnel Term describes how the ratio between reflected and transmitted signals, and a diffuse reflection is just a really short-range refraction. Now there's an issue, and it's a big one. The density function generated above describes the distribution of outbound rays based off of the D microfacet distribution. To convert between a microfacet distribution and an outbound ray distribution, it undergoes a change of base via a Jacobian. Now that Jacobian is analytically derived from the microfacet BRDF definition itself, so to integrate the diffuse term into it (a term which has no microfacets but still changes the shape of the BRDF drastically) it'd have to be re-calculated. And this is a re-occurring problem. There are papers describing every way to couple this with that. Some of them are about coupling matte diffuse with microfacet specular, some describe how to couple transmissive plus specular, some specular with specular. And at the end of the day, these are all specialized cases of different material properties coexisting on a surface. Until it's taken care of, the problem of mixing and matching BRDFs will never go away. I don't know how other path tracers deal with this issue, but I won't stand for it. For my path tracer, I chose to go the way that Arnold does, by using the idea of infinitesimal layers of materials. Each layer of the material can, by using the Fresnel function we already have, reflect an amount of light, and then transmit the rest down to the next layer down. This will happen for every layer down until it hits the final layer: either a diffuse or transmission layer, which will use the rest of the remaining energy. Thus, the problem of coupling BRDFs was solved by thinking about the problem a little differently. If you think about it, the above algorithm can be simplified using Russian Roulette, and so the final algorithm becomes something like this:[code=:0]Generate random number r1for each layer in material: determine the new half-vector (or microfacet normal) f = Fresnel function using that half-vector if (r1 > f): r1 -= f continue calculate outbound ray generate BRDF for current layer and inbound and outbound rays generate pdf for outbound ray sample and accumulate as normal break The only thing left to do is get rid of the Fresnel term from the microfacet BRDF and the (1 - F) from in front of the Lambertian term: it won't be of any more use to us right there, as that calculation is handled implicitly now. That function now serves as a Russian Roulette mechanism to switch between layers, and is no longer needed to scale the energy of the microfacet BRDF, as the D and G term are perfectly capable of doing exactly that on their own. One more pic to show it in action: [Above: Floor with layered specular and diffuse. 500 samples per pixel] That concludes my write-up of Microfacet Importance Sampling: For Dummies edition. With luck, hopefully implementing it will be as enjoyable for you as it was for me. Happy rendering, and may your dividends be forever non-zero! :D
  4. So. Back at this. First things first. You can safely disregard most everything I said last post. All of that was very observation-based, and is actually incorrect in almost every way. Which might seem strange, cause most of it made sense. For clarity and completeness, though, here's a list of why everything last post was wrong: The idea that a fold is a collection of fold segments in order starts falling apart after you do certain folds, like petal folds. The idea that you can keep track of where they are and which way they're going would be way to hard, and you could conceivably connect everything under one fold, since all origami crease diagrams are Hamiltonian. The idea that folds only reflect is another screwy one. It falls apart when you have more than 4 fold segments touching a vertex The fold tree was a nice idea, but constructions using it can only create a subset of all origami, so it's not what I'm looking for Notable things that I'm keeping un-debunked: Fold segments are straight lines that represent part of a fold. They come in 2 flavors: mountain and valley. The ends of the segments either end at an edge of the paper or at the end of another fold segment. These places of union are called vertexes and the graph of all fold segments on the paper, including the edges of the paper, is called a fold diagram All flat folded origami is non-directed and complete If you sum up the amount of mountain and valley fold segments coming out of any vertex, they will always differ by two. I don't really know why it's like that, but it definitely is. Aight. I think that's enough talking about the past. Math time. . Flat Folding . First of all: Flat folding. In a physical sense, this is a type of origami that can be folded flat, pressed between the pages of a book. So, for example, a paper crane without the wings. . So if you make a flat folded origami, and you take a pin and poke a hole completely through it, through every layer, and open it up, you'll see that there are multiple holes in the paper. In essence, many points on the origami paper map to the same point in 3D space. These points also map perfectly to reflections along the axes of the creases themselves. Here's a picture to illustrate: . . [in this image, the red points all map to the same point in 3D space, and the green points all map to the same point as well] . Though that was probably pretty obvious. However, I discovered a couple facts that are not so obvious I think. The first is that, for flat folded origami, when you reflect a point around creases like in the above image, one by one, if the path you take ever ends up in the face that you started with, you'll have the exact same point you started with. In other words, no matter what face you're viewing from, the the original point you start out with will always be equal to itself, relative to its starting face. Second is that this property is not present in all fold diagrams. See the images below for further explanation. . [as you can see in this image, the path that a point takes on this fold diagram, even though you can take two very different paths back, you will still always end up at the same point again] . [in this example, however, the path that the point took back to the original face did not correspond to the same starting point. Likewise, you cannot possibly fold a piece of paper along these lines] . Well if only certain fold diagrams have this property, how do we tell which ones do? How do we prove correctness? It's hard to tell just by looking at all the folds without physically folding it yourself, so it's useful to look at a really simple case: paths going clockwise around a point where fold segments intersect. Because reflections around multiple edges rooted at a same point preserves radial distance [see image], you only have to keep track of the angles of the fold segments and the angle of v. In this case, the transformation around each fold segment, in order, would result in the same point, or the same angle plus 2 * pi. . . . From that: . . The case of n begin odd can be thrown out because, as you see, it's only true for specific angles (specific offsets epsilon from a1), but the definition above states that it should be true for EVERY angle around the vertex. . (as a side note, this is a variant of Kawasaki's Theorem, but since I came up with it independently, along with everything else described here, I'm claiming all credit for my own variant) . This is a very nice equation. It allows us to solve for any missing angle if one is needed, and also acts as a really simple validator. You can check each vertex one by one with this equation to check the simple case of correctness-by-rotating-around. However, that is a very simple case. Paths can take any form, but they all need to exhibit the property listed above, not just paths spinning around vertexes. . However, each vertex is connected to each other by the very fold segments that we're rotating around by. When you transfer a pt from one vertex to another by just connecting the two via a fold segment, pt doesn't change (pt stays in the same face, it's just the origin is changing). Since all fold diagrams are complete anyway, you can connect any vertex to any other by following the path from one to the next, rotating around each vertex as you go. Thus, it can be proven that a fold diagram can be flat folded by simply checking each vertex and seeing if it conforms to the above theorem. . As a caveat, this doesn't ensure that there are no self-intersections. I don't think there is a way to both achieve interactive framerates and also ensure no self-intersection, but I'll get back to you on that. . Yup. So that's that. . Implementation . So the implementation of this is fairly easy. . It's tempting to try to implement a flat folded origami interpreter by trying to represent each fold segment. However, if you travel along this path, it leads to lots of trees connecting faces (that you have to build from the edges) and a lot of searching over spaces and it's just not fun. Believe me, that was what my first implementation was like. . The way I'm now doing everything is by having a soup of vertexes that all have a vector of angles, sorted counter-clockwise, of all the edges connecting them and other vertexes. Finding both sides of a fold segment given one end requires a raycast by the fold segment's angle relative to the vertex. For faster use, this can be accelerated using nice structures, or, since the origami is only changing rarely in our case, we can just spend the time to bake them into each vertex and then go about our merry way. .struct vertex{ dvec2 pt; vector angles; // sorted over [0, 2*PI) vector bakedConnections; int angleToTraverseTree = 0; vertex(dvec2 p) { pt = p; } void sortAngles() { std::sort(angles.begin(), angles.end()); } void removeDuplicates() { // get rid of duplicates. im not worried about performance here for (int a = 1; a < angles.size(); ++a) { double diff = abs(angles[a] - angles[a - 1]); if (diff < epsilon) { angles.erase(angles.begin() + a); --a; } } // catch the 0-2pi case double diff = abs(angles[angles.size() - 1] - angles[0]); if (diff - 2 * PI < epsilon) { angles.erase(angles.begin() + angles.size() - 1); } }};vector verts; . Besides the vertex manual labor of adding, and sometimes iteratively adding, angles to vertexes, there's only 1 more thing that's critical to the implementation: taking this structure and actually folding the paper. In order to do this you first plot out the direction any individual point will take across vertexes (because this only allows rotations and trading off of origins, and that preserves the origami property). If you do this and follow each point in the piece of paper to the same vertex and the same face, you'll have the completed origami. . .void creaseTraversalTree(){ vector processed; vector leftToProcess; for (vertex& v : verts) v.angleToTraverseTree = 0; for (int i = 1; i < verts.size(); ++i) leftToProcess.push_back(i); processed.push_back(0); while (leftToProcess.size() != 0) { vector::iterator leftIter; for (leftIter = leftToProcess.begin(); leftIter != leftToProcess.end(); ++leftIter) { int vId = *leftIter; bool found = false; for (int angleId = 0; angleId < verts[vId].angles.size(); ++angleId) { int closest = getClosestVertexAlongDirection(verts[vId].pt, verts[vId].angles[angleId]); if (closest == -1) continue; vector::iterator iter = std::find(leftToProcess.begin(), leftToProcess.end(), closest); if (iter == leftToProcess.end()) { verts[vId].angleToTraverseTree = angleId; leftToProcess.erase(leftIter); found = true; break; } } if (found) break; } }} .dvec2 fold(dvec2 ptOriginal){ dvec2 pt = ptOriginal; int vId = getAnyVertexInFace(pt); double angle, rad; int angleAt = getBiggestAngleLessThan(vId, pt); while (1) { { dvec2 ptRelV = pt - verts[vId].pt; angle = atan2(ptRelV.y, ptRelV.x); if (angle < 0) angle += 2 * PI; rad = length(ptRelV); } vertex& curV = verts[vId]; int shiftSign = (curV.angleToTraverseTree > angleAt) ? 1 : -1; for (int i = angleAt; i != curV.angleToTraverseTree; i += shiftSign) { int correctedI = i % curV.angles.size(); angle = 2 * curV.angles[correctedI] - angle; } pt = rad * dvec2(cos(angle), sin(angle)) + curV.pt; if (vId == 0) break; assert(curV.bakedConnections.size() != 0); int newVert = curV.bakedConnections[curV.angleToTraverseTree]; // find the new angleAt to preserve the rotation double angleToFind = rotClamp(curV.angles[curV.angleToTraverseTree], PI); int i; bool found = false; for (i = 0; i < verts[newVert].angles.size(); ++i) { if (abs(angleToFind - verts[newVert].angles) < epsilon) { found = true; break; } } if (!found) printf("something funky going on with vertex %d and angle %d\n", vId, angleAt); angleAt = (i - 1 + verts[newVert].angles.size()) % verts[newVert].angles.size(); vId = newVert; } return pt;} . Conclusion . So, Here's some screenshots of everything in action: . [creases in a piece of paper with the corresponding folded points, generated as described above] . [the internal data structure of the above paper] . [a finished folded crane, without the folded wings] . Also, since I'm not that good at explaining things clearly, here's the link to the github for the project. This, if you're interested in looking at it, probably explains everything a bit more clearly, if you can get past the messy 1-file codebase. . Well, that should be everything! Next step: autocompleting origami, and maybe non-planar folds! . :D
  5. CulDeVu

    Origami and the associated maths

    'ey. . So this is going to be another one of those write-it-out-so-that-I-can-better-understand-it sort of things. Just a heads up. This IS a journal, after all, and functions as advertisement second. . So. Origami and mathematics. We know that the two are related because, if you fold a piece of paper the same way twice, you'll get two of the same shape, so there's rules governing it. And some smart people have tried figuring out those rules. There's a bunch of them, but they all fall short for what I'm wanting to do. So here it goes: . Together with research in 1992, and again in 2002, a total of 7 axioms for flat folding paper were created, detailing exactly which basic operations could be performed with paper. These were laid out in much the same way that Euclid's axioms of geometry were laid out. This is basically the way that most research that I've been able to turn up on the internet is: interesting constructions that, using the axioms, prove something about something else. . What I'm wanting to do, however, is a lot different. I'm wanting to be able to take a collection of creases and their associated information, and turn it into a model of the finished product. A kind of crease instruction to finished origami model converter if you will. To do this I've tried a couple methods, like physics based, relaxation and constraint based, and they've failed for one reason or another. I've settled on doing something sort of finite-element based: given a point on the original piece of paper and all the folds, where will that point be located in 3D space? . (turn up for trackpad art) . Unfortunately, there doesn't seem to be any information on the topic, so I guess I'll have to do it. Which is good, because I really wanted an interesting problem to work on. The rest of the entry will be about the stuff I've concluded so far, and the questions I have. . Types of folds . Folds are what what happen when you crease the paper in a straight line, and either fold the other side down or turn it some arbitrary angle around the fold. I'll be using the term "fold" and "crease" interchangeably. . Folds can created that do one of two things: start at one edge of a piece of paper and end at another loop back in on itself (a fold can also start and stop on the same point on the edge of a piece of paper, but I'll ignore that case and say it's case #1) . . As you can see in the above examples, folds can be segmented into multiple parts going different directions, but are still connected end to end. I'll call these fold segments. They come in 2 flavors: mountain valley These types mean essentially nothing by themselves. They represent a direction to fold relative to your viewing angle. What's important is that they are opposites, and reversing every single fold segment will result in the exact same origami (this is equivalent to turning the piece of paper over). In this journal, all mountain folds are shown as dotted lines in red, and all valley folds are shown as dashed lines in blue, unless i'm trying to point something out. . Things you can do with folds . Origami is all about layering paper folds, one after another, in order, to create something. The "in order" part is important. You can't create the head and tail of a paper crane (the last steps) before you make all the other folds. However, folds are not always dependent on all of the previous ones. Sometimes folds are made purely to get a reference point for some other fold later on down the line. Some folds cancel out the symmetry of previous folds (more on that later). Further, some steps can be done in parallel, like the 2 wings of a paper plane, because they are independent of each other. . When you make a fold, sometimes the mountain fold that you thought you made is actually a bunch of mountain-valley segments. To clear this up, I'll call the starting fold segment the fold seed. . (the green fold segment is the fold seed. The purple fold segments were calculated from other information about symmetry and dependency) . There are 4 types of fold seeds that I'll support for right now. They are: mountain valley inside-reverse (more on this one later) outside-reverse (the opposite of inside-reverse) (from left to right: mountain, valley, inside-reverse, outside-reverse. The green is the seed, and this shows what happens when each fold interacts with a single dependent) . I chose these specifically because these are the types of fold seeds that are needed (with the exception to #4) to create a paper crane. I figured if I can't create a origami interpreter that can't interpret the quintessential piece of origami, I've failed. The first 2 are really obvious and the latter two I'm saving for a later time. There are certain issues about them that make them more complicated. . You've likely noticed in all of the pictures you've seen so far there's a lot of reflection about lines going on. I'll call this reflecting about lines "dependency," because of the its ordering properties. When a fold segment is dependent on another fold segment, it will reflect about the other line and change from mountain to valley or vice versa. Here's a pic: . . An important thing to note about this dependency is that the line that is being reflected about *has* to be done first. The only exception to the rule is fold segments at 90 degree angles on the paper, which can be done interchangeably (which I think says something very interesting about these types of folds). . All of the above info has led me to use a tree to talk about origami folds. the parents of the tree represent dependency, while the nodes represent fold seeds. . . For mountain and valley fold seeds, this representation encompasses everything that needs to be said about an origami: from parallelism to dependency to ordering to direction. . Other fun facts . Here are just some neat things that happen to be true, that are essential to have a complete implementation of a origami interpreter, but aren't essential to have a complete understanding of what's happening. In no particular order: Everything talked about above has to do with planar folds, folds that create something that ends up laying down flat (180 degree folds). Non-planar folds act the exact same way but have an additional constraint: they can have no children in the dependency tree. For planar folds, every vertex where fold segments intersect, there are always the number of valley folds segments and mountain ones always differ by 2. For vertexes where fold segments and paper edges intersect, all bets are off. For non-planar fold segments, you can treat them like paper edges for the purpose of counting mountain-valley folds. This is an easy way to tell if a fold is valid or not. For mountain and valley fold seeds only, if a fold seed is dependent on a parent, which is dependent on a grandparent, then the fold seed is also dependent on the grandparent. This is not true for inside-reverse or outside-reverse folds. Folds that do not touch their parent fold in the dependency tree can be turned into 2 separate folds without any issues. For non-planar folds, it get's a bit more complicated. This is the only time self-intersection of paper is a problem, and I think I can safely ignore this case as it's fairly easy to work around. I think Nodes on the same layer of the dependency tree that are not dependent on each other can be done in parallel, or in any order. Each node on the dependency tree must be unique, otherwise it'd be kind of hard to do anything with it or conclude anything from it. Yup. . Well that's about it. That's all I'm really semi-confident about at the moment. There are a bunch of things I still have to look into and figure out before all is said and done though. . Tune in next time to find out what any of this has to do with finite element transformations, what the deal is with inside- and outside-reverse fold seeds and why they throw a kink in everything, and how any of this will help creating a fold to origami interpreter. . See yah!
  6. CulDeVu

    Fluid Dynamics update

    Oioioioioioioioioioioioi . I'm not really going to talk about much math-y stuff this time. I've covered too much ground since my last entry ( ) to be able to remember all the pitfalls I ran into. So this time I'm only really going to be talking about the general algorithm, how I deal with the pitfalls of the marker particle method, and the ongoing translation to CUDA. And probably share some of my Mario Maker levels. I'm addicted to stars . So! Let's do this! . [media][/media] . Sorry for the video being kinda shaky. OBS is having a hard time capturing details and YouTube thinks everything ever is a camcorder, so it looks a little strange. It get's across the idea, though. Note that there is still no surface tension, which would account for just about everything else strange looking with the simulation. . [size=7]The General Algorithm . So the algorithm for simulating a Eulerian fluid goes as follows:init particlesinit velocity field, type field, etcfor each frame: apply external forces to velocity field clear type field update particle positions and apply to type field extrapolate the velocity field find particles to re-position re-position those particles build pressure matrices solve the pressure equation apply the pressure to the velocity field enforce boundary conditions advect the velocity field . The velocity field is the standard MAC staggered grid, cause why not, and because it's useful for pressure computation and velocity extrapolation. The extrapolation and pressure step are straight out of academic papers so it's standard stuff. . The marker particles are the part I don't really know about. I just advect the marker particles through the velocity field just like anything else, and wherever the marker particles end up defines where the fluid exists or doesn't. This is pretty much the simplest method of defining the interior vs exterior of a fluid, so it has a lot of pitfalls, but I'll get to those in a minute. The issue is, though, that most everyone doesn't talk about this method (because of the many many issues it has) and so they use something called level sets. I've tried implementing level sets several times, and marker particles are just so much simpler in every way. . Marker Particles . So the biggest pitfall about marker particles is that, due to numerical inaccuracy and a bunch of other issues, they tend to bunch up a lot in certain places. Namely, places where the divergence of the velocity field is still positive even after the incompressibility pressure solver. You'd think that, since the pressure is positive in some places, it'd be negative in the same area, cancelling each other out, but it's not. The fact that gravity is always pulling down on the fluid makes it not always true, so what ends up happening is a net loss of volume from full to nearly 0 in a couple minutes. . So what I tried to do, instead of using level sets like a sane person, I decided to force the fluid to conserve volume. The way I did this is pretty straightforward, and is the "find/re-position particles" part of the above algorithm. Basically, . 1) iterate over every particle over a grid and see if there are already more than a certain number of particles (max density, if you will) in that grid cell. If there are, I mark it to be re-positioned 2) iterate over every grid cell where and see if there are less than some other number of particles (min density), and if there are, start pulling from the re-position pool and fill in those spots . This is rather finicky in practice. For example, if I set minDensity to maxDensity, you see a lot of artifacts from there not being enough particles in the re-position pool to accommodate (because of the overall positive divergence I mentioned). A happy median, I found, was setting maxDensity to anywhere between 2 or 3 times the minDensity. Sure, this DOES lead to volume loss by a factor of whatever multiple you choose, but it's much better than having visually disgusting and simulations. . . To be fair, the simulation without re-position looks a lot more fluid and water-like. However, conserving volume it too important to be avoided. Oh well. . CUDA . I've been translating small pieces of the simulation to use the GPGPU via CUDA. I've gotten the "update particle" portion completely ported to the GPU, which is just a simple advection and setting type fields. The really neat one is the pressure solver. . In order to find the pressure in each cell, long story short, you need to construct a matrix something like this: , and then solve for pressure. On the cpu I used Gauss-Seidel to solve it, but I have no idea how to do it on the GPU. Luckily, there's a library called cusp that implemented everything for me! .struct cuspTriple{ int row, col; float amount;};cusp::array1d pressure(mapW * mapH);void project(){ cusp::array1d b(mapW * mapH); { float scale = rho / dt; for (int y = 0; y < mapH; ++y) { for (int x = 0; x < mapW; ++x) { int index = y * mapW + x; b[index] = scale * (u->at(x + 1, y) - u->at(x, y) + v->at(x, y + 1) - v->at(x, y)); } } } vector data; { for (int y = 0; y < mapH; ++y) { for (int x = 0; x < mapW; ++x) { float scale = 1; int n = 0; if (x > 0) { if (type[y * mapW + x - 1] != SOLID) { if (type[y * mapW + x - 1] == WATER) { cuspTriple t; t.row = y * mapW + x; t.col = y * mapW + x - 1; t.amount = 1; data.push_back(t); } ++n; } } if (y > 0) { if (type[(y - 1) * mapW + x] != SOLID) { if (type[(y - 1) * mapW + x] == WATER) { cuspTriple t; t.row = y * mapW + x; t.col = (y - 1) * mapW + x; t.amount = 1; data.push_back(t); } ++n; } } if (x < mapW - 1) { if (type[y * mapW + x + 1] != SOLID) { if (type[y * mapW + x + 1] == WATER) { cuspTriple t; t.row = y * mapW + x; t.col = y * mapW + x + 1; t.amount = 1; data.push_back(t); } ++n; } } if (y < mapH - 1) { if (type[(y + 1) * mapW + x] != SOLID) { if (type[(y + 1) * mapW + x] == WATER) { cuspTriple t; t.row = y * mapW + x; t.col = (y + 1) * mapW + x; t.amount = 1; data.push_back(t); } ++n; } } cuspTriple t; t.row = y * mapW + x; t.col = y * mapW + x; t.amount = -n; data.push_back(t); } } } cusp::coo_matrix A(mapW * mapH, mapW * mapH, data.size()); { for (int i = 0; i < data.size(); ++i) { A.row_indices = data.row; A.column_indices = data.col; A.values = data.amount; } } cusp::default_monitor monitor(b, 600, 0.01, 0); cusp::precond::diagonal M(A); cusp::krylov::cg(A, pressure, b, monitor, M);}void applyPressure(){ float scale = dt / (rho); for (int y = 0; y < mapH; y++) { for (int x = 0; x < mapW; x++) { if (type[y * mapW + x] != WATER) continue; float p = pressure[y * mapW + x]; u->at(x, y) -= scale * p; u->at(x + 1, y) += scale * p; v->at(x, y) -= scale * p; v->at(x, y + 1) += scale * p; } }} . This is the version of the GPU solver I have right now. It has a ton of theoretical std::vector overhead (idk how well the CUDA compiler does to try to optimize things out), so the next thing I'm going to be testing is whether or not over-approximating the number of non-zero elements in the sparse matrix will still be efficient. . Also, the GPU version is around 3-4 times faster than the CPU version! Now, on the other hand, it's only 3 or 4 times faster than the CPU version. That's with copying data back and forth from the GPU, so I give it a little bit of a break, but not much. I'm extremely confident that I could easily squeeze much more efficiency out of it, so that will have to be something I'll deal with at a later date, after I port a bit more to the GPU. . In short, kill me now. . Conclusion . Could someone please give me stars in Super Mario Maker? I'm addicted and I need all you have. One of my levels' ID is 04B7-0000-0069-DB69 and from there you should be able to access the 3 levels I've uploaded. They have < 20% clear rate on each of them for some strange reason, even though I think they're pretty easy (and fun). . Anyway, thanks for reading!
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!