
Advertisement

Content Count
192 
Joined

Last visited
Community Reputation
183 NeutralAbout jorgander

Rank
Member
Personal Information

Interests
Art
Business
Education
Programming

With some help I've been able to work through the math part and am implementing it now. Apparently there is a another way to determine the global maximum of quadratic form mathematical functions, and the one I have now is a quadratic. The details are in this mathematica stackexhange post I created. If it works it will be quite significant, as it means that except for the loop to build the constant values, the time to determine split planes is constant regardless of the number of points. But there are potential problems with any solution: For the constant (summation) values and ensuing math, I am using the {{long double}} data type. I don't think this will overflow for any practical uses of the implementation (it would have to mean billions or trillions of points, I am guessing), but I suppose it is possible. If I ever am able to combine the 3rd heuristic function, it might change the form of the function from quadratic and I would have to revisit the math part. To answer your questions: Yes, the the end use is visualization of animated weather data over a set of parameters. For example, rainfall between now and 7 days from now, for a particular geographic region, etc. Again, the idea is that if I can render weather data at one point in time fast enough (say, less than a second), animating over time will be feasible. It won't run at the FPS of video games, but fast enough to look pretty. As the data structure will be pushed up to GPU memory and used by a pixel shader, the calculation will be perpixel which will create images/animation at the appropriate resolution for the screen it's running on. To compose images/animation for other resolutions, the implementation could be modified to instead use a compute shader, which is not that different and in fact it's what I though of first  instead of writing to the frame buffer, we would instead write to a texture. Simple enough.

One of the reasons why I want to stick with a BSP tree is that queries are fast and simple... one test per level of hierarchy using just a dot product. Also since the data points are represented by unit vectors, and querying compares these against another unit vector using the dot product (e.g. to find the closest N points to another arbitrary point), the implementation translates well to graphics hardware, specifically a pixel shader. I've already got a shader implemented for calculating and displaying land/coastlines  rendering these at 2560x1440 on the CPU takes a few minutes but on the GPU takes a tenth of a second. So I'm hoping the difficult in building/maintaining pays off down the road. But we'll see. Yeah this is my hobby project, so it takes a back seat to other things.

It's been a while since my last blog entry, but the problem I posed on my earlier blog entry still persists  how to efficiently choose a good split plane for an nvector data structure. To summarize the structure, geographic points are stored as nvectors (unit vectors) in a binary tree. Branch nodes of this tree define a plane that bisects/halves the unit sphere  one child of the branch contains points that are below the plane, the other child contains points that are above. Like all trees, leaf nodes are turned into branch nodes as they are filled beyond a threshold. That is the basic idea of it. Split plane determination must occur when a leaf becomes a branch or when a branch becomes unbalanced. In either case, the method called to determine the split plane is agnostic as to why it was called  it simply receives a bunch of points that it must determine a good split for. My initial implementation was naive: bestsplit = ... bestscore = infinite; for (i=0; i<numpoints; i++) { for (j=i+1; j<numpoints; j++) { split = (points[i]points[j]).normalize(); score = 0; for (k=0; k<numpoints; k++) { dot = split.dot(points[k]); // accumulate some scoring heuristics, such as is each point above/below the split, how far above/below, etc. score += ...; } if ( bestscore > score ) { bestscore = score; bestsplit = split; } } } So basically  for all points, determine the point between it and every other point, normalize it, and consider this difference as the normal to the split plane, then test it to see how it "scores" using some heuristics. Probably you can see how this will perform miserably for any significant number of points. The bigO notation is something like n^3 which doesn't include the expensive normalization that is called n^2 times. Several other methods were tested, such as fixing the normal of the split plane to be perpendicular to the x, y, or z axis, but these also proved too expensive and/or also had test cases where the split determination was unsatisfactory. Heuristics Enter calculus. If we can represent each of the heuristics as a mathematical function, we can determine when the function reaches what is called a "critical point". Specifically we are interested in the critical point that is the global maximum or minimum, depending on which heuristic it is for. So far we have three of these. 1. Similar Distance We don't want a split plane where, for example, all points above are nearby and all points below are far away. The points should be as evenly distributed as possible on either side. Given that the dot product of a plane normal and any other point is negative when the point is below the plane and positive when the point is above, and the absolute value of the dot product increases as distance to the plane increases, the sum of all dot products for a good split will be at or close to zero. If we let \(P\) be the array of points, \(N\) be the number of points in the array, and \(S\) the split plane, the following function will add all dot products: \(SumOfDots = \displaystyle\sum_{i=1}^{N} P_i \cdot S\) The summation here is not really part of a mathematical function, at least not one we can perform meaningful calculus on, since the calculation must be done in the code. We don't know ahead of time how many points there will be or what their values are, so the function should be agnostic in this regard. As written we cannot use it without inordinate complication, but consider that it is really doing this: \(SumOfDots = \displaystyle\sum_{i=1}^{N} P_{ix} * S_x + P_{iy} * S_y + P_{iz} * S_z\) This summation expanded will look something like: \(SumOfDots = P_{1x} * S_x + P_{1y} * S_y + P_{1z} * S_z + P_{2x} * S_x + P_{2y} * S_y + P_{2z} * S_z + P_{3x} * S_x + P_{3y} * S_y + P_{3z} * S_z + \cdots\) We can then rewrite this as: \(SumOfDots = S_x*(P_{1x} + P_{2x} + P_{3x} + \cdots) + S_y*(P_{1y} + P_{2y} + P_{3y} + \cdots) + S_z*(P_{1z} + P_{2z} + P_{3z} + \cdots)\) Or similarly: \(SumOfDots = S_x*(\displaystyle\sum_{i=1}^{N} P_{ix}) + S_y*(\displaystyle\sum_{i=1}^{N} P_{iy}) + S_z*(\displaystyle\sum_{i=1}^{N} P_{iz})\) As far as the mathematical function is concerned, the sums are constants and we can replace them with single characters to appear concise: \(A = \displaystyle\sum_{i=1}^{N} P_{ix}\) \(B = \displaystyle\sum_{i=1}^{N} P_{iy}\) \(C = \displaystyle\sum_{i=1}^{N} P_{iz}\) We can precalculate these in the code as such: double A = 0, B = 0, C = 0; for (i=0; i<N; i++) { A += points[i].x; B += points[i].y; C += points[i].z; } We will rewrite the function with these constants: \(SumOfDots = S_x*A + S_y*B + S_z*C\) This is great so far. We are interested in when this function reaches zero. To make it simpler, we can square it which makes the negative values positive, and then we become interested in when this function reaches a global minimum: \(SquaredSumOfDots = (S_x*A + S_y*B + S_z*C)^2\) So again, when this function reaches zero it means that the points on either side of the split plane  as denoted by \(S\)  are spread apart evenly. This does not mean that \(S\) is a good split plane overall, as the points could all lie on the plane, or some other undesirable condition occurs. For that we have other heuristics. As a final note, since the vector \(S\) represents a unit normal to a plane, any determination of it must be constrained to space of the unit sphere: \(S_x^2 + S_y^2 + S_z^2 = 1\) 2. Large Distance Practically speaking, the points will not be random and will originate from a grid of points or combination of grids. But random or not, if the points form a shape that is not equilateral  in other words if they form a rectangular shape instead of a square or an ellipse instead of a circle  the larger axis of the shape should be split so that child areas are not even less equilateral. To achieve this we can emphasize that the sum of the absolute value of all the dot products is large, meaning that the points are farther away from the split plane. To do this, we want to find the global maximum of a function that calculates the sum: \(SumOfAbsoluteDots=\displaystyle\sum_{i=1}^{N} P_i \cdot S\) Unfortunately there is no way, that I know of, to handle absolute values and still determine critical points, so we need to rewrite this function without the absolute value operator. Really all we are interested in is when this function reaches a maximum, so we can replace it with a square: \(SumOfSquaredDots=\displaystyle\sum_{i=1}^{N} (P_i \cdot S)^2\) Not unlike the previous heuristic function, we need to rewrite this so that \(S\) is not contained in the summation, and we can extract the constant values. If we skip some of the expanding and reducing the squared dot product we will arrive at this step: \(SumOfSquaredDots=(S_x^2*\displaystyle\sum_{i=1}^{N} P_{ix}^2) + (S_y^2*\displaystyle\sum_{i=1}^{N} P_{iy}^2) + (S_z^2*\displaystyle\sum_{i=1}^{N} P_{iz}^2) + (S_x * S_y * 2 * \displaystyle\sum_{i=1}^{N} P_{ix}*P_{iy})+ (S_x * S_z * 2 * \displaystyle\sum_{i=1}^{N} P_{ix}*P_{iz})+ (S_y * S_z * 2 * \displaystyle\sum_{i=1}^{N} P_{iy}*P_{iz})\) Again we will create some named constants to be concise: \(D = \displaystyle\sum_{i=1}^{N} P_{ix}^2\) \(E = \displaystyle\sum_{i=1}^{N} P_{iy}^2\) \(F = \displaystyle\sum_{i=1}^{N} P_{iz}^2\) \(G = 2 * \displaystyle\sum_{i=1}^{N} P_{ix}*P_{iy}\) \(H = 2 * \displaystyle\sum_{i=1}^{N} P_{ix}*P_{iz}\) \(I = 2 * \displaystyle\sum_{i=1}^{N} P_{iy}*P_{iz}\) As before, these can be precalculated: double D = 0, E = 0, F = 0, G = 0, H = 0, I = 0; for (i=0; i<N; i++) { D += points[i].x * points[i].x; E += points[i].y * points[i].y; F += points[i].z * points[i].z; G += points[i].x * points[i].y; H += points[i].x * points[i].z; I += points[i].y * points[i].z; } G *= 2.0; H *= 2.0; I *= 2.0; And then the function becomes: \(SumOfSquaredDots=(S_x^2*D) + (S_y^2*E) + (S_z^2*F) + (S_x * S_y * G)+ (S_x * S_z * H) + (S_y * S_z * I)\) Again when this function maximizes, the points are farthest away from the split plane, which is what we want. 3. Similar number of points A good split plane will also have the same number of points on either side. We can again use the dot product since it is negative for points below the plane and positive for points above. But we cannot simply sum the dot products themselves, since a large difference for a point on one side will cancel out several smaller differences on the other. To account for this we normalize the distance to be either +1 or 1: \(SumOfNormalizedDots=\displaystyle\sum_{i=1}^{N} \frac{P_i \cdot S}{\sqrt{(P_i \cdot S)^2}}\) Expanding this becomes: \(SumOfNormalizedDots=\displaystyle\sum_{i=1}^{N} \frac{P_{ix} * S_x + P_{iy} * S_y + P_{iz} * S_z}{\sqrt{(S_x^2*P_{ix}^2) + (S_y^2*P_{iy}^2) + (S_z^2*P_{iz}^2) + (S_x*S_y*2*P_{ix}*P_{iy})+ (S_x*S_z*2*P_{ix}*P_{iz})+ (S_y*S_z*2*P_{iy}*P_{iz})}}\) Unfortunately there is no way to reduce this function so that we can extract all \(S\) references out of the summation, and as with the previous heuristics, put all \(P\) references into constants that we precalculate and use to simplify the function. So at present, this heuristic is not able to be used. I am still working on it. The problem is that each iteration of the sum is dependent on the values of \(S\) in such a way that it cannot be extracted. Putting it all together What we ultimately want to do is combine the heuristic functions into one function, then use this one function find the critical point  either the global minimum or global maximum depending on how we combine them. The issue is that for \(SquaredSumOfDots\) we want the global minimum and for \(SumOfSquaredDots\) we want the global maximum. We can account for this by inverting the former so that we instead want the global maximum for it. The combined function then becomes: \(Combined = SumOfSquaredDots  SquaredSumOfDots\) Applying the terms from each function, we get: \(Combined = (S_x^2*D) + (S_y^2*E) + (S_z^2*F) + (S_x * S_y * G)+ (S_x * S_z * H) + (S_y * S_z * I)  (S_x*A + S_y*B + S_z*C)^2\) Expanding the square on the right side and combining some terms will give us: \(Combined = (S_x^2*(DA^2)) + (S_y^2*(E  B^2)) + (S_z^2*(F  C^2)) + (S_x * S_y * (G  (2 * A * B)))+ (S_x * S_z * (H  (2 * A * C))) + (S_y * S_z * (I  (2 * B * C)))\) Again we can combine/precalculate some constants to simplify: \(J = DA^2\) \(K = EB^2\) \(L = FC^2\) \(M = G  (2 * A * B)\) \(N = H  (2 * A * C)\) \(O = I  (2 * B * C)\) double J = D  (A * A); double K = E  (B * B); double L = F  (C * C); double M = G  (2 * A * B); double N = H  (2 * A * C); double O = I  (2 * B * C); And then apply these to the function: \(Combined = (S_x^2*J) + (S_y^2*K) + (S_z^2*L) + (S_x * S_y * M)+ (S_x * S_z * N) + (S_y * S_z * O)\) Because we are lazy, we then plug this into a tool that does the calculus for us. And this is where I am currently blocked on this issue, as I have found no program that can perform this computation. I've actually purchased Wolfram Mathematica and attempted it with the following command: Maximize[{((x^2)*j) + ((y^2)*k) + ((z^2)*l) + (x*y*m) + (x*z*n) + (y*z*o), ((x^2) + (y^2) + (z^2)) == 1}, {x, y, z}] After 5 or 6 days this had not finished and I had to restart the computer it was running on. I assumed that if it took that long, it would not complete in a reasonable amount of time. I will update this blog entry if I make any progress on this problem as well as the 3rd heuristic function. Further Optimizations While I have not gotten this far yet, it may be ultimately necessary to emphasize (or deemphasize) one heuristic over the others in order to further optimize the split determination. This could be done by simply multiplying each heuristic function by a scalar value to either increase or decrease it's emphases on the final result. The reason I haven't researched this yet is because if I cannot find the global maximum without these extra variables, I certainly cannot do it with them.

Nvector data structure to store and query for scatter points
jorgander commented on jorgander's blog entry in Jorgander's journal
Thanks for your comment! The actual use case is a bit more involved than I describe, but not much different than what I use as examples; it will be for rasterizing averaged values of weather data over a geographic area. The weather data comes in predefined grids, where each point of the grid is defined by latitude/longitude and a data value, and the grid may or may not be the same format as what is rendered (i.e. the weather data grid may be Polar Stereographic, while the rendered grid may be Lambert Conformal). As such, the internal structure should be agnostic. Another layer of complication is that the weather data also comes in one or more forecasts, where there is a grid for each forecasted unit of time X number of times. For example, 72 grids spaced out an hour apart from each other, containing wave height data for the Caribbean Sea. To keep the implementation simplified, each BSP only contains one type of weather data, at one point in time. Since the time is the same for all points in a BSP, it is not stored for every point, but once for the whole BSP. I mention it because even though it is not stored with each point, it is taken into consideration as another "dimension" when finding nearby points. The end result looks something like this: float target_time = ...; // render time float max_distance = ...; // points outside this distance are not considered int N = ...; // number of nearby points to consider foreach (pixel) { float target_geo[3] = ...; // unit vector (nvector) of pixel // [0] = distance, [1] = value float distance_and_value[N][2] = ...; // initialize to max_distance // check BSPs foreach (BSP close enough to target_time) { // modifies distance_and_value parameter as smaller distances are found; keeps the array sorted based on distance BSP.findClosest(target_geo, target_time, &distance_and_value); } float average = ...; // calculate using distance_and_value, stopping when >= max_distance pixel = ...; // calculate using average } But I didn't mention all of this because it is tangential to the problem of selecting a partition for each BSP tree branch. how many points are we talking about  it varies from grid to grid. Some are as small as 350x150, and others as large as 4400x2200. It could be larger as more meteorological data is added. For now I'm only using freely available data from NOAA, but could eventually use proprietary data from other sources. what is the distance (relative to the sphere) over which to make a query for nearest neighbours  this is a parameter passed to the query function, and may differ based on application. For most cases, the end result should look "pretty" and show smooth gradients, whereas accuracy is preferred for "serious" uses such as maritime navigation. what are the access patterns (are the queries random, or are you, for instance looking for the closest points to a moving object, where you can reuse the results of a previous search)?  (see above code snippet) The best scheme may depend on how fast adding a point to the data structure needs to be, versus the speed of the queries.  you are right, it may be, although not for my current use case. Regardless of which is best, I would certainly implement faster BSP insertion if I knew how. That makes sense for rendering multiple images of the same geographic area, but not for the case where the area may be panned or zoomed, as the list would have to be rebuilt. Also how would the list of nearby neighbors be efficiently built without some partitioning scheme to begin with? I think this is also not effective for the same reasons. Precalculating the angular distance between data points is not necessary, as that is never needed (only need to determine distance between data points and another arbitrary point), and as in the previous paragraph, precalculating angular distance between data points and another point is only used once. 
Nvector data structure to store and query for scatter points
jorgander posted a blog entry in Jorgander's journal
Introduction The impetus for this research topic is concerned with a data structure that efficiently stores and queries scatter point data. These points may originally be neatly arranged on a grid, or randomly scattered across the surface of a sphere (e.g. the earth), but it makes no assumption in that regard. The data structure could have member functions such as add(point) or add(points, amount). The points themselves contain three scalar values: the longitude, the latitude, and the data value itself. This data value could be geographical data such as land height or weather data such as air temperature, but again, the data structure is agnostic about the data itself other than as a scalar value. Data Structure The purpose of the data structure is ultimately to efficiently and accurately query for point(s) that is(are) nearby another point, the reason being to compute the average value at that point. Suppose a requirement is to create an image representing the air temperature (or rainfall, cloud cover, whatever the scalar value of each point represents)  this data structure could be queried for each point (i.e. pixel) in the image. Once nearby points are queried, the algorithm to average the scalar values of each point is outside the scope of this, but a good candidate is the inverse distance weighed algorithm. Within the data structure, the data points will not be stored in spherical coordinates, but as nvectors. The reason for this is that the sampling algorithm  finding N closest data points  is more efficient and accurate using vector algebra. While it is outside the scope of this, to summarize: it is easier to determine how close two points are on a sphere if these points are represented by unit vectors than by latitude and longitude. For example, two points sharing the same longitude are farther apart at the equator than above or below the equator. Even if we are just concerned with relative distances, and do not need to compute the square root of the distance formula, for this reason it is still not as accurate. Also the international date line, where longitude "resets", presents a problem that would have to be dealt with. Also either pole. But you get the idea  vectors are more stable, accurate, and efficient. I've always been a fan of Binary Space Partitions  each branch node of a BSP has two areas, one below and one above (or one left and one right, or one in and one out, however you like to think of it). To go along with storing points as vectors, branch nodes within the structure partitions their space also using a vector, which in this case forms the normal of a plane that bisects the unit sphere. All data points in a node (or children of the node) will either be above or below this plane (points that are on the plane can be allocated to either side as is appropriate). Points that are below the plane are placed in the first child of the node, and points that are above placed in the second child. We call this plane normal the split axis of the node. Querying the structure for the closest N points then becomes trivial. For any branch node, compute the dot product of the point in question and the split axis of the node. If it is negative (the point is below the split plane), recursively query with the first child node, and if positive (the point is above the split plane), with the second child node. For a leaf node, compute the dot product of the point in question with each data point contained in the node, keeping a sorted list of the closest N points. The one caveat is that in branch nodes, after recursing into one child node, it may be necessary to recurse into the other child if the farthest point found so far is farther than the other child node, since there may be closer points in the other child node. But this is trivial as we are comparing dot products. No expensive computations are necessary to compute the N closest points  just a bunch of dot products. As dot products of unit vectors range from 1 to 1 (1 being farthest apart and 1 being equal), two points are closer if their dot product is higher. Once complete, and the list of N points found, the actual distances can be calculated if necessary, which in this case is done by calculating the angles using the alreadycalculated dot products. This angle can then be multiplied by the radius of the earth to get an exact distance, again only if necessary. But for averaging, these extra calculations are not necessary. As a final note, the data structure lends itself well to graphics hardware. In my particular case, rendering an image using such a data structure on the CPU may take several minutes, but on the GPU takes a fraction of a second. Problem The problem  common to any space partitioning tree  is how to initially create the data structure. Again, the points are not assumed to be arranged in any specific way, and as far as we are concerned, are a "point soup". They can be specified one at a time  addPoint(point)  or all at once  addPoints(points)  or both. Specifically, how can the determination of the split axis of any branch be efficient and provide an even split (the same or similar number of points on either side). The problem is unique here because the points are not arranged on a twodimensional surface or in threedimensional space, but lie on a unit sphere. My current solution to the problem is not terribly elegant. For each two data points, compute the axis that splits them evenly. This is done by computing the point between the two (subtract one from the other) and normalizing it, then crossing this with the cross product of the two points. This cross product comprises the normal of the plane that evenly splits the two points. This normal is then tested against each other point in the node to get 1) the number of points on either side of the plane and 2) the distances of the points to the plane. A good split plane is one that 1) splits points evenly, 2) has the same/similar distances on other side, and 3) has large distances on other side. As this test is performed for each two data points, some bigO notation shows that determination of the split plane for nodes containing a large number of points will become prohibitive. However, I have not been able to determine a better solution. But the advantages of such a data structure still outweigh this expense. In my particular case, the time spent initially creating the tree is worth the time saved during queries. I should mention that if the data points are known ahead of time, it is faster to specify them all at once, so the tree can rebuild itself once, rather than one at a time which may cause the tree to rebuild itself multiple times. 
Efficient thread synchronization
jorgander commented on jorgander's blog entry in Jorgander's journal
Thanks for the info! I knew there must be something I'm missing. I've read the article you linked, as well as other related articles, and will adjust my code to account for it. However, my "real" job has taken priority for now and I will update this when I am able to. 
*See the EDIT below* The impetus for this research topic is the necessity for threads to exchange data in an efficient manner, namely between the network and main threads. Within the context of networking, we want data to be quickly available to, and dispensable from, the main thread, and that means calling the corresponding system calls (e.g. sendto/recvfrom) as often as possible. In addition to this, there is necessarily some translation done between native and network formats, as well lookups and other operations that could be offloaded to the network thread if possible. However, this introduces a problem; as data arrives and is picked up by the network thread, or as logic in the main thread requires data be sent out to the network, how do the separate threads share this data? If we create a mutex for it, we introduce an expensive system call that causes delays and is generally unacceptable in a tight loop. Whatever solution we implement should not involve systems calls or any significant pause in execution. If our shared data is simple enough to be just one variable  just one *instance* of something, if you will  then nothing fancy is needed. As an example, if the network thread needs to notify the main thread that a player has sent a QUIT message, and that this player be removed from all operations and deleted, a "player * quitting" member can be added, which is updated as necessary. The main thread will watch this member, and take appropriate action when it sees that it is nonnull. So, the course of action for both threads is: Network thread: 1. QUIT message received by player 2. update member "quitting" to refer to this player Main thread: 1. check member "quitting" 2. if nonnull, delete all associated resourced and updating member "quitting" to null But there is a problem with this: what if another QUIT message is received by the network thread before the main thread has processed the first one? The network thread may update member "quitting" before the main thread has seen the first one. To address this, we can add a safety check in step 2 of the Network thread  only update member "quitting" if it is null. The reasoning behind this is that after the Network thread has updated this member, it should not update it again until the Main thread has processed it, and the Network thread can tell this by whether or not it is null. And this introduces yet another problem, as you can probably guess: what should the Network thread do with second QUIT message that it has received before the Main thread has finished processing the first one? It can simply queue this second message, and any others it receives, in a buffer. Each loop iteration, it will check to see if the member "quitting" is null, and if so, assign it to one of the items in the buffer (also removing this item from the buffer). This solution works and is great in that it allows each thread to produce or consume data without waiting for the other. If data arrives in the producer, and the consumer is busy with previously shared data, it is simply queued, and the producer continues execution. The only cost incurred is one conditional in each thread, plus a buffer in the producer thread for queueing purposes. The drawback of this approach is that only one item can be shared at a time. While this may be ok for QUIT messages  it would be uncommon for many players to quit in the few milliseconds it takes per each loop iteration  it would not be efficient for messages that are transferred more often. For this approach, two buffers can be used that swap their contents. Instead of explaining all the details, I will simply post the code I've written to support this://// Thread synchronization w/out semaphores//// the data producer can always insert data, and should call Commit() once during it's event loop// (the Commit() call will only commit produced data if the consumer has processed existing data)//// the data consumer will process data if there is any has been committed by the producer//template class Queue{private: std::vector m_clsIn; std::vector m_clsOut; bool m_blnData; bool m_blnProcessed;public: Queue() : m_blnData(false), m_blnProcessed(true) {} void Add(T clsData) { //ASSERT(thread == producer) m_clsIn.push_back(clsData); } void Commit() { //ASSERT(thread == producer) if ( m_blnProcessed && m_clsIn.size() > 0 ) { m_clsIn.swap(m_clsOut); m_blnProcessed = false; m_blnData = true; } } template void Process(Functor & refFunctor) { //ASSERT(thread == consumer) if ( m_blnData ) { for (size_t i=0; im_ptrNext; m_ptrInactive>m_ptrNext = NULL; delete ptrNode; } template void Put(Functor functor) { //ASSERT(thread == producer) // // Check if the queue is full, in which case we must add a node // if ( m_ptrInactive>m_ptrNext == m_ptrActive ) { // // Between the above conditional and the following statement, the consumer can process // data and move the 'active' pointer, making the following node addition unnecessary. // However, there is no way to avoid it except with a mutex lock or other expensive // operation, and the extra node will get used anyway in the next call to Put() // // In the unlikely case that the 'active' pointer is advanced between the preceding // conditional and the Node c'tor below, we use 'inactive>next' as a parameter instead // of 'active' // m_ptrInactive>m_ptrNext = new Node(m_ptrInactive>m_ptrNext); } functor(static_cast(*m_ptrInactive)); std::atomic_signal_fence(std::memory_order_release); m_ptrInactive = m_ptrInactive>m_ptrNext; } template void Get(Functor refFunctor) { //ASSERT(thread == consumer) // // These is no [good] reason to fence here, as it is optimal to let the CPU handle memory as it // naturally does. The producer thread may not see our memory updates right away and subsequently // add extra nodes when it doesn't have to, but any extra nodes will be used anyway in subsequent // calls to Put(). // while ( m_ptrActive != m_ptrInactive ) { refFunctor(static_cast(*m_ptrActive)); m_ptrActive = m_ptrActive>m_ptrNext; } } // // For accurate results, call this from the producer thread // size_t SizeAll() { // Count from inactive > inactive size_t count = 1; for (Node * ptrNode = m_ptrActive>m_ptrNext; ptrNode != m_ptrActive; ptrNode = ptrNode>m_ptrNext) ++count; return count; } // // Results may be inaccurate regardless of which thread calls this // size_t SizeActive() { // Count from active > inactive size_t count = 0; for (Node * ptrNode = m_ptrActive; ptrNode != m_ptrInactive; ptrNode = ptrNode>m_ptrNext) ++count; return count; }}; The calls to [font='courier new']std::atomic_signal_fence(...)[/font] will have to be rewritten for compilers that do not support C++11. In my code I do not directly call these, but instead call a library that mimics the code in the article linked by Josh. I changed them here as I do not want to include my entire library. An assumption made is that a [font='courier new']Queue[/font] object will not be destroyed without coordination between threads (e.g. one waits for the other to finish). In other words, the destructor will clobber concurrent calls to [font='courier new']Put()[/font]/[font='courier new']Get()[/font]. A caveat is that T objects are not destroyed until the [font='courier new']Queue[/font] object is destroyed. This is by design, since my particular implementation benefits from it. However, it can be trivially changed by: In [font='courier new']Node[/font]: adding [font='courier new']char object[sizeof(T)][/font] and removing the T inheritance in [font='courier new']Put()[/font]: calling [font='courier new']new (node>object) T()[/font] and changing the functor parameter to [font='courier new']static_cast(node>object)[/font] in [font='courier new']Get()[/font]: changing the functor parameter to [font='courier new']static_cast(node>object)[/font] and calling [font='courier new']static_cast(node>object)>T::~T()[/font] (syntax may not be correct, but you get the idea)

C++  legal for a template to have a nontemplate inner class?
jorgander replied to jorgander's topic in General and Gameplay Programming
I've been under the impression for the longest time that a template class cannot have virtual methods, and that MS VStudio allowed it like it allows other illegal/bad practices. I'm not really sure where I picked it up, but a bit of research has shed some light on what I was wrong about: A member function template shall not be virtual. Example: template <class T> struct AA { template <class C> virtual void g(C); // error virtual void f(); // OK }; This section of the function pointer tutorials also shows something I thought was illegal. Thanks for the replies 
C++  legal for a template to have a nontemplate inner class?
jorgander posted a topic in General and Gameplay Programming
... that overrides virtual functions in its parent? I know that this is illegal: class foo { public: virtual dostuff(); }; template <typename T> class bar : public foo { public: virtual dostuff(); }; but is this illegal: class foo { public: virtual dostuff(); }; template <typename T> class bar { private: class inner : public foo { T _t inner(T) {...} // and possibly other members/functions with T virtual dostuff(); }; inner _foo; public: ... }; I know that MS VStudio lets you do it, but it's not legal for a template class to derive from a class that has virtual methods. However, can a template class contain a nontemplate class that does it, as shown above? 
C++ template friend class and inheritance
jorgander replied to jorgander's topic in General and Gameplay Programming
Upgrading to VC++ Express 2010 does not fix it, so I'm assuming they fixed it only in the paid version. Unfortunately I'll have to go with composition for now. 
C++ template friend class and inheritance
jorgander replied to jorgander's topic in General and Gameplay Programming
I did try that, and I got the same error as in my second post. I also removed the forward declaration and tried it both ways  same errors. Interestingly enough, changing the relationship from "isa" to "hasa" compiles without error: template <typename T, typename GlobalTraits> class Global { private: Registry::GlobalInternal internal; ... }; Anyone know why? I would think the same access rules apply. Perhaps it has something to do with when the compiler is loading the class  with "hasa" relationship the compiler has already read the class declaration and checked access rules, but with "isa" it is still reading the class declaration (i.e. the class name and bases) when it encounters the base class name. 
C++ template friend class and inheritance
jorgander replied to jorgander's topic in General and Gameplay Programming
Quote:Original post by alvaro template <typename T, typename GlobalTraits> friend class Script::Global; Is that what you were looking for? I originally thought so, but upon adding class I am presented with this compiler error: 1>d:\development\cpp\assware\common\include\asw\script.h(315) : error C2248: 'ASW::Script::Registry::GlobalInternal' : cannot access private class declared in class 'ASW::Script::Registry' 1> d:\development\cpp\assware\common\include\asw\script.h(271) : see declaration of 'ASW::Script::Registry::GlobalInternal' 1> d:\development\cpp\assware\common\include\asw\script.h(237) : see declaration of 'ASW::Script::Registry' ... The error leads me to believe that the compiler is not matching the friend declaration with prior forward declaration, although I'm not sure of this. 
C++ template friend class and inheritance
jorgander posted a topic in General and Gameplay Programming
I don't see why what I'm trying to do shouldn't be possible, but for some reason I can't get VC++ 2008 Express to like it. Here's the basic idea: namespace Script { template <typename T, typename GlobalTraits> class Global; class Registry { private: class GlobalInternal { ... }; template <typename T, typename GlobalTraits> friend Script::Global; }; template <typename T, typename GlobalTraits> class Global : private Registry::GlobalInternal { ... }; } With the above code I get this error: 1>d:\development\cpp\assware\common\include\asw\script.h(291) : error C2955: 'ASW::Script::Global' : use of class template requires template argument list 1> d:\development\cpp\assware\common\include\asw\script.h(234) : see declaration of 'ASW::Script::Global' Does anyone know the correct syntax? In case anyone asks, Registry::GlobalInternal is supposed to be inaccessible from any class that isn't supposed to have access to it, but still have it separate from Global so that source code that uses it can be compiled on its own (i.e. not use a template class). If anyone knows of a cleaner way to accomplish that I would like to know of it. 
[C++] ambiguous call to overloaded function
jorgander replied to jorgander's topic in General and Gameplay Programming
That compiles alright, thanks for the suggestion. I'll have to test and make sure the correct version is called. 
[C++] ambiguous call to overloaded function
jorgander posted a topic in General and Gameplay Programming
I have these two string conversion functions: template <typename T> const T * UTIL::Convert(const T *, const size_t, size_t &); template <typename T, typename U> const T * UTIL::Convert(const U *, const size_t, size_t &); The first simply returns its parameter, since they are the same type and no conversion is necessary. The second performs a conversion using wcstombs, mbstowcs, or some other method. This line of code const UTIL::Char * ptrText = UTIL::Convert<UTIL::Char>(p_ptrText, p_intLength, uintLength); Produces this error 1>c:\development\assware\common\src\gui\factory.cpp(220) : error C2668: 'UTIL::Convert' : ambiguous call to overloaded function 1> c:\development\util\include\util\impl\traits.h(47): could be 'const T *UTIL::Convert<UTIL::Char,XML_Char>(const U *,const size_t,size_t &)' 1> with 1> [ 1> T=UTIL::Char, 1> U=XML_Char 1> ] 1> c:\development\util\include\util\impl\traits.h(14): or 'const T *UTIL::Convert<UTIL::Char>(const T *,const size_t,size_t &)' 1> with 1> [ 1> T=UTIL::Char 1> ] 1> while trying to match the argument list '(const XML_Char *, int, size_t)' I'm wondering how to achieve the effect of automatically calling the correct Covert method using templates. That is to say, the compiler should know when the types are the same and call the first method.

Advertisement