Jump to content
  • Advertisement
Sign in to follow this  
  • entries
    11
  • comments
    53
  • views
    8619

About this blog

Hardware Design and a Little Software Too

Entries in this blog

 

Ouchy

Shortly after I posted the thread concerning my shortness of memory :p a friend of mine who regularly emails me funny or painful videos sent me this

So that pretty much put a damper on my night since I was in a similar accident, but worse, like 12 years ago while off roading with my father (he used to race sprint cars and baja before he went into the Air Force, and now regularly goes to NASCAR races to work with pit crews behind the wall or something like that). The last thing I remember was the ground coming at my window.

Anyway, before my change in mood, I was working on some simple power controls for the GPC expansion slots.



Still needs work but it is a start.

Caitlin

Caitlin

 

Wait, Not so Fast!

Fortunately I have never needed a wait state generator for anything I have ever designed. Now that I am using an expansion system I need one. This is due to the ability of the CPU to access memory on an expansion board and complete transfers to/from that memory. The CPU writes a control word to a register that selects which expansion slot the CPU wants to communicate with (not to be confused with an I/O address), the starting addresses for reading and writing, and address increment/decrement controls, then begins to access the memory on the card as a single I/O address through a "BIOS" port. For this to work, the expansion circuitry needs to keep track of what addresses the CPU is reading and writing to. This is where the wait state generator comes into play - to tell the CPU to wait until the circuitry is ready. When the CPU accesses the BIOS port the address counters will automatically increment/decrement, thus the need for two wait states while the counters and output circuitry sets up for the access. This BIOS memory on each card is how the plug and use system works. When a card is plugged into a slot the system recognizes the change and automatically tries to identify the card type, hardware, drivers, system resource requirements, etc.





The top picture is the timing that I will need to follow while the bottom picture is the basic wait state generator (with some test parts added) in a simulation program I use.

Caitlin

Caitlin

 

DMA is Good

I am currently putting the finishing touches on the DMA control portion of the GPC. I have decided to use superior yet harder to find Intel DMA chips rather than the Zilog types. Why? Because they are much better to interface hardware to. Here is the schematic of the DMA section:



This circuit takes eight DMA request lines from expansion slots plus two from the main board, prioritizes them, and presents them to the CPU in accordance with system timing specifications. Automatic address control and bank switching for 16 banks of 64kb of memory is also provided.

Here is the timer section, which is pretty simple:



This section includes six individual counter/timers on two Intel 8253 CTC chips. One counter/timer from each chip is connected to multiplexing/demultiplexing circuitry to allow the selection of 1 count/timer input from an array of 8 inputs. Both of these timer/counter circuits can be chained together or looped through software control. Again (see my reply concerning PIN5 and /EO), pins are left floating in the schematic for now because there is circuitry which belongs to the CTC block but is routed through the interrupt block for proper function.

As is my custom in designing circuits, this can be slightly modified to work with most older Intel based designs (which includes the Z80). If you are familiar with older Intel hardware you will notice that the circuit uses Intel control signals. These signals are routed through the "system control" protion of the main board in order to interface directly with the CPU being used.

The expansion ports of this system pose several problems which I will enjoy solving. The most significant is the ability to move data between expansion ports without involving the CPU or system busses. So the CPU could be doing calculations such as figuring out how to traverse an obsticle, while data is being streamed from an video sensor/converter card to a vision processor card (this is just an example), all without interfering with each other. Currently I'm thinking about using a microcontroller to control the expansion busses. This is a good idea because it allows adaptive streaming of data between ports unlike DMA.

Caitlin

Caitlin

 

Much better

I have started to modify the Z80 based system I mentioned in my last entry and it feels much more satisfying then coding at a PC for me.

>

That is the PCB layout of the system I am modifying. It was a general purpose controller with 2 serial I/O ports, 4 timer/counter interfaces, and between 56 and 64 bits of I/O depending on configuration. A lot needs to be done to allow memory bank switching, task control, "protected mode", plug and go, and several other things. If you use controllers in your hobby projects, feel free to give me some input on what additions you would find useful. One thing I have been thinking about is ADC/DAC interfaces.

Caitlin

Caitlin

 

Switching to Hardware (Again)

So I have decided to start designing hardware again. I like a lot of the threads here at GameDev and would like to continue to offer what little help I can to people here, so I will pretty much keep visiting and posting.

My current project is a more powerful version of a Z80 based multitasking GPC that I helped a guy I work with design and build. I will also be designing an 8085 version. Both will have support for plug and go (plug and play sounds silly considering the thing is not for playing games) using eight AT-esque expansion slots.

Why use CPU that are so old you ask? Because theyre simple to design systems for and hobbyests still use 8 bit systems quite often.

Caitlin

Caitlin

 

Sigh

So I have been dabbling with programming PCs for several years now, doing it in spurts of about 1-2 months then 4-5 months of not.

I just can't get much satisfaction from it. I'm happier programming a homebrew system using DIP switches, LED's, and toggle switches. It just seems more "fun" that way, interfacing directly with a computer's bus and control lines. Of course the best part is building the hardware from circuit board up.

Right now I just don't have any motivation to program anything for PCs. Maybe its because the weather turned crappy, who knows.

Caitlin

Caitlin

 

Too much to do, too little time to do it

Work has kept me pretty busy since my last entry. I have had a lot to keep me busy there between mission planning, flying, and resting.

Some things I have been thinking about are implementing a multiple line drawing algorithm using SIMD, but it seems like it won't work too well. Another thing I have been working on is my orbital mechanics function library. Most of the functions seem too simple to put them in a library so I will probably end up just using a function where needed instead of calling it as a subroutine.

I have also been reading some of the background stories that were written about the project I inherited. They are quite interesting and I might rewrite them while adding some of my own thoughts and imagination.

Caitlin

Caitlin

 

Brain Doesn't Sleep :((

I have been awoken at some odd hour yet again by the ever problem generating/solving brain. Happily asleep thenpoof I'm awake thinking about calculating the problem or raising m to the n power. Big deal you may say, so she's wanting to do simple math. As usual my brain was cranking away in another one of my problem generating/solving half-dreams. I went to bed and programming and anything related to it was not even close to being on my mind, yet I was pretty much solving the problem of doing power calculations using SIMD instructions in a "dream". I know that I won't be able to get to sleep unless I get the problem out of my head....

Ah good for me, it looks like this might be silly to try. Now hopefully I will be able to sleep without getting woke up again. If Intel only made a horizontal multiply instruction when they made the horizontal add instruction I might have been in business....

a LOT OF HOURS, ERRANDS, AND ONLY ONE BEER LATER i CAN'T FEEL MY TONGUE. oH i THINK i WON'T BE GETTING MUCH PROGRAMMING DONE TONIGHT BUT i'LL DEFINATELY TRY TO. sINCE i ALREADY CAN'T FEEL MY TONGUE WITH ONE BERR i MIGHT AS WELL HAVE ANOTHER :)) AND CONMTINUE WORKING ON THIS ROUTINE. bASICALLY WHAT i', TRYING TO DO IS TEXTURE RESIZING/MIP-MAPPING FOR SQUARE TEXTURES WHICH MEASURE N PIXELS ON EACH SIDE (N MUST BE POWERS OF 3 THOUGH). YAY! :))

Now I'm trying to figure out how RGBA is stored. Is it sored as RBGA in a single 32-bit dword, or is the alpha value stored aserately? Off to Goole to do a search and figur it out. BTE, beer count = 3.

Caitlin

Caitlin

 

4 x 4 ASM Matrix Multiplication

So I spent a fair amount of time yesterday and this morning trying to get around the fact that it appeared that I needed to do a lot of data shuffling to multiply two 4x4 matricies using SIMD instructions. I didn't want to accept that because shuffling data around is a waste of time to me unless it absolutely HAS to be done, so I looked up some things on Intel's web site and I was right unfortunately. I have to do a ridiculous amount of data shuffling, OMG its horrible, so ugly it makes me want to [bawling]. Oh well it HAS to be done so I'll have to live with it.

Here is the unedited Intel code:

multiply_4x4_matrix: ;multiplies two 4 x 4 matricies

mov edx, dword ptr [esp+4] ; src1
mov eax, dword ptr [esp+0Ch] ; dst
mov ecx, dword ptr [esp+8] ; src2
movss xmm0, dword ptr [edx]
movaps xmm1, xmmword ptr [ecx]
shufps xmm0, xmm0, 0
movss xmm2, dword ptr [edx+4]
mulps xmm0, xmm1
shufps xmm2, xmm2, 0
movaps xmm3, xmmword ptr [ecx+10h]
movss xmm7, dword ptr [edx+8]
mulps xmm2, xmm3
shufps xmm7, xmm7, 0
addps xmm0, xmm2
movaps xmm4, xmmword ptr [ecx+20h]
movss xmm2, dword ptr [edx+0Ch]
mulps xmm7, xmm4
shufps xmm2, xmm2, 0
addps xmm0, xmm7
movaps xmm5, xmmword ptr [ecx+30h]
movss xmm6, dword ptr [edx+10h]
mulps xmm2, xmm5
movss xmm7, dword ptr [edx+14h]
shufps xmm6, xmm6, 0
addps xmm0, xmm2
shufps xmm7, xmm7, 0
movlps qword ptr [eax], xmm0
movhps qword ptr [eax+8], xmm0
mulps xmm7, xmm3
movss xmm0, dword ptr [edx+18h]
mulps xmm6, xmm1
shufps xmm0, xmm0, 0
addps xmm6, xmm7
mulps xmm0, xmm4
movss xmm2, dword ptr [edx+24h]
addps xmm6, xmm0
movss xmm0, dword ptr [edx+1Ch]
movss xmm7, dword ptr [edx+20h]
shufps xmm0, xmm0, 0
shufps xmm7, xmm7, 0
mulps xmm0, xmm5
mulps xmm7, xmm1
addps xmm6, xmm0
shufps xmm2, xmm2, 0
movlps qword ptr [eax+10h], xmm6
movhps qword ptr [eax+18h], xmm6
mulps xmm2, xmm3
movss xmm6, dword ptr [edx+28h]
addps xmm7, xmm2
shufps xmm6, xmm6, 0
movss xmm2, dword ptr [edx+2Ch]
mulps xmm6, xmm4
shufps xmm2, xmm2, 0
addps xmm7, xmm6
mulps xmm2, xmm5
movss xmm0, dword ptr [edx+34h]
addps xmm7, xmm2
shufps xmm0, xmm0, 0
movlps qword ptr [eax+20h], xmm7
movss xmm2, dword ptr [edx+30h]
movhps qword ptr [eax+28h], xmm7
mulps xmm0, xmm3
shufps xmm2, xmm2, 0
movss xmm6, dword ptr [edx+38h]
mulps xmm2, xmm1
shufps xmm6, xmm6, 0
addps xmm2, xmm0
mulps xmm6, xmm4
movss xmm7, dword ptr [edx+3Ch]
shufps xmm7, xmm7, 0
addps xmm2, xmm6
mulps xmm7, xmm5
addps xmm2, xmm7
movaps xmmword ptr [eax+30h], xmm2

Caitlin

Caitlin

 

Getting into Windows

Ok, so I've been writing ASM code for all kinds of processors since I was like 14. Most of them have been from the x86 family, and when I have programmed them in a computer system it was usually DOS. As much as I don't like it I think it is about time to get into Windows programming.

What I'm currently working on is an ASM program thats composed of objects. It might sound strange at first but its a really simple concept to understand, and has probably already been done by someone.

Basically each object has code that controls it. Its shape, , texture, movement, reactions to other objects, etc. are all controlled by the object itself rather than a traditional engine. I have been working on this for about two years now, mostly when I have nothing else to do. The only major thing that an engine is in control of is integrating all of the interactions of different objects together to form a single output stream to the user. Other things include inputs, constant forces, timekeeping, networking, etc.

While browsing the site I have noticed a number of posts regarding the problem of calculating the distance between 2 points so I decided I should write a quick ASM routine to do this while it was still fresh in my mind. I don't have a library of 3D and graphics related functions yet so this would be a good place to start. So here is the code I threw together in about 10 minutes, unoptimized, untested, and basically unchecked. I will do all that later but have just thought that I need to also make a routine that would compute the distance between multiple sets of 2D points.

dist_between_points: ;calculates the distance between two points in 3D space

MOVAPS XMM0, point 2 ;load XMM0 with point 2 (packed as n, x2, y2, z2)
MOVAPS XMM1, point 1 ;load XMM1 with point 1 (packed as n, x1, y1, z1)
SUBPS XMM0, XMM1 ;subtract XMM1 from XMM0 (subtract x1, y1, z1 from x2, y2, z2)
MULPS XMM0, XMM0 ;multiply XMM0 by itself (square x, y, and z)
HADDPS XMM0, XMM0 ;add FP values in XMM0 horizontally (n + x, y + z)
HADDPS XMM0, XMM0 ;add horizontally again ((n + x) + (y + z))
SQRTPS XMM0, XMM0 ;calculate the square root to get our distance
MOVAPS distance, XMM0 ;store the distance to use later
EMMS ;free the FPU for other operations
RET ;return to caller



...and the code for two sets of 2D points....

2D_dist_two_points ;calculates the distances between two sets of points in 2D space

MOVAPS XMM0, set 2 ;load XMM0 with point set 2 (packed as x2a, y2a, x2b, y2b)
MOVAPS XMM0, set 1 ;load XMM1 with point set 1 (packed as x1a, y1a, x1b, y1b)
SUBPS SMM0, XMM1 ;subtract XMM1 from XMM0 (x2a - x1a, y2a - y1a, x2b - x1b, y2b - y1b)
MULPS XMM0, XMM0 ;multiply XMM0 by itself (square xa, ya, xb, yb)
HADDPS XMM0, XMM0 ;add FP values in XMM0 horizontally (xa + ya, xb + yb)
SQRTPS XMM0, XMM0 ;calculate the square roots to get distances
MOVAPS XMM1, XMM0 ;copy the distances
PUNPCKHDQ XMM0, XMM0 ;unpack distance a (making the distance xx, yy)
PUNPCKLDQ XMM1, XMM1 ;unpack distance b (making the distance xx, yy)
MOVAPS distance_a, XMM0 ;store distance a
MOVAPS distance_b, XMM1 ;store distance b
EMMS ;free FPU for other operations
RET ;return to caller



After messing around with trying to multiply two matricies together I decided to take a break and write something easy, a routine to scale a matrix by a scaling value:

scale_matrix: ;scales a matrix by scaling value xs, ys, zs, n

MOVAPS XMM4, scalar ;load XMM4 with the scaling value (packed as xs, ys, zs, n)
MOVAPS XMM0, column 0 ;load XMM0 with first column (packed as xs, ys, zs, n)
MOVAPS XMM1, column 1 ;load XMM1 with second column (packed same as above)
MULPS XMM0, XMM4 ;multiply first column by scalar
MOVAPS XMM2, column 2 ;load XMM2 with third column (packed same as above)
MULPS XMM1, XMM4 ;multiply second column by scalar
MOVAPS XMM3, column 3 ;load XMM2 with fourth column (packed same as above)
MULPS XMM2, XMM4 ;multiply third column by scalar
MOVAPS column 0, XMM0 ;store first column
MULPS XMM3, XMM4 ;multiply fourth column by scalar
MOVAPS column 1, XMM1 ;store second column
MOVAPS column 2, XMM2 ;store third column
MOVAPS column 3, XMM3 ;store fourth column
EMMS ;free the FPU
RET ;return to caller


...and naturally the translation of a matrix....

translate_matrix: ;translates a matrix by translation value xt, yt, zt, n

MOVAPS XMM4, translate ;load XMM4 with the translation value (packed as xt, yt, zt, n)
MOVAPS XMM0, column 0 ;load XMM0 with first column (packed as x, y, z, n)
MOVAPS XMM1, column 1 ;load XMM1 with second column (packed same as above)
ADDPS XMM0, XMM4 ;add translation value to first column
MOVAPS XMM2, column 2 ;load XMM2 with third column (packed same as above)
ADDPS XMM1, XMM4 ;add translation value to second column
MOVAPS XMM3, column 3 ;load XMM2 with fourth column (packed same as above)
ADDPS XMM2, XMM4 ;add translation value to third column
MOVAPS column 0, XMM0 ;store first column
ADDPS XMM3, XMM4 ;add translation value to fourth column
MOVAPS column 1, XMM1 ;store second column
MOVAPS column 2, XMM2 ;store third column
MOVAPS column 3, XMM3 ;store fourth column
EMMS ;free the FPU
RET ;return to caller


So after running a couple short errands and wasting some time I thought of what I could add to my rountines. The next step I thought about was to make 'streaming' versions of the scaling and translation routines:

scale_matrix_stream: ;scales a number of matricies by scaling values xs, ys, zs, n

MOVAPS XMM7, scalar ;load XMM4 with the scaling value (packed as xs, ys, zs, n)
MOV ECX, repititions ;load ECX with the number of matricies to process
SHR ECX, 1 ;shift LSB of EAX into CF
JNC, evennumber ;the number of matricies to process is even; process matricies
MOVAPS XMM0, column 0 ;load XMM0 with first column (packed as xs, ys, zs, n)
MOVAPS XMM1, column 1 ;load XMM1 with second column (packed same as above)
MULPS XMM0, XMM7 ;multiply first column by scalar
MOVAPS XMM2, column 2 ;load XMM2 with third column (packed same as above)
MULPS XMM1, XMM7 ;multiply second column by scalar
MOVAPS XMM3, column 3 ;load XMM2 with fourth column (packed same as above)
MULPS XMM2, XMM7 ;multiply third column by scalar
MOVAPS column 0, XMM0 ;store first column
MULPS XMM3, XMM7 ;multiply fourth column by scalar
MOVAPS column 1, XMM1 ;store second column
MOVAPS column 2, XMM2 ;store third column
MOVAPS column 3, XMM3 ;store fourth column
TEST ECX, FFFFFFFF ;see if ECX = 0
JZ, streamdone ;finish if ECX = 0
evennumber:
MOVAPS XMM0, column 0 ;load XMM0 with set 0, first column (packed as xs, ys, zs, n)
MOVAPS XMM1, column 1 ;load XMM1 with set 0, second column (packed same as above)
MULPS XMM0, XMM7 ;multiply set 0, first column by scalar
MOVAPS XMM2, column 2 ;load XMM2 with set 0, third column (packed same as above)
MULPS XMM1, XMM7 ;multiply set 0, second column by scalar
MOVAPS XMM3, column 3 ;load XMM3 with set 0, fourth column (packed same as above)
MULPS XMM2, XMM7 ;multiply set 0, third column by scalar
MOVAPS XMM4, column 0 ;load XMM4 with set 1, first column (packed same as above)
MULPS XMM3, XMM7 ;multiply set 0, fourth column by scalar
MOVAPS XMM5, column 1 ;load Xmm5 with set 1, second column (packed same as above)
MULPS XMM4, XMM7 ;multiply set 1, first column by scalar
MOVAPS XMM6, column 2 ;load XMM2 with set 1, third column (packed same as above)
MULPS XMM5, XMM7 ;multiply set 1, second column by scalar
MOVAPS column 0, XMM0 ;store set 0, first column to free XMM0 for use
MULPS XMM6, XMM7 ;multiply set 1, third column by scalar
MOVAPS XMM0, column 3 ;load XMM3 with set 1, fourth column
MOVAPS column 1, xMM1 ;store set 0, column 1
MULPS XMM0, XMM7 ;multiply set 1, fourth column by scalar
MOVAPS column 2, XMM2 ;store results
MOVAPS column 3, XMM3
MOVAPS column 0, XMM4
MOVAPS column 1, XMM5
MOVAPS column 2, XMM6
MOVAPS column 3, XMM0
DEC ECX ;decrement counter
JNZ, evennumber ;process two more matricies if ECX 0
steamdone:
EMMS ;free the FPU
RET ;return to caller

If you understand ASM enough to read that, you may ask why I have put two seperate calculation blocks in the routine. Its simple - I want to process as many elements at one time as I can without any branching. So the first block is processed if the number of matricies to process is odd, then the code continues to the next block where two matricies are processed before branching.



So thats all for now. Feel free to comment as you like but please keep comments constructive, otherwise I will just ignore them. Bye bye :)

Caitlin

Caitlin

Sign in to follow this  
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!