# Chess Engine - search development priority

This topic is 1726 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

hi, I'm currently developing a chess engine in c++ (here is the github repository: https://github.com/crybot/NapoleonPP), it is called NapoleonPP (probably i will change the name...) and it is based on magic bitboards move generation.

I've recently completed move generation which i think is fast enough (it can calculate perft 6 in about 1,2 seconds without special improvements like threading or transposition tables) . So I started working on search and evaluation, my primary goal was to build a nice search function so i just created a very simple evaluation function that takes care only of material, movement of some pieces and center control.

Now i'm stuck...

I tried to play with the program, and i must say it plays a discrete chess, but it is sooo slow... at some point it takes more than 10 seconds to search under ply 6...

what i need now is to know what to do first to improve my engine speed.

I need to implement move ordering (by now i search first captures and then non captures), Null move pruning, pv-search, principal variation extraction, iterative deeping, transposition tables(actually i've implemented transposition tables, but i didn't see benefits on speed, maybe i did something wrong? ), ecc.ecc.

If someone could direct me to the right way (maybe with description and some resources on the argument) I would be very happy.

Here is my actual search function:

int Search::search(int depth, int alpha, int beta, Board &board)
{
int bound = Alpha;
int score;

if((score = board.Table.Probe(board.zobrist, depth, alpha, beta)) != TranspositionTable::Unknown)
return score;

if( depth == 0 )
{
score = quiesce(alpha, beta, board);
board.Table.Save(HashEntry(board.zobrist, depth, score, 0, Exact)); // should i save leaf nodes??
return score;
}

int pos = 0;
Move moves[Constants::MaxMoves];

BitBoard attackers = board.KingAttackers(board.KingSquare[board.SideToMove], board.SideToMove);

if (attackers)
else
MoveGenerator::GetAllMoves(moves, pos, board);

BitBoard pinned = board.GetPinnedPieces();

for (int i=0; i<pos; i++)
{
if (board.IsMoveLegal(moves[i], pinned))
{
board.MakeMove(moves[i]);
score = -search(depth - 1, -beta, -alpha, board);
board.UndoMove(moves[i]);

if( score >= beta )
{
board.Table.Save(HashEntry(board.zobrist, depth, beta, 0, Beta));
return beta;   //  fail hard beta-cutoff
}
if( score > alpha )
{
bound = Exact;
alpha = score; // alpha acts like max in MiniMax
}
}
}

board.Table.Save(HashEntry(board.zobrist, depth, alpha, 0, bound));
return alpha;
}


Here is Transposition Table Code:

    TranspositionTable::TranspositionTable(unsigned long size)
{
Size = size;
Table = new HashEntry[size];
}

void TranspositionTable::Save(HashEntry entry)
{
HashEntry* hash = &Table[entry.Hash % Size];

hash->Hash = entry.Hash;
hash->Score = entry.Score;
hash->Depth = entry.Depth;
hash->Bound = entry.Bound;
}

int TranspositionTable::Probe(ZobristKey key, int depth, int alpha, int beta)
{
HashEntry* hash = &Table[key % Size];

if (hash->Hash == key)
{
if (hash->Depth >= depth)
{
if (hash->Bound == Exact)
return hash->Score;
if (hash->Bound == Alpha && hash->Score <= alpha)
return alpha;
if (hash->Bound == Beta && hash->Score >= beta)
return beta;
}
}

return TranspositionTable::Unknown;
}



I made some test, in particular i calculated kNodes per second by incrementing a variable every time i call alphabeta and quiesce:

I got about 3000-4000 kn/s using 32 MB for the transposition table (even if it does not make almost any difference).

My system configuration:

O.S.: Arch linux 64 bit (xfce)

CPU: Intel i5 760 2.80 Ghz (Quad core)

It is fast enough? I'm average with other non professional engines?

thanks all

Edited by crybot

##### Share on other sites
Move ordering is probably the most important part, especially in the quiescence search. MVV-LVA is a simple and effective order for captures. For non-captures, search killer moves first and then everything else in history-heuristic order. You can then try to move captures with negative SEE to the end of the move list, so they are searched after non-captures.

Ah, and more important than most of those things, you should store the best move in the hash tables, and try it first in case of a hash hit where you cannot just return a score (because of not enough depth or a useless bound).

By the way, perft 6 in 1.2 seconds is pretty impressive, if it's from the starting position. My own engine fits a similar description to yours, and my perft is about 6 times slower than that (although we are not using the same hardware, of course).

##### Share on other sites

Thanks for your answer, now I see all more clear,  but I still have some doubts:

regarding Transposition tables, should i save leaf nodes during alphabeta (and maybe also during quiescence search)? I tried to run the program without saving them and it seems to work faster.

For storing the best move, can i just use the index of the move inside the array?

That's all, thank you, again.

P.S. i forgot one important thing: How can i prove my transposition table is working correctly?

Edited by crybot

##### Share on other sites

Thanks for your answer, now I see all more clear,  but I still have some doubts:
regarding Transposition tables, should i save leaf nodes during alphabeta (and maybe also during quiescence search)? I tried to run the program without saving them and it seems to work faster.

That's a minor detail. In my own engine, I don't use the transposition tables at all in quiescence search, but I know others do it. You will just have to test it. But it's not critical.

For storing the best move, can i just use the index of the move inside the array?

I used to do that in my old engine (Ruy-López), but that requires you to generate all the moves before you even try the move from the transposition tables. Since this move very often produces a beta cutoff, it's worth writing a function that checks validity of a move on a given position and try to avoid generating the move list altogether.

I use a 16-bit format for the move, as described here: http://chessprogramming.wikispaces.com/Encoding+Moves

P.S. i forgot one important thing: How can i prove my transposition table is working correctly?

That's a very difficult question. A few ideas:
* You can run a search up to depth D (I assume you are using iterative deepening) and then rerun it without clearing the transposition tables: It should be almost instantaneous.
* You can run a long search on a position with and without transposition tables, and the version with transposition tables should generally get to the same depth faster.
* You can match your program against itself, with and without transposition tables, at a fixed depth, and the results should be even.

I recommend you spend some time trying better replacement schemes. I use buckets of size 4, which seems to work very well.

##### Share on other sites

I used to do that in my old engine (Ruy-López), but that requires you to generate all the moves before you even try the move from the transposition tables. Since this move very often produces a beta cutoff, it's worth writing a function that checks validity of a move on a given position and try to avoid generating the move list altogether.

I use a 16-bit format for the move, as described here: http://chessprogramming.wikispaces.com/Encoding+Moves

That's sound cheaper than what I implemented now, I'll try.

You said that you can encode a move in 16 bit, but as far as I know the minimum is 32 bit (how do you manage castle and en passants?). Omitting this, I made some test when i was working on the move generator and it came out that my move encoding (which is a class with 4 byte fields ) is faster than using a single 32 bit integer for representing it. Should I change encode method for saving some memory in the transposition table?

That's a very difficult question. A few ideas:
* You can run a search up to depth D (I assume you are using iterative deepening) and then rerun it without clearing the transposition tables: It should be almost instantaneous.
* You can run a long search on a position with and without transposition tables, and the version with transposition tables should generally get to the same depth faster.
* You can match your program against itself, with and without transposition tables, at a fixed depth, and the results should be even.

for trying  point one I should first implement iterative deeping (when should i do that?).

I will try point 2 and 3 right now.

P.S. I've just implemented MVV-LVA and the search is now MUCH faster! thank you.

##### Share on other sites

I used to do that in my old engine (Ruy-López), but that requires you to generate all the moves before you even try the move from the transposition tables. Since this move very often produces a beta cutoff, it's worth writing a function that checks validity of a move on a given position and try to avoid generating the move list altogether.[/size]

I use a 16-bit format for the move, as described here: [/size]http://chessprogramming.wikispaces.com/Encoding+Moves

That's sound cheaper than what I implemented now, I'll try.
You said that you can encode a move in 16 bit, but as far as I know the minimum is 32 bit (how do you manage castle and en passants?).

Castling and en-passants are encoded in the 4-bit "move type" described in the page I linked to. That plus 6-bit "to" and "from" gets you to 16 bits.

Omitting this, I made some test when i was working on the move generator and it came out that my move encoding (which is a class with 4 byte fields ) is faster than using a single 32 bit integer for representing it. Should I change encode method for saving some memory in the transposition table?

No, you can always convert to and from the 16-bit format just for transposition-table access. That shouldn't be too slow.

That's a very difficult question. A few ideas:[/size]
* You can run a search up to depth D (I assume you are using iterative deepening) and then rerun it without clearing the transposition tables: It should be almost instantaneous.[/size]
* You can run a long search on a position with and without transposition tables, and the version with transposition tables should generally get to the same depth faster.[/size]
* You can match your program against itself, with and without transposition tables, at a fixed depth, and the results should be even.[/size]

for trying  point one I should first implement iterative deeping (when should i do that?).

Now would be a good time. It's basically impossible to implement time control without it, and it's very hard to work with an engine that doesn't control time properly.

P.S. I've just implemented MVV-LVA and the search is now MUCH faster! thank you.

Yup, that one is a biggie. Edited by Álvaro

##### Share on other sites

Now would be a good time. It's basically impossible to implement time control without it, and it's very hard to work with an engine that doesn't control time properly.

I've just implemented a sort of iterative deeping (without time control, just for testing the engine) and started checking correctness of the transposition table.

I have found a few bugs and the engine seems to work properly now:

* You can run a search up to depth D (I assume you are using iterative deepening) and then rerun it without clearing the transposition tables: It should be almost instantaneous.
* You can run a long search on a position with and without transposition tables, and the version with transposition tables should generally get to the same depth faster.
* You can match your program against itself, with and without transposition tables, at a fixed depth, and the results should be even.

point 1 and 2 are ok, but point at point 3 i noticed that then engine doesn't produce the same final state after a "self-play" with and without transposition table.

I also tried to build a perft routine with the transposition table and, probably for some kind of collision error, i got, at depth 6, a node count of 119060316 instead of 119060324 (that i can obtain if I reduce the table size...).

I think zobrist keys are ok, so what's going on?

The last thing I have doubts about is principal variation: I'm triyng to improve move ordering and it seems that principal variation moves are the most important moves to try during alpha-beta search, but, with transposition table implemented, aren't them already the first moves tried (best moves)?

Thanks.

Edited by crybot

##### Share on other sites
You need to debug the difference in perft with and without transposition tables. Can you post the definition of the type of your TT entries?

##### Share on other sites

You need to debug the difference in perft with and without transposition tables. Can you post the definition of the type of your TT entries?

namespace Napoleon
{
enum ScoreType { Exact, Alpha, Beta };

class HashEntry
{
public:
ZobristKey Hash; // unsigned long long
Byte Depth; // unsigned char
Byte Bound; // unsigned char
Move BestMove; // 32 bit integer wrapper
int Score;

HashEntry();
HashEntry(ZobristKey, Byte, int, Move, Byte);
};
}


##### Share on other sites

The data structure seems correct. The only comment I have is that if you switch to using a 16-bit representation for the move, the size of a hash entry will go from 24 bytes to 16 bytes (this depends on the compiler, of course).

##### Share on other sites

The data structure seems correct. The only comment I have is that if you switch to using a 16-bit representation for the move, the size of a hash entry will go from 24 bytes to 16 bytes (this depends on the compiler, of course).

I know, I planned to change move encoding not just for the hashentry, but also for move generation. It will take some time, so i'll do it when i can.

Anyway now a HashEntry is a 18 bytes object.

Can you tell me something about what i wrote in my previous post?

The last thing I have doubts about is principal variation: I'm triyng to improve move ordering and it seems that principal variation moves are the most important moves to try during alpha-beta search, but, with transposition table implemented, aren't them already the first moves tried (best moves)?

Edited by crybot

##### Share on other sites

Please, print sizeof(Napoleon::HashEntry) and see if you get 18 (you won't).

I agree that there is nothing special to do about principal-variation moves when sorting, because using the move from the hash should be about the same thing as picking it from the PV.

##### Share on other sites

Please, print sizeof(Napoleon::HashEntry) and see if you get 18 (you won't).

you are right... I didn't know that even if a char should be of size 1 byte the compiler allocate 4 byte for it(I still don't get why)...

##### Share on other sites

It doesn't... but what it does do (EDIT: by default) is align 32 bit ints on a 4 byte boundary, and 16 bit ints on a 2 byte boundary. So there is 2 bytes wasted after you have 2 bytes followed by an int.

You get the best packing in classes and structs by having the largest members first in the struct/class. Changing the order shouldn't require changes to any client code.

##### Share on other sites

In 64-bit code your struct will probably get a size of 24 bytes, because the 64-bit integer will be aligned to 8 bytes.

http://en.wikipedia.org/wiki/Data_structure_alignment

##### Share on other sites

I understand.

I finally completely debugged my engine and found an other few bugs and now the transposition table is bug free (tested with my perft suite).

Despite this, when i make my program match itself it still produce a different end state with and without transposition table.

What could be the problem? Are we sure that it should produce the same result?

for clarity i post my actual code:

alpha-beta search:

    int Search::search(int depth, int alpha, int beta, Board& board)
{
int bound = Alpha;
int pos = 0;
int score;
Move best = Constants::NullMove;
Move moves[Constants::MaxMoves];

if((score = board.Table.Probe(board.zobrist, depth, alpha, &best, beta)) != TranspositionTable::Unknown)
return score;

if(depth == 0)
return quiesce(alpha, beta, board);

BitBoard pinned = board.GetPinnedPieces();

// tries best move first (if there is one)
if(best != Constants::NullMove)
{
assert(board.IsMoveLegal(best, pinned));

legal++;
board.MakeMove(best);
score = -search(depth - 1, -beta, -alpha, board);
board.UndoMove(best);

if( score >= beta )
return beta;

if( score > alpha )
{
bound = Exact;
alpha = score;
}
}

MoveGenerator::GetPseudoLegalMoves<false>(moves, pos, board); // get captures and non-captures

for (int i=0; i<pos; i++)
{
if (board.IsMoveLegal(moves[i], pinned))
{
board.MakeMove(moves[i]);
score = -search(depth - 1, -beta, -alpha, board);
board.UndoMove(moves[i]);

if( score >= beta )
{
board.Table.Save(board.zobrist, depth, beta, best, Beta);
return beta;   //  fail hard beta-cutoff
}
if( score > alpha )
{
bound = Exact;
alpha = score; // alpha acts like max in MiniMax
best = moves[i];
}
}
}

board.Table.Save(board.zobrist, depth, alpha, best, bound);
return alpha;
}


transposition table class:

    TranspositionTable::TranspositionTable(unsigned long size)
{
Size = size;
Table = new HashEntry[size]; // dynamic allocation is fine during program initialization
}

void TranspositionTable::Save(ZobristKey key, Byte depth, int score, Move move, Byte bound)
{
HashEntry* hash = &Table[key % Size];

if (depth >= hash->Depth) // it runs faster than depth > hash->Depth
{
hash->Hash = key;
hash->Score = score;
hash->Depth = depth;
hash->Bound = bound;
hash->BestMove = move;
}
}

int TranspositionTable::Probe(ZobristKey key, Byte depth, int alpha, Move* move, int beta)
{
HashEntry* hash = &Table[key % Size];

if (hash->Hash == key)
{
if (hash->Depth >= depth)
{
if (hash->Bound == Exact)
return hash->Score;
if (hash->Bound == Alpha && hash->Score <= alpha)
return alpha;
if (hash->Bound == Beta && hash->Score >= beta)
return beta;
}

*move = hash->BestMove; // get best move on this position
}
return TranspositionTable::Unknown;
}


could it be a replacement scheme problem or maybe a hash collision one?

Edited by crybot

##### Share on other sites
You can't expect the state to be the same (see [url="http://chessprogramming.wikispaces.com/Search+Instability"search instability). But you can expect playing at a fixed depth should produce similar playing strength with and without transposition tables, which is what I suggested you test.

##### Share on other sites

You can't expect the state to be the same (see [url="http://chessprogramming.wikispaces.com/Search+Instability"search instability). But you can expect playing at a fixed depth should produce similar playing strength with and without transposition tables, which is what I suggested you test.

sorry then, I misunderstood what you said.

Now I think i have no more problems and dubts with search, I just have to improve move ordering and try to apply more advanced search strategies.

Thank you for all

Edited by crybot