1*ebfedea0SLionel Sambuc\documentclass[b5paper]{book} 2*ebfedea0SLionel Sambuc\usepackage{hyperref} 3*ebfedea0SLionel Sambuc\usepackage{makeidx} 4*ebfedea0SLionel Sambuc\usepackage{amssymb} 5*ebfedea0SLionel Sambuc\usepackage{color} 6*ebfedea0SLionel Sambuc\usepackage{alltt} 7*ebfedea0SLionel Sambuc\usepackage{graphicx} 8*ebfedea0SLionel Sambuc\usepackage{layout} 9*ebfedea0SLionel Sambuc\def\union{\cup} 10*ebfedea0SLionel Sambuc\def\intersect{\cap} 11*ebfedea0SLionel Sambuc\def\getsrandom{\stackrel{\rm R}{\gets}} 12*ebfedea0SLionel Sambuc\def\cross{\times} 13*ebfedea0SLionel Sambuc\def\cat{\hspace{0.5em} \| \hspace{0.5em}} 14*ebfedea0SLionel Sambuc\def\catn{$\|$} 15*ebfedea0SLionel Sambuc\def\divides{\hspace{0.3em} | \hspace{0.3em}} 16*ebfedea0SLionel Sambuc\def\nequiv{\not\equiv} 17*ebfedea0SLionel Sambuc\def\approx{\raisebox{0.2ex}{\mbox{\small $\sim$}}} 18*ebfedea0SLionel Sambuc\def\lcm{{\rm lcm}} 19*ebfedea0SLionel Sambuc\def\gcd{{\rm gcd}} 20*ebfedea0SLionel Sambuc\def\log{{\rm log}} 21*ebfedea0SLionel Sambuc\def\ord{{\rm ord}} 22*ebfedea0SLionel Sambuc\def\abs{{\mathit abs}} 23*ebfedea0SLionel Sambuc\def\rep{{\mathit rep}} 24*ebfedea0SLionel Sambuc\def\mod{{\mathit\ mod\ }} 25*ebfedea0SLionel Sambuc\renewcommand{\pmod}[1]{\ ({\rm mod\ }{#1})} 26*ebfedea0SLionel Sambuc\newcommand{\floor}[1]{\left\lfloor{#1}\right\rfloor} 27*ebfedea0SLionel Sambuc\newcommand{\ceil}[1]{\left\lceil{#1}\right\rceil} 28*ebfedea0SLionel Sambuc\def\Or{{\rm\ or\ }} 29*ebfedea0SLionel Sambuc\def\And{{\rm\ and\ }} 30*ebfedea0SLionel Sambuc\def\iff{\hspace{1em}\Longleftrightarrow\hspace{1em}} 31*ebfedea0SLionel Sambuc\def\implies{\Rightarrow} 32*ebfedea0SLionel Sambuc\def\undefined{{\rm ``undefined"}} 33*ebfedea0SLionel Sambuc\def\Proof{\vspace{1ex}\noindent {\bf Proof:}\hspace{1em}} 34*ebfedea0SLionel Sambuc\let\oldphi\phi 35*ebfedea0SLionel Sambuc\def\phi{\varphi} 36*ebfedea0SLionel Sambuc\def\Pr{{\rm Pr}} 37*ebfedea0SLionel Sambuc\newcommand{\str}[1]{{\mathbf{#1}}} 38*ebfedea0SLionel Sambuc\def\F{{\mathbb F}} 39*ebfedea0SLionel Sambuc\def\N{{\mathbb N}} 40*ebfedea0SLionel Sambuc\def\Z{{\mathbb Z}} 41*ebfedea0SLionel Sambuc\def\R{{\mathbb R}} 42*ebfedea0SLionel Sambuc\def\C{{\mathbb C}} 43*ebfedea0SLionel Sambuc\def\Q{{\mathbb Q}} 44*ebfedea0SLionel Sambuc\definecolor{DGray}{gray}{0.5} 45*ebfedea0SLionel Sambuc\newcommand{\emailaddr}[1]{\mbox{$<${#1}$>$}} 46*ebfedea0SLionel Sambuc\def\twiddle{\raisebox{0.3ex}{\mbox{\tiny $\sim$}}} 47*ebfedea0SLionel Sambuc\def\gap{\vspace{0.5ex}} 48*ebfedea0SLionel Sambuc\makeindex 49*ebfedea0SLionel Sambuc\begin{document} 50*ebfedea0SLionel Sambuc\frontmatter 51*ebfedea0SLionel Sambuc\pagestyle{empty} 52*ebfedea0SLionel Sambuc\title{Multi--Precision Math} 53*ebfedea0SLionel Sambuc\author{\mbox{ 54*ebfedea0SLionel Sambuc%\begin{small} 55*ebfedea0SLionel Sambuc\begin{tabular}{c} 56*ebfedea0SLionel SambucTom St Denis \\ 57*ebfedea0SLionel SambucAlgonquin College \\ 58*ebfedea0SLionel Sambuc\\ 59*ebfedea0SLionel SambucMads Rasmussen \\ 60*ebfedea0SLionel SambucOpen Communications Security \\ 61*ebfedea0SLionel Sambuc\\ 62*ebfedea0SLionel SambucGreg Rose \\ 63*ebfedea0SLionel SambucQUALCOMM Australia \\ 64*ebfedea0SLionel Sambuc\end{tabular} 65*ebfedea0SLionel Sambuc%\end{small} 66*ebfedea0SLionel Sambuc} 67*ebfedea0SLionel Sambuc} 68*ebfedea0SLionel Sambuc\maketitle 69*ebfedea0SLionel SambucThis text has been placed in the public domain. This text corresponds to the v0.39 release of the 70*ebfedea0SLionel SambucLibTomMath project. 71*ebfedea0SLionel Sambuc 72*ebfedea0SLionel Sambuc\begin{alltt} 73*ebfedea0SLionel SambucTom St Denis 74*ebfedea0SLionel Sambuc111 Banning Rd 75*ebfedea0SLionel SambucOttawa, Ontario 76*ebfedea0SLionel SambucK2L 1C3 77*ebfedea0SLionel SambucCanada 78*ebfedea0SLionel Sambuc 79*ebfedea0SLionel SambucPhone: 1-613-836-3160 80*ebfedea0SLionel SambucEmail: tomstdenis@gmail.com 81*ebfedea0SLionel Sambuc\end{alltt} 82*ebfedea0SLionel Sambuc 83*ebfedea0SLionel SambucThis text is formatted to the international B5 paper size of 176mm wide by 250mm tall using the \LaTeX{} 84*ebfedea0SLionel Sambuc{\em book} macro package and the Perl {\em booker} package. 85*ebfedea0SLionel Sambuc 86*ebfedea0SLionel Sambuc\tableofcontents 87*ebfedea0SLionel Sambuc\listoffigures 88*ebfedea0SLionel Sambuc\chapter*{Prefaces} 89*ebfedea0SLionel SambucWhen I tell people about my LibTom projects and that I release them as public domain they are often puzzled. 90*ebfedea0SLionel SambucThey ask why I did it and especially why I continue to work on them for free. The best I can explain it is ``Because I can.'' 91*ebfedea0SLionel SambucWhich seems odd and perhaps too terse for adult conversation. I often qualify it with ``I am able, I am willing.'' which 92*ebfedea0SLionel Sambucperhaps explains it better. I am the first to admit there is not anything that special with what I have done. Perhaps 93*ebfedea0SLionel Sambucothers can see that too and then we would have a society to be proud of. My LibTom projects are what I am doing to give 94*ebfedea0SLionel Sambucback to society in the form of tools and knowledge that can help others in their endeavours. 95*ebfedea0SLionel Sambuc 96*ebfedea0SLionel SambucI started writing this book because it was the most logical task to further my goal of open academia. The LibTomMath source 97*ebfedea0SLionel Sambuccode itself was written to be easy to follow and learn from. There are times, however, where pure C source code does not 98*ebfedea0SLionel Sambucexplain the algorithms properly. Hence this book. The book literally starts with the foundation of the library and works 99*ebfedea0SLionel Sambucitself outwards to the more complicated algorithms. The use of both pseudo--code and verbatim source code provides a duality 100*ebfedea0SLionel Sambucof ``theory'' and ``practice'' that the computer science students of the world shall appreciate. I never deviate too far 101*ebfedea0SLionel Sambucfrom relatively straightforward algebra and I hope that this book can be a valuable learning asset. 102*ebfedea0SLionel Sambuc 103*ebfedea0SLionel SambucThis book and indeed much of the LibTom projects would not exist in their current form if it was not for a plethora 104*ebfedea0SLionel Sambucof kind people donating their time, resources and kind words to help support my work. Writing a text of significant 105*ebfedea0SLionel Sambuclength (along with the source code) is a tiresome and lengthy process. Currently the LibTom project is four years old, 106*ebfedea0SLionel Sambuccomprises of literally thousands of users and over 100,000 lines of source code, TeX and other material. People like Mads and Greg 107*ebfedea0SLionel Sambucwere there at the beginning to encourage me to work well. It is amazing how timely validation from others can boost morale to 108*ebfedea0SLionel Sambuccontinue the project. Definitely my parents were there for me by providing room and board during the many months of work in 2003. 109*ebfedea0SLionel Sambuc 110*ebfedea0SLionel SambucTo my many friends whom I have met through the years I thank you for the good times and the words of encouragement. I hope I 111*ebfedea0SLionel Sambuchonour your kind gestures with this project. 112*ebfedea0SLionel Sambuc 113*ebfedea0SLionel SambucOpen Source. Open Academia. Open Minds. 114*ebfedea0SLionel Sambuc 115*ebfedea0SLionel Sambuc\begin{flushright} Tom St Denis \end{flushright} 116*ebfedea0SLionel Sambuc 117*ebfedea0SLionel Sambuc\newpage 118*ebfedea0SLionel SambucI found the opportunity to work with Tom appealing for several reasons, not only could I broaden my own horizons, but also 119*ebfedea0SLionel Sambuccontribute to educate others facing the problem of having to handle big number mathematical calculations. 120*ebfedea0SLionel Sambuc 121*ebfedea0SLionel SambucThis book is Tom's child and he has been caring and fostering the project ever since the beginning with a clear mind of 122*ebfedea0SLionel Sambuchow he wanted the project to turn out. I have helped by proofreading the text and we have had several discussions about 123*ebfedea0SLionel Sambucthe layout and language used. 124*ebfedea0SLionel Sambuc 125*ebfedea0SLionel SambucI hold a masters degree in cryptography from the University of Southern Denmark and have always been interested in the 126*ebfedea0SLionel Sambucpractical aspects of cryptography. 127*ebfedea0SLionel Sambuc 128*ebfedea0SLionel SambucHaving worked in the security consultancy business for several years in S\~{a}o Paulo, Brazil, I have been in touch with a 129*ebfedea0SLionel Sambucgreat deal of work in which multiple precision mathematics was needed. Understanding the possibilities for speeding up 130*ebfedea0SLionel Sambucmultiple precision calculations is often very important since we deal with outdated machine architecture where modular 131*ebfedea0SLionel Sambucreductions, for example, become painfully slow. 132*ebfedea0SLionel Sambuc 133*ebfedea0SLionel SambucThis text is for people who stop and wonder when first examining algorithms such as RSA for the first time and asks 134*ebfedea0SLionel Sambucthemselves, ``You tell me this is only secure for large numbers, fine; but how do you implement these numbers?'' 135*ebfedea0SLionel Sambuc 136*ebfedea0SLionel Sambuc\begin{flushright} 137*ebfedea0SLionel SambucMads Rasmussen 138*ebfedea0SLionel Sambuc 139*ebfedea0SLionel SambucS\~{a}o Paulo - SP 140*ebfedea0SLionel Sambuc 141*ebfedea0SLionel SambucBrazil 142*ebfedea0SLionel Sambuc\end{flushright} 143*ebfedea0SLionel Sambuc 144*ebfedea0SLionel Sambuc\newpage 145*ebfedea0SLionel SambucIt's all because I broke my leg. That just happened to be at about the same time that Tom asked for someone to review the section of the book about 146*ebfedea0SLionel SambucKaratsuba multiplication. I was laid up, alone and immobile, and thought ``Why not?'' I vaguely knew what Karatsuba multiplication was, but not 147*ebfedea0SLionel Sambucreally, so I thought I could help, learn, and stop myself from watching daytime cable TV, all at once. 148*ebfedea0SLionel Sambuc 149*ebfedea0SLionel SambucAt the time of writing this, I've still not met Tom or Mads in meatspace. I've been following Tom's progress since his first splash on the 150*ebfedea0SLionel Sambucsci.crypt Usenet news group. I watched him go from a clueless newbie, to the cryptographic equivalent of a reformed smoker, to a real 151*ebfedea0SLionel Sambuccontributor to the field, over a period of about two years. I've been impressed with his obvious intelligence, and astounded by his productivity. 152*ebfedea0SLionel SambucOf course, he's young enough to be my own child, so he doesn't have my problems with staying awake. 153*ebfedea0SLionel Sambuc 154*ebfedea0SLionel SambucWhen I reviewed that single section of the book, in its very earliest form, I was very pleasantly surprised. So I decided to collaborate more fully, 155*ebfedea0SLionel Sambucand at least review all of it, and perhaps write some bits too. There's still a long way to go with it, and I have watched a number of close 156*ebfedea0SLionel Sambucfriends go through the mill of publication, so I think that the way to go is longer than Tom thinks it is. Nevertheless, it's a good effort, 157*ebfedea0SLionel Sambucand I'm pleased to be involved with it. 158*ebfedea0SLionel Sambuc 159*ebfedea0SLionel Sambuc\begin{flushright} 160*ebfedea0SLionel SambucGreg Rose, Sydney, Australia, June 2003. 161*ebfedea0SLionel Sambuc\end{flushright} 162*ebfedea0SLionel Sambuc 163*ebfedea0SLionel Sambuc\mainmatter 164*ebfedea0SLionel Sambuc\pagestyle{headings} 165*ebfedea0SLionel Sambuc\chapter{Introduction} 166*ebfedea0SLionel Sambuc\section{Multiple Precision Arithmetic} 167*ebfedea0SLionel Sambuc 168*ebfedea0SLionel Sambuc\subsection{What is Multiple Precision Arithmetic?} 169*ebfedea0SLionel SambucWhen we think of long-hand arithmetic such as addition or multiplication we rarely consider the fact that we instinctively 170*ebfedea0SLionel Sambucraise or lower the precision of the numbers we are dealing with. For example, in decimal we almost immediate can 171*ebfedea0SLionel Sambucreason that $7$ times $6$ is $42$. However, $42$ has two digits of precision as opposed to one digit we started with. 172*ebfedea0SLionel SambucFurther multiplications of say $3$ result in a larger precision result $126$. In these few examples we have multiple 173*ebfedea0SLionel Sambucprecisions for the numbers we are working with. Despite the various levels of precision a single subset\footnote{With the occasional optimization.} 174*ebfedea0SLionel Sambuc of algorithms can be designed to accomodate them. 175*ebfedea0SLionel Sambuc 176*ebfedea0SLionel SambucBy way of comparison a fixed or single precision operation would lose precision on various operations. For example, in 177*ebfedea0SLionel Sambucthe decimal system with fixed precision $6 \cdot 7 = 2$. 178*ebfedea0SLionel Sambuc 179*ebfedea0SLionel SambucEssentially at the heart of computer based multiple precision arithmetic are the same long-hand algorithms taught in 180*ebfedea0SLionel Sambucschools to manually add, subtract, multiply and divide. 181*ebfedea0SLionel Sambuc 182*ebfedea0SLionel Sambuc\subsection{The Need for Multiple Precision Arithmetic} 183*ebfedea0SLionel SambucThe most prevalent need for multiple precision arithmetic, often referred to as ``bignum'' math, is within the implementation 184*ebfedea0SLionel Sambucof public-key cryptography algorithms. Algorithms such as RSA \cite{RSAREF} and Diffie-Hellman \cite{DHREF} require 185*ebfedea0SLionel Sambucintegers of significant magnitude to resist known cryptanalytic attacks. For example, at the time of this writing a 186*ebfedea0SLionel Sambuctypical RSA modulus would be at least greater than $10^{309}$. However, modern programming languages such as ISO C \cite{ISOC} and 187*ebfedea0SLionel SambucJava \cite{JAVA} only provide instrinsic support for integers which are relatively small and single precision. 188*ebfedea0SLionel Sambuc 189*ebfedea0SLionel Sambuc\begin{figure}[!here] 190*ebfedea0SLionel Sambuc\begin{center} 191*ebfedea0SLionel Sambuc\begin{tabular}{|r|c|} 192*ebfedea0SLionel Sambuc\hline \textbf{Data Type} & \textbf{Range} \\ 193*ebfedea0SLionel Sambuc\hline char & $-128 \ldots 127$ \\ 194*ebfedea0SLionel Sambuc\hline short & $-32768 \ldots 32767$ \\ 195*ebfedea0SLionel Sambuc\hline long & $-2147483648 \ldots 2147483647$ \\ 196*ebfedea0SLionel Sambuc\hline long long & $-9223372036854775808 \ldots 9223372036854775807$ \\ 197*ebfedea0SLionel Sambuc\hline 198*ebfedea0SLionel Sambuc\end{tabular} 199*ebfedea0SLionel Sambuc\end{center} 200*ebfedea0SLionel Sambuc\caption{Typical Data Types for the C Programming Language} 201*ebfedea0SLionel Sambuc\label{fig:ISOC} 202*ebfedea0SLionel Sambuc\end{figure} 203*ebfedea0SLionel Sambuc 204*ebfedea0SLionel SambucThe largest data type guaranteed to be provided by the ISO C programming 205*ebfedea0SLionel Sambuclanguage\footnote{As per the ISO C standard. However, each compiler vendor is allowed to augment the precision as they 206*ebfedea0SLionel Sambucsee fit.} can only represent values up to $10^{19}$ as shown in figure \ref{fig:ISOC}. On its own the C language is 207*ebfedea0SLionel Sambucinsufficient to accomodate the magnitude required for the problem at hand. An RSA modulus of magnitude $10^{19}$ could be 208*ebfedea0SLionel Sambuctrivially factored\footnote{A Pollard-Rho factoring would take only $2^{16}$ time.} on the average desktop computer, 209*ebfedea0SLionel Sambucrendering any protocol based on the algorithm insecure. Multiple precision algorithms solve this very problem by 210*ebfedea0SLionel Sambucextending the range of representable integers while using single precision data types. 211*ebfedea0SLionel Sambuc 212*ebfedea0SLionel SambucMost advancements in fast multiple precision arithmetic stem from the need for faster and more efficient cryptographic 213*ebfedea0SLionel Sambucprimitives. Faster modular reduction and exponentiation algorithms such as Barrett's algorithm, which have appeared in 214*ebfedea0SLionel Sambucvarious cryptographic journals, can render algorithms such as RSA and Diffie-Hellman more efficient. In fact, several 215*ebfedea0SLionel Sambucmajor companies such as RSA Security, Certicom and Entrust have built entire product lines on the implementation and 216*ebfedea0SLionel Sambucdeployment of efficient algorithms. 217*ebfedea0SLionel Sambuc 218*ebfedea0SLionel SambucHowever, cryptography is not the only field of study that can benefit from fast multiple precision integer routines. 219*ebfedea0SLionel SambucAnother auxiliary use of multiple precision integers is high precision floating point data types. 220*ebfedea0SLionel SambucThe basic IEEE \cite{IEEE} standard floating point type is made up of an integer mantissa $q$, an exponent $e$ and a sign bit $s$. 221*ebfedea0SLionel SambucNumbers are given in the form $n = q \cdot b^e \cdot -1^s$ where $b = 2$ is the most common base for IEEE. Since IEEE 222*ebfedea0SLionel Sambucfloating point is meant to be implemented in hardware the precision of the mantissa is often fairly small 223*ebfedea0SLionel Sambuc(\textit{23, 48 and 64 bits}). The mantissa is merely an integer and a multiple precision integer could be used to create 224*ebfedea0SLionel Sambuca mantissa of much larger precision than hardware alone can efficiently support. This approach could be useful where 225*ebfedea0SLionel Sambucscientific applications must minimize the total output error over long calculations. 226*ebfedea0SLionel Sambuc 227*ebfedea0SLionel SambucYet another use for large integers is within arithmetic on polynomials of large characteristic (i.e. $GF(p)[x]$ for large $p$). 228*ebfedea0SLionel SambucIn fact the library discussed within this text has already been used to form a polynomial basis library\footnote{See \url{http://poly.libtomcrypt.org} for more details.}. 229*ebfedea0SLionel Sambuc 230*ebfedea0SLionel Sambuc\subsection{Benefits of Multiple Precision Arithmetic} 231*ebfedea0SLionel Sambuc\index{precision} 232*ebfedea0SLionel SambucThe benefit of multiple precision representations over single or fixed precision representations is that 233*ebfedea0SLionel Sambucno precision is lost while representing the result of an operation which requires excess precision. For example, 234*ebfedea0SLionel Sambucthe product of two $n$-bit integers requires at least $2n$ bits of precision to be represented faithfully. A multiple 235*ebfedea0SLionel Sambucprecision algorithm would augment the precision of the destination to accomodate the result while a single precision system 236*ebfedea0SLionel Sambucwould truncate excess bits to maintain a fixed level of precision. 237*ebfedea0SLionel Sambuc 238*ebfedea0SLionel SambucIt is possible to implement algorithms which require large integers with fixed precision algorithms. For example, elliptic 239*ebfedea0SLionel Sambuccurve cryptography (\textit{ECC}) is often implemented on smartcards by fixing the precision of the integers to the maximum 240*ebfedea0SLionel Sambucsize the system will ever need. Such an approach can lead to vastly simpler algorithms which can accomodate the 241*ebfedea0SLionel Sambucintegers required even if the host platform cannot natively accomodate them\footnote{For example, the average smartcard 242*ebfedea0SLionel Sambucprocessor has an 8 bit accumulator.}. However, as efficient as such an approach may be, the resulting source code is not 243*ebfedea0SLionel Sambucnormally very flexible. It cannot, at runtime, accomodate inputs of higher magnitude than the designer anticipated. 244*ebfedea0SLionel Sambuc 245*ebfedea0SLionel SambucMultiple precision algorithms have the most overhead of any style of arithmetic. For the the most part the 246*ebfedea0SLionel Sambucoverhead can be kept to a minimum with careful planning, but overall, it is not well suited for most memory starved 247*ebfedea0SLionel Sambucplatforms. However, multiple precision algorithms do offer the most flexibility in terms of the magnitude of the 248*ebfedea0SLionel Sambucinputs. That is, the same algorithms based on multiple precision integers can accomodate any reasonable size input 249*ebfedea0SLionel Sambucwithout the designer's explicit forethought. This leads to lower cost of ownership for the code as it only has to 250*ebfedea0SLionel Sambucbe written and tested once. 251*ebfedea0SLionel Sambuc 252*ebfedea0SLionel Sambuc\section{Purpose of This Text} 253*ebfedea0SLionel SambucThe purpose of this text is to instruct the reader regarding how to implement efficient multiple precision algorithms. 254*ebfedea0SLionel SambucThat is to not only explain a limited subset of the core theory behind the algorithms but also the various ``house keeping'' 255*ebfedea0SLionel Sambucelements that are neglected by authors of other texts on the subject. Several well reknowned texts \cite{TAOCPV2,HAC} 256*ebfedea0SLionel Sambucgive considerably detailed explanations of the theoretical aspects of algorithms and often very little information 257*ebfedea0SLionel Sambucregarding the practical implementation aspects. 258*ebfedea0SLionel Sambuc 259*ebfedea0SLionel SambucIn most cases how an algorithm is explained and how it is actually implemented are two very different concepts. For 260*ebfedea0SLionel Sambucexample, the Handbook of Applied Cryptography (\textit{HAC}), algorithm 14.7 on page 594, gives a relatively simple 261*ebfedea0SLionel Sambucalgorithm for performing multiple precision integer addition. However, the description lacks any discussion concerning 262*ebfedea0SLionel Sambucthe fact that the two integer inputs may be of differing magnitudes. As a result the implementation is not as simple 263*ebfedea0SLionel Sambucas the text would lead people to believe. Similarly the division routine (\textit{algorithm 14.20, pp. 598}) does not 264*ebfedea0SLionel Sambucdiscuss how to handle sign or handle the dividend's decreasing magnitude in the main loop (\textit{step \#3}). 265*ebfedea0SLionel Sambuc 266*ebfedea0SLionel SambucBoth texts also do not discuss several key optimal algorithms required such as ``Comba'' and Karatsuba multipliers 267*ebfedea0SLionel Sambucand fast modular inversion, which we consider practical oversights. These optimal algorithms are vital to achieve 268*ebfedea0SLionel Sambucany form of useful performance in non-trivial applications. 269*ebfedea0SLionel Sambuc 270*ebfedea0SLionel SambucTo solve this problem the focus of this text is on the practical aspects of implementing a multiple precision integer 271*ebfedea0SLionel Sambucpackage. As a case study the ``LibTomMath''\footnote{Available at \url{http://math.libtomcrypt.com}} package is used 272*ebfedea0SLionel Sambucto demonstrate algorithms with real implementations\footnote{In the ISO C programming language.} that have been field 273*ebfedea0SLionel Sambuctested and work very well. The LibTomMath library is freely available on the Internet for all uses and this text 274*ebfedea0SLionel Sambucdiscusses a very large portion of the inner workings of the library. 275*ebfedea0SLionel Sambuc 276*ebfedea0SLionel SambucThe algorithms that are presented will always include at least one ``pseudo-code'' description followed 277*ebfedea0SLionel Sambucby the actual C source code that implements the algorithm. The pseudo-code can be used to implement the same 278*ebfedea0SLionel Sambucalgorithm in other programming languages as the reader sees fit. 279*ebfedea0SLionel Sambuc 280*ebfedea0SLionel SambucThis text shall also serve as a walkthrough of the creation of multiple precision algorithms from scratch. Showing 281*ebfedea0SLionel Sambucthe reader how the algorithms fit together as well as where to start on various taskings. 282*ebfedea0SLionel Sambuc 283*ebfedea0SLionel Sambuc\section{Discussion and Notation} 284*ebfedea0SLionel Sambuc\subsection{Notation} 285*ebfedea0SLionel SambucA multiple precision integer of $n$-digits shall be denoted as $x = (x_{n-1}, \ldots, x_1, x_0)_{ \beta }$ and represent 286*ebfedea0SLionel Sambucthe integer $x \equiv \sum_{i=0}^{n-1} x_i\beta^i$. The elements of the array $x$ are said to be the radix $\beta$ digits 287*ebfedea0SLionel Sambucof the integer. For example, $x = (1,2,3)_{10}$ would represent the integer 288*ebfedea0SLionel Sambuc$1\cdot 10^2 + 2\cdot10^1 + 3\cdot10^0 = 123$. 289*ebfedea0SLionel Sambuc 290*ebfedea0SLionel Sambuc\index{mp\_int} 291*ebfedea0SLionel SambucThe term ``mp\_int'' shall refer to a composite structure which contains the digits of the integer it represents, as well 292*ebfedea0SLionel Sambucas auxilary data required to manipulate the data. These additional members are discussed further in section 293*ebfedea0SLionel Sambuc\ref{sec:MPINT}. For the purposes of this text a ``multiple precision integer'' and an ``mp\_int'' are assumed to be 294*ebfedea0SLionel Sambucsynonymous. When an algorithm is specified to accept an mp\_int variable it is assumed the various auxliary data members 295*ebfedea0SLionel Sambucare present as well. An expression of the type \textit{variablename.item} implies that it should evaluate to the 296*ebfedea0SLionel Sambucmember named ``item'' of the variable. For example, a string of characters may have a member ``length'' which would 297*ebfedea0SLionel Sambucevaluate to the number of characters in the string. If the string $a$ equals ``hello'' then it follows that 298*ebfedea0SLionel Sambuc$a.length = 5$. 299*ebfedea0SLionel Sambuc 300*ebfedea0SLionel SambucFor certain discussions more generic algorithms are presented to help the reader understand the final algorithm used 301*ebfedea0SLionel Sambucto solve a given problem. When an algorithm is described as accepting an integer input it is assumed the input is 302*ebfedea0SLionel Sambuca plain integer with no additional multiple-precision members. That is, algorithms that use integers as opposed to 303*ebfedea0SLionel Sambucmp\_ints as inputs do not concern themselves with the housekeeping operations required such as memory management. These 304*ebfedea0SLionel Sambucalgorithms will be used to establish the relevant theory which will subsequently be used to describe a multiple 305*ebfedea0SLionel Sambucprecision algorithm to solve the same problem. 306*ebfedea0SLionel Sambuc 307*ebfedea0SLionel Sambuc\subsection{Precision Notation} 308*ebfedea0SLionel SambucThe variable $\beta$ represents the radix of a single digit of a multiple precision integer and 309*ebfedea0SLionel Sambucmust be of the form $q^p$ for $q, p \in \Z^+$. A single precision variable must be able to represent integers in 310*ebfedea0SLionel Sambucthe range $0 \le x < q \beta$ while a double precision variable must be able to represent integers in the range 311*ebfedea0SLionel Sambuc$0 \le x < q \beta^2$. The extra radix-$q$ factor allows additions and subtractions to proceed without truncation of the 312*ebfedea0SLionel Sambuccarry. Since all modern computers are binary, it is assumed that $q$ is two. 313*ebfedea0SLionel Sambuc 314*ebfedea0SLionel Sambuc\index{mp\_digit} \index{mp\_word} 315*ebfedea0SLionel SambucWithin the source code that will be presented for each algorithm, the data type \textbf{mp\_digit} will represent 316*ebfedea0SLionel Sambuca single precision integer type, while, the data type \textbf{mp\_word} will represent a double precision integer type. In 317*ebfedea0SLionel Sambucseveral algorithms (notably the Comba routines) temporary results will be stored in arrays of double precision mp\_words. 318*ebfedea0SLionel SambucFor the purposes of this text $x_j$ will refer to the $j$'th digit of a single precision array and $\hat x_j$ will refer to 319*ebfedea0SLionel Sambucthe $j$'th digit of a double precision array. Whenever an expression is to be assigned to a double precision 320*ebfedea0SLionel Sambucvariable it is assumed that all single precision variables are promoted to double precision during the evaluation. 321*ebfedea0SLionel SambucExpressions that are assigned to a single precision variable are truncated to fit within the precision of a single 322*ebfedea0SLionel Sambucprecision data type. 323*ebfedea0SLionel Sambuc 324*ebfedea0SLionel SambucFor example, if $\beta = 10^2$ a single precision data type may represent a value in the 325*ebfedea0SLionel Sambucrange $0 \le x < 10^3$, while a double precision data type may represent a value in the range $0 \le x < 10^5$. Let 326*ebfedea0SLionel Sambuc$a = 23$ and $b = 49$ represent two single precision variables. The single precision product shall be written 327*ebfedea0SLionel Sambucas $c \leftarrow a \cdot b$ while the double precision product shall be written as $\hat c \leftarrow a \cdot b$. 328*ebfedea0SLionel SambucIn this particular case, $\hat c = 1127$ and $c = 127$. The most significant digit of the product would not fit 329*ebfedea0SLionel Sambucin a single precision data type and as a result $c \ne \hat c$. 330*ebfedea0SLionel Sambuc 331*ebfedea0SLionel Sambuc\subsection{Algorithm Inputs and Outputs} 332*ebfedea0SLionel SambucWithin the algorithm descriptions all variables are assumed to be scalars of either single or double precision 333*ebfedea0SLionel Sambucas indicated. The only exception to this rule is when variables have been indicated to be of type mp\_int. This 334*ebfedea0SLionel Sambucdistinction is important as scalars are often used as array indicies and various other counters. 335*ebfedea0SLionel Sambuc 336*ebfedea0SLionel Sambuc\subsection{Mathematical Expressions} 337*ebfedea0SLionel SambucThe $\lfloor \mbox{ } \rfloor$ brackets imply an expression truncated to an integer not greater than the expression 338*ebfedea0SLionel Sambucitself. For example, $\lfloor 5.7 \rfloor = 5$. Similarly the $\lceil \mbox{ } \rceil$ brackets imply an expression 339*ebfedea0SLionel Sambucrounded to an integer not less than the expression itself. For example, $\lceil 5.1 \rceil = 6$. Typically when 340*ebfedea0SLionel Sambucthe $/$ division symbol is used the intention is to perform an integer division with truncation. For example, 341*ebfedea0SLionel Sambuc$5/2 = 2$ which will often be written as $\lfloor 5/2 \rfloor = 2$ for clarity. When an expression is written as a 342*ebfedea0SLionel Sambucfraction a real value division is implied, for example ${5 \over 2} = 2.5$. 343*ebfedea0SLionel Sambuc 344*ebfedea0SLionel SambucThe norm of a multiple precision integer, for example $\vert \vert x \vert \vert$, will be used to represent the number of digits in the representation 345*ebfedea0SLionel Sambucof the integer. For example, $\vert \vert 123 \vert \vert = 3$ and $\vert \vert 79452 \vert \vert = 5$. 346*ebfedea0SLionel Sambuc 347*ebfedea0SLionel Sambuc\subsection{Work Effort} 348*ebfedea0SLionel Sambuc\index{big-Oh} 349*ebfedea0SLionel SambucTo measure the efficiency of the specified algorithms, a modified big-Oh notation is used. In this system all 350*ebfedea0SLionel Sambucsingle precision operations are considered to have the same cost\footnote{Except where explicitly noted.}. 351*ebfedea0SLionel SambucThat is a single precision addition, multiplication and division are assumed to take the same time to 352*ebfedea0SLionel Sambuccomplete. While this is generally not true in practice, it will simplify the discussions considerably. 353*ebfedea0SLionel Sambuc 354*ebfedea0SLionel SambucSome algorithms have slight advantages over others which is why some constants will not be removed in 355*ebfedea0SLionel Sambucthe notation. For example, a normal baseline multiplication (section \ref{sec:basemult}) requires $O(n^2)$ work while a 356*ebfedea0SLionel Sambucbaseline squaring (section \ref{sec:basesquare}) requires $O({{n^2 + n}\over 2})$ work. In standard big-Oh notation these 357*ebfedea0SLionel Sambucwould both be said to be equivalent to $O(n^2)$. However, 358*ebfedea0SLionel Sambucin the context of the this text this is not the case as the magnitude of the inputs will typically be rather small. As a 359*ebfedea0SLionel Sambucresult small constant factors in the work effort will make an observable difference in algorithm efficiency. 360*ebfedea0SLionel Sambuc 361*ebfedea0SLionel SambucAll of the algorithms presented in this text have a polynomial time work level. That is, of the form 362*ebfedea0SLionel Sambuc$O(n^k)$ for $n, k \in \Z^{+}$. This will help make useful comparisons in terms of the speed of the algorithms and how 363*ebfedea0SLionel Sambucvarious optimizations will help pay off in the long run. 364*ebfedea0SLionel Sambuc 365*ebfedea0SLionel Sambuc\section{Exercises} 366*ebfedea0SLionel SambucWithin the more advanced chapters a section will be set aside to give the reader some challenging exercises related to 367*ebfedea0SLionel Sambucthe discussion at hand. These exercises are not designed to be prize winning problems, but instead to be thought 368*ebfedea0SLionel Sambucprovoking. Wherever possible the problems are forward minded, stating problems that will be answered in subsequent 369*ebfedea0SLionel Sambucchapters. The reader is encouraged to finish the exercises as they appear to get a better understanding of the 370*ebfedea0SLionel Sambucsubject material. 371*ebfedea0SLionel Sambuc 372*ebfedea0SLionel SambucThat being said, the problems are designed to affirm knowledge of a particular subject matter. Students in particular 373*ebfedea0SLionel Sambucare encouraged to verify they can answer the problems correctly before moving on. 374*ebfedea0SLionel Sambuc 375*ebfedea0SLionel SambucSimilar to the exercises of \cite[pp. ix]{TAOCPV2} these exercises are given a scoring system based on the difficulty of 376*ebfedea0SLionel Sambucthe problem. However, unlike \cite{TAOCPV2} the problems do not get nearly as hard. The scoring of these 377*ebfedea0SLionel Sambucexercises ranges from one (the easiest) to five (the hardest). The following table sumarizes the 378*ebfedea0SLionel Sambucscoring system used. 379*ebfedea0SLionel Sambuc 380*ebfedea0SLionel Sambuc\begin{figure}[here] 381*ebfedea0SLionel Sambuc\begin{center} 382*ebfedea0SLionel Sambuc\begin{small} 383*ebfedea0SLionel Sambuc\begin{tabular}{|c|l|} 384*ebfedea0SLionel Sambuc\hline $\left [ 1 \right ]$ & An easy problem that should only take the reader a manner of \\ 385*ebfedea0SLionel Sambuc & minutes to solve. Usually does not involve much computer time \\ 386*ebfedea0SLionel Sambuc & to solve. \\ 387*ebfedea0SLionel Sambuc\hline $\left [ 2 \right ]$ & An easy problem that involves a marginal amount of computer \\ 388*ebfedea0SLionel Sambuc & time usage. Usually requires a program to be written to \\ 389*ebfedea0SLionel Sambuc & solve the problem. \\ 390*ebfedea0SLionel Sambuc\hline $\left [ 3 \right ]$ & A moderately hard problem that requires a non-trivial amount \\ 391*ebfedea0SLionel Sambuc & of work. Usually involves trivial research and development of \\ 392*ebfedea0SLionel Sambuc & new theory from the perspective of a student. \\ 393*ebfedea0SLionel Sambuc\hline $\left [ 4 \right ]$ & A moderately hard problem that involves a non-trivial amount \\ 394*ebfedea0SLionel Sambuc & of work and research, the solution to which will demonstrate \\ 395*ebfedea0SLionel Sambuc & a higher mastery of the subject matter. \\ 396*ebfedea0SLionel Sambuc\hline $\left [ 5 \right ]$ & A hard problem that involves concepts that are difficult for a \\ 397*ebfedea0SLionel Sambuc & novice to solve. Solutions to these problems will demonstrate a \\ 398*ebfedea0SLionel Sambuc & complete mastery of the given subject. \\ 399*ebfedea0SLionel Sambuc\hline 400*ebfedea0SLionel Sambuc\end{tabular} 401*ebfedea0SLionel Sambuc\end{small} 402*ebfedea0SLionel Sambuc\end{center} 403*ebfedea0SLionel Sambuc\caption{Exercise Scoring System} 404*ebfedea0SLionel Sambuc\end{figure} 405*ebfedea0SLionel Sambuc 406*ebfedea0SLionel SambucProblems at the first level are meant to be simple questions that the reader can answer quickly without programming a solution or 407*ebfedea0SLionel Sambucdevising new theory. These problems are quick tests to see if the material is understood. Problems at the second level 408*ebfedea0SLionel Sambucare also designed to be easy but will require a program or algorithm to be implemented to arrive at the answer. These 409*ebfedea0SLionel Sambuctwo levels are essentially entry level questions. 410*ebfedea0SLionel Sambuc 411*ebfedea0SLionel SambucProblems at the third level are meant to be a bit more difficult than the first two levels. The answer is often 412*ebfedea0SLionel Sambucfairly obvious but arriving at an exacting solution requires some thought and skill. These problems will almost always 413*ebfedea0SLionel Sambucinvolve devising a new algorithm or implementing a variation of another algorithm previously presented. Readers who can 414*ebfedea0SLionel Sambucanswer these questions will feel comfortable with the concepts behind the topic at hand. 415*ebfedea0SLionel Sambuc 416*ebfedea0SLionel SambucProblems at the fourth level are meant to be similar to those of the level three questions except they will require 417*ebfedea0SLionel Sambucadditional research to be completed. The reader will most likely not know the answer right away, nor will the text provide 418*ebfedea0SLionel Sambucthe exact details of the answer until a subsequent chapter. 419*ebfedea0SLionel Sambuc 420*ebfedea0SLionel SambucProblems at the fifth level are meant to be the hardest 421*ebfedea0SLionel Sambucproblems relative to all the other problems in the chapter. People who can correctly answer fifth level problems have a 422*ebfedea0SLionel Sambucmastery of the subject matter at hand. 423*ebfedea0SLionel Sambuc 424*ebfedea0SLionel SambucOften problems will be tied together. The purpose of this is to start a chain of thought that will be discussed in future chapters. The reader 425*ebfedea0SLionel Sambucis encouraged to answer the follow-up problems and try to draw the relevance of problems. 426*ebfedea0SLionel Sambuc 427*ebfedea0SLionel Sambuc\section{Introduction to LibTomMath} 428*ebfedea0SLionel Sambuc 429*ebfedea0SLionel Sambuc\subsection{What is LibTomMath?} 430*ebfedea0SLionel SambucLibTomMath is a free and open source multiple precision integer library written entirely in portable ISO C. By portable it 431*ebfedea0SLionel Sambucis meant that the library does not contain any code that is computer platform dependent or otherwise problematic to use on 432*ebfedea0SLionel Sambucany given platform. 433*ebfedea0SLionel Sambuc 434*ebfedea0SLionel SambucThe library has been successfully tested under numerous operating systems including Unix\footnote{All of these 435*ebfedea0SLionel Sambuctrademarks belong to their respective rightful owners.}, MacOS, Windows, Linux, PalmOS and on standalone hardware such 436*ebfedea0SLionel Sambucas the Gameboy Advance. The library is designed to contain enough functionality to be able to develop applications such 437*ebfedea0SLionel Sambucas public key cryptosystems and still maintain a relatively small footprint. 438*ebfedea0SLionel Sambuc 439*ebfedea0SLionel Sambuc\subsection{Goals of LibTomMath} 440*ebfedea0SLionel Sambuc 441*ebfedea0SLionel SambucLibraries which obtain the most efficiency are rarely written in a high level programming language such as C. However, 442*ebfedea0SLionel Sambuceven though this library is written entirely in ISO C, considerable care has been taken to optimize the algorithm implementations within the 443*ebfedea0SLionel Sambuclibrary. Specifically the code has been written to work well with the GNU C Compiler (\textit{GCC}) on both x86 and ARM 444*ebfedea0SLionel Sambucprocessors. Wherever possible, highly efficient algorithms, such as Karatsuba multiplication, sliding window 445*ebfedea0SLionel Sambucexponentiation and Montgomery reduction have been provided to make the library more efficient. 446*ebfedea0SLionel Sambuc 447*ebfedea0SLionel SambucEven with the nearly optimal and specialized algorithms that have been included the Application Programing Interface 448*ebfedea0SLionel Sambuc(\textit{API}) has been kept as simple as possible. Often generic place holder routines will make use of specialized 449*ebfedea0SLionel Sambucalgorithms automatically without the developer's specific attention. One such example is the generic multiplication 450*ebfedea0SLionel Sambucalgorithm \textbf{mp\_mul()} which will automatically use Toom--Cook, Karatsuba, Comba or baseline multiplication 451*ebfedea0SLionel Sambucbased on the magnitude of the inputs and the configuration of the library. 452*ebfedea0SLionel Sambuc 453*ebfedea0SLionel SambucMaking LibTomMath as efficient as possible is not the only goal of the LibTomMath project. Ideally the library should 454*ebfedea0SLionel Sambucbe source compatible with another popular library which makes it more attractive for developers to use. In this case the 455*ebfedea0SLionel SambucMPI library was used as a API template for all the basic functions. MPI was chosen because it is another library that fits 456*ebfedea0SLionel Sambucin the same niche as LibTomMath. Even though LibTomMath uses MPI as the template for the function names and argument 457*ebfedea0SLionel Sambucpassing conventions, it has been written from scratch by Tom St Denis. 458*ebfedea0SLionel Sambuc 459*ebfedea0SLionel SambucThe project is also meant to act as a learning tool for students, the logic being that no easy-to-follow ``bignum'' 460*ebfedea0SLionel Sambuclibrary exists which can be used to teach computer science students how to perform fast and reliable multiple precision 461*ebfedea0SLionel Sambucinteger arithmetic. To this end the source code has been given quite a few comments and algorithm discussion points. 462*ebfedea0SLionel Sambuc 463*ebfedea0SLionel Sambuc\section{Choice of LibTomMath} 464*ebfedea0SLionel SambucLibTomMath was chosen as the case study of this text not only because the author of both projects is one and the same but 465*ebfedea0SLionel Sambucfor more worthy reasons. Other libraries such as GMP \cite{GMP}, MPI \cite{MPI}, LIP \cite{LIP} and OpenSSL 466*ebfedea0SLionel Sambuc\cite{OPENSSL} have multiple precision integer arithmetic routines but would not be ideal for this text for 467*ebfedea0SLionel Sambucreasons that will be explained in the following sub-sections. 468*ebfedea0SLionel Sambuc 469*ebfedea0SLionel Sambuc\subsection{Code Base} 470*ebfedea0SLionel SambucThe LibTomMath code base is all portable ISO C source code. This means that there are no platform dependent conditional 471*ebfedea0SLionel Sambucsegments of code littered throughout the source. This clean and uncluttered approach to the library means that a 472*ebfedea0SLionel Sambucdeveloper can more readily discern the true intent of a given section of source code without trying to keep track of 473*ebfedea0SLionel Sambucwhat conditional code will be used. 474*ebfedea0SLionel Sambuc 475*ebfedea0SLionel SambucThe code base of LibTomMath is well organized. Each function is in its own separate source code file 476*ebfedea0SLionel Sambucwhich allows the reader to find a given function very quickly. On average there are $76$ lines of code per source 477*ebfedea0SLionel Sambucfile which makes the source very easily to follow. By comparison MPI and LIP are single file projects making code tracing 478*ebfedea0SLionel Sambucvery hard. GMP has many conditional code segments which also hinder tracing. 479*ebfedea0SLionel Sambuc 480*ebfedea0SLionel SambucWhen compiled with GCC for the x86 processor and optimized for speed the entire library is approximately $100$KiB\footnote{The notation ``KiB'' means $2^{10}$ octets, similarly ``MiB'' means $2^{20}$ octets.} 481*ebfedea0SLionel Sambuc which is fairly small compared to GMP (over $250$KiB). LibTomMath is slightly larger than MPI (which compiles to about 482*ebfedea0SLionel Sambuc$50$KiB) but LibTomMath is also much faster and more complete than MPI. 483*ebfedea0SLionel Sambuc 484*ebfedea0SLionel Sambuc\subsection{API Simplicity} 485*ebfedea0SLionel SambucLibTomMath is designed after the MPI library and shares the API design. Quite often programs that use MPI will build 486*ebfedea0SLionel Sambucwith LibTomMath without change. The function names correlate directly to the action they perform. Almost all of the 487*ebfedea0SLionel Sambucfunctions share the same parameter passing convention. The learning curve is fairly shallow with the API provided 488*ebfedea0SLionel Sambucwhich is an extremely valuable benefit for the student and developer alike. 489*ebfedea0SLionel Sambuc 490*ebfedea0SLionel SambucThe LIP library is an example of a library with an API that is awkward to work with. LIP uses function names that are often ``compressed'' to 491*ebfedea0SLionel Sambucillegible short hand. LibTomMath does not share this characteristic. 492*ebfedea0SLionel Sambuc 493*ebfedea0SLionel SambucThe GMP library also does not return error codes. Instead it uses a POSIX.1 \cite{POSIX1} signal system where errors 494*ebfedea0SLionel Sambucare signaled to the host application. This happens to be the fastest approach but definitely not the most versatile. In 495*ebfedea0SLionel Sambuceffect a math error (i.e. invalid input, heap error, etc) can cause a program to stop functioning which is definitely 496*ebfedea0SLionel Sambucundersireable in many situations. 497*ebfedea0SLionel Sambuc 498*ebfedea0SLionel Sambuc\subsection{Optimizations} 499*ebfedea0SLionel SambucWhile LibTomMath is certainly not the fastest library (GMP often beats LibTomMath by a factor of two) it does 500*ebfedea0SLionel Sambucfeature a set of optimal algorithms for tasks such as modular reduction, exponentiation, multiplication and squaring. GMP 501*ebfedea0SLionel Sambucand LIP also feature such optimizations while MPI only uses baseline algorithms with no optimizations. GMP lacks a few 502*ebfedea0SLionel Sambucof the additional modular reduction optimizations that LibTomMath features\footnote{At the time of this writing GMP 503*ebfedea0SLionel Sambuconly had Barrett and Montgomery modular reduction algorithms.}. 504*ebfedea0SLionel Sambuc 505*ebfedea0SLionel SambucLibTomMath is almost always an order of magnitude faster than the MPI library at computationally expensive tasks such as modular 506*ebfedea0SLionel Sambucexponentiation. In the grand scheme of ``bignum'' libraries LibTomMath is faster than the average library and usually 507*ebfedea0SLionel Sambucslower than the best libraries such as GMP and OpenSSL by only a small factor. 508*ebfedea0SLionel Sambuc 509*ebfedea0SLionel Sambuc\subsection{Portability and Stability} 510*ebfedea0SLionel SambucLibTomMath will build ``out of the box'' on any platform equipped with a modern version of the GNU C Compiler 511*ebfedea0SLionel Sambuc(\textit{GCC}). This means that without changes the library will build without configuration or setting up any 512*ebfedea0SLionel Sambucvariables. LIP and MPI will build ``out of the box'' as well but have numerous known bugs. Most notably the author of 513*ebfedea0SLionel SambucMPI has recently stopped working on his library and LIP has long since been discontinued. 514*ebfedea0SLionel Sambuc 515*ebfedea0SLionel SambucGMP requires a configuration script to run and will not build out of the box. GMP and LibTomMath are still in active 516*ebfedea0SLionel Sambucdevelopment and are very stable across a variety of platforms. 517*ebfedea0SLionel Sambuc 518*ebfedea0SLionel Sambuc\subsection{Choice} 519*ebfedea0SLionel SambucLibTomMath is a relatively compact, well documented, highly optimized and portable library which seems only natural for 520*ebfedea0SLionel Sambucthe case study of this text. Various source files from the LibTomMath project will be included within the text. However, 521*ebfedea0SLionel Sambucthe reader is encouraged to download their own copy of the library to actually be able to work with the library. 522*ebfedea0SLionel Sambuc 523*ebfedea0SLionel Sambuc\chapter{Getting Started} 524*ebfedea0SLionel Sambuc\section{Library Basics} 525*ebfedea0SLionel SambucThe trick to writing any useful library of source code is to build a solid foundation and work outwards from it. First, 526*ebfedea0SLionel Sambuca problem along with allowable solution parameters should be identified and analyzed. In this particular case the 527*ebfedea0SLionel Sambucinability to accomodate multiple precision integers is the problem. Futhermore, the solution must be written 528*ebfedea0SLionel Sambucas portable source code that is reasonably efficient across several different computer platforms. 529*ebfedea0SLionel Sambuc 530*ebfedea0SLionel SambucAfter a foundation is formed the remainder of the library can be designed and implemented in a hierarchical fashion. 531*ebfedea0SLionel SambucThat is, to implement the lowest level dependencies first and work towards the most abstract functions last. For example, 532*ebfedea0SLionel Sambucbefore implementing a modular exponentiation algorithm one would implement a modular reduction algorithm. 533*ebfedea0SLionel SambucBy building outwards from a base foundation instead of using a parallel design methodology the resulting project is 534*ebfedea0SLionel Sambuchighly modular. Being highly modular is a desirable property of any project as it often means the resulting product 535*ebfedea0SLionel Sambuchas a small footprint and updates are easy to perform. 536*ebfedea0SLionel Sambuc 537*ebfedea0SLionel SambucUsually when I start a project I will begin with the header files. I define the data types I think I will need and 538*ebfedea0SLionel Sambucprototype the initial functions that are not dependent on other functions (within the library). After I 539*ebfedea0SLionel Sambucimplement these base functions I prototype more dependent functions and implement them. The process repeats until 540*ebfedea0SLionel SambucI implement all of the functions I require. For example, in the case of LibTomMath I implemented functions such as 541*ebfedea0SLionel Sambucmp\_init() well before I implemented mp\_mul() and even further before I implemented mp\_exptmod(). As an example as to 542*ebfedea0SLionel Sambucwhy this design works note that the Karatsuba and Toom-Cook multipliers were written \textit{after} the 543*ebfedea0SLionel Sambucdependent function mp\_exptmod() was written. Adding the new multiplication algorithms did not require changes to the 544*ebfedea0SLionel Sambucmp\_exptmod() function itself and lowered the total cost of ownership (\textit{so to speak}) and of development 545*ebfedea0SLionel Sambucfor new algorithms. This methodology allows new algorithms to be tested in a complete framework with relative ease. 546*ebfedea0SLionel Sambuc 547*ebfedea0SLionel SambucFIGU,design_process,Design Flow of the First Few Original LibTomMath Functions. 548*ebfedea0SLionel Sambuc 549*ebfedea0SLionel SambucOnly after the majority of the functions were in place did I pursue a less hierarchical approach to auditing and optimizing 550*ebfedea0SLionel Sambucthe source code. For example, one day I may audit the multipliers and the next day the polynomial basis functions. 551*ebfedea0SLionel Sambuc 552*ebfedea0SLionel SambucIt only makes sense to begin the text with the preliminary data types and support algorithms required as well. 553*ebfedea0SLionel SambucThis chapter discusses the core algorithms of the library which are the dependents for every other algorithm. 554*ebfedea0SLionel Sambuc 555*ebfedea0SLionel Sambuc\section{What is a Multiple Precision Integer?} 556*ebfedea0SLionel SambucRecall that most programming languages, in particular ISO C \cite{ISOC}, only have fixed precision data types that on their own cannot 557*ebfedea0SLionel Sambucbe used to represent values larger than their precision will allow. The purpose of multiple precision algorithms is 558*ebfedea0SLionel Sambucto use fixed precision data types to create and manipulate multiple precision integers which may represent values 559*ebfedea0SLionel Sambucthat are very large. 560*ebfedea0SLionel Sambuc 561*ebfedea0SLionel SambucAs a well known analogy, school children are taught how to form numbers larger than nine by prepending more radix ten digits. In the decimal system 562*ebfedea0SLionel Sambucthe largest single digit value is $9$. However, by concatenating digits together larger numbers may be represented. Newly prepended digits 563*ebfedea0SLionel Sambuc(\textit{to the left}) are said to be in a different power of ten column. That is, the number $123$ can be described as having a $1$ in the hundreds 564*ebfedea0SLionel Sambuccolumn, $2$ in the tens column and $3$ in the ones column. Or more formally $123 = 1 \cdot 10^2 + 2 \cdot 10^1 + 3 \cdot 10^0$. Computer based 565*ebfedea0SLionel Sambucmultiple precision arithmetic is essentially the same concept. Larger integers are represented by adjoining fixed 566*ebfedea0SLionel Sambucprecision computer words with the exception that a different radix is used. 567*ebfedea0SLionel Sambuc 568*ebfedea0SLionel SambucWhat most people probably do not think about explicitly are the various other attributes that describe a multiple precision 569*ebfedea0SLionel Sambucinteger. For example, the integer $154_{10}$ has two immediately obvious properties. First, the integer is positive, 570*ebfedea0SLionel Sambucthat is the sign of this particular integer is positive as opposed to negative. Second, the integer has three digits in 571*ebfedea0SLionel Sambucits representation. There is an additional property that the integer posesses that does not concern pencil-and-paper 572*ebfedea0SLionel Sambucarithmetic. The third property is how many digits placeholders are available to hold the integer. 573*ebfedea0SLionel Sambuc 574*ebfedea0SLionel SambucThe human analogy of this third property is ensuring there is enough space on the paper to write the integer. For example, 575*ebfedea0SLionel Sambucif one starts writing a large number too far to the right on a piece of paper they will have to erase it and move left. 576*ebfedea0SLionel SambucSimilarly, computer algorithms must maintain strict control over memory usage to ensure that the digits of an integer 577*ebfedea0SLionel Sambucwill not exceed the allowed boundaries. These three properties make up what is known as a multiple precision 578*ebfedea0SLionel Sambucinteger or mp\_int for short. 579*ebfedea0SLionel Sambuc 580*ebfedea0SLionel Sambuc\subsection{The mp\_int Structure} 581*ebfedea0SLionel Sambuc\label{sec:MPINT} 582*ebfedea0SLionel SambucThe mp\_int structure is the ISO C based manifestation of what represents a multiple precision integer. The ISO C standard does not provide for 583*ebfedea0SLionel Sambucany such data type but it does provide for making composite data types known as structures. The following is the structure definition 584*ebfedea0SLionel Sambucused within LibTomMath. 585*ebfedea0SLionel Sambuc 586*ebfedea0SLionel Sambuc\index{mp\_int} 587*ebfedea0SLionel Sambuc\begin{figure}[here] 588*ebfedea0SLionel Sambuc\begin{center} 589*ebfedea0SLionel Sambuc\begin{small} 590*ebfedea0SLionel Sambuc%\begin{verbatim} 591*ebfedea0SLionel Sambuc\begin{tabular}{|l|} 592*ebfedea0SLionel Sambuc\hline 593*ebfedea0SLionel Sambuctypedef struct \{ \\ 594*ebfedea0SLionel Sambuc\hspace{3mm}int used, alloc, sign;\\ 595*ebfedea0SLionel Sambuc\hspace{3mm}mp\_digit *dp;\\ 596*ebfedea0SLionel Sambuc\} \textbf{mp\_int}; \\ 597*ebfedea0SLionel Sambuc\hline 598*ebfedea0SLionel Sambuc\end{tabular} 599*ebfedea0SLionel Sambuc%\end{verbatim} 600*ebfedea0SLionel Sambuc\end{small} 601*ebfedea0SLionel Sambuc\caption{The mp\_int Structure} 602*ebfedea0SLionel Sambuc\label{fig:mpint} 603*ebfedea0SLionel Sambuc\end{center} 604*ebfedea0SLionel Sambuc\end{figure} 605*ebfedea0SLionel Sambuc 606*ebfedea0SLionel SambucThe mp\_int structure (fig. \ref{fig:mpint}) can be broken down as follows. 607*ebfedea0SLionel Sambuc 608*ebfedea0SLionel Sambuc\begin{enumerate} 609*ebfedea0SLionel Sambuc\item The \textbf{used} parameter denotes how many digits of the array \textbf{dp} contain the digits used to represent 610*ebfedea0SLionel Sambuca given integer. The \textbf{used} count must be positive (or zero) and may not exceed the \textbf{alloc} count. 611*ebfedea0SLionel Sambuc 612*ebfedea0SLionel Sambuc\item The \textbf{alloc} parameter denotes how 613*ebfedea0SLionel Sambucmany digits are available in the array to use by functions before it has to increase in size. When the \textbf{used} count 614*ebfedea0SLionel Sambucof a result would exceed the \textbf{alloc} count all of the algorithms will automatically increase the size of the 615*ebfedea0SLionel Sambucarray to accommodate the precision of the result. 616*ebfedea0SLionel Sambuc 617*ebfedea0SLionel Sambuc\item The pointer \textbf{dp} points to a dynamically allocated array of digits that represent the given multiple 618*ebfedea0SLionel Sambucprecision integer. It is padded with $(\textbf{alloc} - \textbf{used})$ zero digits. The array is maintained in a least 619*ebfedea0SLionel Sambucsignificant digit order. As a pencil and paper analogy the array is organized such that the right most digits are stored 620*ebfedea0SLionel Sambucfirst starting at the location indexed by zero\footnote{In C all arrays begin at zero.} in the array. For example, 621*ebfedea0SLionel Sambucif \textbf{dp} contains $\lbrace a, b, c, \ldots \rbrace$ where \textbf{dp}$_0 = a$, \textbf{dp}$_1 = b$, \textbf{dp}$_2 = c$, $\ldots$ then 622*ebfedea0SLionel Sambucit would represent the integer $a + b\beta + c\beta^2 + \ldots$ 623*ebfedea0SLionel Sambuc 624*ebfedea0SLionel Sambuc\index{MP\_ZPOS} \index{MP\_NEG} 625*ebfedea0SLionel Sambuc\item The \textbf{sign} parameter denotes the sign as either zero/positive (\textbf{MP\_ZPOS}) or negative (\textbf{MP\_NEG}). 626*ebfedea0SLionel Sambuc\end{enumerate} 627*ebfedea0SLionel Sambuc 628*ebfedea0SLionel Sambuc\subsubsection{Valid mp\_int Structures} 629*ebfedea0SLionel SambucSeveral rules are placed on the state of an mp\_int structure and are assumed to be followed for reasons of efficiency. 630*ebfedea0SLionel SambucThe only exceptions are when the structure is passed to initialization functions such as mp\_init() and mp\_init\_copy(). 631*ebfedea0SLionel Sambuc 632*ebfedea0SLionel Sambuc\begin{enumerate} 633*ebfedea0SLionel Sambuc\item The value of \textbf{alloc} may not be less than one. That is \textbf{dp} always points to a previously allocated 634*ebfedea0SLionel Sambucarray of digits. 635*ebfedea0SLionel Sambuc\item The value of \textbf{used} may not exceed \textbf{alloc} and must be greater than or equal to zero. 636*ebfedea0SLionel Sambuc\item The value of \textbf{used} implies the digit at index $(used - 1)$ of the \textbf{dp} array is non-zero. That is, 637*ebfedea0SLionel Sambucleading zero digits in the most significant positions must be trimmed. 638*ebfedea0SLionel Sambuc \begin{enumerate} 639*ebfedea0SLionel Sambuc \item Digits in the \textbf{dp} array at and above the \textbf{used} location must be zero. 640*ebfedea0SLionel Sambuc \end{enumerate} 641*ebfedea0SLionel Sambuc\item The value of \textbf{sign} must be \textbf{MP\_ZPOS} if \textbf{used} is zero; 642*ebfedea0SLionel Sambucthis represents the mp\_int value of zero. 643*ebfedea0SLionel Sambuc\end{enumerate} 644*ebfedea0SLionel Sambuc 645*ebfedea0SLionel Sambuc\section{Argument Passing} 646*ebfedea0SLionel SambucA convention of argument passing must be adopted early on in the development of any library. Making the function 647*ebfedea0SLionel Sambucprototypes consistent will help eliminate many headaches in the future as the library grows to significant complexity. 648*ebfedea0SLionel SambucIn LibTomMath the multiple precision integer functions accept parameters from left to right as pointers to mp\_int 649*ebfedea0SLionel Sambucstructures. That means that the source (input) operands are placed on the left and the destination (output) on the right. 650*ebfedea0SLionel SambucConsider the following examples. 651*ebfedea0SLionel Sambuc 652*ebfedea0SLionel Sambuc\begin{verbatim} 653*ebfedea0SLionel Sambuc mp_mul(&a, &b, &c); /* c = a * b */ 654*ebfedea0SLionel Sambuc mp_add(&a, &b, &a); /* a = a + b */ 655*ebfedea0SLionel Sambuc mp_sqr(&a, &b); /* b = a * a */ 656*ebfedea0SLionel Sambuc\end{verbatim} 657*ebfedea0SLionel Sambuc 658*ebfedea0SLionel SambucThe left to right order is a fairly natural way to implement the functions since it lets the developer read aloud the 659*ebfedea0SLionel Sambucfunctions and make sense of them. For example, the first function would read ``multiply a and b and store in c''. 660*ebfedea0SLionel Sambuc 661*ebfedea0SLionel SambucCertain libraries (\textit{LIP by Lenstra for instance}) accept parameters the other way around, to mimic the order 662*ebfedea0SLionel Sambucof assignment expressions. That is, the destination (output) is on the left and arguments (inputs) are on the right. In 663*ebfedea0SLionel Sambuctruth, it is entirely a matter of preference. In the case of LibTomMath the convention from the MPI library has been 664*ebfedea0SLionel Sambucadopted. 665*ebfedea0SLionel Sambuc 666*ebfedea0SLionel SambucAnother very useful design consideration, provided for in LibTomMath, is whether to allow argument sources to also be a 667*ebfedea0SLionel Sambucdestination. For example, the second example (\textit{mp\_add}) adds $a$ to $b$ and stores in $a$. This is an important 668*ebfedea0SLionel Sambucfeature to implement since it allows the calling functions to cut down on the number of variables it must maintain. 669*ebfedea0SLionel SambucHowever, to implement this feature specific care has to be given to ensure the destination is not modified before the 670*ebfedea0SLionel Sambucsource is fully read. 671*ebfedea0SLionel Sambuc 672*ebfedea0SLionel Sambuc\section{Return Values} 673*ebfedea0SLionel SambucA well implemented application, no matter what its purpose, should trap as many runtime errors as possible and return them 674*ebfedea0SLionel Sambucto the caller. By catching runtime errors a library can be guaranteed to prevent undefined behaviour. However, the end 675*ebfedea0SLionel Sambucdeveloper can still manage to cause a library to crash. For example, by passing an invalid pointer an application may 676*ebfedea0SLionel Sambucfault by dereferencing memory not owned by the application. 677*ebfedea0SLionel Sambuc 678*ebfedea0SLionel SambucIn the case of LibTomMath the only errors that are checked for are related to inappropriate inputs (division by zero for 679*ebfedea0SLionel Sambucinstance) and memory allocation errors. It will not check that the mp\_int passed to any function is valid nor 680*ebfedea0SLionel Sambucwill it check pointers for validity. Any function that can cause a runtime error will return an error code as an 681*ebfedea0SLionel Sambuc\textbf{int} data type with one of the following values (fig \ref{fig:errcodes}). 682*ebfedea0SLionel Sambuc 683*ebfedea0SLionel Sambuc\index{MP\_OKAY} \index{MP\_VAL} \index{MP\_MEM} 684*ebfedea0SLionel Sambuc\begin{figure}[here] 685*ebfedea0SLionel Sambuc\begin{center} 686*ebfedea0SLionel Sambuc\begin{tabular}{|l|l|} 687*ebfedea0SLionel Sambuc\hline \textbf{Value} & \textbf{Meaning} \\ 688*ebfedea0SLionel Sambuc\hline \textbf{MP\_OKAY} & The function was successful \\ 689*ebfedea0SLionel Sambuc\hline \textbf{MP\_VAL} & One of the input value(s) was invalid \\ 690*ebfedea0SLionel Sambuc\hline \textbf{MP\_MEM} & The function ran out of heap memory \\ 691*ebfedea0SLionel Sambuc\hline 692*ebfedea0SLionel Sambuc\end{tabular} 693*ebfedea0SLionel Sambuc\end{center} 694*ebfedea0SLionel Sambuc\caption{LibTomMath Error Codes} 695*ebfedea0SLionel Sambuc\label{fig:errcodes} 696*ebfedea0SLionel Sambuc\end{figure} 697*ebfedea0SLionel Sambuc 698*ebfedea0SLionel SambucWhen an error is detected within a function it should free any memory it allocated, often during the initialization of 699*ebfedea0SLionel Sambuctemporary mp\_ints, and return as soon as possible. The goal is to leave the system in the same state it was when the 700*ebfedea0SLionel Sambucfunction was called. Error checking with this style of API is fairly simple. 701*ebfedea0SLionel Sambuc 702*ebfedea0SLionel Sambuc\begin{verbatim} 703*ebfedea0SLionel Sambuc int err; 704*ebfedea0SLionel Sambuc if ((err = mp_add(&a, &b, &c)) != MP_OKAY) { 705*ebfedea0SLionel Sambuc printf("Error: %s\n", mp_error_to_string(err)); 706*ebfedea0SLionel Sambuc exit(EXIT_FAILURE); 707*ebfedea0SLionel Sambuc } 708*ebfedea0SLionel Sambuc\end{verbatim} 709*ebfedea0SLionel Sambuc 710*ebfedea0SLionel SambucThe GMP \cite{GMP} library uses C style \textit{signals} to flag errors which is of questionable use. Not all errors are fatal 711*ebfedea0SLionel Sambucand it was not deemed ideal by the author of LibTomMath to force developers to have signal handlers for such cases. 712*ebfedea0SLionel Sambuc 713*ebfedea0SLionel Sambuc\section{Initialization and Clearing} 714*ebfedea0SLionel SambucThe logical starting point when actually writing multiple precision integer functions is the initialization and 715*ebfedea0SLionel Sambucclearing of the mp\_int structures. These two algorithms will be used by the majority of the higher level algorithms. 716*ebfedea0SLionel Sambuc 717*ebfedea0SLionel SambucGiven the basic mp\_int structure an initialization routine must first allocate memory to hold the digits of 718*ebfedea0SLionel Sambucthe integer. Often it is optimal to allocate a sufficiently large pre-set number of digits even though 719*ebfedea0SLionel Sambucthe initial integer will represent zero. If only a single digit were allocated quite a few subsequent re-allocations 720*ebfedea0SLionel Sambucwould occur when operations are performed on the integers. There is a tradeoff between how many default digits to allocate 721*ebfedea0SLionel Sambucand how many re-allocations are tolerable. Obviously allocating an excessive amount of digits initially will waste 722*ebfedea0SLionel Sambucmemory and become unmanageable. 723*ebfedea0SLionel Sambuc 724*ebfedea0SLionel SambucIf the memory for the digits has been successfully allocated then the rest of the members of the structure must 725*ebfedea0SLionel Sambucbe initialized. Since the initial state of an mp\_int is to represent the zero integer, the allocated digits must be set 726*ebfedea0SLionel Sambucto zero. The \textbf{used} count set to zero and \textbf{sign} set to \textbf{MP\_ZPOS}. 727*ebfedea0SLionel Sambuc 728*ebfedea0SLionel Sambuc\subsection{Initializing an mp\_int} 729*ebfedea0SLionel SambucAn mp\_int is said to be initialized if it is set to a valid, preferably default, state such that all of the members of the 730*ebfedea0SLionel Sambucstructure are set to valid values. The mp\_init algorithm will perform such an action. 731*ebfedea0SLionel Sambuc 732*ebfedea0SLionel Sambuc\index{mp\_init} 733*ebfedea0SLionel Sambuc\begin{figure}[here] 734*ebfedea0SLionel Sambuc\begin{center} 735*ebfedea0SLionel Sambuc\begin{tabular}{l} 736*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_init}. \\ 737*ebfedea0SLionel Sambuc\textbf{Input}. An mp\_int $a$ \\ 738*ebfedea0SLionel Sambuc\textbf{Output}. Allocate memory and initialize $a$ to a known valid mp\_int state. \\ 739*ebfedea0SLionel Sambuc\hline \\ 740*ebfedea0SLionel Sambuc1. Allocate memory for \textbf{MP\_PREC} digits. \\ 741*ebfedea0SLionel Sambuc2. If the allocation failed return(\textit{MP\_MEM}) \\ 742*ebfedea0SLionel Sambuc3. for $n$ from $0$ to $MP\_PREC - 1$ do \\ 743*ebfedea0SLionel Sambuc\hspace{3mm}3.1 $a_n \leftarrow 0$\\ 744*ebfedea0SLionel Sambuc4. $a.sign \leftarrow MP\_ZPOS$\\ 745*ebfedea0SLionel Sambuc5. $a.used \leftarrow 0$\\ 746*ebfedea0SLionel Sambuc6. $a.alloc \leftarrow MP\_PREC$\\ 747*ebfedea0SLionel Sambuc7. Return(\textit{MP\_OKAY})\\ 748*ebfedea0SLionel Sambuc\hline 749*ebfedea0SLionel Sambuc\end{tabular} 750*ebfedea0SLionel Sambuc\end{center} 751*ebfedea0SLionel Sambuc\caption{Algorithm mp\_init} 752*ebfedea0SLionel Sambuc\end{figure} 753*ebfedea0SLionel Sambuc 754*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_init.} 755*ebfedea0SLionel SambucThe purpose of this function is to initialize an mp\_int structure so that the rest of the library can properly 756*ebfedea0SLionel Sambucmanipulte it. It is assumed that the input may not have had any of its members previously initialized which is certainly 757*ebfedea0SLionel Sambuca valid assumption if the input resides on the stack. 758*ebfedea0SLionel Sambuc 759*ebfedea0SLionel SambucBefore any of the members such as \textbf{sign}, \textbf{used} or \textbf{alloc} are initialized the memory for 760*ebfedea0SLionel Sambucthe digits is allocated. If this fails the function returns before setting any of the other members. The \textbf{MP\_PREC} 761*ebfedea0SLionel Sambucname represents a constant\footnote{Defined in the ``tommath.h'' header file within LibTomMath.} 762*ebfedea0SLionel Sambucused to dictate the minimum precision of newly initialized mp\_int integers. Ideally, it is at least equal to the smallest 763*ebfedea0SLionel Sambucprecision number you'll be working with. 764*ebfedea0SLionel Sambuc 765*ebfedea0SLionel SambucAllocating a block of digits at first instead of a single digit has the benefit of lowering the number of usually slow 766*ebfedea0SLionel Sambucheap operations later functions will have to perform in the future. If \textbf{MP\_PREC} is set correctly the slack 767*ebfedea0SLionel Sambucmemory and the number of heap operations will be trivial. 768*ebfedea0SLionel Sambuc 769*ebfedea0SLionel SambucOnce the allocation has been made the digits have to be set to zero as well as the \textbf{used}, \textbf{sign} and 770*ebfedea0SLionel Sambuc\textbf{alloc} members initialized. This ensures that the mp\_int will always represent the default state of zero regardless 771*ebfedea0SLionel Sambucof the original condition of the input. 772*ebfedea0SLionel Sambuc 773*ebfedea0SLionel Sambuc\textbf{Remark.} 774*ebfedea0SLionel SambucThis function introduces the idiosyncrasy that all iterative loops, commonly initiated with the ``for'' keyword, iterate incrementally 775*ebfedea0SLionel Sambucwhen the ``to'' keyword is placed between two expressions. For example, ``for $a$ from $b$ to $c$ do'' means that 776*ebfedea0SLionel Sambuca subsequent expression (or body of expressions) are to be evaluated upto $c - b$ times so long as $b \le c$. In each 777*ebfedea0SLionel Sambuciteration the variable $a$ is substituted for a new integer that lies inclusively between $b$ and $c$. If $b > c$ occured 778*ebfedea0SLionel Sambucthe loop would not iterate. By contrast if the ``downto'' keyword were used in place of ``to'' the loop would iterate 779*ebfedea0SLionel Sambucdecrementally. 780*ebfedea0SLionel Sambuc 781*ebfedea0SLionel SambucEXAM,bn_mp_init.c 782*ebfedea0SLionel Sambuc 783*ebfedea0SLionel SambucOne immediate observation of this initializtion function is that it does not return a pointer to a mp\_int structure. It 784*ebfedea0SLionel Sambucis assumed that the caller has already allocated memory for the mp\_int structure, typically on the application stack. The 785*ebfedea0SLionel Sambuccall to mp\_init() is used only to initialize the members of the structure to a known default state. 786*ebfedea0SLionel Sambuc 787*ebfedea0SLionel SambucHere we see (line @23,XMALLOC@) the memory allocation is performed first. This allows us to exit cleanly and quickly 788*ebfedea0SLionel Sambucif there is an error. If the allocation fails the routine will return \textbf{MP\_MEM} to the caller to indicate there 789*ebfedea0SLionel Sambucwas a memory error. The function XMALLOC is what actually allocates the memory. Technically XMALLOC is not a function 790*ebfedea0SLionel Sambucbut a macro defined in ``tommath.h``. By default, XMALLOC will evaluate to malloc() which is the C library's built--in 791*ebfedea0SLionel Sambucmemory allocation routine. 792*ebfedea0SLionel Sambuc 793*ebfedea0SLionel SambucIn order to assure the mp\_int is in a known state the digits must be set to zero. On most platforms this could have been 794*ebfedea0SLionel Sambucaccomplished by using calloc() instead of malloc(). However, to correctly initialize a integer type to a given value in a 795*ebfedea0SLionel Sambucportable fashion you have to actually assign the value. The for loop (line @28,for@) performs this required 796*ebfedea0SLionel Sambucoperation. 797*ebfedea0SLionel Sambuc 798*ebfedea0SLionel SambucAfter the memory has been successfully initialized the remainder of the members are initialized 799*ebfedea0SLionel Sambuc(lines @29,used@ through @31,sign@) to their respective default states. At this point the algorithm has succeeded and 800*ebfedea0SLionel Sambuca success code is returned to the calling function. If this function returns \textbf{MP\_OKAY} it is safe to assume the 801*ebfedea0SLionel Sambucmp\_int structure has been properly initialized and is safe to use with other functions within the library. 802*ebfedea0SLionel Sambuc 803*ebfedea0SLionel Sambuc\subsection{Clearing an mp\_int} 804*ebfedea0SLionel SambucWhen an mp\_int is no longer required by the application, the memory that has been allocated for its digits must be 805*ebfedea0SLionel Sambucreturned to the application's memory pool with the mp\_clear algorithm. 806*ebfedea0SLionel Sambuc 807*ebfedea0SLionel Sambuc\begin{figure}[here] 808*ebfedea0SLionel Sambuc\begin{center} 809*ebfedea0SLionel Sambuc\begin{tabular}{l} 810*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_clear}. \\ 811*ebfedea0SLionel Sambuc\textbf{Input}. An mp\_int $a$ \\ 812*ebfedea0SLionel Sambuc\textbf{Output}. The memory for $a$ shall be deallocated. \\ 813*ebfedea0SLionel Sambuc\hline \\ 814*ebfedea0SLionel Sambuc1. If $a$ has been previously freed then return(\textit{MP\_OKAY}). \\ 815*ebfedea0SLionel Sambuc2. for $n$ from 0 to $a.used - 1$ do \\ 816*ebfedea0SLionel Sambuc\hspace{3mm}2.1 $a_n \leftarrow 0$ \\ 817*ebfedea0SLionel Sambuc3. Free the memory allocated for the digits of $a$. \\ 818*ebfedea0SLionel Sambuc4. $a.used \leftarrow 0$ \\ 819*ebfedea0SLionel Sambuc5. $a.alloc \leftarrow 0$ \\ 820*ebfedea0SLionel Sambuc6. $a.sign \leftarrow MP\_ZPOS$ \\ 821*ebfedea0SLionel Sambuc7. Return(\textit{MP\_OKAY}). \\ 822*ebfedea0SLionel Sambuc\hline 823*ebfedea0SLionel Sambuc\end{tabular} 824*ebfedea0SLionel Sambuc\end{center} 825*ebfedea0SLionel Sambuc\caption{Algorithm mp\_clear} 826*ebfedea0SLionel Sambuc\end{figure} 827*ebfedea0SLionel Sambuc 828*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_clear.} 829*ebfedea0SLionel SambucThis algorithm accomplishes two goals. First, it clears the digits and the other mp\_int members. This ensures that 830*ebfedea0SLionel Sambucif a developer accidentally re-uses a cleared structure it is less likely to cause problems. The second goal 831*ebfedea0SLionel Sambucis to free the allocated memory. 832*ebfedea0SLionel Sambuc 833*ebfedea0SLionel SambucThe logic behind the algorithm is extended by marking cleared mp\_int structures so that subsequent calls to this 834*ebfedea0SLionel Sambucalgorithm will not try to free the memory multiple times. Cleared mp\_ints are detectable by having a pre-defined invalid 835*ebfedea0SLionel Sambucdigit pointer \textbf{dp} setting. 836*ebfedea0SLionel Sambuc 837*ebfedea0SLionel SambucOnce an mp\_int has been cleared the mp\_int structure is no longer in a valid state for any other algorithm 838*ebfedea0SLionel Sambucwith the exception of algorithms mp\_init, mp\_init\_copy, mp\_init\_size and mp\_clear. 839*ebfedea0SLionel Sambuc 840*ebfedea0SLionel SambucEXAM,bn_mp_clear.c 841*ebfedea0SLionel Sambuc 842*ebfedea0SLionel SambucThe algorithm only operates on the mp\_int if it hasn't been previously cleared. The if statement (line @23,a->dp != NULL@) 843*ebfedea0SLionel Sambucchecks to see if the \textbf{dp} member is not \textbf{NULL}. If the mp\_int is a valid mp\_int then \textbf{dp} cannot be 844*ebfedea0SLionel Sambuc\textbf{NULL} in which case the if statement will evaluate to true. 845*ebfedea0SLionel Sambuc 846*ebfedea0SLionel SambucThe digits of the mp\_int are cleared by the for loop (line @25,for@) which assigns a zero to every digit. Similar to mp\_init() 847*ebfedea0SLionel Sambucthe digits are assigned zero instead of using block memory operations (such as memset()) since this is more portable. 848*ebfedea0SLionel Sambuc 849*ebfedea0SLionel SambucThe digits are deallocated off the heap via the XFREE macro. Similar to XMALLOC the XFREE macro actually evaluates to 850*ebfedea0SLionel Sambuca standard C library function. In this case the free() function. Since free() only deallocates the memory the pointer 851*ebfedea0SLionel Sambucstill has to be reset to \textbf{NULL} manually (line @33,NULL@). 852*ebfedea0SLionel Sambuc 853*ebfedea0SLionel SambucNow that the digits have been cleared and deallocated the other members are set to their final values (lines @34,= 0@ and @35,ZPOS@). 854*ebfedea0SLionel Sambuc 855*ebfedea0SLionel Sambuc\section{Maintenance Algorithms} 856*ebfedea0SLionel Sambuc 857*ebfedea0SLionel SambucThe previous sections describes how to initialize and clear an mp\_int structure. To further support operations 858*ebfedea0SLionel Sambucthat are to be performed on mp\_int structures (such as addition and multiplication) the dependent algorithms must be 859*ebfedea0SLionel Sambucable to augment the precision of an mp\_int and 860*ebfedea0SLionel Sambucinitialize mp\_ints with differing initial conditions. 861*ebfedea0SLionel Sambuc 862*ebfedea0SLionel SambucThese algorithms complete the set of low level algorithms required to work with mp\_int structures in the higher level 863*ebfedea0SLionel Sambucalgorithms such as addition, multiplication and modular exponentiation. 864*ebfedea0SLionel Sambuc 865*ebfedea0SLionel Sambuc\subsection{Augmenting an mp\_int's Precision} 866*ebfedea0SLionel SambucWhen storing a value in an mp\_int structure, a sufficient number of digits must be available to accomodate the entire 867*ebfedea0SLionel Sambucresult of an operation without loss of precision. Quite often the size of the array given by the \textbf{alloc} member 868*ebfedea0SLionel Sambucis large enough to simply increase the \textbf{used} digit count. However, when the size of the array is too small it 869*ebfedea0SLionel Sambucmust be re-sized appropriately to accomodate the result. The mp\_grow algorithm will provide this functionality. 870*ebfedea0SLionel Sambuc 871*ebfedea0SLionel Sambuc\newpage\begin{figure}[here] 872*ebfedea0SLionel Sambuc\begin{center} 873*ebfedea0SLionel Sambuc\begin{tabular}{l} 874*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_grow}. \\ 875*ebfedea0SLionel Sambuc\textbf{Input}. An mp\_int $a$ and an integer $b$. \\ 876*ebfedea0SLionel Sambuc\textbf{Output}. $a$ is expanded to accomodate $b$ digits. \\ 877*ebfedea0SLionel Sambuc\hline \\ 878*ebfedea0SLionel Sambuc1. if $a.alloc \ge b$ then return(\textit{MP\_OKAY}) \\ 879*ebfedea0SLionel Sambuc2. $u \leftarrow b\mbox{ (mod }MP\_PREC\mbox{)}$ \\ 880*ebfedea0SLionel Sambuc3. $v \leftarrow b + 2 \cdot MP\_PREC - u$ \\ 881*ebfedea0SLionel Sambuc4. Re-allocate the array of digits $a$ to size $v$ \\ 882*ebfedea0SLionel Sambuc5. If the allocation failed then return(\textit{MP\_MEM}). \\ 883*ebfedea0SLionel Sambuc6. for n from a.alloc to $v - 1$ do \\ 884*ebfedea0SLionel Sambuc\hspace{+3mm}6.1 $a_n \leftarrow 0$ \\ 885*ebfedea0SLionel Sambuc7. $a.alloc \leftarrow v$ \\ 886*ebfedea0SLionel Sambuc8. Return(\textit{MP\_OKAY}) \\ 887*ebfedea0SLionel Sambuc\hline 888*ebfedea0SLionel Sambuc\end{tabular} 889*ebfedea0SLionel Sambuc\end{center} 890*ebfedea0SLionel Sambuc\caption{Algorithm mp\_grow} 891*ebfedea0SLionel Sambuc\end{figure} 892*ebfedea0SLionel Sambuc 893*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_grow.} 894*ebfedea0SLionel SambucIt is ideal to prevent re-allocations from being performed if they are not required (step one). This is useful to 895*ebfedea0SLionel Sambucprevent mp\_ints from growing excessively in code that erroneously calls mp\_grow. 896*ebfedea0SLionel Sambuc 897*ebfedea0SLionel SambucThe requested digit count is padded up to next multiple of \textbf{MP\_PREC} plus an additional \textbf{MP\_PREC} (steps two and three). 898*ebfedea0SLionel SambucThis helps prevent many trivial reallocations that would grow an mp\_int by trivially small values. 899*ebfedea0SLionel Sambuc 900*ebfedea0SLionel SambucIt is assumed that the reallocation (step four) leaves the lower $a.alloc$ digits of the mp\_int intact. This is much 901*ebfedea0SLionel Sambucakin to how the \textit{realloc} function from the standard C library works. Since the newly allocated digits are 902*ebfedea0SLionel Sambucassumed to contain undefined values they are initially set to zero. 903*ebfedea0SLionel Sambuc 904*ebfedea0SLionel SambucEXAM,bn_mp_grow.c 905*ebfedea0SLionel Sambuc 906*ebfedea0SLionel SambucA quick optimization is to first determine if a memory re-allocation is required at all. The if statement (line @24,alloc@) checks 907*ebfedea0SLionel Sambucif the \textbf{alloc} member of the mp\_int is smaller than the requested digit count. If the count is not larger than \textbf{alloc} 908*ebfedea0SLionel Sambucthe function skips the re-allocation part thus saving time. 909*ebfedea0SLionel Sambuc 910*ebfedea0SLionel SambucWhen a re-allocation is performed it is turned into an optimal request to save time in the future. The requested digit count is 911*ebfedea0SLionel Sambucpadded upwards to 2nd multiple of \textbf{MP\_PREC} larger than \textbf{alloc} (line @25, size@). The XREALLOC function is used 912*ebfedea0SLionel Sambucto re-allocate the memory. As per the other functions XREALLOC is actually a macro which evaluates to realloc by default. The realloc 913*ebfedea0SLionel Sambucfunction leaves the base of the allocation intact which means the first \textbf{alloc} digits of the mp\_int are the same as before 914*ebfedea0SLionel Sambucthe re-allocation. All that is left is to clear the newly allocated digits and return. 915*ebfedea0SLionel Sambuc 916*ebfedea0SLionel SambucNote that the re-allocation result is actually stored in a temporary pointer $tmp$. This is to allow this function to return 917*ebfedea0SLionel Sambucan error with a valid pointer. Earlier releases of the library stored the result of XREALLOC into the mp\_int $a$. That would 918*ebfedea0SLionel Sambucresult in a memory leak if XREALLOC ever failed. 919*ebfedea0SLionel Sambuc 920*ebfedea0SLionel Sambuc\subsection{Initializing Variable Precision mp\_ints} 921*ebfedea0SLionel SambucOccasionally the number of digits required will be known in advance of an initialization, based on, for example, the size 922*ebfedea0SLionel Sambucof input mp\_ints to a given algorithm. The purpose of algorithm mp\_init\_size is similar to mp\_init except that it 923*ebfedea0SLionel Sambucwill allocate \textit{at least} a specified number of digits. 924*ebfedea0SLionel Sambuc 925*ebfedea0SLionel Sambuc\begin{figure}[here] 926*ebfedea0SLionel Sambuc\begin{small} 927*ebfedea0SLionel Sambuc\begin{center} 928*ebfedea0SLionel Sambuc\begin{tabular}{l} 929*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_init\_size}. \\ 930*ebfedea0SLionel Sambuc\textbf{Input}. An mp\_int $a$ and the requested number of digits $b$. \\ 931*ebfedea0SLionel Sambuc\textbf{Output}. $a$ is initialized to hold at least $b$ digits. \\ 932*ebfedea0SLionel Sambuc\hline \\ 933*ebfedea0SLionel Sambuc1. $u \leftarrow b \mbox{ (mod }MP\_PREC\mbox{)}$ \\ 934*ebfedea0SLionel Sambuc2. $v \leftarrow b + 2 \cdot MP\_PREC - u$ \\ 935*ebfedea0SLionel Sambuc3. Allocate $v$ digits. \\ 936*ebfedea0SLionel Sambuc4. for $n$ from $0$ to $v - 1$ do \\ 937*ebfedea0SLionel Sambuc\hspace{3mm}4.1 $a_n \leftarrow 0$ \\ 938*ebfedea0SLionel Sambuc5. $a.sign \leftarrow MP\_ZPOS$\\ 939*ebfedea0SLionel Sambuc6. $a.used \leftarrow 0$\\ 940*ebfedea0SLionel Sambuc7. $a.alloc \leftarrow v$\\ 941*ebfedea0SLionel Sambuc8. Return(\textit{MP\_OKAY})\\ 942*ebfedea0SLionel Sambuc\hline 943*ebfedea0SLionel Sambuc\end{tabular} 944*ebfedea0SLionel Sambuc\end{center} 945*ebfedea0SLionel Sambuc\end{small} 946*ebfedea0SLionel Sambuc\caption{Algorithm mp\_init\_size} 947*ebfedea0SLionel Sambuc\end{figure} 948*ebfedea0SLionel Sambuc 949*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_init\_size.} 950*ebfedea0SLionel SambucThis algorithm will initialize an mp\_int structure $a$ like algorithm mp\_init with the exception that the number of 951*ebfedea0SLionel Sambucdigits allocated can be controlled by the second input argument $b$. The input size is padded upwards so it is a 952*ebfedea0SLionel Sambucmultiple of \textbf{MP\_PREC} plus an additional \textbf{MP\_PREC} digits. This padding is used to prevent trivial 953*ebfedea0SLionel Sambucallocations from becoming a bottleneck in the rest of the algorithms. 954*ebfedea0SLionel Sambuc 955*ebfedea0SLionel SambucLike algorithm mp\_init, the mp\_int structure is initialized to a default state representing the integer zero. This 956*ebfedea0SLionel Sambucparticular algorithm is useful if it is known ahead of time the approximate size of the input. If the approximation is 957*ebfedea0SLionel Sambuccorrect no further memory re-allocations are required to work with the mp\_int. 958*ebfedea0SLionel Sambuc 959*ebfedea0SLionel SambucEXAM,bn_mp_init_size.c 960*ebfedea0SLionel Sambuc 961*ebfedea0SLionel SambucThe number of digits $b$ requested is padded (line @22,MP_PREC@) by first augmenting it to the next multiple of 962*ebfedea0SLionel Sambuc\textbf{MP\_PREC} and then adding \textbf{MP\_PREC} to the result. If the memory can be successfully allocated the 963*ebfedea0SLionel Sambucmp\_int is placed in a default state representing the integer zero. Otherwise, the error code \textbf{MP\_MEM} will be 964*ebfedea0SLionel Sambucreturned (line @27,return@). 965*ebfedea0SLionel Sambuc 966*ebfedea0SLionel SambucThe digits are allocated and set to zero at the same time with the calloc() function (line @25,XCALLOC@). The 967*ebfedea0SLionel Sambuc\textbf{used} count is set to zero, the \textbf{alloc} count set to the padded digit count and the \textbf{sign} flag set 968*ebfedea0SLionel Sambucto \textbf{MP\_ZPOS} to achieve a default valid mp\_int state (lines @29,used@, @30,alloc@ and @31,sign@). If the function 969*ebfedea0SLionel Sambucreturns succesfully then it is correct to assume that the mp\_int structure is in a valid state for the remainder of the 970*ebfedea0SLionel Sambucfunctions to work with. 971*ebfedea0SLionel Sambuc 972*ebfedea0SLionel Sambuc\subsection{Multiple Integer Initializations and Clearings} 973*ebfedea0SLionel SambucOccasionally a function will require a series of mp\_int data types to be made available simultaneously. 974*ebfedea0SLionel SambucThe purpose of algorithm mp\_init\_multi is to initialize a variable length array of mp\_int structures in a single 975*ebfedea0SLionel Sambucstatement. It is essentially a shortcut to multiple initializations. 976*ebfedea0SLionel Sambuc 977*ebfedea0SLionel Sambuc\newpage\begin{figure}[here] 978*ebfedea0SLionel Sambuc\begin{center} 979*ebfedea0SLionel Sambuc\begin{tabular}{l} 980*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_init\_multi}. \\ 981*ebfedea0SLionel Sambuc\textbf{Input}. Variable length array $V_k$ of mp\_int variables of length $k$. \\ 982*ebfedea0SLionel Sambuc\textbf{Output}. The array is initialized such that each mp\_int of $V_k$ is ready to use. \\ 983*ebfedea0SLionel Sambuc\hline \\ 984*ebfedea0SLionel Sambuc1. for $n$ from 0 to $k - 1$ do \\ 985*ebfedea0SLionel Sambuc\hspace{+3mm}1.1. Initialize the mp\_int $V_n$ (\textit{mp\_init}) \\ 986*ebfedea0SLionel Sambuc\hspace{+3mm}1.2. If initialization failed then do \\ 987*ebfedea0SLionel Sambuc\hspace{+6mm}1.2.1. for $j$ from $0$ to $n$ do \\ 988*ebfedea0SLionel Sambuc\hspace{+9mm}1.2.1.1. Free the mp\_int $V_j$ (\textit{mp\_clear}) \\ 989*ebfedea0SLionel Sambuc\hspace{+6mm}1.2.2. Return(\textit{MP\_MEM}) \\ 990*ebfedea0SLionel Sambuc2. Return(\textit{MP\_OKAY}) \\ 991*ebfedea0SLionel Sambuc\hline 992*ebfedea0SLionel Sambuc\end{tabular} 993*ebfedea0SLionel Sambuc\end{center} 994*ebfedea0SLionel Sambuc\caption{Algorithm mp\_init\_multi} 995*ebfedea0SLionel Sambuc\end{figure} 996*ebfedea0SLionel Sambuc 997*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_init\_multi.} 998*ebfedea0SLionel SambucThe algorithm will initialize the array of mp\_int variables one at a time. If a runtime error has been detected 999*ebfedea0SLionel Sambuc(\textit{step 1.2}) all of the previously initialized variables are cleared. The goal is an ``all or nothing'' 1000*ebfedea0SLionel Sambucinitialization which allows for quick recovery from runtime errors. 1001*ebfedea0SLionel Sambuc 1002*ebfedea0SLionel SambucEXAM,bn_mp_init_multi.c 1003*ebfedea0SLionel Sambuc 1004*ebfedea0SLionel SambucThis function intializes a variable length list of mp\_int structure pointers. However, instead of having the mp\_int 1005*ebfedea0SLionel Sambucstructures in an actual C array they are simply passed as arguments to the function. This function makes use of the 1006*ebfedea0SLionel Sambuc``...'' argument syntax of the C programming language. The list is terminated with a final \textbf{NULL} argument 1007*ebfedea0SLionel Sambucappended on the right. 1008*ebfedea0SLionel Sambuc 1009*ebfedea0SLionel SambucThe function uses the ``stdarg.h'' \textit{va} functions to step portably through the arguments to the function. A count 1010*ebfedea0SLionel Sambuc$n$ of succesfully initialized mp\_int structures is maintained (line @47,n++@) such that if a failure does occur, 1011*ebfedea0SLionel Sambucthe algorithm can backtrack and free the previously initialized structures (lines @27,if@ to @46,}@). 1012*ebfedea0SLionel Sambuc 1013*ebfedea0SLionel Sambuc 1014*ebfedea0SLionel Sambuc\subsection{Clamping Excess Digits} 1015*ebfedea0SLionel SambucWhen a function anticipates a result will be $n$ digits it is simpler to assume this is true within the body of 1016*ebfedea0SLionel Sambucthe function instead of checking during the computation. For example, a multiplication of a $i$ digit number by a 1017*ebfedea0SLionel Sambuc$j$ digit produces a result of at most $i + j$ digits. It is entirely possible that the result is $i + j - 1$ 1018*ebfedea0SLionel Sambucthough, with no final carry into the last position. However, suppose the destination had to be first expanded 1019*ebfedea0SLionel Sambuc(\textit{via mp\_grow}) to accomodate $i + j - 1$ digits than further expanded to accomodate the final carry. 1020*ebfedea0SLionel SambucThat would be a considerable waste of time since heap operations are relatively slow. 1021*ebfedea0SLionel Sambuc 1022*ebfedea0SLionel SambucThe ideal solution is to always assume the result is $i + j$ and fix up the \textbf{used} count after the function 1023*ebfedea0SLionel Sambucterminates. This way a single heap operation (\textit{at most}) is required. However, if the result was not checked 1024*ebfedea0SLionel Sambucthere would be an excess high order zero digit. 1025*ebfedea0SLionel Sambuc 1026*ebfedea0SLionel SambucFor example, suppose the product of two integers was $x_n = (0x_{n-1}x_{n-2}...x_0)_{\beta}$. The leading zero digit 1027*ebfedea0SLionel Sambucwill not contribute to the precision of the result. In fact, through subsequent operations more leading zero digits would 1028*ebfedea0SLionel Sambucaccumulate to the point the size of the integer would be prohibitive. As a result even though the precision is very 1029*ebfedea0SLionel Sambuclow the representation is excessively large. 1030*ebfedea0SLionel Sambuc 1031*ebfedea0SLionel SambucThe mp\_clamp algorithm is designed to solve this very problem. It will trim high-order zeros by decrementing the 1032*ebfedea0SLionel Sambuc\textbf{used} count until a non-zero most significant digit is found. Also in this system, zero is considered to be a 1033*ebfedea0SLionel Sambucpositive number which means that if the \textbf{used} count is decremented to zero, the sign must be set to 1034*ebfedea0SLionel Sambuc\textbf{MP\_ZPOS}. 1035*ebfedea0SLionel Sambuc 1036*ebfedea0SLionel Sambuc\begin{figure}[here] 1037*ebfedea0SLionel Sambuc\begin{center} 1038*ebfedea0SLionel Sambuc\begin{tabular}{l} 1039*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_clamp}. \\ 1040*ebfedea0SLionel Sambuc\textbf{Input}. An mp\_int $a$ \\ 1041*ebfedea0SLionel Sambuc\textbf{Output}. Any excess leading zero digits of $a$ are removed \\ 1042*ebfedea0SLionel Sambuc\hline \\ 1043*ebfedea0SLionel Sambuc1. while $a.used > 0$ and $a_{a.used - 1} = 0$ do \\ 1044*ebfedea0SLionel Sambuc\hspace{+3mm}1.1 $a.used \leftarrow a.used - 1$ \\ 1045*ebfedea0SLionel Sambuc2. if $a.used = 0$ then do \\ 1046*ebfedea0SLionel Sambuc\hspace{+3mm}2.1 $a.sign \leftarrow MP\_ZPOS$ \\ 1047*ebfedea0SLionel Sambuc\hline \\ 1048*ebfedea0SLionel Sambuc\end{tabular} 1049*ebfedea0SLionel Sambuc\end{center} 1050*ebfedea0SLionel Sambuc\caption{Algorithm mp\_clamp} 1051*ebfedea0SLionel Sambuc\end{figure} 1052*ebfedea0SLionel Sambuc 1053*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_clamp.} 1054*ebfedea0SLionel SambucAs can be expected this algorithm is very simple. The loop on step one is expected to iterate only once or twice at 1055*ebfedea0SLionel Sambucthe most. For example, this will happen in cases where there is not a carry to fill the last position. Step two fixes the sign for 1056*ebfedea0SLionel Sambucwhen all of the digits are zero to ensure that the mp\_int is valid at all times. 1057*ebfedea0SLionel Sambuc 1058*ebfedea0SLionel SambucEXAM,bn_mp_clamp.c 1059*ebfedea0SLionel Sambuc 1060*ebfedea0SLionel SambucNote on line @27,while@ how to test for the \textbf{used} count is made on the left of the \&\& operator. In the C programming 1061*ebfedea0SLionel Sambuclanguage the terms to \&\& are evaluated left to right with a boolean short-circuit if any condition fails. This is 1062*ebfedea0SLionel Sambucimportant since if the \textbf{used} is zero the test on the right would fetch below the array. That is obviously 1063*ebfedea0SLionel Sambucundesirable. The parenthesis on line @28,a->used@ is used to make sure the \textbf{used} count is decremented and not 1064*ebfedea0SLionel Sambucthe pointer ``a''. 1065*ebfedea0SLionel Sambuc 1066*ebfedea0SLionel Sambuc\section*{Exercises} 1067*ebfedea0SLionel Sambuc\begin{tabular}{cl} 1068*ebfedea0SLionel Sambuc$\left [ 1 \right ]$ & Discuss the relevance of the \textbf{used} member of the mp\_int structure. \\ 1069*ebfedea0SLionel Sambuc & \\ 1070*ebfedea0SLionel Sambuc$\left [ 1 \right ]$ & Discuss the consequences of not using padding when performing allocations. \\ 1071*ebfedea0SLionel Sambuc & \\ 1072*ebfedea0SLionel Sambuc$\left [ 2 \right ]$ & Estimate an ideal value for \textbf{MP\_PREC} when performing 1024-bit RSA \\ 1073*ebfedea0SLionel Sambuc & encryption when $\beta = 2^{28}$. \\ 1074*ebfedea0SLionel Sambuc & \\ 1075*ebfedea0SLionel Sambuc$\left [ 1 \right ]$ & Discuss the relevance of the algorithm mp\_clamp. What does it prevent? \\ 1076*ebfedea0SLionel Sambuc & \\ 1077*ebfedea0SLionel Sambuc$\left [ 1 \right ]$ & Give an example of when the algorithm mp\_init\_copy might be useful. \\ 1078*ebfedea0SLionel Sambuc & \\ 1079*ebfedea0SLionel Sambuc\end{tabular} 1080*ebfedea0SLionel Sambuc 1081*ebfedea0SLionel Sambuc 1082*ebfedea0SLionel Sambuc%%% 1083*ebfedea0SLionel Sambuc% CHAPTER FOUR 1084*ebfedea0SLionel Sambuc%%% 1085*ebfedea0SLionel Sambuc 1086*ebfedea0SLionel Sambuc\chapter{Basic Operations} 1087*ebfedea0SLionel Sambuc 1088*ebfedea0SLionel Sambuc\section{Introduction} 1089*ebfedea0SLionel SambucIn the previous chapter a series of low level algorithms were established that dealt with initializing and maintaining 1090*ebfedea0SLionel Sambucmp\_int structures. This chapter will discuss another set of seemingly non-algebraic algorithms which will form the low 1091*ebfedea0SLionel Sambuclevel basis of the entire library. While these algorithm are relatively trivial it is important to understand how they 1092*ebfedea0SLionel Sambucwork before proceeding since these algorithms will be used almost intrinsically in the following chapters. 1093*ebfedea0SLionel Sambuc 1094*ebfedea0SLionel SambucThe algorithms in this chapter deal primarily with more ``programmer'' related tasks such as creating copies of 1095*ebfedea0SLionel Sambucmp\_int structures, assigning small values to mp\_int structures and comparisons of the values mp\_int structures 1096*ebfedea0SLionel Sambucrepresent. 1097*ebfedea0SLionel Sambuc 1098*ebfedea0SLionel Sambuc\section{Assigning Values to mp\_int Structures} 1099*ebfedea0SLionel Sambuc\subsection{Copying an mp\_int} 1100*ebfedea0SLionel SambucAssigning the value that a given mp\_int structure represents to another mp\_int structure shall be known as making 1101*ebfedea0SLionel Sambuca copy for the purposes of this text. The copy of the mp\_int will be a separate entity that represents the same 1102*ebfedea0SLionel Sambucvalue as the mp\_int it was copied from. The mp\_copy algorithm provides this functionality. 1103*ebfedea0SLionel Sambuc 1104*ebfedea0SLionel Sambuc\newpage\begin{figure}[here] 1105*ebfedea0SLionel Sambuc\begin{center} 1106*ebfedea0SLionel Sambuc\begin{tabular}{l} 1107*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_copy}. \\ 1108*ebfedea0SLionel Sambuc\textbf{Input}. An mp\_int $a$ and $b$. \\ 1109*ebfedea0SLionel Sambuc\textbf{Output}. Store a copy of $a$ in $b$. \\ 1110*ebfedea0SLionel Sambuc\hline \\ 1111*ebfedea0SLionel Sambuc1. If $b.alloc < a.used$ then grow $b$ to $a.used$ digits. (\textit{mp\_grow}) \\ 1112*ebfedea0SLionel Sambuc2. for $n$ from 0 to $a.used - 1$ do \\ 1113*ebfedea0SLionel Sambuc\hspace{3mm}2.1 $b_{n} \leftarrow a_{n}$ \\ 1114*ebfedea0SLionel Sambuc3. for $n$ from $a.used$ to $b.used - 1$ do \\ 1115*ebfedea0SLionel Sambuc\hspace{3mm}3.1 $b_{n} \leftarrow 0$ \\ 1116*ebfedea0SLionel Sambuc4. $b.used \leftarrow a.used$ \\ 1117*ebfedea0SLionel Sambuc5. $b.sign \leftarrow a.sign$ \\ 1118*ebfedea0SLionel Sambuc6. return(\textit{MP\_OKAY}) \\ 1119*ebfedea0SLionel Sambuc\hline 1120*ebfedea0SLionel Sambuc\end{tabular} 1121*ebfedea0SLionel Sambuc\end{center} 1122*ebfedea0SLionel Sambuc\caption{Algorithm mp\_copy} 1123*ebfedea0SLionel Sambuc\end{figure} 1124*ebfedea0SLionel Sambuc 1125*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_copy.} 1126*ebfedea0SLionel SambucThis algorithm copies the mp\_int $a$ such that upon succesful termination of the algorithm the mp\_int $b$ will 1127*ebfedea0SLionel Sambucrepresent the same integer as the mp\_int $a$. The mp\_int $b$ shall be a complete and distinct copy of the 1128*ebfedea0SLionel Sambucmp\_int $a$ meaing that the mp\_int $a$ can be modified and it shall not affect the value of the mp\_int $b$. 1129*ebfedea0SLionel Sambuc 1130*ebfedea0SLionel SambucIf $b$ does not have enough room for the digits of $a$ it must first have its precision augmented via the mp\_grow 1131*ebfedea0SLionel Sambucalgorithm. The digits of $a$ are copied over the digits of $b$ and any excess digits of $b$ are set to zero (step two 1132*ebfedea0SLionel Sambucand three). The \textbf{used} and \textbf{sign} members of $a$ are finally copied over the respective members of 1133*ebfedea0SLionel Sambuc$b$. 1134*ebfedea0SLionel Sambuc 1135*ebfedea0SLionel Sambuc\textbf{Remark.} This algorithm also introduces a new idiosyncrasy that will be used throughout the rest of the 1136*ebfedea0SLionel Sambuctext. The error return codes of other algorithms are not explicitly checked in the pseudo-code presented. For example, in 1137*ebfedea0SLionel Sambucstep one of the mp\_copy algorithm the return of mp\_grow is not explicitly checked to ensure it succeeded. Text space is 1138*ebfedea0SLionel Sambuclimited so it is assumed that if a algorithm fails it will clear all temporarily allocated mp\_ints and return 1139*ebfedea0SLionel Sambucthe error code itself. However, the C code presented will demonstrate all of the error handling logic required to 1140*ebfedea0SLionel Sambucimplement the pseudo-code. 1141*ebfedea0SLionel Sambuc 1142*ebfedea0SLionel SambucEXAM,bn_mp_copy.c 1143*ebfedea0SLionel Sambuc 1144*ebfedea0SLionel SambucOccasionally a dependent algorithm may copy an mp\_int effectively into itself such as when the input and output 1145*ebfedea0SLionel Sambucmp\_int structures passed to a function are one and the same. For this case it is optimal to return immediately without 1146*ebfedea0SLionel Sambuccopying digits (line @24,a == b@). 1147*ebfedea0SLionel Sambuc 1148*ebfedea0SLionel SambucThe mp\_int $b$ must have enough digits to accomodate the used digits of the mp\_int $a$. If $b.alloc$ is less than 1149*ebfedea0SLionel Sambuc$a.used$ the algorithm mp\_grow is used to augment the precision of $b$ (lines @29,alloc@ to @33,}@). In order to 1150*ebfedea0SLionel Sambucsimplify the inner loop that copies the digits from $a$ to $b$, two aliases $tmpa$ and $tmpb$ point directly at the digits 1151*ebfedea0SLionel Sambucof the mp\_ints $a$ and $b$ respectively. These aliases (lines @42,tmpa@ and @45,tmpb@) allow the compiler to access the digits without first dereferencing the 1152*ebfedea0SLionel Sambucmp\_int pointers and then subsequently the pointer to the digits. 1153*ebfedea0SLionel Sambuc 1154*ebfedea0SLionel SambucAfter the aliases are established the digits from $a$ are copied into $b$ (lines @48,for@ to @50,}@) and then the excess 1155*ebfedea0SLionel Sambucdigits of $b$ are set to zero (lines @53,for@ to @55,}@). Both ``for'' loops make use of the pointer aliases and in 1156*ebfedea0SLionel Sambucfact the alias for $b$ is carried through into the second ``for'' loop to clear the excess digits. This optimization 1157*ebfedea0SLionel Sambucallows the alias to stay in a machine register fairly easy between the two loops. 1158*ebfedea0SLionel Sambuc 1159*ebfedea0SLionel Sambuc\textbf{Remarks.} The use of pointer aliases is an implementation methodology first introduced in this function that will 1160*ebfedea0SLionel Sambucbe used considerably in other functions. Technically, a pointer alias is simply a short hand alias used to lower the 1161*ebfedea0SLionel Sambucnumber of pointer dereferencing operations required to access data. For example, a for loop may resemble 1162*ebfedea0SLionel Sambuc 1163*ebfedea0SLionel Sambuc\begin{alltt} 1164*ebfedea0SLionel Sambucfor (x = 0; x < 100; x++) \{ 1165*ebfedea0SLionel Sambuc a->num[4]->dp[x] = 0; 1166*ebfedea0SLionel Sambuc\} 1167*ebfedea0SLionel Sambuc\end{alltt} 1168*ebfedea0SLionel Sambuc 1169*ebfedea0SLionel SambucThis could be re-written using aliases as 1170*ebfedea0SLionel Sambuc 1171*ebfedea0SLionel Sambuc\begin{alltt} 1172*ebfedea0SLionel Sambucmp_digit *tmpa; 1173*ebfedea0SLionel Sambuca = a->num[4]->dp; 1174*ebfedea0SLionel Sambucfor (x = 0; x < 100; x++) \{ 1175*ebfedea0SLionel Sambuc *a++ = 0; 1176*ebfedea0SLionel Sambuc\} 1177*ebfedea0SLionel Sambuc\end{alltt} 1178*ebfedea0SLionel Sambuc 1179*ebfedea0SLionel SambucIn this case an alias is used to access the 1180*ebfedea0SLionel Sambucarray of digits within an mp\_int structure directly. It may seem that a pointer alias is strictly not required 1181*ebfedea0SLionel Sambucas a compiler may optimize out the redundant pointer operations. However, there are two dominant reasons to use aliases. 1182*ebfedea0SLionel Sambuc 1183*ebfedea0SLionel SambucThe first reason is that most compilers will not effectively optimize pointer arithmetic. For example, some optimizations 1184*ebfedea0SLionel Sambucmay work for the Microsoft Visual C++ compiler (MSVC) and not for the GNU C Compiler (GCC). Also some optimizations may 1185*ebfedea0SLionel Sambucwork for GCC and not MSVC. As such it is ideal to find a common ground for as many compilers as possible. Pointer 1186*ebfedea0SLionel Sambucaliases optimize the code considerably before the compiler even reads the source code which means the end compiled code 1187*ebfedea0SLionel Sambucstands a better chance of being faster. 1188*ebfedea0SLionel Sambuc 1189*ebfedea0SLionel SambucThe second reason is that pointer aliases often can make an algorithm simpler to read. Consider the first ``for'' 1190*ebfedea0SLionel Sambucloop of the function mp\_copy() re-written to not use pointer aliases. 1191*ebfedea0SLionel Sambuc 1192*ebfedea0SLionel Sambuc\begin{alltt} 1193*ebfedea0SLionel Sambuc /* copy all the digits */ 1194*ebfedea0SLionel Sambuc for (n = 0; n < a->used; n++) \{ 1195*ebfedea0SLionel Sambuc b->dp[n] = a->dp[n]; 1196*ebfedea0SLionel Sambuc \} 1197*ebfedea0SLionel Sambuc\end{alltt} 1198*ebfedea0SLionel Sambuc 1199*ebfedea0SLionel SambucWhether this code is harder to read depends strongly on the individual. However, it is quantifiably slightly more 1200*ebfedea0SLionel Sambuccomplicated as there are four variables within the statement instead of just two. 1201*ebfedea0SLionel Sambuc 1202*ebfedea0SLionel Sambuc\subsubsection{Nested Statements} 1203*ebfedea0SLionel SambucAnother commonly used technique in the source routines is that certain sections of code are nested. This is used in 1204*ebfedea0SLionel Sambucparticular with the pointer aliases to highlight code phases. For example, a Comba multiplier (discussed in chapter six) 1205*ebfedea0SLionel Sambucwill typically have three different phases. First the temporaries are initialized, then the columns calculated and 1206*ebfedea0SLionel Sambucfinally the carries are propagated. In this example the middle column production phase will typically be nested as it 1207*ebfedea0SLionel Sambucuses temporary variables and aliases the most. 1208*ebfedea0SLionel Sambuc 1209*ebfedea0SLionel SambucThe nesting also simplies the source code as variables that are nested are only valid for their scope. As a result 1210*ebfedea0SLionel Sambucthe various temporary variables required do not propagate into other sections of code. 1211*ebfedea0SLionel Sambuc 1212*ebfedea0SLionel Sambuc 1213*ebfedea0SLionel Sambuc\subsection{Creating a Clone} 1214*ebfedea0SLionel SambucAnother common operation is to make a local temporary copy of an mp\_int argument. To initialize an mp\_int 1215*ebfedea0SLionel Sambucand then copy another existing mp\_int into the newly intialized mp\_int will be known as creating a clone. This is 1216*ebfedea0SLionel Sambucuseful within functions that need to modify an argument but do not wish to actually modify the original copy. The 1217*ebfedea0SLionel Sambucmp\_init\_copy algorithm has been designed to help perform this task. 1218*ebfedea0SLionel Sambuc 1219*ebfedea0SLionel Sambuc\begin{figure}[here] 1220*ebfedea0SLionel Sambuc\begin{center} 1221*ebfedea0SLionel Sambuc\begin{tabular}{l} 1222*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_init\_copy}. \\ 1223*ebfedea0SLionel Sambuc\textbf{Input}. An mp\_int $a$ and $b$\\ 1224*ebfedea0SLionel Sambuc\textbf{Output}. $a$ is initialized to be a copy of $b$. \\ 1225*ebfedea0SLionel Sambuc\hline \\ 1226*ebfedea0SLionel Sambuc1. Init $a$. (\textit{mp\_init}) \\ 1227*ebfedea0SLionel Sambuc2. Copy $b$ to $a$. (\textit{mp\_copy}) \\ 1228*ebfedea0SLionel Sambuc3. Return the status of the copy operation. \\ 1229*ebfedea0SLionel Sambuc\hline 1230*ebfedea0SLionel Sambuc\end{tabular} 1231*ebfedea0SLionel Sambuc\end{center} 1232*ebfedea0SLionel Sambuc\caption{Algorithm mp\_init\_copy} 1233*ebfedea0SLionel Sambuc\end{figure} 1234*ebfedea0SLionel Sambuc 1235*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_init\_copy.} 1236*ebfedea0SLionel SambucThis algorithm will initialize an mp\_int variable and copy another previously initialized mp\_int variable into it. As 1237*ebfedea0SLionel Sambucsuch this algorithm will perform two operations in one step. 1238*ebfedea0SLionel Sambuc 1239*ebfedea0SLionel SambucEXAM,bn_mp_init_copy.c 1240*ebfedea0SLionel Sambuc 1241*ebfedea0SLionel SambucThis will initialize \textbf{a} and make it a verbatim copy of the contents of \textbf{b}. Note that 1242*ebfedea0SLionel Sambuc\textbf{a} will have its own memory allocated which means that \textbf{b} may be cleared after the call 1243*ebfedea0SLionel Sambucand \textbf{a} will be left intact. 1244*ebfedea0SLionel Sambuc 1245*ebfedea0SLionel Sambuc\section{Zeroing an Integer} 1246*ebfedea0SLionel SambucReseting an mp\_int to the default state is a common step in many algorithms. The mp\_zero algorithm will be the algorithm used to 1247*ebfedea0SLionel Sambucperform this task. 1248*ebfedea0SLionel Sambuc 1249*ebfedea0SLionel Sambuc\begin{figure}[here] 1250*ebfedea0SLionel Sambuc\begin{center} 1251*ebfedea0SLionel Sambuc\begin{tabular}{l} 1252*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_zero}. \\ 1253*ebfedea0SLionel Sambuc\textbf{Input}. An mp\_int $a$ \\ 1254*ebfedea0SLionel Sambuc\textbf{Output}. Zero the contents of $a$ \\ 1255*ebfedea0SLionel Sambuc\hline \\ 1256*ebfedea0SLionel Sambuc1. $a.used \leftarrow 0$ \\ 1257*ebfedea0SLionel Sambuc2. $a.sign \leftarrow$ MP\_ZPOS \\ 1258*ebfedea0SLionel Sambuc3. for $n$ from 0 to $a.alloc - 1$ do \\ 1259*ebfedea0SLionel Sambuc\hspace{3mm}3.1 $a_n \leftarrow 0$ \\ 1260*ebfedea0SLionel Sambuc\hline 1261*ebfedea0SLionel Sambuc\end{tabular} 1262*ebfedea0SLionel Sambuc\end{center} 1263*ebfedea0SLionel Sambuc\caption{Algorithm mp\_zero} 1264*ebfedea0SLionel Sambuc\end{figure} 1265*ebfedea0SLionel Sambuc 1266*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_zero.} 1267*ebfedea0SLionel SambucThis algorithm simply resets a mp\_int to the default state. 1268*ebfedea0SLionel Sambuc 1269*ebfedea0SLionel SambucEXAM,bn_mp_zero.c 1270*ebfedea0SLionel Sambuc 1271*ebfedea0SLionel SambucAfter the function is completed, all of the digits are zeroed, the \textbf{used} count is zeroed and the 1272*ebfedea0SLionel Sambuc\textbf{sign} variable is set to \textbf{MP\_ZPOS}. 1273*ebfedea0SLionel Sambuc 1274*ebfedea0SLionel Sambuc\section{Sign Manipulation} 1275*ebfedea0SLionel Sambuc\subsection{Absolute Value} 1276*ebfedea0SLionel SambucWith the mp\_int representation of an integer, calculating the absolute value is trivial. The mp\_abs algorithm will compute 1277*ebfedea0SLionel Sambucthe absolute value of an mp\_int. 1278*ebfedea0SLionel Sambuc 1279*ebfedea0SLionel Sambuc\begin{figure}[here] 1280*ebfedea0SLionel Sambuc\begin{center} 1281*ebfedea0SLionel Sambuc\begin{tabular}{l} 1282*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_abs}. \\ 1283*ebfedea0SLionel Sambuc\textbf{Input}. An mp\_int $a$ \\ 1284*ebfedea0SLionel Sambuc\textbf{Output}. Computes $b = \vert a \vert$ \\ 1285*ebfedea0SLionel Sambuc\hline \\ 1286*ebfedea0SLionel Sambuc1. Copy $a$ to $b$. (\textit{mp\_copy}) \\ 1287*ebfedea0SLionel Sambuc2. If the copy failed return(\textit{MP\_MEM}). \\ 1288*ebfedea0SLionel Sambuc3. $b.sign \leftarrow MP\_ZPOS$ \\ 1289*ebfedea0SLionel Sambuc4. Return(\textit{MP\_OKAY}) \\ 1290*ebfedea0SLionel Sambuc\hline 1291*ebfedea0SLionel Sambuc\end{tabular} 1292*ebfedea0SLionel Sambuc\end{center} 1293*ebfedea0SLionel Sambuc\caption{Algorithm mp\_abs} 1294*ebfedea0SLionel Sambuc\end{figure} 1295*ebfedea0SLionel Sambuc 1296*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_abs.} 1297*ebfedea0SLionel SambucThis algorithm computes the absolute of an mp\_int input. First it copies $a$ over $b$. This is an example of an 1298*ebfedea0SLionel Sambucalgorithm where the check in mp\_copy that determines if the source and destination are equal proves useful. This allows, 1299*ebfedea0SLionel Sambucfor instance, the developer to pass the same mp\_int as the source and destination to this function without addition 1300*ebfedea0SLionel Sambuclogic to handle it. 1301*ebfedea0SLionel Sambuc 1302*ebfedea0SLionel SambucEXAM,bn_mp_abs.c 1303*ebfedea0SLionel Sambuc 1304*ebfedea0SLionel SambucThis fairly trivial algorithm first eliminates non--required duplications (line @27,a != b@) and then sets the 1305*ebfedea0SLionel Sambuc\textbf{sign} flag to \textbf{MP\_ZPOS}. 1306*ebfedea0SLionel Sambuc 1307*ebfedea0SLionel Sambuc\subsection{Integer Negation} 1308*ebfedea0SLionel SambucWith the mp\_int representation of an integer, calculating the negation is also trivial. The mp\_neg algorithm will compute 1309*ebfedea0SLionel Sambucthe negative of an mp\_int input. 1310*ebfedea0SLionel Sambuc 1311*ebfedea0SLionel Sambuc\begin{figure}[here] 1312*ebfedea0SLionel Sambuc\begin{center} 1313*ebfedea0SLionel Sambuc\begin{tabular}{l} 1314*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_neg}. \\ 1315*ebfedea0SLionel Sambuc\textbf{Input}. An mp\_int $a$ \\ 1316*ebfedea0SLionel Sambuc\textbf{Output}. Computes $b = -a$ \\ 1317*ebfedea0SLionel Sambuc\hline \\ 1318*ebfedea0SLionel Sambuc1. Copy $a$ to $b$. (\textit{mp\_copy}) \\ 1319*ebfedea0SLionel Sambuc2. If the copy failed return(\textit{MP\_MEM}). \\ 1320*ebfedea0SLionel Sambuc3. If $a.used = 0$ then return(\textit{MP\_OKAY}). \\ 1321*ebfedea0SLionel Sambuc4. If $a.sign = MP\_ZPOS$ then do \\ 1322*ebfedea0SLionel Sambuc\hspace{3mm}4.1 $b.sign = MP\_NEG$. \\ 1323*ebfedea0SLionel Sambuc5. else do \\ 1324*ebfedea0SLionel Sambuc\hspace{3mm}5.1 $b.sign = MP\_ZPOS$. \\ 1325*ebfedea0SLionel Sambuc6. Return(\textit{MP\_OKAY}) \\ 1326*ebfedea0SLionel Sambuc\hline 1327*ebfedea0SLionel Sambuc\end{tabular} 1328*ebfedea0SLionel Sambuc\end{center} 1329*ebfedea0SLionel Sambuc\caption{Algorithm mp\_neg} 1330*ebfedea0SLionel Sambuc\end{figure} 1331*ebfedea0SLionel Sambuc 1332*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_neg.} 1333*ebfedea0SLionel SambucThis algorithm computes the negation of an input. First it copies $a$ over $b$. If $a$ has no used digits then 1334*ebfedea0SLionel Sambucthe algorithm returns immediately. Otherwise it flips the sign flag and stores the result in $b$. Note that if 1335*ebfedea0SLionel Sambuc$a$ had no digits then it must be positive by definition. Had step three been omitted then the algorithm would return 1336*ebfedea0SLionel Sambuczero as negative. 1337*ebfedea0SLionel Sambuc 1338*ebfedea0SLionel SambucEXAM,bn_mp_neg.c 1339*ebfedea0SLionel Sambuc 1340*ebfedea0SLionel SambucLike mp\_abs() this function avoids non--required duplications (line @21,a != b@) and then sets the sign. We 1341*ebfedea0SLionel Sambuchave to make sure that only non--zero values get a \textbf{sign} of \textbf{MP\_NEG}. If the mp\_int is zero 1342*ebfedea0SLionel Sambucthan the \textbf{sign} is hard--coded to \textbf{MP\_ZPOS}. 1343*ebfedea0SLionel Sambuc 1344*ebfedea0SLionel Sambuc\section{Small Constants} 1345*ebfedea0SLionel Sambuc\subsection{Setting Small Constants} 1346*ebfedea0SLionel SambucOften a mp\_int must be set to a relatively small value such as $1$ or $2$. For these cases the mp\_set algorithm is useful. 1347*ebfedea0SLionel Sambuc 1348*ebfedea0SLionel Sambuc\newpage\begin{figure}[here] 1349*ebfedea0SLionel Sambuc\begin{center} 1350*ebfedea0SLionel Sambuc\begin{tabular}{l} 1351*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_set}. \\ 1352*ebfedea0SLionel Sambuc\textbf{Input}. An mp\_int $a$ and a digit $b$ \\ 1353*ebfedea0SLionel Sambuc\textbf{Output}. Make $a$ equivalent to $b$ \\ 1354*ebfedea0SLionel Sambuc\hline \\ 1355*ebfedea0SLionel Sambuc1. Zero $a$ (\textit{mp\_zero}). \\ 1356*ebfedea0SLionel Sambuc2. $a_0 \leftarrow b \mbox{ (mod }\beta\mbox{)}$ \\ 1357*ebfedea0SLionel Sambuc3. $a.used \leftarrow \left \lbrace \begin{array}{ll} 1358*ebfedea0SLionel Sambuc 1 & \mbox{if }a_0 > 0 \\ 1359*ebfedea0SLionel Sambuc 0 & \mbox{if }a_0 = 0 1360*ebfedea0SLionel Sambuc \end{array} \right .$ \\ 1361*ebfedea0SLionel Sambuc\hline 1362*ebfedea0SLionel Sambuc\end{tabular} 1363*ebfedea0SLionel Sambuc\end{center} 1364*ebfedea0SLionel Sambuc\caption{Algorithm mp\_set} 1365*ebfedea0SLionel Sambuc\end{figure} 1366*ebfedea0SLionel Sambuc 1367*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_set.} 1368*ebfedea0SLionel SambucThis algorithm sets a mp\_int to a small single digit value. Step number 1 ensures that the integer is reset to the default state. The 1369*ebfedea0SLionel Sambucsingle digit is set (\textit{modulo $\beta$}) and the \textbf{used} count is adjusted accordingly. 1370*ebfedea0SLionel Sambuc 1371*ebfedea0SLionel SambucEXAM,bn_mp_set.c 1372*ebfedea0SLionel Sambuc 1373*ebfedea0SLionel SambucFirst we zero (line @21,mp_zero@) the mp\_int to make sure that the other members are initialized for a 1374*ebfedea0SLionel Sambucsmall positive constant. mp\_zero() ensures that the \textbf{sign} is positive and the \textbf{used} count 1375*ebfedea0SLionel Sambucis zero. Next we set the digit and reduce it modulo $\beta$ (line @22,MP_MASK@). After this step we have to 1376*ebfedea0SLionel Sambuccheck if the resulting digit is zero or not. If it is not then we set the \textbf{used} count to one, otherwise 1377*ebfedea0SLionel Sambucto zero. 1378*ebfedea0SLionel Sambuc 1379*ebfedea0SLionel SambucWe can quickly reduce modulo $\beta$ since it is of the form $2^k$ and a quick binary AND operation with 1380*ebfedea0SLionel Sambuc$2^k - 1$ will perform the same operation. 1381*ebfedea0SLionel Sambuc 1382*ebfedea0SLionel SambucOne important limitation of this function is that it will only set one digit. The size of a digit is not fixed, meaning source that uses 1383*ebfedea0SLionel Sambucthis function should take that into account. Only trivially small constants can be set using this function. 1384*ebfedea0SLionel Sambuc 1385*ebfedea0SLionel Sambuc\subsection{Setting Large Constants} 1386*ebfedea0SLionel SambucTo overcome the limitations of the mp\_set algorithm the mp\_set\_int algorithm is ideal. It accepts a ``long'' 1387*ebfedea0SLionel Sambucdata type as input and will always treat it as a 32-bit integer. 1388*ebfedea0SLionel Sambuc 1389*ebfedea0SLionel Sambuc\begin{figure}[here] 1390*ebfedea0SLionel Sambuc\begin{center} 1391*ebfedea0SLionel Sambuc\begin{tabular}{l} 1392*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_set\_int}. \\ 1393*ebfedea0SLionel Sambuc\textbf{Input}. An mp\_int $a$ and a ``long'' integer $b$ \\ 1394*ebfedea0SLionel Sambuc\textbf{Output}. Make $a$ equivalent to $b$ \\ 1395*ebfedea0SLionel Sambuc\hline \\ 1396*ebfedea0SLionel Sambuc1. Zero $a$ (\textit{mp\_zero}) \\ 1397*ebfedea0SLionel Sambuc2. for $n$ from 0 to 7 do \\ 1398*ebfedea0SLionel Sambuc\hspace{3mm}2.1 $a \leftarrow a \cdot 16$ (\textit{mp\_mul2d}) \\ 1399*ebfedea0SLionel Sambuc\hspace{3mm}2.2 $u \leftarrow \lfloor b / 2^{4(7 - n)} \rfloor \mbox{ (mod }16\mbox{)}$\\ 1400*ebfedea0SLionel Sambuc\hspace{3mm}2.3 $a_0 \leftarrow a_0 + u$ \\ 1401*ebfedea0SLionel Sambuc\hspace{3mm}2.4 $a.used \leftarrow a.used + 1$ \\ 1402*ebfedea0SLionel Sambuc3. Clamp excess used digits (\textit{mp\_clamp}) \\ 1403*ebfedea0SLionel Sambuc\hline 1404*ebfedea0SLionel Sambuc\end{tabular} 1405*ebfedea0SLionel Sambuc\end{center} 1406*ebfedea0SLionel Sambuc\caption{Algorithm mp\_set\_int} 1407*ebfedea0SLionel Sambuc\end{figure} 1408*ebfedea0SLionel Sambuc 1409*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_set\_int.} 1410*ebfedea0SLionel SambucThe algorithm performs eight iterations of a simple loop where in each iteration four bits from the source are added to the 1411*ebfedea0SLionel Sambucmp\_int. Step 2.1 will multiply the current result by sixteen making room for four more bits in the less significant positions. In step 2.2 the 1412*ebfedea0SLionel Sambucnext four bits from the source are extracted and are added to the mp\_int. The \textbf{used} digit count is 1413*ebfedea0SLionel Sambucincremented to reflect the addition. The \textbf{used} digit counter is incremented since if any of the leading digits were zero the mp\_int would have 1414*ebfedea0SLionel Sambuczero digits used and the newly added four bits would be ignored. 1415*ebfedea0SLionel Sambuc 1416*ebfedea0SLionel SambucExcess zero digits are trimmed in steps 2.1 and 3 by using higher level algorithms mp\_mul2d and mp\_clamp. 1417*ebfedea0SLionel Sambuc 1418*ebfedea0SLionel SambucEXAM,bn_mp_set_int.c 1419*ebfedea0SLionel Sambuc 1420*ebfedea0SLionel SambucThis function sets four bits of the number at a time to handle all practical \textbf{DIGIT\_BIT} sizes. The weird 1421*ebfedea0SLionel Sambucaddition on line @38,a->used@ ensures that the newly added in bits are added to the number of digits. While it may not 1422*ebfedea0SLionel Sambucseem obvious as to why the digit counter does not grow exceedingly large it is because of the shift on line @27,mp_mul_2d@ 1423*ebfedea0SLionel Sambucas well as the call to mp\_clamp() on line @40,mp_clamp@. Both functions will clamp excess leading digits which keeps 1424*ebfedea0SLionel Sambucthe number of used digits low. 1425*ebfedea0SLionel Sambuc 1426*ebfedea0SLionel Sambuc\section{Comparisons} 1427*ebfedea0SLionel Sambuc\subsection{Unsigned Comparisions} 1428*ebfedea0SLionel SambucComparing a multiple precision integer is performed with the exact same algorithm used to compare two decimal numbers. For example, 1429*ebfedea0SLionel Sambucto compare $1,234$ to $1,264$ the digits are extracted by their positions. That is we compare $1 \cdot 10^3 + 2 \cdot 10^2 + 3 \cdot 10^1 + 4 \cdot 10^0$ 1430*ebfedea0SLionel Sambucto $1 \cdot 10^3 + 2 \cdot 10^2 + 6 \cdot 10^1 + 4 \cdot 10^0$ by comparing single digits at a time starting with the highest magnitude 1431*ebfedea0SLionel Sambucpositions. If any leading digit of one integer is greater than a digit in the same position of another integer then obviously it must be greater. 1432*ebfedea0SLionel Sambuc 1433*ebfedea0SLionel SambucThe first comparision routine that will be developed is the unsigned magnitude compare which will perform a comparison based on the digits of two 1434*ebfedea0SLionel Sambucmp\_int variables alone. It will ignore the sign of the two inputs. Such a function is useful when an absolute comparison is required or if the 1435*ebfedea0SLionel Sambucsigns are known to agree in advance. 1436*ebfedea0SLionel Sambuc 1437*ebfedea0SLionel SambucTo facilitate working with the results of the comparison functions three constants are required. 1438*ebfedea0SLionel Sambuc 1439*ebfedea0SLionel Sambuc\begin{figure}[here] 1440*ebfedea0SLionel Sambuc\begin{center} 1441*ebfedea0SLionel Sambuc\begin{tabular}{|r|l|} 1442*ebfedea0SLionel Sambuc\hline \textbf{Constant} & \textbf{Meaning} \\ 1443*ebfedea0SLionel Sambuc\hline \textbf{MP\_GT} & Greater Than \\ 1444*ebfedea0SLionel Sambuc\hline \textbf{MP\_EQ} & Equal To \\ 1445*ebfedea0SLionel Sambuc\hline \textbf{MP\_LT} & Less Than \\ 1446*ebfedea0SLionel Sambuc\hline 1447*ebfedea0SLionel Sambuc\end{tabular} 1448*ebfedea0SLionel Sambuc\end{center} 1449*ebfedea0SLionel Sambuc\caption{Comparison Return Codes} 1450*ebfedea0SLionel Sambuc\end{figure} 1451*ebfedea0SLionel Sambuc 1452*ebfedea0SLionel Sambuc\begin{figure}[here] 1453*ebfedea0SLionel Sambuc\begin{center} 1454*ebfedea0SLionel Sambuc\begin{tabular}{l} 1455*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_cmp\_mag}. \\ 1456*ebfedea0SLionel Sambuc\textbf{Input}. Two mp\_ints $a$ and $b$. \\ 1457*ebfedea0SLionel Sambuc\textbf{Output}. Unsigned comparison results ($a$ to the left of $b$). \\ 1458*ebfedea0SLionel Sambuc\hline \\ 1459*ebfedea0SLionel Sambuc1. If $a.used > b.used$ then return(\textit{MP\_GT}) \\ 1460*ebfedea0SLionel Sambuc2. If $a.used < b.used$ then return(\textit{MP\_LT}) \\ 1461*ebfedea0SLionel Sambuc3. for n from $a.used - 1$ to 0 do \\ 1462*ebfedea0SLionel Sambuc\hspace{+3mm}3.1 if $a_n > b_n$ then return(\textit{MP\_GT}) \\ 1463*ebfedea0SLionel Sambuc\hspace{+3mm}3.2 if $a_n < b_n$ then return(\textit{MP\_LT}) \\ 1464*ebfedea0SLionel Sambuc4. Return(\textit{MP\_EQ}) \\ 1465*ebfedea0SLionel Sambuc\hline 1466*ebfedea0SLionel Sambuc\end{tabular} 1467*ebfedea0SLionel Sambuc\end{center} 1468*ebfedea0SLionel Sambuc\caption{Algorithm mp\_cmp\_mag} 1469*ebfedea0SLionel Sambuc\end{figure} 1470*ebfedea0SLionel Sambuc 1471*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_cmp\_mag.} 1472*ebfedea0SLionel SambucBy saying ``$a$ to the left of $b$'' it is meant that the comparison is with respect to $a$, that is if $a$ is greater than $b$ it will return 1473*ebfedea0SLionel Sambuc\textbf{MP\_GT} and similar with respect to when $a = b$ and $a < b$. The first two steps compare the number of digits used in both $a$ and $b$. 1474*ebfedea0SLionel SambucObviously if the digit counts differ there would be an imaginary zero digit in the smaller number where the leading digit of the larger number is. 1475*ebfedea0SLionel SambucIf both have the same number of digits than the actual digits themselves must be compared starting at the leading digit. 1476*ebfedea0SLionel Sambuc 1477*ebfedea0SLionel SambucBy step three both inputs must have the same number of digits so its safe to start from either $a.used - 1$ or $b.used - 1$ and count down to 1478*ebfedea0SLionel Sambucthe zero'th digit. If after all of the digits have been compared, no difference is found, the algorithm returns \textbf{MP\_EQ}. 1479*ebfedea0SLionel Sambuc 1480*ebfedea0SLionel SambucEXAM,bn_mp_cmp_mag.c 1481*ebfedea0SLionel Sambuc 1482*ebfedea0SLionel SambucThe two if statements (lines @24,if@ and @28,if@) compare the number of digits in the two inputs. These two are 1483*ebfedea0SLionel Sambucperformed before all of the digits are compared since it is a very cheap test to perform and can potentially save 1484*ebfedea0SLionel Sambucconsiderable time. The implementation given is also not valid without those two statements. $b.alloc$ may be 1485*ebfedea0SLionel Sambucsmaller than $a.used$, meaning that undefined values will be read from $b$ past the end of the array of digits. 1486*ebfedea0SLionel Sambuc 1487*ebfedea0SLionel Sambuc 1488*ebfedea0SLionel Sambuc 1489*ebfedea0SLionel Sambuc\subsection{Signed Comparisons} 1490*ebfedea0SLionel SambucComparing with sign considerations is also fairly critical in several routines (\textit{division for example}). Based on an unsigned magnitude 1491*ebfedea0SLionel Sambuccomparison a trivial signed comparison algorithm can be written. 1492*ebfedea0SLionel Sambuc 1493*ebfedea0SLionel Sambuc\begin{figure}[here] 1494*ebfedea0SLionel Sambuc\begin{center} 1495*ebfedea0SLionel Sambuc\begin{tabular}{l} 1496*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_cmp}. \\ 1497*ebfedea0SLionel Sambuc\textbf{Input}. Two mp\_ints $a$ and $b$ \\ 1498*ebfedea0SLionel Sambuc\textbf{Output}. Signed Comparison Results ($a$ to the left of $b$) \\ 1499*ebfedea0SLionel Sambuc\hline \\ 1500*ebfedea0SLionel Sambuc1. if $a.sign = MP\_NEG$ and $b.sign = MP\_ZPOS$ then return(\textit{MP\_LT}) \\ 1501*ebfedea0SLionel Sambuc2. if $a.sign = MP\_ZPOS$ and $b.sign = MP\_NEG$ then return(\textit{MP\_GT}) \\ 1502*ebfedea0SLionel Sambuc3. if $a.sign = MP\_NEG$ then \\ 1503*ebfedea0SLionel Sambuc\hspace{+3mm}3.1 Return the unsigned comparison of $b$ and $a$ (\textit{mp\_cmp\_mag}) \\ 1504*ebfedea0SLionel Sambuc4 Otherwise \\ 1505*ebfedea0SLionel Sambuc\hspace{+3mm}4.1 Return the unsigned comparison of $a$ and $b$ \\ 1506*ebfedea0SLionel Sambuc\hline 1507*ebfedea0SLionel Sambuc\end{tabular} 1508*ebfedea0SLionel Sambuc\end{center} 1509*ebfedea0SLionel Sambuc\caption{Algorithm mp\_cmp} 1510*ebfedea0SLionel Sambuc\end{figure} 1511*ebfedea0SLionel Sambuc 1512*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_cmp.} 1513*ebfedea0SLionel SambucThe first two steps compare the signs of the two inputs. If the signs do not agree then it can return right away with the appropriate 1514*ebfedea0SLionel Sambuccomparison code. When the signs are equal the digits of the inputs must be compared to determine the correct result. In step 1515*ebfedea0SLionel Sambucthree the unsigned comparision flips the order of the arguments since they are both negative. For instance, if $-a > -b$ then 1516*ebfedea0SLionel Sambuc$\vert a \vert < \vert b \vert$. Step number four will compare the two when they are both positive. 1517*ebfedea0SLionel Sambuc 1518*ebfedea0SLionel SambucEXAM,bn_mp_cmp.c 1519*ebfedea0SLionel Sambuc 1520*ebfedea0SLionel SambucThe two if statements (lines @22,if@ and @26,if@) perform the initial sign comparison. If the signs are not the equal then which ever 1521*ebfedea0SLionel Sambuchas the positive sign is larger. The inputs are compared (line @30,if@) based on magnitudes. If the signs were both 1522*ebfedea0SLionel Sambucnegative then the unsigned comparison is performed in the opposite direction (line @31,mp_cmp_mag@). Otherwise, the signs are assumed to 1523*ebfedea0SLionel Sambucbe both positive and a forward direction unsigned comparison is performed. 1524*ebfedea0SLionel Sambuc 1525*ebfedea0SLionel Sambuc\section*{Exercises} 1526*ebfedea0SLionel Sambuc\begin{tabular}{cl} 1527*ebfedea0SLionel Sambuc$\left [ 2 \right ]$ & Modify algorithm mp\_set\_int to accept as input a variable length array of bits. \\ 1528*ebfedea0SLionel Sambuc & \\ 1529*ebfedea0SLionel Sambuc$\left [ 3 \right ]$ & Give the probability that algorithm mp\_cmp\_mag will have to compare $k$ digits \\ 1530*ebfedea0SLionel Sambuc & of two random digits (of equal magnitude) before a difference is found. \\ 1531*ebfedea0SLionel Sambuc & \\ 1532*ebfedea0SLionel Sambuc$\left [ 1 \right ]$ & Suggest a simple method to speed up the implementation of mp\_cmp\_mag based \\ 1533*ebfedea0SLionel Sambuc & on the observations made in the previous problem. \\ 1534*ebfedea0SLionel Sambuc & 1535*ebfedea0SLionel Sambuc\end{tabular} 1536*ebfedea0SLionel Sambuc 1537*ebfedea0SLionel Sambuc\chapter{Basic Arithmetic} 1538*ebfedea0SLionel Sambuc\section{Introduction} 1539*ebfedea0SLionel SambucAt this point algorithms for initialization, clearing, zeroing, copying, comparing and setting small constants have been 1540*ebfedea0SLionel Sambucestablished. The next logical set of algorithms to develop are addition, subtraction and digit shifting algorithms. These 1541*ebfedea0SLionel Sambucalgorithms make use of the lower level algorithms and are the cruicial building block for the multiplication algorithms. It is very important 1542*ebfedea0SLionel Sambucthat these algorithms are highly optimized. On their own they are simple $O(n)$ algorithms but they can be called from higher level algorithms 1543*ebfedea0SLionel Sambucwhich easily places them at $O(n^2)$ or even $O(n^3)$ work levels. 1544*ebfedea0SLionel Sambuc 1545*ebfedea0SLionel SambucMARK,SHIFTS 1546*ebfedea0SLionel SambucAll of the algorithms within this chapter make use of the logical bit shift operations denoted by $<<$ and $>>$ for left and right 1547*ebfedea0SLionel Sambuclogical shifts respectively. A logical shift is analogous to sliding the decimal point of radix-10 representations. For example, the real 1548*ebfedea0SLionel Sambucnumber $0.9345$ is equivalent to $93.45\%$ which is found by sliding the the decimal two places to the right (\textit{multiplying by $\beta^2 = 10^2$}). 1549*ebfedea0SLionel SambucAlgebraically a binary logical shift is equivalent to a division or multiplication by a power of two. 1550*ebfedea0SLionel SambucFor example, $a << k = a \cdot 2^k$ while $a >> k = \lfloor a/2^k \rfloor$. 1551*ebfedea0SLionel Sambuc 1552*ebfedea0SLionel SambucOne significant difference between a logical shift and the way decimals are shifted is that digits below the zero'th position are removed 1553*ebfedea0SLionel Sambucfrom the number. For example, consider $1101_2 >> 1$ using decimal notation this would produce $110.1_2$. However, with a logical shift the 1554*ebfedea0SLionel Sambucresult is $110_2$. 1555*ebfedea0SLionel Sambuc 1556*ebfedea0SLionel Sambuc\section{Addition and Subtraction} 1557*ebfedea0SLionel SambucIn common twos complement fixed precision arithmetic negative numbers are easily represented by subtraction from the modulus. For example, with 32-bit integers 1558*ebfedea0SLionel Sambuc$a - b\mbox{ (mod }2^{32}\mbox{)}$ is the same as $a + (2^{32} - b) \mbox{ (mod }2^{32}\mbox{)}$ since $2^{32} \equiv 0 \mbox{ (mod }2^{32}\mbox{)}$. 1559*ebfedea0SLionel SambucAs a result subtraction can be performed with a trivial series of logical operations and an addition. 1560*ebfedea0SLionel Sambuc 1561*ebfedea0SLionel SambucHowever, in multiple precision arithmetic negative numbers are not represented in the same way. Instead a sign flag is used to keep track of the 1562*ebfedea0SLionel Sambucsign of the integer. As a result signed addition and subtraction are actually implemented as conditional usage of lower level addition or 1563*ebfedea0SLionel Sambucsubtraction algorithms with the sign fixed up appropriately. 1564*ebfedea0SLionel Sambuc 1565*ebfedea0SLionel SambucThe lower level algorithms will add or subtract integers without regard to the sign flag. That is they will add or subtract the magnitude of 1566*ebfedea0SLionel Sambucthe integers respectively. 1567*ebfedea0SLionel Sambuc 1568*ebfedea0SLionel Sambuc\subsection{Low Level Addition} 1569*ebfedea0SLionel SambucAn unsigned addition of multiple precision integers is performed with the same long-hand algorithm used to add decimal numbers. That is to add the 1570*ebfedea0SLionel Sambuctrailing digits first and propagate the resulting carry upwards. Since this is a lower level algorithm the name will have a ``s\_'' prefix. 1571*ebfedea0SLionel SambucHistorically that convention stems from the MPI library where ``s\_'' stood for static functions that were hidden from the developer entirely. 1572*ebfedea0SLionel Sambuc 1573*ebfedea0SLionel Sambuc\newpage 1574*ebfedea0SLionel Sambuc\begin{figure}[!here] 1575*ebfedea0SLionel Sambuc\begin{center} 1576*ebfedea0SLionel Sambuc\begin{small} 1577*ebfedea0SLionel Sambuc\begin{tabular}{l} 1578*ebfedea0SLionel Sambuc\hline Algorithm \textbf{s\_mp\_add}. \\ 1579*ebfedea0SLionel Sambuc\textbf{Input}. Two mp\_ints $a$ and $b$ \\ 1580*ebfedea0SLionel Sambuc\textbf{Output}. The unsigned addition $c = \vert a \vert + \vert b \vert$. \\ 1581*ebfedea0SLionel Sambuc\hline \\ 1582*ebfedea0SLionel Sambuc1. if $a.used > b.used$ then \\ 1583*ebfedea0SLionel Sambuc\hspace{+3mm}1.1 $min \leftarrow b.used$ \\ 1584*ebfedea0SLionel Sambuc\hspace{+3mm}1.2 $max \leftarrow a.used$ \\ 1585*ebfedea0SLionel Sambuc\hspace{+3mm}1.3 $x \leftarrow a$ \\ 1586*ebfedea0SLionel Sambuc2. else \\ 1587*ebfedea0SLionel Sambuc\hspace{+3mm}2.1 $min \leftarrow a.used$ \\ 1588*ebfedea0SLionel Sambuc\hspace{+3mm}2.2 $max \leftarrow b.used$ \\ 1589*ebfedea0SLionel Sambuc\hspace{+3mm}2.3 $x \leftarrow b$ \\ 1590*ebfedea0SLionel Sambuc3. If $c.alloc < max + 1$ then grow $c$ to hold at least $max + 1$ digits (\textit{mp\_grow}) \\ 1591*ebfedea0SLionel Sambuc4. $oldused \leftarrow c.used$ \\ 1592*ebfedea0SLionel Sambuc5. $c.used \leftarrow max + 1$ \\ 1593*ebfedea0SLionel Sambuc6. $u \leftarrow 0$ \\ 1594*ebfedea0SLionel Sambuc7. for $n$ from $0$ to $min - 1$ do \\ 1595*ebfedea0SLionel Sambuc\hspace{+3mm}7.1 $c_n \leftarrow a_n + b_n + u$ \\ 1596*ebfedea0SLionel Sambuc\hspace{+3mm}7.2 $u \leftarrow c_n >> lg(\beta)$ \\ 1597*ebfedea0SLionel Sambuc\hspace{+3mm}7.3 $c_n \leftarrow c_n \mbox{ (mod }\beta\mbox{)}$ \\ 1598*ebfedea0SLionel Sambuc8. if $min \ne max$ then do \\ 1599*ebfedea0SLionel Sambuc\hspace{+3mm}8.1 for $n$ from $min$ to $max - 1$ do \\ 1600*ebfedea0SLionel Sambuc\hspace{+6mm}8.1.1 $c_n \leftarrow x_n + u$ \\ 1601*ebfedea0SLionel Sambuc\hspace{+6mm}8.1.2 $u \leftarrow c_n >> lg(\beta)$ \\ 1602*ebfedea0SLionel Sambuc\hspace{+6mm}8.1.3 $c_n \leftarrow c_n \mbox{ (mod }\beta\mbox{)}$ \\ 1603*ebfedea0SLionel Sambuc9. $c_{max} \leftarrow u$ \\ 1604*ebfedea0SLionel Sambuc10. if $olduse > max$ then \\ 1605*ebfedea0SLionel Sambuc\hspace{+3mm}10.1 for $n$ from $max + 1$ to $oldused - 1$ do \\ 1606*ebfedea0SLionel Sambuc\hspace{+6mm}10.1.1 $c_n \leftarrow 0$ \\ 1607*ebfedea0SLionel Sambuc11. Clamp excess digits in $c$. (\textit{mp\_clamp}) \\ 1608*ebfedea0SLionel Sambuc12. Return(\textit{MP\_OKAY}) \\ 1609*ebfedea0SLionel Sambuc\hline 1610*ebfedea0SLionel Sambuc\end{tabular} 1611*ebfedea0SLionel Sambuc\end{small} 1612*ebfedea0SLionel Sambuc\end{center} 1613*ebfedea0SLionel Sambuc\caption{Algorithm s\_mp\_add} 1614*ebfedea0SLionel Sambuc\end{figure} 1615*ebfedea0SLionel Sambuc 1616*ebfedea0SLionel Sambuc\textbf{Algorithm s\_mp\_add.} 1617*ebfedea0SLionel SambucThis algorithm is loosely based on algorithm 14.7 of HAC \cite[pp. 594]{HAC} but has been extended to allow the inputs to have different magnitudes. 1618*ebfedea0SLionel SambucCoincidentally the description of algorithm A in Knuth \cite[pp. 266]{TAOCPV2} shares the same deficiency as the algorithm from \cite{HAC}. Even the 1619*ebfedea0SLionel SambucMIX pseudo machine code presented by Knuth \cite[pp. 266-267]{TAOCPV2} is incapable of handling inputs which are of different magnitudes. 1620*ebfedea0SLionel Sambuc 1621*ebfedea0SLionel SambucThe first thing that has to be accomplished is to sort out which of the two inputs is the largest. The addition logic 1622*ebfedea0SLionel Sambucwill simply add all of the smallest input to the largest input and store that first part of the result in the 1623*ebfedea0SLionel Sambucdestination. Then it will apply a simpler addition loop to excess digits of the larger input. 1624*ebfedea0SLionel Sambuc 1625*ebfedea0SLionel SambucThe first two steps will handle sorting the inputs such that $min$ and $max$ hold the digit counts of the two 1626*ebfedea0SLionel Sambucinputs. The variable $x$ will be an mp\_int alias for the largest input or the second input $b$ if they have the 1627*ebfedea0SLionel Sambucsame number of digits. After the inputs are sorted the destination $c$ is grown as required to accomodate the sum 1628*ebfedea0SLionel Sambucof the two inputs. The original \textbf{used} count of $c$ is copied and set to the new used count. 1629*ebfedea0SLionel Sambuc 1630*ebfedea0SLionel SambucAt this point the first addition loop will go through as many digit positions that both inputs have. The carry 1631*ebfedea0SLionel Sambucvariable $\mu$ is set to zero outside the loop. Inside the loop an ``addition'' step requires three statements to produce 1632*ebfedea0SLionel Sambucone digit of the summand. First 1633*ebfedea0SLionel Sambuctwo digits from $a$ and $b$ are added together along with the carry $\mu$. The carry of this step is extracted and stored 1634*ebfedea0SLionel Sambucin $\mu$ and finally the digit of the result $c_n$ is truncated within the range $0 \le c_n < \beta$. 1635*ebfedea0SLionel Sambuc 1636*ebfedea0SLionel SambucNow all of the digit positions that both inputs have in common have been exhausted. If $min \ne max$ then $x$ is an alias 1637*ebfedea0SLionel Sambucfor one of the inputs that has more digits. A simplified addition loop is then used to essentially copy the remaining digits 1638*ebfedea0SLionel Sambucand the carry to the destination. 1639*ebfedea0SLionel Sambuc 1640*ebfedea0SLionel SambucThe final carry is stored in $c_{max}$ and digits above $max$ upto $oldused$ are zeroed which completes the addition. 1641*ebfedea0SLionel Sambuc 1642*ebfedea0SLionel Sambuc 1643*ebfedea0SLionel SambucEXAM,bn_s_mp_add.c 1644*ebfedea0SLionel Sambuc 1645*ebfedea0SLionel SambucWe first sort (lines @27,if@ to @35,}@) the inputs based on magnitude and determine the $min$ and $max$ variables. 1646*ebfedea0SLionel SambucNote that $x$ is a pointer to an mp\_int assigned to the largest input, in effect it is a local alias. Next we 1647*ebfedea0SLionel Sambucgrow the destination (@37,init@ to @42,}@) ensure that it can accomodate the result of the addition. 1648*ebfedea0SLionel Sambuc 1649*ebfedea0SLionel SambucSimilar to the implementation of mp\_copy this function uses the braced code and local aliases coding style. The three aliases that are on 1650*ebfedea0SLionel Sambuclines @56,tmpa@, @59,tmpb@ and @62,tmpc@ represent the two inputs and destination variables respectively. These aliases are used to ensure the 1651*ebfedea0SLionel Sambuccompiler does not have to dereference $a$, $b$ or $c$ (respectively) to access the digits of the respective mp\_int. 1652*ebfedea0SLionel Sambuc 1653*ebfedea0SLionel SambucThe initial carry $u$ will be cleared (line @65,u = 0@), note that $u$ is of type mp\_digit which ensures type 1654*ebfedea0SLionel Sambuccompatibility within the implementation. The initial addition (line @66,for@ to @75,}@) adds digits from 1655*ebfedea0SLionel Sambucboth inputs until the smallest input runs out of digits. Similarly the conditional addition loop 1656*ebfedea0SLionel Sambuc(line @81,for@ to @90,}@) adds the remaining digits from the larger of the two inputs. The addition is finished 1657*ebfedea0SLionel Sambucwith the final carry being stored in $tmpc$ (line @94,tmpc++@). Note the ``++'' operator within the same expression. 1658*ebfedea0SLionel SambucAfter line @94,tmpc++@, $tmpc$ will point to the $c.used$'th digit of the mp\_int $c$. This is useful 1659*ebfedea0SLionel Sambucfor the next loop (line @97,for@ to @99,}@) which set any old upper digits to zero. 1660*ebfedea0SLionel Sambuc 1661*ebfedea0SLionel Sambuc\subsection{Low Level Subtraction} 1662*ebfedea0SLionel SambucThe low level unsigned subtraction algorithm is very similar to the low level unsigned addition algorithm. The principle difference is that the 1663*ebfedea0SLionel Sambucunsigned subtraction algorithm requires the result to be positive. That is when computing $a - b$ the condition $\vert a \vert \ge \vert b\vert$ must 1664*ebfedea0SLionel Sambucbe met for this algorithm to function properly. Keep in mind this low level algorithm is not meant to be used in higher level algorithms directly. 1665*ebfedea0SLionel SambucThis algorithm as will be shown can be used to create functional signed addition and subtraction algorithms. 1666*ebfedea0SLionel Sambuc 1667*ebfedea0SLionel SambucMARK,GAMMA 1668*ebfedea0SLionel Sambuc 1669*ebfedea0SLionel SambucFor this algorithm a new variable is required to make the description simpler. Recall from section 1.3.1 that a mp\_digit must be able to represent 1670*ebfedea0SLionel Sambucthe range $0 \le x < 2\beta$ for the algorithms to work correctly. However, it is allowable that a mp\_digit represent a larger range of values. For 1671*ebfedea0SLionel Sambucthis algorithm we will assume that the variable $\gamma$ represents the number of bits available in a 1672*ebfedea0SLionel Sambucmp\_digit (\textit{this implies $2^{\gamma} > \beta$}). 1673*ebfedea0SLionel Sambuc 1674*ebfedea0SLionel SambucFor example, the default for LibTomMath is to use a ``unsigned long'' for the mp\_digit ``type'' while $\beta = 2^{28}$. In ISO C an ``unsigned long'' 1675*ebfedea0SLionel Sambucdata type must be able to represent $0 \le x < 2^{32}$ meaning that in this case $\gamma \ge 32$. 1676*ebfedea0SLionel Sambuc 1677*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 1678*ebfedea0SLionel Sambuc\begin{center} 1679*ebfedea0SLionel Sambuc\begin{small} 1680*ebfedea0SLionel Sambuc\begin{tabular}{l} 1681*ebfedea0SLionel Sambuc\hline Algorithm \textbf{s\_mp\_sub}. \\ 1682*ebfedea0SLionel Sambuc\textbf{Input}. Two mp\_ints $a$ and $b$ ($\vert a \vert \ge \vert b \vert$) \\ 1683*ebfedea0SLionel Sambuc\textbf{Output}. The unsigned subtraction $c = \vert a \vert - \vert b \vert$. \\ 1684*ebfedea0SLionel Sambuc\hline \\ 1685*ebfedea0SLionel Sambuc1. $min \leftarrow b.used$ \\ 1686*ebfedea0SLionel Sambuc2. $max \leftarrow a.used$ \\ 1687*ebfedea0SLionel Sambuc3. If $c.alloc < max$ then grow $c$ to hold at least $max$ digits. (\textit{mp\_grow}) \\ 1688*ebfedea0SLionel Sambuc4. $oldused \leftarrow c.used$ \\ 1689*ebfedea0SLionel Sambuc5. $c.used \leftarrow max$ \\ 1690*ebfedea0SLionel Sambuc6. $u \leftarrow 0$ \\ 1691*ebfedea0SLionel Sambuc7. for $n$ from $0$ to $min - 1$ do \\ 1692*ebfedea0SLionel Sambuc\hspace{3mm}7.1 $c_n \leftarrow a_n - b_n - u$ \\ 1693*ebfedea0SLionel Sambuc\hspace{3mm}7.2 $u \leftarrow c_n >> (\gamma - 1)$ \\ 1694*ebfedea0SLionel Sambuc\hspace{3mm}7.3 $c_n \leftarrow c_n \mbox{ (mod }\beta\mbox{)}$ \\ 1695*ebfedea0SLionel Sambuc8. if $min < max$ then do \\ 1696*ebfedea0SLionel Sambuc\hspace{3mm}8.1 for $n$ from $min$ to $max - 1$ do \\ 1697*ebfedea0SLionel Sambuc\hspace{6mm}8.1.1 $c_n \leftarrow a_n - u$ \\ 1698*ebfedea0SLionel Sambuc\hspace{6mm}8.1.2 $u \leftarrow c_n >> (\gamma - 1)$ \\ 1699*ebfedea0SLionel Sambuc\hspace{6mm}8.1.3 $c_n \leftarrow c_n \mbox{ (mod }\beta\mbox{)}$ \\ 1700*ebfedea0SLionel Sambuc9. if $oldused > max$ then do \\ 1701*ebfedea0SLionel Sambuc\hspace{3mm}9.1 for $n$ from $max$ to $oldused - 1$ do \\ 1702*ebfedea0SLionel Sambuc\hspace{6mm}9.1.1 $c_n \leftarrow 0$ \\ 1703*ebfedea0SLionel Sambuc10. Clamp excess digits of $c$. (\textit{mp\_clamp}). \\ 1704*ebfedea0SLionel Sambuc11. Return(\textit{MP\_OKAY}). \\ 1705*ebfedea0SLionel Sambuc\hline 1706*ebfedea0SLionel Sambuc\end{tabular} 1707*ebfedea0SLionel Sambuc\end{small} 1708*ebfedea0SLionel Sambuc\end{center} 1709*ebfedea0SLionel Sambuc\caption{Algorithm s\_mp\_sub} 1710*ebfedea0SLionel Sambuc\end{figure} 1711*ebfedea0SLionel Sambuc 1712*ebfedea0SLionel Sambuc\textbf{Algorithm s\_mp\_sub.} 1713*ebfedea0SLionel SambucThis algorithm performs the unsigned subtraction of two mp\_int variables under the restriction that the result must be positive. That is when 1714*ebfedea0SLionel Sambucpassing variables $a$ and $b$ the condition that $\vert a \vert \ge \vert b \vert$ must be met for the algorithm to function correctly. This 1715*ebfedea0SLionel Sambucalgorithm is loosely based on algorithm 14.9 \cite[pp. 595]{HAC} and is similar to algorithm S in \cite[pp. 267]{TAOCPV2} as well. As was the case 1716*ebfedea0SLionel Sambucof the algorithm s\_mp\_add both other references lack discussion concerning various practical details such as when the inputs differ in magnitude. 1717*ebfedea0SLionel Sambuc 1718*ebfedea0SLionel SambucThe initial sorting of the inputs is trivial in this algorithm since $a$ is guaranteed to have at least the same magnitude of $b$. Steps 1 and 2 1719*ebfedea0SLionel Sambucset the $min$ and $max$ variables. Unlike the addition routine there is guaranteed to be no carry which means that the final result can be at 1720*ebfedea0SLionel Sambucmost $max$ digits in length as opposed to $max + 1$. Similar to the addition algorithm the \textbf{used} count of $c$ is copied locally and 1721*ebfedea0SLionel Sambucset to the maximal count for the operation. 1722*ebfedea0SLionel Sambuc 1723*ebfedea0SLionel SambucThe subtraction loop that begins on step seven is essentially the same as the addition loop of algorithm s\_mp\_add except single precision 1724*ebfedea0SLionel Sambucsubtraction is used instead. Note the use of the $\gamma$ variable to extract the carry (\textit{also known as the borrow}) within the subtraction 1725*ebfedea0SLionel Sambucloops. Under the assumption that two's complement single precision arithmetic is used this will successfully extract the desired carry. 1726*ebfedea0SLionel Sambuc 1727*ebfedea0SLionel SambucFor example, consider subtracting $0101_2$ from $0100_2$ where $\gamma = 4$ and $\beta = 2$. The least significant bit will force a carry upwards to 1728*ebfedea0SLionel Sambucthe third bit which will be set to zero after the borrow. After the very first bit has been subtracted $4 - 1 \equiv 0011_2$ will remain, When the 1729*ebfedea0SLionel Sambucthird bit of $0101_2$ is subtracted from the result it will cause another carry. In this case though the carry will be forced to propagate all the 1730*ebfedea0SLionel Sambucway to the most significant bit. 1731*ebfedea0SLionel Sambuc 1732*ebfedea0SLionel SambucRecall that $\beta < 2^{\gamma}$. This means that if a carry does occur just before the $lg(\beta)$'th bit it will propagate all the way to the most 1733*ebfedea0SLionel Sambucsignificant bit. Thus, the high order bits of the mp\_digit that are not part of the actual digit will either be all zero, or all one. All that 1734*ebfedea0SLionel Sambucis needed is a single zero or one bit for the carry. Therefore a single logical shift right by $\gamma - 1$ positions is sufficient to extract the 1735*ebfedea0SLionel Sambuccarry. This method of carry extraction may seem awkward but the reason for it becomes apparent when the implementation is discussed. 1736*ebfedea0SLionel Sambuc 1737*ebfedea0SLionel SambucIf $b$ has a smaller magnitude than $a$ then step 9 will force the carry and copy operation to propagate through the larger input $a$ into $c$. Step 1738*ebfedea0SLionel Sambuc10 will ensure that any leading digits of $c$ above the $max$'th position are zeroed. 1739*ebfedea0SLionel Sambuc 1740*ebfedea0SLionel SambucEXAM,bn_s_mp_sub.c 1741*ebfedea0SLionel Sambuc 1742*ebfedea0SLionel SambucLike low level addition we ``sort'' the inputs. Except in this case the sorting is hardcoded 1743*ebfedea0SLionel Sambuc(lines @24,min@ and @25,max@). In reality the $min$ and $max$ variables are only aliases and are only 1744*ebfedea0SLionel Sambucused to make the source code easier to read. Again the pointer alias optimization is used 1745*ebfedea0SLionel Sambucwithin this algorithm. The aliases $tmpa$, $tmpb$ and $tmpc$ are initialized 1746*ebfedea0SLionel Sambuc(lines @42,tmpa@, @43,tmpb@ and @44,tmpc@) for $a$, $b$ and $c$ respectively. 1747*ebfedea0SLionel Sambuc 1748*ebfedea0SLionel SambucThe first subtraction loop (lines @47,u = 0@ through @61,}@) subtract digits from both inputs until the smaller of 1749*ebfedea0SLionel Sambucthe two inputs has been exhausted. As remarked earlier there is an implementation reason for using the ``awkward'' 1750*ebfedea0SLionel Sambucmethod of extracting the carry (line @57, >>@). The traditional method for extracting the carry would be to shift 1751*ebfedea0SLionel Sambucby $lg(\beta)$ positions and logically AND the least significant bit. The AND operation is required because all of 1752*ebfedea0SLionel Sambucthe bits above the $\lg(\beta)$'th bit will be set to one after a carry occurs from subtraction. This carry 1753*ebfedea0SLionel Sambucextraction requires two relatively cheap operations to extract the carry. The other method is to simply shift the 1754*ebfedea0SLionel Sambucmost significant bit to the least significant bit thus extracting the carry with a single cheap operation. This 1755*ebfedea0SLionel Sambucoptimization only works on twos compliment machines which is a safe assumption to make. 1756*ebfedea0SLionel Sambuc 1757*ebfedea0SLionel SambucIf $a$ has a larger magnitude than $b$ an additional loop (lines @64,for@ through @73,}@) is required to propagate 1758*ebfedea0SLionel Sambucthe carry through $a$ and copy the result to $c$. 1759*ebfedea0SLionel Sambuc 1760*ebfedea0SLionel Sambuc\subsection{High Level Addition} 1761*ebfedea0SLionel SambucNow that both lower level addition and subtraction algorithms have been established an effective high level signed addition algorithm can be 1762*ebfedea0SLionel Sambucestablished. This high level addition algorithm will be what other algorithms and developers will use to perform addition of mp\_int data 1763*ebfedea0SLionel Sambuctypes. 1764*ebfedea0SLionel Sambuc 1765*ebfedea0SLionel SambucRecall from section 5.2 that an mp\_int represents an integer with an unsigned mantissa (\textit{the array of digits}) and a \textbf{sign} 1766*ebfedea0SLionel Sambucflag. A high level addition is actually performed as a series of eight separate cases which can be optimized down to three unique cases. 1767*ebfedea0SLionel Sambuc 1768*ebfedea0SLionel Sambuc\begin{figure}[!here] 1769*ebfedea0SLionel Sambuc\begin{center} 1770*ebfedea0SLionel Sambuc\begin{tabular}{l} 1771*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_add}. \\ 1772*ebfedea0SLionel Sambuc\textbf{Input}. Two mp\_ints $a$ and $b$ \\ 1773*ebfedea0SLionel Sambuc\textbf{Output}. The signed addition $c = a + b$. \\ 1774*ebfedea0SLionel Sambuc\hline \\ 1775*ebfedea0SLionel Sambuc1. if $a.sign = b.sign$ then do \\ 1776*ebfedea0SLionel Sambuc\hspace{3mm}1.1 $c.sign \leftarrow a.sign$ \\ 1777*ebfedea0SLionel Sambuc\hspace{3mm}1.2 $c \leftarrow \vert a \vert + \vert b \vert$ (\textit{s\_mp\_add})\\ 1778*ebfedea0SLionel Sambuc2. else do \\ 1779*ebfedea0SLionel Sambuc\hspace{3mm}2.1 if $\vert a \vert < \vert b \vert$ then do (\textit{mp\_cmp\_mag}) \\ 1780*ebfedea0SLionel Sambuc\hspace{6mm}2.1.1 $c.sign \leftarrow b.sign$ \\ 1781*ebfedea0SLionel Sambuc\hspace{6mm}2.1.2 $c \leftarrow \vert b \vert - \vert a \vert$ (\textit{s\_mp\_sub}) \\ 1782*ebfedea0SLionel Sambuc\hspace{3mm}2.2 else do \\ 1783*ebfedea0SLionel Sambuc\hspace{6mm}2.2.1 $c.sign \leftarrow a.sign$ \\ 1784*ebfedea0SLionel Sambuc\hspace{6mm}2.2.2 $c \leftarrow \vert a \vert - \vert b \vert$ \\ 1785*ebfedea0SLionel Sambuc3. Return(\textit{MP\_OKAY}). \\ 1786*ebfedea0SLionel Sambuc\hline 1787*ebfedea0SLionel Sambuc\end{tabular} 1788*ebfedea0SLionel Sambuc\end{center} 1789*ebfedea0SLionel Sambuc\caption{Algorithm mp\_add} 1790*ebfedea0SLionel Sambuc\end{figure} 1791*ebfedea0SLionel Sambuc 1792*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_add.} 1793*ebfedea0SLionel SambucThis algorithm performs the signed addition of two mp\_int variables. There is no reference algorithm to draw upon from 1794*ebfedea0SLionel Sambuceither \cite{TAOCPV2} or \cite{HAC} since they both only provide unsigned operations. The algorithm is fairly 1795*ebfedea0SLionel Sambucstraightforward but restricted since subtraction can only produce positive results. 1796*ebfedea0SLionel Sambuc 1797*ebfedea0SLionel Sambuc\begin{figure}[here] 1798*ebfedea0SLionel Sambuc\begin{small} 1799*ebfedea0SLionel Sambuc\begin{center} 1800*ebfedea0SLionel Sambuc\begin{tabular}{|c|c|c|c|c|} 1801*ebfedea0SLionel Sambuc\hline \textbf{Sign of $a$} & \textbf{Sign of $b$} & \textbf{$\vert a \vert > \vert b \vert $} & \textbf{Unsigned Operation} & \textbf{Result Sign Flag} \\ 1802*ebfedea0SLionel Sambuc\hline $+$ & $+$ & Yes & $c = a + b$ & $a.sign$ \\ 1803*ebfedea0SLionel Sambuc\hline $+$ & $+$ & No & $c = a + b$ & $a.sign$ \\ 1804*ebfedea0SLionel Sambuc\hline $-$ & $-$ & Yes & $c = a + b$ & $a.sign$ \\ 1805*ebfedea0SLionel Sambuc\hline $-$ & $-$ & No & $c = a + b$ & $a.sign$ \\ 1806*ebfedea0SLionel Sambuc\hline &&&&\\ 1807*ebfedea0SLionel Sambuc 1808*ebfedea0SLionel Sambuc\hline $+$ & $-$ & No & $c = b - a$ & $b.sign$ \\ 1809*ebfedea0SLionel Sambuc\hline $-$ & $+$ & No & $c = b - a$ & $b.sign$ \\ 1810*ebfedea0SLionel Sambuc 1811*ebfedea0SLionel Sambuc\hline &&&&\\ 1812*ebfedea0SLionel Sambuc 1813*ebfedea0SLionel Sambuc\hline $+$ & $-$ & Yes & $c = a - b$ & $a.sign$ \\ 1814*ebfedea0SLionel Sambuc\hline $-$ & $+$ & Yes & $c = a - b$ & $a.sign$ \\ 1815*ebfedea0SLionel Sambuc 1816*ebfedea0SLionel Sambuc\hline 1817*ebfedea0SLionel Sambuc\end{tabular} 1818*ebfedea0SLionel Sambuc\end{center} 1819*ebfedea0SLionel Sambuc\end{small} 1820*ebfedea0SLionel Sambuc\caption{Addition Guide Chart} 1821*ebfedea0SLionel Sambuc\label{fig:AddChart} 1822*ebfedea0SLionel Sambuc\end{figure} 1823*ebfedea0SLionel Sambuc 1824*ebfedea0SLionel SambucFigure~\ref{fig:AddChart} lists all of the eight possible input combinations and is sorted to show that only three 1825*ebfedea0SLionel Sambucspecific cases need to be handled. The return code of the unsigned operations at step 1.2, 2.1.2 and 2.2.2 are 1826*ebfedea0SLionel Sambucforwarded to step three to check for errors. This simplifies the description of the algorithm considerably and best 1827*ebfedea0SLionel Sambucfollows how the implementation actually was achieved. 1828*ebfedea0SLionel Sambuc 1829*ebfedea0SLionel SambucAlso note how the \textbf{sign} is set before the unsigned addition or subtraction is performed. Recall from the descriptions of algorithms 1830*ebfedea0SLionel Sambucs\_mp\_add and s\_mp\_sub that the mp\_clamp function is used at the end to trim excess digits. The mp\_clamp algorithm will set the \textbf{sign} 1831*ebfedea0SLionel Sambucto \textbf{MP\_ZPOS} when the \textbf{used} digit count reaches zero. 1832*ebfedea0SLionel Sambuc 1833*ebfedea0SLionel SambucFor example, consider performing $-a + a$ with algorithm mp\_add. By the description of the algorithm the sign is set to \textbf{MP\_NEG} which would 1834*ebfedea0SLionel Sambucproduce a result of $-0$. However, since the sign is set first then the unsigned addition is performed the subsequent usage of algorithm mp\_clamp 1835*ebfedea0SLionel Sambucwithin algorithm s\_mp\_add will force $-0$ to become $0$. 1836*ebfedea0SLionel Sambuc 1837*ebfedea0SLionel SambucEXAM,bn_mp_add.c 1838*ebfedea0SLionel Sambuc 1839*ebfedea0SLionel SambucThe source code follows the algorithm fairly closely. The most notable new source code addition is the usage of the $res$ integer variable which 1840*ebfedea0SLionel Sambucis used to pass result of the unsigned operations forward. Unlike in the algorithm, the variable $res$ is merely returned as is without 1841*ebfedea0SLionel Sambucexplicitly checking it and returning the constant \textbf{MP\_OKAY}. The observation is this algorithm will succeed or fail only if the lower 1842*ebfedea0SLionel Sambuclevel functions do so. Returning their return code is sufficient. 1843*ebfedea0SLionel Sambuc 1844*ebfedea0SLionel Sambuc\subsection{High Level Subtraction} 1845*ebfedea0SLionel SambucThe high level signed subtraction algorithm is essentially the same as the high level signed addition algorithm. 1846*ebfedea0SLionel Sambuc 1847*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 1848*ebfedea0SLionel Sambuc\begin{center} 1849*ebfedea0SLionel Sambuc\begin{tabular}{l} 1850*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_sub}. \\ 1851*ebfedea0SLionel Sambuc\textbf{Input}. Two mp\_ints $a$ and $b$ \\ 1852*ebfedea0SLionel Sambuc\textbf{Output}. The signed subtraction $c = a - b$. \\ 1853*ebfedea0SLionel Sambuc\hline \\ 1854*ebfedea0SLionel Sambuc1. if $a.sign \ne b.sign$ then do \\ 1855*ebfedea0SLionel Sambuc\hspace{3mm}1.1 $c.sign \leftarrow a.sign$ \\ 1856*ebfedea0SLionel Sambuc\hspace{3mm}1.2 $c \leftarrow \vert a \vert + \vert b \vert$ (\textit{s\_mp\_add}) \\ 1857*ebfedea0SLionel Sambuc2. else do \\ 1858*ebfedea0SLionel Sambuc\hspace{3mm}2.1 if $\vert a \vert \ge \vert b \vert$ then do (\textit{mp\_cmp\_mag}) \\ 1859*ebfedea0SLionel Sambuc\hspace{6mm}2.1.1 $c.sign \leftarrow a.sign$ \\ 1860*ebfedea0SLionel Sambuc\hspace{6mm}2.1.2 $c \leftarrow \vert a \vert - \vert b \vert$ (\textit{s\_mp\_sub}) \\ 1861*ebfedea0SLionel Sambuc\hspace{3mm}2.2 else do \\ 1862*ebfedea0SLionel Sambuc\hspace{6mm}2.2.1 $c.sign \leftarrow \left \lbrace \begin{array}{ll} 1863*ebfedea0SLionel Sambuc MP\_ZPOS & \mbox{if }a.sign = MP\_NEG \\ 1864*ebfedea0SLionel Sambuc MP\_NEG & \mbox{otherwise} \\ 1865*ebfedea0SLionel Sambuc \end{array} \right .$ \\ 1866*ebfedea0SLionel Sambuc\hspace{6mm}2.2.2 $c \leftarrow \vert b \vert - \vert a \vert$ \\ 1867*ebfedea0SLionel Sambuc3. Return(\textit{MP\_OKAY}). \\ 1868*ebfedea0SLionel Sambuc\hline 1869*ebfedea0SLionel Sambuc\end{tabular} 1870*ebfedea0SLionel Sambuc\end{center} 1871*ebfedea0SLionel Sambuc\caption{Algorithm mp\_sub} 1872*ebfedea0SLionel Sambuc\end{figure} 1873*ebfedea0SLionel Sambuc 1874*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_sub.} 1875*ebfedea0SLionel SambucThis algorithm performs the signed subtraction of two inputs. Similar to algorithm mp\_add there is no reference in either \cite{TAOCPV2} or 1876*ebfedea0SLionel Sambuc\cite{HAC}. Also this algorithm is restricted by algorithm s\_mp\_sub. Chart \ref{fig:SubChart} lists the eight possible inputs and 1877*ebfedea0SLionel Sambucthe operations required. 1878*ebfedea0SLionel Sambuc 1879*ebfedea0SLionel Sambuc\begin{figure}[!here] 1880*ebfedea0SLionel Sambuc\begin{small} 1881*ebfedea0SLionel Sambuc\begin{center} 1882*ebfedea0SLionel Sambuc\begin{tabular}{|c|c|c|c|c|} 1883*ebfedea0SLionel Sambuc\hline \textbf{Sign of $a$} & \textbf{Sign of $b$} & \textbf{$\vert a \vert \ge \vert b \vert $} & \textbf{Unsigned Operation} & \textbf{Result Sign Flag} \\ 1884*ebfedea0SLionel Sambuc\hline $+$ & $-$ & Yes & $c = a + b$ & $a.sign$ \\ 1885*ebfedea0SLionel Sambuc\hline $+$ & $-$ & No & $c = a + b$ & $a.sign$ \\ 1886*ebfedea0SLionel Sambuc\hline $-$ & $+$ & Yes & $c = a + b$ & $a.sign$ \\ 1887*ebfedea0SLionel Sambuc\hline $-$ & $+$ & No & $c = a + b$ & $a.sign$ \\ 1888*ebfedea0SLionel Sambuc\hline &&&& \\ 1889*ebfedea0SLionel Sambuc\hline $+$ & $+$ & Yes & $c = a - b$ & $a.sign$ \\ 1890*ebfedea0SLionel Sambuc\hline $-$ & $-$ & Yes & $c = a - b$ & $a.sign$ \\ 1891*ebfedea0SLionel Sambuc\hline &&&& \\ 1892*ebfedea0SLionel Sambuc\hline $+$ & $+$ & No & $c = b - a$ & $\mbox{opposite of }a.sign$ \\ 1893*ebfedea0SLionel Sambuc\hline $-$ & $-$ & No & $c = b - a$ & $\mbox{opposite of }a.sign$ \\ 1894*ebfedea0SLionel Sambuc\hline 1895*ebfedea0SLionel Sambuc\end{tabular} 1896*ebfedea0SLionel Sambuc\end{center} 1897*ebfedea0SLionel Sambuc\end{small} 1898*ebfedea0SLionel Sambuc\caption{Subtraction Guide Chart} 1899*ebfedea0SLionel Sambuc\label{fig:SubChart} 1900*ebfedea0SLionel Sambuc\end{figure} 1901*ebfedea0SLionel Sambuc 1902*ebfedea0SLionel SambucSimilar to the case of algorithm mp\_add the \textbf{sign} is set first before the unsigned addition or subtraction. That is to prevent the 1903*ebfedea0SLionel Sambucalgorithm from producing $-a - -a = -0$ as a result. 1904*ebfedea0SLionel Sambuc 1905*ebfedea0SLionel SambucEXAM,bn_mp_sub.c 1906*ebfedea0SLionel Sambuc 1907*ebfedea0SLionel SambucMuch like the implementation of algorithm mp\_add the variable $res$ is used to catch the return code of the unsigned addition or subtraction operations 1908*ebfedea0SLionel Sambucand forward it to the end of the function. On line @38, != MP_LT@ the ``not equal to'' \textbf{MP\_LT} expression is used to emulate a 1909*ebfedea0SLionel Sambuc``greater than or equal to'' comparison. 1910*ebfedea0SLionel Sambuc 1911*ebfedea0SLionel Sambuc\section{Bit and Digit Shifting} 1912*ebfedea0SLionel SambucMARK,POLY 1913*ebfedea0SLionel SambucIt is quite common to think of a multiple precision integer as a polynomial in $x$, that is $y = f(\beta)$ where $f(x) = \sum_{i=0}^{n-1} a_i x^i$. 1914*ebfedea0SLionel SambucThis notation arises within discussion of Montgomery and Diminished Radix Reduction as well as Karatsuba multiplication and squaring. 1915*ebfedea0SLionel Sambuc 1916*ebfedea0SLionel SambucIn order to facilitate operations on polynomials in $x$ as above a series of simple ``digit'' algorithms have to be established. That is to shift 1917*ebfedea0SLionel Sambucthe digits left or right as well to shift individual bits of the digits left and right. It is important to note that not all ``shift'' operations 1918*ebfedea0SLionel Sambucare on radix-$\beta$ digits. 1919*ebfedea0SLionel Sambuc 1920*ebfedea0SLionel Sambuc\subsection{Multiplication by Two} 1921*ebfedea0SLionel Sambuc 1922*ebfedea0SLionel SambucIn a binary system where the radix is a power of two multiplication by two not only arises often in other algorithms it is a fairly efficient 1923*ebfedea0SLionel Sambucoperation to perform. A single precision logical shift left is sufficient to multiply a single digit by two. 1924*ebfedea0SLionel Sambuc 1925*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 1926*ebfedea0SLionel Sambuc\begin{small} 1927*ebfedea0SLionel Sambuc\begin{center} 1928*ebfedea0SLionel Sambuc\begin{tabular}{l} 1929*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_mul\_2}. \\ 1930*ebfedea0SLionel Sambuc\textbf{Input}. One mp\_int $a$ \\ 1931*ebfedea0SLionel Sambuc\textbf{Output}. $b = 2a$. \\ 1932*ebfedea0SLionel Sambuc\hline \\ 1933*ebfedea0SLionel Sambuc1. If $b.alloc < a.used + 1$ then grow $b$ to hold $a.used + 1$ digits. (\textit{mp\_grow}) \\ 1934*ebfedea0SLionel Sambuc2. $oldused \leftarrow b.used$ \\ 1935*ebfedea0SLionel Sambuc3. $b.used \leftarrow a.used$ \\ 1936*ebfedea0SLionel Sambuc4. $r \leftarrow 0$ \\ 1937*ebfedea0SLionel Sambuc5. for $n$ from 0 to $a.used - 1$ do \\ 1938*ebfedea0SLionel Sambuc\hspace{3mm}5.1 $rr \leftarrow a_n >> (lg(\beta) - 1)$ \\ 1939*ebfedea0SLionel Sambuc\hspace{3mm}5.2 $b_n \leftarrow (a_n << 1) + r \mbox{ (mod }\beta\mbox{)}$ \\ 1940*ebfedea0SLionel Sambuc\hspace{3mm}5.3 $r \leftarrow rr$ \\ 1941*ebfedea0SLionel Sambuc6. If $r \ne 0$ then do \\ 1942*ebfedea0SLionel Sambuc\hspace{3mm}6.1 $b_{n + 1} \leftarrow r$ \\ 1943*ebfedea0SLionel Sambuc\hspace{3mm}6.2 $b.used \leftarrow b.used + 1$ \\ 1944*ebfedea0SLionel Sambuc7. If $b.used < oldused - 1$ then do \\ 1945*ebfedea0SLionel Sambuc\hspace{3mm}7.1 for $n$ from $b.used$ to $oldused - 1$ do \\ 1946*ebfedea0SLionel Sambuc\hspace{6mm}7.1.1 $b_n \leftarrow 0$ \\ 1947*ebfedea0SLionel Sambuc8. $b.sign \leftarrow a.sign$ \\ 1948*ebfedea0SLionel Sambuc9. Return(\textit{MP\_OKAY}).\\ 1949*ebfedea0SLionel Sambuc\hline 1950*ebfedea0SLionel Sambuc\end{tabular} 1951*ebfedea0SLionel Sambuc\end{center} 1952*ebfedea0SLionel Sambuc\end{small} 1953*ebfedea0SLionel Sambuc\caption{Algorithm mp\_mul\_2} 1954*ebfedea0SLionel Sambuc\end{figure} 1955*ebfedea0SLionel Sambuc 1956*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_mul\_2.} 1957*ebfedea0SLionel SambucThis algorithm will quickly multiply a mp\_int by two provided $\beta$ is a power of two. Neither \cite{TAOCPV2} nor \cite{HAC} describe such 1958*ebfedea0SLionel Sambucan algorithm despite the fact it arises often in other algorithms. The algorithm is setup much like the lower level algorithm s\_mp\_add since 1959*ebfedea0SLionel Sambucit is for all intents and purposes equivalent to the operation $b = \vert a \vert + \vert a \vert$. 1960*ebfedea0SLionel Sambuc 1961*ebfedea0SLionel SambucStep 1 and 2 grow the input as required to accomodate the maximum number of \textbf{used} digits in the result. The initial \textbf{used} count 1962*ebfedea0SLionel Sambucis set to $a.used$ at step 4. Only if there is a final carry will the \textbf{used} count require adjustment. 1963*ebfedea0SLionel Sambuc 1964*ebfedea0SLionel SambucStep 6 is an optimization implementation of the addition loop for this specific case. That is since the two values being added together 1965*ebfedea0SLionel Sambucare the same there is no need to perform two reads from the digits of $a$. Step 6.1 performs a single precision shift on the current digit $a_n$ to 1966*ebfedea0SLionel Sambucobtain what will be the carry for the next iteration. Step 6.2 calculates the $n$'th digit of the result as single precision shift of $a_n$ plus 1967*ebfedea0SLionel Sambucthe previous carry. Recall from ~SHIFTS~ that $a_n << 1$ is equivalent to $a_n \cdot 2$. An iteration of the addition loop is finished with 1968*ebfedea0SLionel Sambucforwarding the carry to the next iteration. 1969*ebfedea0SLionel Sambuc 1970*ebfedea0SLionel SambucStep 7 takes care of any final carry by setting the $a.used$'th digit of the result to the carry and augmenting the \textbf{used} count of $b$. 1971*ebfedea0SLionel SambucStep 8 clears any leading digits of $b$ in case it originally had a larger magnitude than $a$. 1972*ebfedea0SLionel Sambuc 1973*ebfedea0SLionel SambucEXAM,bn_mp_mul_2.c 1974*ebfedea0SLionel Sambuc 1975*ebfedea0SLionel SambucThis implementation is essentially an optimized implementation of s\_mp\_add for the case of doubling an input. The only noteworthy difference 1976*ebfedea0SLionel Sambucis the use of the logical shift operator on line @52,<<@ to perform a single precision doubling. 1977*ebfedea0SLionel Sambuc 1978*ebfedea0SLionel Sambuc\subsection{Division by Two} 1979*ebfedea0SLionel SambucA division by two can just as easily be accomplished with a logical shift right as multiplication by two can be with a logical shift left. 1980*ebfedea0SLionel Sambuc 1981*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 1982*ebfedea0SLionel Sambuc\begin{small} 1983*ebfedea0SLionel Sambuc\begin{center} 1984*ebfedea0SLionel Sambuc\begin{tabular}{l} 1985*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_div\_2}. \\ 1986*ebfedea0SLionel Sambuc\textbf{Input}. One mp\_int $a$ \\ 1987*ebfedea0SLionel Sambuc\textbf{Output}. $b = a/2$. \\ 1988*ebfedea0SLionel Sambuc\hline \\ 1989*ebfedea0SLionel Sambuc1. If $b.alloc < a.used$ then grow $b$ to hold $a.used$ digits. (\textit{mp\_grow}) \\ 1990*ebfedea0SLionel Sambuc2. If the reallocation failed return(\textit{MP\_MEM}). \\ 1991*ebfedea0SLionel Sambuc3. $oldused \leftarrow b.used$ \\ 1992*ebfedea0SLionel Sambuc4. $b.used \leftarrow a.used$ \\ 1993*ebfedea0SLionel Sambuc5. $r \leftarrow 0$ \\ 1994*ebfedea0SLionel Sambuc6. for $n$ from $b.used - 1$ to $0$ do \\ 1995*ebfedea0SLionel Sambuc\hspace{3mm}6.1 $rr \leftarrow a_n \mbox{ (mod }2\mbox{)}$\\ 1996*ebfedea0SLionel Sambuc\hspace{3mm}6.2 $b_n \leftarrow (a_n >> 1) + (r << (lg(\beta) - 1)) \mbox{ (mod }\beta\mbox{)}$ \\ 1997*ebfedea0SLionel Sambuc\hspace{3mm}6.3 $r \leftarrow rr$ \\ 1998*ebfedea0SLionel Sambuc7. If $b.used < oldused - 1$ then do \\ 1999*ebfedea0SLionel Sambuc\hspace{3mm}7.1 for $n$ from $b.used$ to $oldused - 1$ do \\ 2000*ebfedea0SLionel Sambuc\hspace{6mm}7.1.1 $b_n \leftarrow 0$ \\ 2001*ebfedea0SLionel Sambuc8. $b.sign \leftarrow a.sign$ \\ 2002*ebfedea0SLionel Sambuc9. Clamp excess digits of $b$. (\textit{mp\_clamp}) \\ 2003*ebfedea0SLionel Sambuc10. Return(\textit{MP\_OKAY}).\\ 2004*ebfedea0SLionel Sambuc\hline 2005*ebfedea0SLionel Sambuc\end{tabular} 2006*ebfedea0SLionel Sambuc\end{center} 2007*ebfedea0SLionel Sambuc\end{small} 2008*ebfedea0SLionel Sambuc\caption{Algorithm mp\_div\_2} 2009*ebfedea0SLionel Sambuc\end{figure} 2010*ebfedea0SLionel Sambuc 2011*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_div\_2.} 2012*ebfedea0SLionel SambucThis algorithm will divide an mp\_int by two using logical shifts to the right. Like mp\_mul\_2 it uses a modified low level addition 2013*ebfedea0SLionel Sambuccore as the basis of the algorithm. Unlike mp\_mul\_2 the shift operations work from the leading digit to the trailing digit. The algorithm 2014*ebfedea0SLionel Sambuccould be written to work from the trailing digit to the leading digit however, it would have to stop one short of $a.used - 1$ digits to prevent 2015*ebfedea0SLionel Sambucreading past the end of the array of digits. 2016*ebfedea0SLionel Sambuc 2017*ebfedea0SLionel SambucEssentially the loop at step 6 is similar to that of mp\_mul\_2 except the logical shifts go in the opposite direction and the carry is at the 2018*ebfedea0SLionel Sambucleast significant bit not the most significant bit. 2019*ebfedea0SLionel Sambuc 2020*ebfedea0SLionel SambucEXAM,bn_mp_div_2.c 2021*ebfedea0SLionel Sambuc 2022*ebfedea0SLionel Sambuc\section{Polynomial Basis Operations} 2023*ebfedea0SLionel SambucRecall from ~POLY~ that any integer can be represented as a polynomial in $x$ as $y = f(\beta)$. Such a representation is also known as 2024*ebfedea0SLionel Sambucthe polynomial basis \cite[pp. 48]{ROSE}. Given such a notation a multiplication or division by $x$ amounts to shifting whole digits a single 2025*ebfedea0SLionel Sambucplace. The need for such operations arises in several other higher level algorithms such as Barrett and Montgomery reduction, integer 2026*ebfedea0SLionel Sambucdivision and Karatsuba multiplication. 2027*ebfedea0SLionel Sambuc 2028*ebfedea0SLionel SambucConverting from an array of digits to polynomial basis is very simple. Consider the integer $y \equiv (a_2, a_1, a_0)_{\beta}$ and recall that 2029*ebfedea0SLionel Sambuc$y = \sum_{i=0}^{2} a_i \beta^i$. Simply replace $\beta$ with $x$ and the expression is in polynomial basis. For example, $f(x) = 8x + 9$ is the 2030*ebfedea0SLionel Sambucpolynomial basis representation for $89$ using radix ten. That is, $f(10) = 8(10) + 9 = 89$. 2031*ebfedea0SLionel Sambuc 2032*ebfedea0SLionel Sambuc\subsection{Multiplication by $x$} 2033*ebfedea0SLionel Sambuc 2034*ebfedea0SLionel SambucGiven a polynomial in $x$ such as $f(x) = a_n x^n + a_{n-1} x^{n-1} + ... + a_0$ multiplying by $x$ amounts to shifting the coefficients up one 2035*ebfedea0SLionel Sambucdegree. In this case $f(x) \cdot x = a_n x^{n+1} + a_{n-1} x^n + ... + a_0 x$. From a scalar basis point of view multiplying by $x$ is equivalent to 2036*ebfedea0SLionel Sambucmultiplying by the integer $\beta$. 2037*ebfedea0SLionel Sambuc 2038*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 2039*ebfedea0SLionel Sambuc\begin{small} 2040*ebfedea0SLionel Sambuc\begin{center} 2041*ebfedea0SLionel Sambuc\begin{tabular}{l} 2042*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_lshd}. \\ 2043*ebfedea0SLionel Sambuc\textbf{Input}. One mp\_int $a$ and an integer $b$ \\ 2044*ebfedea0SLionel Sambuc\textbf{Output}. $a \leftarrow a \cdot \beta^b$ (equivalent to multiplication by $x^b$). \\ 2045*ebfedea0SLionel Sambuc\hline \\ 2046*ebfedea0SLionel Sambuc1. If $b \le 0$ then return(\textit{MP\_OKAY}). \\ 2047*ebfedea0SLionel Sambuc2. If $a.alloc < a.used + b$ then grow $a$ to at least $a.used + b$ digits. (\textit{mp\_grow}). \\ 2048*ebfedea0SLionel Sambuc3. If the reallocation failed return(\textit{MP\_MEM}). \\ 2049*ebfedea0SLionel Sambuc4. $a.used \leftarrow a.used + b$ \\ 2050*ebfedea0SLionel Sambuc5. $i \leftarrow a.used - 1$ \\ 2051*ebfedea0SLionel Sambuc6. $j \leftarrow a.used - 1 - b$ \\ 2052*ebfedea0SLionel Sambuc7. for $n$ from $a.used - 1$ to $b$ do \\ 2053*ebfedea0SLionel Sambuc\hspace{3mm}7.1 $a_{i} \leftarrow a_{j}$ \\ 2054*ebfedea0SLionel Sambuc\hspace{3mm}7.2 $i \leftarrow i - 1$ \\ 2055*ebfedea0SLionel Sambuc\hspace{3mm}7.3 $j \leftarrow j - 1$ \\ 2056*ebfedea0SLionel Sambuc8. for $n$ from 0 to $b - 1$ do \\ 2057*ebfedea0SLionel Sambuc\hspace{3mm}8.1 $a_n \leftarrow 0$ \\ 2058*ebfedea0SLionel Sambuc9. Return(\textit{MP\_OKAY}). \\ 2059*ebfedea0SLionel Sambuc\hline 2060*ebfedea0SLionel Sambuc\end{tabular} 2061*ebfedea0SLionel Sambuc\end{center} 2062*ebfedea0SLionel Sambuc\end{small} 2063*ebfedea0SLionel Sambuc\caption{Algorithm mp\_lshd} 2064*ebfedea0SLionel Sambuc\end{figure} 2065*ebfedea0SLionel Sambuc 2066*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_lshd.} 2067*ebfedea0SLionel SambucThis algorithm multiplies an mp\_int by the $b$'th power of $x$. This is equivalent to multiplying by $\beta^b$. The algorithm differs 2068*ebfedea0SLionel Sambucfrom the other algorithms presented so far as it performs the operation in place instead storing the result in a separate location. The 2069*ebfedea0SLionel Sambucmotivation behind this change is due to the way this function is typically used. Algorithms such as mp\_add store the result in an optionally 2070*ebfedea0SLionel Sambucdifferent third mp\_int because the original inputs are often still required. Algorithm mp\_lshd (\textit{and similarly algorithm mp\_rshd}) is 2071*ebfedea0SLionel Sambuctypically used on values where the original value is no longer required. The algorithm will return success immediately if 2072*ebfedea0SLionel Sambuc$b \le 0$ since the rest of algorithm is only valid when $b > 0$. 2073*ebfedea0SLionel Sambuc 2074*ebfedea0SLionel SambucFirst the destination $a$ is grown as required to accomodate the result. The counters $i$ and $j$ are used to form a \textit{sliding window} over 2075*ebfedea0SLionel Sambucthe digits of $a$ of length $b$. The head of the sliding window is at $i$ (\textit{the leading digit}) and the tail at $j$ (\textit{the trailing digit}). 2076*ebfedea0SLionel SambucThe loop on step 7 copies the digit from the tail to the head. In each iteration the window is moved down one digit. The last loop on 2077*ebfedea0SLionel Sambucstep 8 sets the lower $b$ digits to zero. 2078*ebfedea0SLionel Sambuc 2079*ebfedea0SLionel Sambuc\newpage 2080*ebfedea0SLionel SambucFIGU,sliding_window,Sliding Window Movement 2081*ebfedea0SLionel Sambuc 2082*ebfedea0SLionel SambucEXAM,bn_mp_lshd.c 2083*ebfedea0SLionel Sambuc 2084*ebfedea0SLionel SambucThe if statement (line @24,if@) ensures that the $b$ variable is greater than zero since we do not interpret negative 2085*ebfedea0SLionel Sambucshift counts properly. The \textbf{used} count is incremented by $b$ before the copy loop begins. This elminates 2086*ebfedea0SLionel Sambucthe need for an additional variable in the for loop. The variable $top$ (line @42,top@) is an alias 2087*ebfedea0SLionel Sambucfor the leading digit while $bottom$ (line @45,bottom@) is an alias for the trailing edge. The aliases form a 2088*ebfedea0SLionel Sambucwindow of exactly $b$ digits over the input. 2089*ebfedea0SLionel Sambuc 2090*ebfedea0SLionel Sambuc\subsection{Division by $x$} 2091*ebfedea0SLionel Sambuc 2092*ebfedea0SLionel SambucDivision by powers of $x$ is easily achieved by shifting the digits right and removing any that will end up to the right of the zero'th digit. 2093*ebfedea0SLionel Sambuc 2094*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 2095*ebfedea0SLionel Sambuc\begin{small} 2096*ebfedea0SLionel Sambuc\begin{center} 2097*ebfedea0SLionel Sambuc\begin{tabular}{l} 2098*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_rshd}. \\ 2099*ebfedea0SLionel Sambuc\textbf{Input}. One mp\_int $a$ and an integer $b$ \\ 2100*ebfedea0SLionel Sambuc\textbf{Output}. $a \leftarrow a / \beta^b$ (Divide by $x^b$). \\ 2101*ebfedea0SLionel Sambuc\hline \\ 2102*ebfedea0SLionel Sambuc1. If $b \le 0$ then return. \\ 2103*ebfedea0SLionel Sambuc2. If $a.used \le b$ then do \\ 2104*ebfedea0SLionel Sambuc\hspace{3mm}2.1 Zero $a$. (\textit{mp\_zero}). \\ 2105*ebfedea0SLionel Sambuc\hspace{3mm}2.2 Return. \\ 2106*ebfedea0SLionel Sambuc3. $i \leftarrow 0$ \\ 2107*ebfedea0SLionel Sambuc4. $j \leftarrow b$ \\ 2108*ebfedea0SLionel Sambuc5. for $n$ from 0 to $a.used - b - 1$ do \\ 2109*ebfedea0SLionel Sambuc\hspace{3mm}5.1 $a_i \leftarrow a_j$ \\ 2110*ebfedea0SLionel Sambuc\hspace{3mm}5.2 $i \leftarrow i + 1$ \\ 2111*ebfedea0SLionel Sambuc\hspace{3mm}5.3 $j \leftarrow j + 1$ \\ 2112*ebfedea0SLionel Sambuc6. for $n$ from $a.used - b$ to $a.used - 1$ do \\ 2113*ebfedea0SLionel Sambuc\hspace{3mm}6.1 $a_n \leftarrow 0$ \\ 2114*ebfedea0SLionel Sambuc7. $a.used \leftarrow a.used - b$ \\ 2115*ebfedea0SLionel Sambuc8. Return. \\ 2116*ebfedea0SLionel Sambuc\hline 2117*ebfedea0SLionel Sambuc\end{tabular} 2118*ebfedea0SLionel Sambuc\end{center} 2119*ebfedea0SLionel Sambuc\end{small} 2120*ebfedea0SLionel Sambuc\caption{Algorithm mp\_rshd} 2121*ebfedea0SLionel Sambuc\end{figure} 2122*ebfedea0SLionel Sambuc 2123*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_rshd.} 2124*ebfedea0SLionel SambucThis algorithm divides the input in place by the $b$'th power of $x$. It is analogous to dividing by a $\beta^b$ but much quicker since 2125*ebfedea0SLionel Sambucit does not require single precision division. This algorithm does not actually return an error code as it cannot fail. 2126*ebfedea0SLionel Sambuc 2127*ebfedea0SLionel SambucIf the input $b$ is less than one the algorithm quickly returns without performing any work. If the \textbf{used} count is less than or equal 2128*ebfedea0SLionel Sambucto the shift count $b$ then it will simply zero the input and return. 2129*ebfedea0SLionel Sambuc 2130*ebfedea0SLionel SambucAfter the trivial cases of inputs have been handled the sliding window is setup. Much like the case of algorithm mp\_lshd a sliding window that 2131*ebfedea0SLionel Sambucis $b$ digits wide is used to copy the digits. Unlike mp\_lshd the window slides in the opposite direction from the trailing to the leading digit. 2132*ebfedea0SLionel SambucAlso the digits are copied from the leading to the trailing edge. 2133*ebfedea0SLionel Sambuc 2134*ebfedea0SLionel SambucOnce the window copy is complete the upper digits must be zeroed and the \textbf{used} count decremented. 2135*ebfedea0SLionel Sambuc 2136*ebfedea0SLionel SambucEXAM,bn_mp_rshd.c 2137*ebfedea0SLionel Sambuc 2138*ebfedea0SLionel SambucThe only noteworthy element of this routine is the lack of a return type since it cannot fail. Like mp\_lshd() we 2139*ebfedea0SLionel Sambucform a sliding window except we copy in the other direction. After the window (line @59,for (;@) we then zero 2140*ebfedea0SLionel Sambucthe upper digits of the input to make sure the result is correct. 2141*ebfedea0SLionel Sambuc 2142*ebfedea0SLionel Sambuc\section{Powers of Two} 2143*ebfedea0SLionel Sambuc 2144*ebfedea0SLionel SambucNow that algorithms for moving single bits as well as whole digits exist algorithms for moving the ``in between'' distances are required. For 2145*ebfedea0SLionel Sambucexample, to quickly multiply by $2^k$ for any $k$ without using a full multiplier algorithm would prove useful. Instead of performing single 2146*ebfedea0SLionel Sambucshifts $k$ times to achieve a multiplication by $2^{\pm k}$ a mixture of whole digit shifting and partial digit shifting is employed. 2147*ebfedea0SLionel Sambuc 2148*ebfedea0SLionel Sambuc\subsection{Multiplication by Power of Two} 2149*ebfedea0SLionel Sambuc 2150*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 2151*ebfedea0SLionel Sambuc\begin{small} 2152*ebfedea0SLionel Sambuc\begin{center} 2153*ebfedea0SLionel Sambuc\begin{tabular}{l} 2154*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_mul\_2d}. \\ 2155*ebfedea0SLionel Sambuc\textbf{Input}. One mp\_int $a$ and an integer $b$ \\ 2156*ebfedea0SLionel Sambuc\textbf{Output}. $c \leftarrow a \cdot 2^b$. \\ 2157*ebfedea0SLionel Sambuc\hline \\ 2158*ebfedea0SLionel Sambuc1. $c \leftarrow a$. (\textit{mp\_copy}) \\ 2159*ebfedea0SLionel Sambuc2. If $c.alloc < c.used + \lfloor b / lg(\beta) \rfloor + 2$ then grow $c$ accordingly. \\ 2160*ebfedea0SLionel Sambuc3. If the reallocation failed return(\textit{MP\_MEM}). \\ 2161*ebfedea0SLionel Sambuc4. If $b \ge lg(\beta)$ then \\ 2162*ebfedea0SLionel Sambuc\hspace{3mm}4.1 $c \leftarrow c \cdot \beta^{\lfloor b / lg(\beta) \rfloor}$ (\textit{mp\_lshd}). \\ 2163*ebfedea0SLionel Sambuc\hspace{3mm}4.2 If step 4.1 failed return(\textit{MP\_MEM}). \\ 2164*ebfedea0SLionel Sambuc5. $d \leftarrow b \mbox{ (mod }lg(\beta)\mbox{)}$ \\ 2165*ebfedea0SLionel Sambuc6. If $d \ne 0$ then do \\ 2166*ebfedea0SLionel Sambuc\hspace{3mm}6.1 $mask \leftarrow 2^d$ \\ 2167*ebfedea0SLionel Sambuc\hspace{3mm}6.2 $r \leftarrow 0$ \\ 2168*ebfedea0SLionel Sambuc\hspace{3mm}6.3 for $n$ from $0$ to $c.used - 1$ do \\ 2169*ebfedea0SLionel Sambuc\hspace{6mm}6.3.1 $rr \leftarrow c_n >> (lg(\beta) - d) \mbox{ (mod }mask\mbox{)}$ \\ 2170*ebfedea0SLionel Sambuc\hspace{6mm}6.3.2 $c_n \leftarrow (c_n << d) + r \mbox{ (mod }\beta\mbox{)}$ \\ 2171*ebfedea0SLionel Sambuc\hspace{6mm}6.3.3 $r \leftarrow rr$ \\ 2172*ebfedea0SLionel Sambuc\hspace{3mm}6.4 If $r > 0$ then do \\ 2173*ebfedea0SLionel Sambuc\hspace{6mm}6.4.1 $c_{c.used} \leftarrow r$ \\ 2174*ebfedea0SLionel Sambuc\hspace{6mm}6.4.2 $c.used \leftarrow c.used + 1$ \\ 2175*ebfedea0SLionel Sambuc7. Return(\textit{MP\_OKAY}). \\ 2176*ebfedea0SLionel Sambuc\hline 2177*ebfedea0SLionel Sambuc\end{tabular} 2178*ebfedea0SLionel Sambuc\end{center} 2179*ebfedea0SLionel Sambuc\end{small} 2180*ebfedea0SLionel Sambuc\caption{Algorithm mp\_mul\_2d} 2181*ebfedea0SLionel Sambuc\end{figure} 2182*ebfedea0SLionel Sambuc 2183*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_mul\_2d.} 2184*ebfedea0SLionel SambucThis algorithm multiplies $a$ by $2^b$ and stores the result in $c$. The algorithm uses algorithm mp\_lshd and a derivative of algorithm mp\_mul\_2 to 2185*ebfedea0SLionel Sambucquickly compute the product. 2186*ebfedea0SLionel Sambuc 2187*ebfedea0SLionel SambucFirst the algorithm will multiply $a$ by $x^{\lfloor b / lg(\beta) \rfloor}$ which will ensure that the remainder multiplicand is less than 2188*ebfedea0SLionel Sambuc$\beta$. For example, if $b = 37$ and $\beta = 2^{28}$ then this step will multiply by $x$ leaving a multiplication by $2^{37 - 28} = 2^{9}$ 2189*ebfedea0SLionel Sambucleft. 2190*ebfedea0SLionel Sambuc 2191*ebfedea0SLionel SambucAfter the digits have been shifted appropriately at most $lg(\beta) - 1$ shifts are left to perform. Step 5 calculates the number of remaining shifts 2192*ebfedea0SLionel Sambucrequired. If it is non-zero a modified shift loop is used to calculate the remaining product. 2193*ebfedea0SLionel SambucEssentially the loop is a generic version of algorithm mp\_mul\_2 designed to handle any shift count in the range $1 \le x < lg(\beta)$. The $mask$ 2194*ebfedea0SLionel Sambucvariable is used to extract the upper $d$ bits to form the carry for the next iteration. 2195*ebfedea0SLionel Sambuc 2196*ebfedea0SLionel SambucThis algorithm is loosely measured as a $O(2n)$ algorithm which means that if the input is $n$-digits that it takes $2n$ ``time'' to 2197*ebfedea0SLionel Sambuccomplete. It is possible to optimize this algorithm down to a $O(n)$ algorithm at a cost of making the algorithm slightly harder to follow. 2198*ebfedea0SLionel Sambuc 2199*ebfedea0SLionel SambucEXAM,bn_mp_mul_2d.c 2200*ebfedea0SLionel Sambuc 2201*ebfedea0SLionel SambucThe shifting is performed in--place which means the first step (line @24,a != c@) is to copy the input to the 2202*ebfedea0SLionel Sambucdestination. We avoid calling mp\_copy() by making sure the mp\_ints are different. The destination then 2203*ebfedea0SLionel Sambuchas to be grown (line @31,grow@) to accomodate the result. 2204*ebfedea0SLionel Sambuc 2205*ebfedea0SLionel SambucIf the shift count $b$ is larger than $lg(\beta)$ then a call to mp\_lshd() is used to handle all of the multiples 2206*ebfedea0SLionel Sambucof $lg(\beta)$. Leaving only a remaining shift of $lg(\beta) - 1$ or fewer bits left. Inside the actual shift 2207*ebfedea0SLionel Sambucloop (lines @45,if@ to @76,}@) we make use of pre--computed values $shift$ and $mask$. These are used to 2208*ebfedea0SLionel Sambucextract the carry bit(s) to pass into the next iteration of the loop. The $r$ and $rr$ variables form a 2209*ebfedea0SLionel Sambucchain between consecutive iterations to propagate the carry. 2210*ebfedea0SLionel Sambuc 2211*ebfedea0SLionel Sambuc\subsection{Division by Power of Two} 2212*ebfedea0SLionel Sambuc 2213*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 2214*ebfedea0SLionel Sambuc\begin{small} 2215*ebfedea0SLionel Sambuc\begin{center} 2216*ebfedea0SLionel Sambuc\begin{tabular}{l} 2217*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_div\_2d}. \\ 2218*ebfedea0SLionel Sambuc\textbf{Input}. One mp\_int $a$ and an integer $b$ \\ 2219*ebfedea0SLionel Sambuc\textbf{Output}. $c \leftarrow \lfloor a / 2^b \rfloor, d \leftarrow a \mbox{ (mod }2^b\mbox{)}$. \\ 2220*ebfedea0SLionel Sambuc\hline \\ 2221*ebfedea0SLionel Sambuc1. If $b \le 0$ then do \\ 2222*ebfedea0SLionel Sambuc\hspace{3mm}1.1 $c \leftarrow a$ (\textit{mp\_copy}) \\ 2223*ebfedea0SLionel Sambuc\hspace{3mm}1.2 $d \leftarrow 0$ (\textit{mp\_zero}) \\ 2224*ebfedea0SLionel Sambuc\hspace{3mm}1.3 Return(\textit{MP\_OKAY}). \\ 2225*ebfedea0SLionel Sambuc2. $c \leftarrow a$ \\ 2226*ebfedea0SLionel Sambuc3. $d \leftarrow a \mbox{ (mod }2^b\mbox{)}$ (\textit{mp\_mod\_2d}) \\ 2227*ebfedea0SLionel Sambuc4. If $b \ge lg(\beta)$ then do \\ 2228*ebfedea0SLionel Sambuc\hspace{3mm}4.1 $c \leftarrow \lfloor c/\beta^{\lfloor b/lg(\beta) \rfloor} \rfloor$ (\textit{mp\_rshd}). \\ 2229*ebfedea0SLionel Sambuc5. $k \leftarrow b \mbox{ (mod }lg(\beta)\mbox{)}$ \\ 2230*ebfedea0SLionel Sambuc6. If $k \ne 0$ then do \\ 2231*ebfedea0SLionel Sambuc\hspace{3mm}6.1 $mask \leftarrow 2^k$ \\ 2232*ebfedea0SLionel Sambuc\hspace{3mm}6.2 $r \leftarrow 0$ \\ 2233*ebfedea0SLionel Sambuc\hspace{3mm}6.3 for $n$ from $c.used - 1$ to $0$ do \\ 2234*ebfedea0SLionel Sambuc\hspace{6mm}6.3.1 $rr \leftarrow c_n \mbox{ (mod }mask\mbox{)}$ \\ 2235*ebfedea0SLionel Sambuc\hspace{6mm}6.3.2 $c_n \leftarrow (c_n >> k) + (r << (lg(\beta) - k))$ \\ 2236*ebfedea0SLionel Sambuc\hspace{6mm}6.3.3 $r \leftarrow rr$ \\ 2237*ebfedea0SLionel Sambuc7. Clamp excess digits of $c$. (\textit{mp\_clamp}) \\ 2238*ebfedea0SLionel Sambuc8. Return(\textit{MP\_OKAY}). \\ 2239*ebfedea0SLionel Sambuc\hline 2240*ebfedea0SLionel Sambuc\end{tabular} 2241*ebfedea0SLionel Sambuc\end{center} 2242*ebfedea0SLionel Sambuc\end{small} 2243*ebfedea0SLionel Sambuc\caption{Algorithm mp\_div\_2d} 2244*ebfedea0SLionel Sambuc\end{figure} 2245*ebfedea0SLionel Sambuc 2246*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_div\_2d.} 2247*ebfedea0SLionel SambucThis algorithm will divide an input $a$ by $2^b$ and produce the quotient and remainder. The algorithm is designed much like algorithm 2248*ebfedea0SLionel Sambucmp\_mul\_2d by first using whole digit shifts then single precision shifts. This algorithm will also produce the remainder of the division 2249*ebfedea0SLionel Sambucby using algorithm mp\_mod\_2d. 2250*ebfedea0SLionel Sambuc 2251*ebfedea0SLionel SambucEXAM,bn_mp_div_2d.c 2252*ebfedea0SLionel Sambuc 2253*ebfedea0SLionel SambucThe implementation of algorithm mp\_div\_2d is slightly different than the algorithm specifies. The remainder $d$ may be optionally 2254*ebfedea0SLionel Sambucignored by passing \textbf{NULL} as the pointer to the mp\_int variable. The temporary mp\_int variable $t$ is used to hold the 2255*ebfedea0SLionel Sambucresult of the remainder operation until the end. This allows $d$ and $a$ to represent the same mp\_int without modifying $a$ before 2256*ebfedea0SLionel Sambucthe quotient is obtained. 2257*ebfedea0SLionel Sambuc 2258*ebfedea0SLionel SambucThe remainder of the source code is essentially the same as the source code for mp\_mul\_2d. The only significant difference is 2259*ebfedea0SLionel Sambucthe direction of the shifts. 2260*ebfedea0SLionel Sambuc 2261*ebfedea0SLionel Sambuc\subsection{Remainder of Division by Power of Two} 2262*ebfedea0SLionel Sambuc 2263*ebfedea0SLionel SambucThe last algorithm in the series of polynomial basis power of two algorithms is calculating the remainder of division by $2^b$. This 2264*ebfedea0SLionel Sambucalgorithm benefits from the fact that in twos complement arithmetic $a \mbox{ (mod }2^b\mbox{)}$ is the same as $a$ AND $2^b - 1$. 2265*ebfedea0SLionel Sambuc 2266*ebfedea0SLionel Sambuc\begin{figure}[!here] 2267*ebfedea0SLionel Sambuc\begin{small} 2268*ebfedea0SLionel Sambuc\begin{center} 2269*ebfedea0SLionel Sambuc\begin{tabular}{l} 2270*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_mod\_2d}. \\ 2271*ebfedea0SLionel Sambuc\textbf{Input}. One mp\_int $a$ and an integer $b$ \\ 2272*ebfedea0SLionel Sambuc\textbf{Output}. $c \leftarrow a \mbox{ (mod }2^b\mbox{)}$. \\ 2273*ebfedea0SLionel Sambuc\hline \\ 2274*ebfedea0SLionel Sambuc1. If $b \le 0$ then do \\ 2275*ebfedea0SLionel Sambuc\hspace{3mm}1.1 $c \leftarrow 0$ (\textit{mp\_zero}) \\ 2276*ebfedea0SLionel Sambuc\hspace{3mm}1.2 Return(\textit{MP\_OKAY}). \\ 2277*ebfedea0SLionel Sambuc2. If $b > a.used \cdot lg(\beta)$ then do \\ 2278*ebfedea0SLionel Sambuc\hspace{3mm}2.1 $c \leftarrow a$ (\textit{mp\_copy}) \\ 2279*ebfedea0SLionel Sambuc\hspace{3mm}2.2 Return the result of step 2.1. \\ 2280*ebfedea0SLionel Sambuc3. $c \leftarrow a$ \\ 2281*ebfedea0SLionel Sambuc4. If step 3 failed return(\textit{MP\_MEM}). \\ 2282*ebfedea0SLionel Sambuc5. for $n$ from $\lceil b / lg(\beta) \rceil$ to $c.used$ do \\ 2283*ebfedea0SLionel Sambuc\hspace{3mm}5.1 $c_n \leftarrow 0$ \\ 2284*ebfedea0SLionel Sambuc6. $k \leftarrow b \mbox{ (mod }lg(\beta)\mbox{)}$ \\ 2285*ebfedea0SLionel Sambuc7. $c_{\lfloor b / lg(\beta) \rfloor} \leftarrow c_{\lfloor b / lg(\beta) \rfloor} \mbox{ (mod }2^{k}\mbox{)}$. \\ 2286*ebfedea0SLionel Sambuc8. Clamp excess digits of $c$. (\textit{mp\_clamp}) \\ 2287*ebfedea0SLionel Sambuc9. Return(\textit{MP\_OKAY}). \\ 2288*ebfedea0SLionel Sambuc\hline 2289*ebfedea0SLionel Sambuc\end{tabular} 2290*ebfedea0SLionel Sambuc\end{center} 2291*ebfedea0SLionel Sambuc\end{small} 2292*ebfedea0SLionel Sambuc\caption{Algorithm mp\_mod\_2d} 2293*ebfedea0SLionel Sambuc\end{figure} 2294*ebfedea0SLionel Sambuc 2295*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_mod\_2d.} 2296*ebfedea0SLionel SambucThis algorithm will quickly calculate the value of $a \mbox{ (mod }2^b\mbox{)}$. First if $b$ is less than or equal to zero the 2297*ebfedea0SLionel Sambucresult is set to zero. If $b$ is greater than the number of bits in $a$ then it simply copies $a$ to $c$ and returns. Otherwise, $a$ 2298*ebfedea0SLionel Sambucis copied to $b$, leading digits are removed and the remaining leading digit is trimed to the exact bit count. 2299*ebfedea0SLionel Sambuc 2300*ebfedea0SLionel SambucEXAM,bn_mp_mod_2d.c 2301*ebfedea0SLionel Sambuc 2302*ebfedea0SLionel SambucWe first avoid cases of $b \le 0$ by simply mp\_zero()'ing the destination in such cases. Next if $2^b$ is larger 2303*ebfedea0SLionel Sambucthan the input we just mp\_copy() the input and return right away. After this point we know we must actually 2304*ebfedea0SLionel Sambucperform some work to produce the remainder. 2305*ebfedea0SLionel Sambuc 2306*ebfedea0SLionel SambucRecalling that reducing modulo $2^k$ and a binary ``and'' with $2^k - 1$ are numerically equivalent we can quickly reduce 2307*ebfedea0SLionel Sambucthe number. First we zero any digits above the last digit in $2^b$ (line @41,for@). Next we reduce the 2308*ebfedea0SLionel Sambucleading digit of both (line @45,&=@) and then mp\_clamp(). 2309*ebfedea0SLionel Sambuc 2310*ebfedea0SLionel Sambuc\section*{Exercises} 2311*ebfedea0SLionel Sambuc\begin{tabular}{cl} 2312*ebfedea0SLionel Sambuc$\left [ 3 \right ] $ & Devise an algorithm that performs $a \cdot 2^b$ for generic values of $b$ \\ 2313*ebfedea0SLionel Sambuc & in $O(n)$ time. \\ 2314*ebfedea0SLionel Sambuc &\\ 2315*ebfedea0SLionel Sambuc$\left [ 3 \right ] $ & Devise an efficient algorithm to multiply by small low hamming \\ 2316*ebfedea0SLionel Sambuc & weight values such as $3$, $5$ and $9$. Extend it to handle all values \\ 2317*ebfedea0SLionel Sambuc & upto $64$ with a hamming weight less than three. \\ 2318*ebfedea0SLionel Sambuc &\\ 2319*ebfedea0SLionel Sambuc$\left [ 2 \right ] $ & Modify the preceding algorithm to handle values of the form \\ 2320*ebfedea0SLionel Sambuc & $2^k - 1$ as well. \\ 2321*ebfedea0SLionel Sambuc &\\ 2322*ebfedea0SLionel Sambuc$\left [ 3 \right ] $ & Using only algorithms mp\_mul\_2, mp\_div\_2 and mp\_add create an \\ 2323*ebfedea0SLionel Sambuc & algorithm to multiply two integers in roughly $O(2n^2)$ time for \\ 2324*ebfedea0SLionel Sambuc & any $n$-bit input. Note that the time of addition is ignored in the \\ 2325*ebfedea0SLionel Sambuc & calculation. \\ 2326*ebfedea0SLionel Sambuc & \\ 2327*ebfedea0SLionel Sambuc$\left [ 5 \right ] $ & Improve the previous algorithm to have a working time of at most \\ 2328*ebfedea0SLionel Sambuc & $O \left (2^{(k-1)}n + \left ({2n^2 \over k} \right ) \right )$ for an appropriate choice of $k$. Again ignore \\ 2329*ebfedea0SLionel Sambuc & the cost of addition. \\ 2330*ebfedea0SLionel Sambuc & \\ 2331*ebfedea0SLionel Sambuc$\left [ 2 \right ] $ & Devise a chart to find optimal values of $k$ for the previous problem \\ 2332*ebfedea0SLionel Sambuc & for $n = 64 \ldots 1024$ in steps of $64$. \\ 2333*ebfedea0SLionel Sambuc & \\ 2334*ebfedea0SLionel Sambuc$\left [ 2 \right ] $ & Using only algorithms mp\_abs and mp\_sub devise another method for \\ 2335*ebfedea0SLionel Sambuc & calculating the result of a signed comparison. \\ 2336*ebfedea0SLionel Sambuc & 2337*ebfedea0SLionel Sambuc\end{tabular} 2338*ebfedea0SLionel Sambuc 2339*ebfedea0SLionel Sambuc\chapter{Multiplication and Squaring} 2340*ebfedea0SLionel Sambuc\section{The Multipliers} 2341*ebfedea0SLionel SambucFor most number theoretic problems including certain public key cryptographic algorithms, the ``multipliers'' form the most important subset of 2342*ebfedea0SLionel Sambucalgorithms of any multiple precision integer package. The set of multiplier algorithms include integer multiplication, squaring and modular reduction 2343*ebfedea0SLionel Sambucwhere in each of the algorithms single precision multiplication is the dominant operation performed. This chapter will discuss integer multiplication 2344*ebfedea0SLionel Sambucand squaring, leaving modular reductions for the subsequent chapter. 2345*ebfedea0SLionel Sambuc 2346*ebfedea0SLionel SambucThe importance of the multiplier algorithms is for the most part driven by the fact that certain popular public key algorithms are based on modular 2347*ebfedea0SLionel Sambucexponentiation, that is computing $d \equiv a^b \mbox{ (mod }c\mbox{)}$ for some arbitrary choice of $a$, $b$, $c$ and $d$. During a modular 2348*ebfedea0SLionel Sambucexponentiation the majority\footnote{Roughly speaking a modular exponentiation will spend about 40\% of the time performing modular reductions, 2349*ebfedea0SLionel Sambuc35\% of the time performing squaring and 25\% of the time performing multiplications.} of the processor time is spent performing single precision 2350*ebfedea0SLionel Sambucmultiplications. 2351*ebfedea0SLionel Sambuc 2352*ebfedea0SLionel SambucFor centuries general purpose multiplication has required a lengthly $O(n^2)$ process, whereby each digit of one multiplicand has to be multiplied 2353*ebfedea0SLionel Sambucagainst every digit of the other multiplicand. Traditional long-hand multiplication is based on this process; while the techniques can differ the 2354*ebfedea0SLionel Sambucoverall algorithm used is essentially the same. Only ``recently'' have faster algorithms been studied. First Karatsuba multiplication was discovered in 2355*ebfedea0SLionel Sambuc1962. This algorithm can multiply two numbers with considerably fewer single precision multiplications when compared to the long-hand approach. 2356*ebfedea0SLionel SambucThis technique led to the discovery of polynomial basis algorithms (\textit{good reference?}) and subquently Fourier Transform based solutions. 2357*ebfedea0SLionel Sambuc 2358*ebfedea0SLionel Sambuc\section{Multiplication} 2359*ebfedea0SLionel Sambuc\subsection{The Baseline Multiplication} 2360*ebfedea0SLionel Sambuc\label{sec:basemult} 2361*ebfedea0SLionel Sambuc\index{baseline multiplication} 2362*ebfedea0SLionel SambucComputing the product of two integers in software can be achieved using a trivial adaptation of the standard $O(n^2)$ long-hand multiplication 2363*ebfedea0SLionel Sambucalgorithm that school children are taught. The algorithm is considered an $O(n^2)$ algorithm since for two $n$-digit inputs $n^2$ single precision 2364*ebfedea0SLionel Sambucmultiplications are required. More specifically for a $m$ and $n$ digit input $m \cdot n$ single precision multiplications are required. To 2365*ebfedea0SLionel Sambucsimplify most discussions, it will be assumed that the inputs have comparable number of digits. 2366*ebfedea0SLionel Sambuc 2367*ebfedea0SLionel SambucThe ``baseline multiplication'' algorithm is designed to act as the ``catch-all'' algorithm, only to be used when the faster algorithms cannot be 2368*ebfedea0SLionel Sambucused. This algorithm does not use any particularly interesting optimizations and should ideally be avoided if possible. One important 2369*ebfedea0SLionel Sambucfacet of this algorithm, is that it has been modified to only produce a certain amount of output digits as resolution. The importance of this 2370*ebfedea0SLionel Sambucmodification will become evident during the discussion of Barrett modular reduction. Recall that for a $n$ and $m$ digit input the product 2371*ebfedea0SLionel Sambucwill be at most $n + m$ digits. Therefore, this algorithm can be reduced to a full multiplier by having it produce $n + m$ digits of the product. 2372*ebfedea0SLionel Sambuc 2373*ebfedea0SLionel SambucRecall from ~GAMMA~ the definition of $\gamma$ as the number of bits in the type \textbf{mp\_digit}. We shall now extend the variable set to 2374*ebfedea0SLionel Sambucinclude $\alpha$ which shall represent the number of bits in the type \textbf{mp\_word}. This implies that $2^{\alpha} > 2 \cdot \beta^2$. The 2375*ebfedea0SLionel Sambucconstant $\delta = 2^{\alpha - 2lg(\beta)}$ will represent the maximal weight of any column in a product (\textit{see ~COMBA~ for more information}). 2376*ebfedea0SLionel Sambuc 2377*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 2378*ebfedea0SLionel Sambuc\begin{small} 2379*ebfedea0SLionel Sambuc\begin{center} 2380*ebfedea0SLionel Sambuc\begin{tabular}{l} 2381*ebfedea0SLionel Sambuc\hline Algorithm \textbf{s\_mp\_mul\_digs}. \\ 2382*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$, mp\_int $b$ and an integer $digs$ \\ 2383*ebfedea0SLionel Sambuc\textbf{Output}. $c \leftarrow \vert a \vert \cdot \vert b \vert \mbox{ (mod }\beta^{digs}\mbox{)}$. \\ 2384*ebfedea0SLionel Sambuc\hline \\ 2385*ebfedea0SLionel Sambuc1. If min$(a.used, b.used) < \delta$ then do \\ 2386*ebfedea0SLionel Sambuc\hspace{3mm}1.1 Calculate $c = \vert a \vert \cdot \vert b \vert$ by the Comba method (\textit{see algorithm~\ref{fig:COMBAMULT}}). \\ 2387*ebfedea0SLionel Sambuc\hspace{3mm}1.2 Return the result of step 1.1 \\ 2388*ebfedea0SLionel Sambuc\\ 2389*ebfedea0SLionel SambucAllocate and initialize a temporary mp\_int. \\ 2390*ebfedea0SLionel Sambuc2. Init $t$ to be of size $digs$ \\ 2391*ebfedea0SLionel Sambuc3. If step 2 failed return(\textit{MP\_MEM}). \\ 2392*ebfedea0SLionel Sambuc4. $t.used \leftarrow digs$ \\ 2393*ebfedea0SLionel Sambuc\\ 2394*ebfedea0SLionel SambucCompute the product. \\ 2395*ebfedea0SLionel Sambuc5. for $ix$ from $0$ to $a.used - 1$ do \\ 2396*ebfedea0SLionel Sambuc\hspace{3mm}5.1 $u \leftarrow 0$ \\ 2397*ebfedea0SLionel Sambuc\hspace{3mm}5.2 $pb \leftarrow \mbox{min}(b.used, digs - ix)$ \\ 2398*ebfedea0SLionel Sambuc\hspace{3mm}5.3 If $pb < 1$ then goto step 6. \\ 2399*ebfedea0SLionel Sambuc\hspace{3mm}5.4 for $iy$ from $0$ to $pb - 1$ do \\ 2400*ebfedea0SLionel Sambuc\hspace{6mm}5.4.1 $\hat r \leftarrow t_{iy + ix} + a_{ix} \cdot b_{iy} + u$ \\ 2401*ebfedea0SLionel Sambuc\hspace{6mm}5.4.2 $t_{iy + ix} \leftarrow \hat r \mbox{ (mod }\beta\mbox{)}$ \\ 2402*ebfedea0SLionel Sambuc\hspace{6mm}5.4.3 $u \leftarrow \lfloor \hat r / \beta \rfloor$ \\ 2403*ebfedea0SLionel Sambuc\hspace{3mm}5.5 if $ix + pb < digs$ then do \\ 2404*ebfedea0SLionel Sambuc\hspace{6mm}5.5.1 $t_{ix + pb} \leftarrow u$ \\ 2405*ebfedea0SLionel Sambuc6. Clamp excess digits of $t$. \\ 2406*ebfedea0SLionel Sambuc7. Swap $c$ with $t$ \\ 2407*ebfedea0SLionel Sambuc8. Clear $t$ \\ 2408*ebfedea0SLionel Sambuc9. Return(\textit{MP\_OKAY}). \\ 2409*ebfedea0SLionel Sambuc\hline 2410*ebfedea0SLionel Sambuc\end{tabular} 2411*ebfedea0SLionel Sambuc\end{center} 2412*ebfedea0SLionel Sambuc\end{small} 2413*ebfedea0SLionel Sambuc\caption{Algorithm s\_mp\_mul\_digs} 2414*ebfedea0SLionel Sambuc\end{figure} 2415*ebfedea0SLionel Sambuc 2416*ebfedea0SLionel Sambuc\textbf{Algorithm s\_mp\_mul\_digs.} 2417*ebfedea0SLionel SambucThis algorithm computes the unsigned product of two inputs $a$ and $b$, limited to an output precision of $digs$ digits. While it may seem 2418*ebfedea0SLionel Sambuca bit awkward to modify the function from its simple $O(n^2)$ description, the usefulness of partial multipliers will arise in a subsequent 2419*ebfedea0SLionel Sambucalgorithm. The algorithm is loosely based on algorithm 14.12 from \cite[pp. 595]{HAC} and is similar to Algorithm M of Knuth \cite[pp. 268]{TAOCPV2}. 2420*ebfedea0SLionel SambucAlgorithm s\_mp\_mul\_digs differs from these cited references since it can produce a variable output precision regardless of the precision of the 2421*ebfedea0SLionel Sambucinputs. 2422*ebfedea0SLionel Sambuc 2423*ebfedea0SLionel SambucThe first thing this algorithm checks for is whether a Comba multiplier can be used instead. If the minimum digit count of either 2424*ebfedea0SLionel Sambucinput is less than $\delta$, then the Comba method may be used instead. After the Comba method is ruled out, the baseline algorithm begins. A 2425*ebfedea0SLionel Sambuctemporary mp\_int variable $t$ is used to hold the intermediate result of the product. This allows the algorithm to be used to 2426*ebfedea0SLionel Sambuccompute products when either $a = c$ or $b = c$ without overwriting the inputs. 2427*ebfedea0SLionel Sambuc 2428*ebfedea0SLionel SambucAll of step 5 is the infamous $O(n^2)$ multiplication loop slightly modified to only produce upto $digs$ digits of output. The $pb$ variable 2429*ebfedea0SLionel Sambucis given the count of digits to read from $b$ inside the nested loop. If $pb \le 1$ then no more output digits can be produced and the algorithm 2430*ebfedea0SLionel Sambucwill exit the loop. The best way to think of the loops are as a series of $pb \times 1$ multiplications. That is, in each pass of the 2431*ebfedea0SLionel Sambucinnermost loop $a_{ix}$ is multiplied against $b$ and the result is added (\textit{with an appropriate shift}) to $t$. 2432*ebfedea0SLionel Sambuc 2433*ebfedea0SLionel SambucFor example, consider multiplying $576$ by $241$. That is equivalent to computing $10^0(1)(576) + 10^1(4)(576) + 10^2(2)(576)$ which is best 2434*ebfedea0SLionel Sambucvisualized in the following table. 2435*ebfedea0SLionel Sambuc 2436*ebfedea0SLionel Sambuc\begin{figure}[here] 2437*ebfedea0SLionel Sambuc\begin{center} 2438*ebfedea0SLionel Sambuc\begin{tabular}{|c|c|c|c|c|c|l|} 2439*ebfedea0SLionel Sambuc\hline && & 5 & 7 & 6 & \\ 2440*ebfedea0SLionel Sambuc\hline $\times$&& & 2 & 4 & 1 & \\ 2441*ebfedea0SLionel Sambuc\hline &&&&&&\\ 2442*ebfedea0SLionel Sambuc && & 5 & 7 & 6 & $10^0(1)(576)$ \\ 2443*ebfedea0SLionel Sambuc &2 & 3 & 6 & 1 & 6 & $10^1(4)(576) + 10^0(1)(576)$ \\ 2444*ebfedea0SLionel Sambuc 1 & 3 & 8 & 8 & 1 & 6 & $10^2(2)(576) + 10^1(4)(576) + 10^0(1)(576)$ \\ 2445*ebfedea0SLionel Sambuc\hline 2446*ebfedea0SLionel Sambuc\end{tabular} 2447*ebfedea0SLionel Sambuc\end{center} 2448*ebfedea0SLionel Sambuc\caption{Long-Hand Multiplication Diagram} 2449*ebfedea0SLionel Sambuc\end{figure} 2450*ebfedea0SLionel Sambuc 2451*ebfedea0SLionel SambucEach row of the product is added to the result after being shifted to the left (\textit{multiplied by a power of the radix}) by the appropriate 2452*ebfedea0SLionel Sambuccount. That is in pass $ix$ of the inner loop the product is added starting at the $ix$'th digit of the reult. 2453*ebfedea0SLionel Sambuc 2454*ebfedea0SLionel SambucStep 5.4.1 introduces the hat symbol (\textit{e.g. $\hat r$}) which represents a double precision variable. The multiplication on that step 2455*ebfedea0SLionel Sambucis assumed to be a double wide output single precision multiplication. That is, two single precision variables are multiplied to produce a 2456*ebfedea0SLionel Sambucdouble precision result. The step is somewhat optimized from a long-hand multiplication algorithm because the carry from the addition in step 2457*ebfedea0SLionel Sambuc5.4.1 is propagated through the nested loop. If the carry was not propagated immediately it would overflow the single precision digit 2458*ebfedea0SLionel Sambuc$t_{ix+iy}$ and the result would be lost. 2459*ebfedea0SLionel Sambuc 2460*ebfedea0SLionel SambucAt step 5.5 the nested loop is finished and any carry that was left over should be forwarded. The carry does not have to be added to the $ix+pb$'th 2461*ebfedea0SLionel Sambucdigit since that digit is assumed to be zero at this point. However, if $ix + pb \ge digs$ the carry is not set as it would make the result 2462*ebfedea0SLionel Sambucexceed the precision requested. 2463*ebfedea0SLionel Sambuc 2464*ebfedea0SLionel SambucEXAM,bn_s_mp_mul_digs.c 2465*ebfedea0SLionel Sambuc 2466*ebfedea0SLionel SambucFirst we determine (line @30,if@) if the Comba method can be used first since it's faster. The conditions for 2467*ebfedea0SLionel Sambucsing the Comba routine are that min$(a.used, b.used) < \delta$ and the number of digits of output is less than 2468*ebfedea0SLionel Sambuc\textbf{MP\_WARRAY}. This new constant is used to control the stack usage in the Comba routines. By default it is 2469*ebfedea0SLionel Sambucset to $\delta$ but can be reduced when memory is at a premium. 2470*ebfedea0SLionel Sambuc 2471*ebfedea0SLionel SambucIf we cannot use the Comba method we proceed to setup the baseline routine. We allocate the the destination mp\_int 2472*ebfedea0SLionel Sambuc$t$ (line @36,init@) to the exact size of the output to avoid further re--allocations. At this point we now 2473*ebfedea0SLionel Sambucbegin the $O(n^2)$ loop. 2474*ebfedea0SLionel Sambuc 2475*ebfedea0SLionel SambucThis implementation of multiplication has the caveat that it can be trimmed to only produce a variable number of 2476*ebfedea0SLionel Sambucdigits as output. In each iteration of the outer loop the $pb$ variable is set (line @48,MIN@) to the maximum 2477*ebfedea0SLionel Sambucnumber of inner loop iterations. 2478*ebfedea0SLionel Sambuc 2479*ebfedea0SLionel SambucInside the inner loop we calculate $\hat r$ as the mp\_word product of the two mp\_digits and the addition of the 2480*ebfedea0SLionel Sambuccarry from the previous iteration. A particularly important observation is that most modern optimizing 2481*ebfedea0SLionel SambucC compilers (GCC for instance) can recognize that a $N \times N \rightarrow 2N$ multiplication is all that 2482*ebfedea0SLionel Sambucis required for the product. In x86 terms for example, this means using the MUL instruction. 2483*ebfedea0SLionel Sambuc 2484*ebfedea0SLionel SambucEach digit of the product is stored in turn (line @68,tmpt@) and the carry propagated (line @71,>>@) to the 2485*ebfedea0SLionel Sambucnext iteration. 2486*ebfedea0SLionel Sambuc 2487*ebfedea0SLionel Sambuc\subsection{Faster Multiplication by the ``Comba'' Method} 2488*ebfedea0SLionel SambucMARK,COMBA 2489*ebfedea0SLionel Sambuc 2490*ebfedea0SLionel SambucOne of the huge drawbacks of the ``baseline'' algorithms is that at the $O(n^2)$ level the carry must be 2491*ebfedea0SLionel Sambuccomputed and propagated upwards. This makes the nested loop very sequential and hard to unroll and implement 2492*ebfedea0SLionel Sambucin parallel. The ``Comba'' \cite{COMBA} method is named after little known (\textit{in cryptographic venues}) Paul G. 2493*ebfedea0SLionel SambucComba who described a method of implementing fast multipliers that do not require nested carry fixup operations. As an 2494*ebfedea0SLionel Sambucinteresting aside it seems that Paul Barrett describes a similar technique in his 1986 paper \cite{BARRETT} written 2495*ebfedea0SLionel Sambucfive years before. 2496*ebfedea0SLionel Sambuc 2497*ebfedea0SLionel SambucAt the heart of the Comba technique is once again the long-hand algorithm. Except in this case a slight 2498*ebfedea0SLionel Sambuctwist is placed on how the columns of the result are produced. In the standard long-hand algorithm rows of products 2499*ebfedea0SLionel Sambucare produced then added together to form the final result. In the baseline algorithm the columns are added together 2500*ebfedea0SLionel Sambucafter each iteration to get the result instantaneously. 2501*ebfedea0SLionel Sambuc 2502*ebfedea0SLionel SambucIn the Comba algorithm the columns of the result are produced entirely independently of each other. That is at 2503*ebfedea0SLionel Sambucthe $O(n^2)$ level a simple multiplication and addition step is performed. The carries of the columns are propagated 2504*ebfedea0SLionel Sambucafter the nested loop to reduce the amount of work requiored. Succintly the first step of the algorithm is to compute 2505*ebfedea0SLionel Sambucthe product vector $\vec x$ as follows. 2506*ebfedea0SLionel Sambuc 2507*ebfedea0SLionel Sambuc\begin{equation} 2508*ebfedea0SLionel Sambuc\vec x_n = \sum_{i+j = n} a_ib_j, \forall n \in \lbrace 0, 1, 2, \ldots, i + j \rbrace 2509*ebfedea0SLionel Sambuc\end{equation} 2510*ebfedea0SLionel Sambuc 2511*ebfedea0SLionel SambucWhere $\vec x_n$ is the $n'th$ column of the output vector. Consider the following example which computes the vector $\vec x$ for the multiplication 2512*ebfedea0SLionel Sambucof $576$ and $241$. 2513*ebfedea0SLionel Sambuc 2514*ebfedea0SLionel Sambuc\newpage\begin{figure}[here] 2515*ebfedea0SLionel Sambuc\begin{small} 2516*ebfedea0SLionel Sambuc\begin{center} 2517*ebfedea0SLionel Sambuc\begin{tabular}{|c|c|c|c|c|c|} 2518*ebfedea0SLionel Sambuc \hline & & 5 & 7 & 6 & First Input\\ 2519*ebfedea0SLionel Sambuc \hline $\times$ & & 2 & 4 & 1 & Second Input\\ 2520*ebfedea0SLionel Sambuc\hline & & $1 \cdot 5 = 5$ & $1 \cdot 7 = 7$ & $1 \cdot 6 = 6$ & First pass \\ 2521*ebfedea0SLionel Sambuc & $4 \cdot 5 = 20$ & $4 \cdot 7+5=33$ & $4 \cdot 6+7=31$ & 6 & Second pass \\ 2522*ebfedea0SLionel Sambuc $2 \cdot 5 = 10$ & $2 \cdot 7 + 20 = 34$ & $2 \cdot 6+33=45$ & 31 & 6 & Third pass \\ 2523*ebfedea0SLionel Sambuc\hline 10 & 34 & 45 & 31 & 6 & Final Result \\ 2524*ebfedea0SLionel Sambuc\hline 2525*ebfedea0SLionel Sambuc\end{tabular} 2526*ebfedea0SLionel Sambuc\end{center} 2527*ebfedea0SLionel Sambuc\end{small} 2528*ebfedea0SLionel Sambuc\caption{Comba Multiplication Diagram} 2529*ebfedea0SLionel Sambuc\end{figure} 2530*ebfedea0SLionel Sambuc 2531*ebfedea0SLionel SambucAt this point the vector $x = \left < 10, 34, 45, 31, 6 \right >$ is the result of the first step of the Comba multipler. 2532*ebfedea0SLionel SambucNow the columns must be fixed by propagating the carry upwards. The resultant vector will have one extra dimension over the input vector which is 2533*ebfedea0SLionel Sambuccongruent to adding a leading zero digit. 2534*ebfedea0SLionel Sambuc 2535*ebfedea0SLionel Sambuc\begin{figure}[!here] 2536*ebfedea0SLionel Sambuc\begin{small} 2537*ebfedea0SLionel Sambuc\begin{center} 2538*ebfedea0SLionel Sambuc\begin{tabular}{l} 2539*ebfedea0SLionel Sambuc\hline Algorithm \textbf{Comba Fixup}. \\ 2540*ebfedea0SLionel Sambuc\textbf{Input}. Vector $\vec x$ of dimension $k$ \\ 2541*ebfedea0SLionel Sambuc\textbf{Output}. Vector $\vec x$ such that the carries have been propagated. \\ 2542*ebfedea0SLionel Sambuc\hline \\ 2543*ebfedea0SLionel Sambuc1. for $n$ from $0$ to $k - 1$ do \\ 2544*ebfedea0SLionel Sambuc\hspace{3mm}1.1 $\vec x_{n+1} \leftarrow \vec x_{n+1} + \lfloor \vec x_{n}/\beta \rfloor$ \\ 2545*ebfedea0SLionel Sambuc\hspace{3mm}1.2 $\vec x_{n} \leftarrow \vec x_{n} \mbox{ (mod }\beta\mbox{)}$ \\ 2546*ebfedea0SLionel Sambuc2. Return($\vec x$). \\ 2547*ebfedea0SLionel Sambuc\hline 2548*ebfedea0SLionel Sambuc\end{tabular} 2549*ebfedea0SLionel Sambuc\end{center} 2550*ebfedea0SLionel Sambuc\end{small} 2551*ebfedea0SLionel Sambuc\caption{Algorithm Comba Fixup} 2552*ebfedea0SLionel Sambuc\end{figure} 2553*ebfedea0SLionel Sambuc 2554*ebfedea0SLionel SambucWith that algorithm and $k = 5$ and $\beta = 10$ the following vector is produced $\vec x= \left < 1, 3, 8, 8, 1, 6 \right >$. In this case 2555*ebfedea0SLionel Sambuc$241 \cdot 576$ is in fact $138816$ and the procedure succeeded. If the algorithm is correct and as will be demonstrated shortly more 2556*ebfedea0SLionel Sambucefficient than the baseline algorithm why not simply always use this algorithm? 2557*ebfedea0SLionel Sambuc 2558*ebfedea0SLionel Sambuc\subsubsection{Column Weight.} 2559*ebfedea0SLionel SambucAt the nested $O(n^2)$ level the Comba method adds the product of two single precision variables to each column of the output 2560*ebfedea0SLionel Sambucindependently. A serious obstacle is if the carry is lost, due to lack of precision before the algorithm has a chance to fix 2561*ebfedea0SLionel Sambucthe carries. For example, in the multiplication of two three-digit numbers the third column of output will be the sum of 2562*ebfedea0SLionel Sambucthree single precision multiplications. If the precision of the accumulator for the output digits is less then $3 \cdot (\beta - 1)^2$ then 2563*ebfedea0SLionel Sambucan overflow can occur and the carry information will be lost. For any $m$ and $n$ digit inputs the maximum weight of any column is 2564*ebfedea0SLionel Sambucmin$(m, n)$ which is fairly obvious. 2565*ebfedea0SLionel Sambuc 2566*ebfedea0SLionel SambucThe maximum number of terms in any column of a product is known as the ``column weight'' and strictly governs when the algorithm can be used. Recall 2567*ebfedea0SLionel Sambucfrom earlier that a double precision type has $\alpha$ bits of resolution and a single precision digit has $lg(\beta)$ bits of precision. Given these 2568*ebfedea0SLionel Sambuctwo quantities we must not violate the following 2569*ebfedea0SLionel Sambuc 2570*ebfedea0SLionel Sambuc\begin{equation} 2571*ebfedea0SLionel Sambuck \cdot \left (\beta - 1 \right )^2 < 2^{\alpha} 2572*ebfedea0SLionel Sambuc\end{equation} 2573*ebfedea0SLionel Sambuc 2574*ebfedea0SLionel SambucWhich reduces to 2575*ebfedea0SLionel Sambuc 2576*ebfedea0SLionel Sambuc\begin{equation} 2577*ebfedea0SLionel Sambuck \cdot \left ( \beta^2 - 2\beta + 1 \right ) < 2^{\alpha} 2578*ebfedea0SLionel Sambuc\end{equation} 2579*ebfedea0SLionel Sambuc 2580*ebfedea0SLionel SambucLet $\rho = lg(\beta)$ represent the number of bits in a single precision digit. By further re-arrangement of the equation the final solution is 2581*ebfedea0SLionel Sambucfound. 2582*ebfedea0SLionel Sambuc 2583*ebfedea0SLionel Sambuc\begin{equation} 2584*ebfedea0SLionel Sambuck < {{2^{\alpha}} \over {\left (2^{2\rho} - 2^{\rho + 1} + 1 \right )}} 2585*ebfedea0SLionel Sambuc\end{equation} 2586*ebfedea0SLionel Sambuc 2587*ebfedea0SLionel SambucThe defaults for LibTomMath are $\beta = 2^{28}$ and $\alpha = 2^{64}$ which means that $k$ is bounded by $k < 257$. In this configuration 2588*ebfedea0SLionel Sambucthe smaller input may not have more than $256$ digits if the Comba method is to be used. This is quite satisfactory for most applications since 2589*ebfedea0SLionel Sambuc$256$ digits would allow for numbers in the range of $0 \le x < 2^{7168}$ which, is much larger than most public key cryptographic algorithms require. 2590*ebfedea0SLionel Sambuc 2591*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 2592*ebfedea0SLionel Sambuc\begin{small} 2593*ebfedea0SLionel Sambuc\begin{center} 2594*ebfedea0SLionel Sambuc\begin{tabular}{l} 2595*ebfedea0SLionel Sambuc\hline Algorithm \textbf{fast\_s\_mp\_mul\_digs}. \\ 2596*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$, mp\_int $b$ and an integer $digs$ \\ 2597*ebfedea0SLionel Sambuc\textbf{Output}. $c \leftarrow \vert a \vert \cdot \vert b \vert \mbox{ (mod }\beta^{digs}\mbox{)}$. \\ 2598*ebfedea0SLionel Sambuc\hline \\ 2599*ebfedea0SLionel SambucPlace an array of \textbf{MP\_WARRAY} single precision digits named $W$ on the stack. \\ 2600*ebfedea0SLionel Sambuc1. If $c.alloc < digs$ then grow $c$ to $digs$ digits. (\textit{mp\_grow}) \\ 2601*ebfedea0SLionel Sambuc2. If step 1 failed return(\textit{MP\_MEM}).\\ 2602*ebfedea0SLionel Sambuc\\ 2603*ebfedea0SLionel Sambuc3. $pa \leftarrow \mbox{MIN}(digs, a.used + b.used)$ \\ 2604*ebfedea0SLionel Sambuc\\ 2605*ebfedea0SLionel Sambuc4. $\_ \hat W \leftarrow 0$ \\ 2606*ebfedea0SLionel Sambuc5. for $ix$ from 0 to $pa - 1$ do \\ 2607*ebfedea0SLionel Sambuc\hspace{3mm}5.1 $ty \leftarrow \mbox{MIN}(b.used - 1, ix)$ \\ 2608*ebfedea0SLionel Sambuc\hspace{3mm}5.2 $tx \leftarrow ix - ty$ \\ 2609*ebfedea0SLionel Sambuc\hspace{3mm}5.3 $iy \leftarrow \mbox{MIN}(a.used - tx, ty + 1)$ \\ 2610*ebfedea0SLionel Sambuc\hspace{3mm}5.4 for $iz$ from 0 to $iy - 1$ do \\ 2611*ebfedea0SLionel Sambuc\hspace{6mm}5.4.1 $\_ \hat W \leftarrow \_ \hat W + a_{tx+iy}b_{ty-iy}$ \\ 2612*ebfedea0SLionel Sambuc\hspace{3mm}5.5 $W_{ix} \leftarrow \_ \hat W (\mbox{mod }\beta)$\\ 2613*ebfedea0SLionel Sambuc\hspace{3mm}5.6 $\_ \hat W \leftarrow \lfloor \_ \hat W / \beta \rfloor$ \\ 2614*ebfedea0SLionel Sambuc\\ 2615*ebfedea0SLionel Sambuc6. $oldused \leftarrow c.used$ \\ 2616*ebfedea0SLionel Sambuc7. $c.used \leftarrow digs$ \\ 2617*ebfedea0SLionel Sambuc8. for $ix$ from $0$ to $pa$ do \\ 2618*ebfedea0SLionel Sambuc\hspace{3mm}8.1 $c_{ix} \leftarrow W_{ix}$ \\ 2619*ebfedea0SLionel Sambuc9. for $ix$ from $pa + 1$ to $oldused - 1$ do \\ 2620*ebfedea0SLionel Sambuc\hspace{3mm}9.1 $c_{ix} \leftarrow 0$ \\ 2621*ebfedea0SLionel Sambuc\\ 2622*ebfedea0SLionel Sambuc10. Clamp $c$. \\ 2623*ebfedea0SLionel Sambuc11. Return MP\_OKAY. \\ 2624*ebfedea0SLionel Sambuc\hline 2625*ebfedea0SLionel Sambuc\end{tabular} 2626*ebfedea0SLionel Sambuc\end{center} 2627*ebfedea0SLionel Sambuc\end{small} 2628*ebfedea0SLionel Sambuc\caption{Algorithm fast\_s\_mp\_mul\_digs} 2629*ebfedea0SLionel Sambuc\label{fig:COMBAMULT} 2630*ebfedea0SLionel Sambuc\end{figure} 2631*ebfedea0SLionel Sambuc 2632*ebfedea0SLionel Sambuc\textbf{Algorithm fast\_s\_mp\_mul\_digs.} 2633*ebfedea0SLionel SambucThis algorithm performs the unsigned multiplication of $a$ and $b$ using the Comba method limited to $digs$ digits of precision. 2634*ebfedea0SLionel Sambuc 2635*ebfedea0SLionel SambucThe outer loop of this algorithm is more complicated than that of the baseline multiplier. This is because on the inside of the 2636*ebfedea0SLionel Sambucloop we want to produce one column per pass. This allows the accumulator $\_ \hat W$ to be placed in CPU registers and 2637*ebfedea0SLionel Sambucreduce the memory bandwidth to two \textbf{mp\_digit} reads per iteration. 2638*ebfedea0SLionel Sambuc 2639*ebfedea0SLionel SambucThe $ty$ variable is set to the minimum count of $ix$ or the number of digits in $b$. That way if $a$ has more digits than 2640*ebfedea0SLionel Sambuc$b$ this will be limited to $b.used - 1$. The $tx$ variable is set to the to the distance past $b.used$ the variable 2641*ebfedea0SLionel Sambuc$ix$ is. This is used for the immediately subsequent statement where we find $iy$. 2642*ebfedea0SLionel Sambuc 2643*ebfedea0SLionel SambucThe variable $iy$ is the minimum digits we can read from either $a$ or $b$ before running out. Computing one column at a time 2644*ebfedea0SLionel Sambucmeans we have to scan one integer upwards and the other downwards. $a$ starts at $tx$ and $b$ starts at $ty$. In each 2645*ebfedea0SLionel Sambucpass we are producing the $ix$'th output column and we note that $tx + ty = ix$. As we move $tx$ upwards we have to 2646*ebfedea0SLionel Sambucmove $ty$ downards so the equality remains valid. The $iy$ variable is the number of iterations until 2647*ebfedea0SLionel Sambuc$tx \ge a.used$ or $ty < 0$ occurs. 2648*ebfedea0SLionel Sambuc 2649*ebfedea0SLionel SambucAfter every inner pass we store the lower half of the accumulator into $W_{ix}$ and then propagate the carry of the accumulator 2650*ebfedea0SLionel Sambucinto the next round by dividing $\_ \hat W$ by $\beta$. 2651*ebfedea0SLionel Sambuc 2652*ebfedea0SLionel SambucTo measure the benefits of the Comba method over the baseline method consider the number of operations that are required. If the 2653*ebfedea0SLionel Sambuccost in terms of time of a multiply and addition is $p$ and the cost of a carry propagation is $q$ then a baseline multiplication would require 2654*ebfedea0SLionel Sambuc$O \left ((p + q)n^2 \right )$ time to multiply two $n$-digit numbers. The Comba method requires only $O(pn^2 + qn)$ time, however in practice, 2655*ebfedea0SLionel Sambucthe speed increase is actually much more. With $O(n)$ space the algorithm can be reduced to $O(pn + qn)$ time by implementing the $n$ multiply 2656*ebfedea0SLionel Sambucand addition operations in the nested loop in parallel. 2657*ebfedea0SLionel Sambuc 2658*ebfedea0SLionel SambucEXAM,bn_fast_s_mp_mul_digs.c 2659*ebfedea0SLionel Sambuc 2660*ebfedea0SLionel SambucAs per the pseudo--code we first calculate $pa$ (line @47,MIN@) as the number of digits to output. Next we begin the outer loop 2661*ebfedea0SLionel Sambucto produce the individual columns of the product. We use the two aliases $tmpx$ and $tmpy$ (lines @61,tmpx@, @62,tmpy@) to point 2662*ebfedea0SLionel Sambucinside the two multiplicands quickly. 2663*ebfedea0SLionel Sambuc 2664*ebfedea0SLionel SambucThe inner loop (lines @70,for@ to @72,}@) of this implementation is where the tradeoff come into play. Originally this comba 2665*ebfedea0SLionel Sambucimplementation was ``row--major'' which means it adds to each of the columns in each pass. After the outer loop it would then fix 2666*ebfedea0SLionel Sambucthe carries. This was very fast except it had an annoying drawback. You had to read a mp\_word and two mp\_digits and write 2667*ebfedea0SLionel Sambucone mp\_word per iteration. On processors such as the Athlon XP and P4 this did not matter much since the cache bandwidth 2668*ebfedea0SLionel Sambucis very high and it can keep the ALU fed with data. It did, however, matter on older and embedded cpus where cache is often 2669*ebfedea0SLionel Sambucslower and also often doesn't exist. This new algorithm only performs two reads per iteration under the assumption that the 2670*ebfedea0SLionel Sambuccompiler has aliased $\_ \hat W$ to a CPU register. 2671*ebfedea0SLionel Sambuc 2672*ebfedea0SLionel SambucAfter the inner loop we store the current accumulator in $W$ and shift $\_ \hat W$ (lines @75,W[ix]@, @78,>>@) to forward it as 2673*ebfedea0SLionel Sambuca carry for the next pass. After the outer loop we use the final carry (line @82,W[ix]@) as the last digit of the product. 2674*ebfedea0SLionel Sambuc 2675*ebfedea0SLionel Sambuc\subsection{Polynomial Basis Multiplication} 2676*ebfedea0SLionel SambucTo break the $O(n^2)$ barrier in multiplication requires a completely different look at integer multiplication. In the following algorithms 2677*ebfedea0SLionel Sambucthe use of polynomial basis representation for two integers $a$ and $b$ as $f(x) = \sum_{i=0}^{n} a_i x^i$ and 2678*ebfedea0SLionel Sambuc$g(x) = \sum_{i=0}^{n} b_i x^i$ respectively, is required. In this system both $f(x)$ and $g(x)$ have $n + 1$ terms and are of the $n$'th degree. 2679*ebfedea0SLionel Sambuc 2680*ebfedea0SLionel SambucThe product $a \cdot b \equiv f(x)g(x)$ is the polynomial $W(x) = \sum_{i=0}^{2n} w_i x^i$. The coefficients $w_i$ will 2681*ebfedea0SLionel Sambucdirectly yield the desired product when $\beta$ is substituted for $x$. The direct solution to solve for the $2n + 1$ coefficients 2682*ebfedea0SLionel Sambucrequires $O(n^2)$ time and would in practice be slower than the Comba technique. 2683*ebfedea0SLionel Sambuc 2684*ebfedea0SLionel SambucHowever, numerical analysis theory indicates that only $2n + 1$ distinct points in $W(x)$ are required to determine the values of the $2n + 1$ unknown 2685*ebfedea0SLionel Sambuccoefficients. This means by finding $\zeta_y = W(y)$ for $2n + 1$ small values of $y$ the coefficients of $W(x)$ can be found with 2686*ebfedea0SLionel SambucGaussian elimination. This technique is also occasionally refered to as the \textit{interpolation technique} (\textit{references please...}) since in 2687*ebfedea0SLionel Sambuceffect an interpolation based on $2n + 1$ points will yield a polynomial equivalent to $W(x)$. 2688*ebfedea0SLionel Sambuc 2689*ebfedea0SLionel SambucThe coefficients of the polynomial $W(x)$ are unknown which makes finding $W(y)$ for any value of $y$ impossible. However, since 2690*ebfedea0SLionel Sambuc$W(x) = f(x)g(x)$ the equivalent $\zeta_y = f(y) g(y)$ can be used in its place. The benefit of this technique stems from the 2691*ebfedea0SLionel Sambucfact that $f(y)$ and $g(y)$ are much smaller than either $a$ or $b$ respectively. As a result finding the $2n + 1$ relations required 2692*ebfedea0SLionel Sambucby multiplying $f(y)g(y)$ involves multiplying integers that are much smaller than either of the inputs. 2693*ebfedea0SLionel Sambuc 2694*ebfedea0SLionel SambucWhen picking points to gather relations there are always three obvious points to choose, $y = 0, 1$ and $ \infty$. The $\zeta_0$ term 2695*ebfedea0SLionel Sambucis simply the product $W(0) = w_0 = a_0 \cdot b_0$. The $\zeta_1$ term is the product 2696*ebfedea0SLionel Sambuc$W(1) = \left (\sum_{i = 0}^{n} a_i \right ) \left (\sum_{i = 0}^{n} b_i \right )$. The third point $\zeta_{\infty}$ is less obvious but rather 2697*ebfedea0SLionel Sambucsimple to explain. The $2n + 1$'th coefficient of $W(x)$ is numerically equivalent to the most significant column in an integer multiplication. 2698*ebfedea0SLionel SambucThe point at $\infty$ is used symbolically to represent the most significant column, that is $W(\infty) = w_{2n} = a_nb_n$. Note that the 2699*ebfedea0SLionel Sambucpoints at $y = 0$ and $\infty$ yield the coefficients $w_0$ and $w_{2n}$ directly. 2700*ebfedea0SLionel Sambuc 2701*ebfedea0SLionel SambucIf more points are required they should be of small values and powers of two such as $2^q$ and the related \textit{mirror points} 2702*ebfedea0SLionel Sambuc$\left (2^q \right )^{2n} \cdot \zeta_{2^{-q}}$ for small values of $q$. The term ``mirror point'' stems from the fact that 2703*ebfedea0SLionel Sambuc$\left (2^q \right )^{2n} \cdot \zeta_{2^{-q}}$ can be calculated in the exact opposite fashion as $\zeta_{2^q}$. For 2704*ebfedea0SLionel Sambucexample, when $n = 2$ and $q = 1$ then following two equations are equivalent to the point $\zeta_{2}$ and its mirror. 2705*ebfedea0SLionel Sambuc 2706*ebfedea0SLionel Sambuc\begin{eqnarray} 2707*ebfedea0SLionel Sambuc\zeta_{2} = f(2)g(2) = (4a_2 + 2a_1 + a_0)(4b_2 + 2b_1 + b_0) \nonumber \\ 2708*ebfedea0SLionel Sambuc16 \cdot \zeta_{1 \over 2} = 4f({1\over 2}) \cdot 4g({1 \over 2}) = (a_2 + 2a_1 + 4a_0)(b_2 + 2b_1 + 4b_0) 2709*ebfedea0SLionel Sambuc\end{eqnarray} 2710*ebfedea0SLionel Sambuc 2711*ebfedea0SLionel SambucUsing such points will allow the values of $f(y)$ and $g(y)$ to be independently calculated using only left shifts. For example, when $n = 2$ the 2712*ebfedea0SLionel Sambucpolynomial $f(2^q)$ is equal to $2^q((2^qa_2) + a_1) + a_0$. This technique of polynomial representation is known as Horner's method. 2713*ebfedea0SLionel Sambuc 2714*ebfedea0SLionel SambucAs a general rule of the algorithm when the inputs are split into $n$ parts each there are $2n - 1$ multiplications. Each multiplication is of 2715*ebfedea0SLionel Sambucmultiplicands that have $n$ times fewer digits than the inputs. The asymptotic running time of this algorithm is 2716*ebfedea0SLionel Sambuc$O \left ( k^{lg_n(2n - 1)} \right )$ for $k$ digit inputs (\textit{assuming they have the same number of digits}). Figure~\ref{fig:exponent} 2717*ebfedea0SLionel Sambucsummarizes the exponents for various values of $n$. 2718*ebfedea0SLionel Sambuc 2719*ebfedea0SLionel Sambuc\begin{figure} 2720*ebfedea0SLionel Sambuc\begin{center} 2721*ebfedea0SLionel Sambuc\begin{tabular}{|c|c|c|} 2722*ebfedea0SLionel Sambuc\hline \textbf{Split into $n$ Parts} & \textbf{Exponent} & \textbf{Notes}\\ 2723*ebfedea0SLionel Sambuc\hline $2$ & $1.584962501$ & This is Karatsuba Multiplication. \\ 2724*ebfedea0SLionel Sambuc\hline $3$ & $1.464973520$ & This is Toom-Cook Multiplication. \\ 2725*ebfedea0SLionel Sambuc\hline $4$ & $1.403677461$ &\\ 2726*ebfedea0SLionel Sambuc\hline $5$ & $1.365212389$ &\\ 2727*ebfedea0SLionel Sambuc\hline $10$ & $1.278753601$ &\\ 2728*ebfedea0SLionel Sambuc\hline $100$ & $1.149426538$ &\\ 2729*ebfedea0SLionel Sambuc\hline $1000$ & $1.100270931$ &\\ 2730*ebfedea0SLionel Sambuc\hline $10000$ & $1.075252070$ &\\ 2731*ebfedea0SLionel Sambuc\hline 2732*ebfedea0SLionel Sambuc\end{tabular} 2733*ebfedea0SLionel Sambuc\end{center} 2734*ebfedea0SLionel Sambuc\caption{Asymptotic Running Time of Polynomial Basis Multiplication} 2735*ebfedea0SLionel Sambuc\label{fig:exponent} 2736*ebfedea0SLionel Sambuc\end{figure} 2737*ebfedea0SLionel Sambuc 2738*ebfedea0SLionel SambucAt first it may seem like a good idea to choose $n = 1000$ since the exponent is approximately $1.1$. However, the overhead 2739*ebfedea0SLionel Sambucof solving for the 2001 terms of $W(x)$ will certainly consume any savings the algorithm could offer for all but exceedingly large 2740*ebfedea0SLionel Sambucnumbers. 2741*ebfedea0SLionel Sambuc 2742*ebfedea0SLionel Sambuc\subsubsection{Cutoff Point} 2743*ebfedea0SLionel SambucThe polynomial basis multiplication algorithms all require fewer single precision multiplications than a straight Comba approach. However, 2744*ebfedea0SLionel Sambucthe algorithms incur an overhead (\textit{at the $O(n)$ work level}) since they require a system of equations to be solved. This makes the 2745*ebfedea0SLionel Sambucpolynomial basis approach more costly to use with small inputs. 2746*ebfedea0SLionel Sambuc 2747*ebfedea0SLionel SambucLet $m$ represent the number of digits in the multiplicands (\textit{assume both multiplicands have the same number of digits}). There exists a 2748*ebfedea0SLionel Sambucpoint $y$ such that when $m < y$ the polynomial basis algorithms are more costly than Comba, when $m = y$ they are roughly the same cost and 2749*ebfedea0SLionel Sambucwhen $m > y$ the Comba methods are slower than the polynomial basis algorithms. 2750*ebfedea0SLionel Sambuc 2751*ebfedea0SLionel SambucThe exact location of $y$ depends on several key architectural elements of the computer platform in question. 2752*ebfedea0SLionel Sambuc 2753*ebfedea0SLionel Sambuc\begin{enumerate} 2754*ebfedea0SLionel Sambuc\item The ratio of clock cycles for single precision multiplication versus other simpler operations such as addition, shifting, etc. For example 2755*ebfedea0SLionel Sambucon the AMD Athlon the ratio is roughly $17 : 1$ while on the Intel P4 it is $29 : 1$. The higher the ratio in favour of multiplication the lower 2756*ebfedea0SLionel Sambucthe cutoff point $y$ will be. 2757*ebfedea0SLionel Sambuc 2758*ebfedea0SLionel Sambuc\item The complexity of the linear system of equations (\textit{for the coefficients of $W(x)$}) is. Generally speaking as the number of splits 2759*ebfedea0SLionel Sambucgrows the complexity grows substantially. Ideally solving the system will only involve addition, subtraction and shifting of integers. This 2760*ebfedea0SLionel Sambucdirectly reflects on the ratio previous mentioned. 2761*ebfedea0SLionel Sambuc 2762*ebfedea0SLionel Sambuc\item To a lesser extent memory bandwidth and function call overheads. Provided the values are in the processor cache this is less of an 2763*ebfedea0SLionel Sambucinfluence over the cutoff point. 2764*ebfedea0SLionel Sambuc 2765*ebfedea0SLionel Sambuc\end{enumerate} 2766*ebfedea0SLionel Sambuc 2767*ebfedea0SLionel SambucA clean cutoff point separation occurs when a point $y$ is found such that all of the cutoff point conditions are met. For example, if the point 2768*ebfedea0SLionel Sambucis too low then there will be values of $m$ such that $m > y$ and the Comba method is still faster. Finding the cutoff points is fairly simple when 2769*ebfedea0SLionel Sambuca high resolution timer is available. 2770*ebfedea0SLionel Sambuc 2771*ebfedea0SLionel Sambuc\subsection{Karatsuba Multiplication} 2772*ebfedea0SLionel SambucKaratsuba \cite{KARA} multiplication when originally proposed in 1962 was among the first set of algorithms to break the $O(n^2)$ barrier for 2773*ebfedea0SLionel Sambucgeneral purpose multiplication. Given two polynomial basis representations $f(x) = ax + b$ and $g(x) = cx + d$, Karatsuba proved with 2774*ebfedea0SLionel Sambuclight algebra \cite{KARAP} that the following polynomial is equivalent to multiplication of the two integers the polynomials represent. 2775*ebfedea0SLionel Sambuc 2776*ebfedea0SLionel Sambuc\begin{equation} 2777*ebfedea0SLionel Sambucf(x) \cdot g(x) = acx^2 + ((a + b)(c + d) - (ac + bd))x + bd 2778*ebfedea0SLionel Sambuc\end{equation} 2779*ebfedea0SLionel Sambuc 2780*ebfedea0SLionel SambucUsing the observation that $ac$ and $bd$ could be re-used only three half sized multiplications would be required to produce the product. Applying 2781*ebfedea0SLionel Sambucthis algorithm recursively, the work factor becomes $O(n^{lg(3)})$ which is substantially better than the work factor $O(n^2)$ of the Comba technique. It turns 2782*ebfedea0SLionel Sambucout what Karatsuba did not know or at least did not publish was that this is simply polynomial basis multiplication with the points 2783*ebfedea0SLionel Sambuc$\zeta_0$, $\zeta_{\infty}$ and $\zeta_{1}$. Consider the resultant system of equations. 2784*ebfedea0SLionel Sambuc 2785*ebfedea0SLionel Sambuc\begin{center} 2786*ebfedea0SLionel Sambuc\begin{tabular}{rcrcrcrc} 2787*ebfedea0SLionel Sambuc$\zeta_{0}$ & $=$ & & & & & $w_0$ \\ 2788*ebfedea0SLionel Sambuc$\zeta_{1}$ & $=$ & $w_2$ & $+$ & $w_1$ & $+$ & $w_0$ \\ 2789*ebfedea0SLionel Sambuc$\zeta_{\infty}$ & $=$ & $w_2$ & & & & \\ 2790*ebfedea0SLionel Sambuc\end{tabular} 2791*ebfedea0SLionel Sambuc\end{center} 2792*ebfedea0SLionel Sambuc 2793*ebfedea0SLionel SambucBy adding the first and last equation to the equation in the middle the term $w_1$ can be isolated and all three coefficients solved for. The simplicity 2794*ebfedea0SLionel Sambucof this system of equations has made Karatsuba fairly popular. In fact the cutoff point is often fairly low\footnote{With LibTomMath 0.18 it is 70 and 109 digits for the Intel P4 and AMD Athlon respectively.} 2795*ebfedea0SLionel Sambucmaking it an ideal algorithm to speed up certain public key cryptosystems such as RSA and Diffie-Hellman. 2796*ebfedea0SLionel Sambuc 2797*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 2798*ebfedea0SLionel Sambuc\begin{small} 2799*ebfedea0SLionel Sambuc\begin{center} 2800*ebfedea0SLionel Sambuc\begin{tabular}{l} 2801*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_karatsuba\_mul}. \\ 2802*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ and mp\_int $b$ \\ 2803*ebfedea0SLionel Sambuc\textbf{Output}. $c \leftarrow \vert a \vert \cdot \vert b \vert$ \\ 2804*ebfedea0SLionel Sambuc\hline \\ 2805*ebfedea0SLionel Sambuc1. Init the following mp\_int variables: $x0$, $x1$, $y0$, $y1$, $t1$, $x0y0$, $x1y1$.\\ 2806*ebfedea0SLionel Sambuc2. If step 2 failed then return(\textit{MP\_MEM}). \\ 2807*ebfedea0SLionel Sambuc\\ 2808*ebfedea0SLionel SambucSplit the input. e.g. $a = x1 \cdot \beta^B + x0$ \\ 2809*ebfedea0SLionel Sambuc3. $B \leftarrow \mbox{min}(a.used, b.used)/2$ \\ 2810*ebfedea0SLionel Sambuc4. $x0 \leftarrow a \mbox{ (mod }\beta^B\mbox{)}$ (\textit{mp\_mod\_2d}) \\ 2811*ebfedea0SLionel Sambuc5. $y0 \leftarrow b \mbox{ (mod }\beta^B\mbox{)}$ \\ 2812*ebfedea0SLionel Sambuc6. $x1 \leftarrow \lfloor a / \beta^B \rfloor$ (\textit{mp\_rshd}) \\ 2813*ebfedea0SLionel Sambuc7. $y1 \leftarrow \lfloor b / \beta^B \rfloor$ \\ 2814*ebfedea0SLionel Sambuc\\ 2815*ebfedea0SLionel SambucCalculate the three products. \\ 2816*ebfedea0SLionel Sambuc8. $x0y0 \leftarrow x0 \cdot y0$ (\textit{mp\_mul}) \\ 2817*ebfedea0SLionel Sambuc9. $x1y1 \leftarrow x1 \cdot y1$ \\ 2818*ebfedea0SLionel Sambuc10. $t1 \leftarrow x1 + x0$ (\textit{mp\_add}) \\ 2819*ebfedea0SLionel Sambuc11. $x0 \leftarrow y1 + y0$ \\ 2820*ebfedea0SLionel Sambuc12. $t1 \leftarrow t1 \cdot x0$ \\ 2821*ebfedea0SLionel Sambuc\\ 2822*ebfedea0SLionel SambucCalculate the middle term. \\ 2823*ebfedea0SLionel Sambuc13. $x0 \leftarrow x0y0 + x1y1$ \\ 2824*ebfedea0SLionel Sambuc14. $t1 \leftarrow t1 - x0$ (\textit{s\_mp\_sub}) \\ 2825*ebfedea0SLionel Sambuc\\ 2826*ebfedea0SLionel SambucCalculate the final product. \\ 2827*ebfedea0SLionel Sambuc15. $t1 \leftarrow t1 \cdot \beta^B$ (\textit{mp\_lshd}) \\ 2828*ebfedea0SLionel Sambuc16. $x1y1 \leftarrow x1y1 \cdot \beta^{2B}$ \\ 2829*ebfedea0SLionel Sambuc17. $t1 \leftarrow x0y0 + t1$ \\ 2830*ebfedea0SLionel Sambuc18. $c \leftarrow t1 + x1y1$ \\ 2831*ebfedea0SLionel Sambuc19. Clear all of the temporary variables. \\ 2832*ebfedea0SLionel Sambuc20. Return(\textit{MP\_OKAY}).\\ 2833*ebfedea0SLionel Sambuc\hline 2834*ebfedea0SLionel Sambuc\end{tabular} 2835*ebfedea0SLionel Sambuc\end{center} 2836*ebfedea0SLionel Sambuc\end{small} 2837*ebfedea0SLionel Sambuc\caption{Algorithm mp\_karatsuba\_mul} 2838*ebfedea0SLionel Sambuc\end{figure} 2839*ebfedea0SLionel Sambuc 2840*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_karatsuba\_mul.} 2841*ebfedea0SLionel SambucThis algorithm computes the unsigned product of two inputs using the Karatsuba multiplication algorithm. It is loosely based on the description 2842*ebfedea0SLionel Sambucfrom Knuth \cite[pp. 294-295]{TAOCPV2}. 2843*ebfedea0SLionel Sambuc 2844*ebfedea0SLionel Sambuc\index{radix point} 2845*ebfedea0SLionel SambucIn order to split the two inputs into their respective halves, a suitable \textit{radix point} must be chosen. The radix point chosen must 2846*ebfedea0SLionel Sambucbe used for both of the inputs meaning that it must be smaller than the smallest input. Step 3 chooses the radix point $B$ as half of the 2847*ebfedea0SLionel Sambucsmallest input \textbf{used} count. After the radix point is chosen the inputs are split into lower and upper halves. Step 4 and 5 2848*ebfedea0SLionel Sambuccompute the lower halves. Step 6 and 7 computer the upper halves. 2849*ebfedea0SLionel Sambuc 2850*ebfedea0SLionel SambucAfter the halves have been computed the three intermediate half-size products must be computed. Step 8 and 9 compute the trivial products 2851*ebfedea0SLionel Sambuc$x0 \cdot y0$ and $x1 \cdot y1$. The mp\_int $x0$ is used as a temporary variable after $x1 + x0$ has been computed. By using $x0$ instead 2852*ebfedea0SLionel Sambucof an additional temporary variable, the algorithm can avoid an addition memory allocation operation. 2853*ebfedea0SLionel Sambuc 2854*ebfedea0SLionel SambucThe remaining steps 13 through 18 compute the Karatsuba polynomial through a variety of digit shifting and addition operations. 2855*ebfedea0SLionel Sambuc 2856*ebfedea0SLionel SambucEXAM,bn_mp_karatsuba_mul.c 2857*ebfedea0SLionel Sambuc 2858*ebfedea0SLionel SambucThe new coding element in this routine, not seen in previous routines, is the usage of goto statements. The conventional 2859*ebfedea0SLionel Sambucwisdom is that goto statements should be avoided. This is generally true, however when every single function call can fail, it makes sense 2860*ebfedea0SLionel Sambucto handle error recovery with a single piece of code. Lines @61,if@ to @75,if@ handle initializing all of the temporary variables 2861*ebfedea0SLionel Sambucrequired. Note how each of the if statements goes to a different label in case of failure. This allows the routine to correctly free only 2862*ebfedea0SLionel Sambucthe temporaries that have been successfully allocated so far. 2863*ebfedea0SLionel Sambuc 2864*ebfedea0SLionel SambucThe temporary variables are all initialized using the mp\_init\_size routine since they are expected to be large. This saves the 2865*ebfedea0SLionel Sambucadditional reallocation that would have been necessary. Also $x0$, $x1$, $y0$ and $y1$ have to be able to hold at least their respective 2866*ebfedea0SLionel Sambucnumber of digits for the next section of code. 2867*ebfedea0SLionel Sambuc 2868*ebfedea0SLionel SambucThe first algebraic portion of the algorithm is to split the two inputs into their halves. However, instead of using mp\_mod\_2d and mp\_rshd 2869*ebfedea0SLionel Sambucto extract the halves, the respective code has been placed inline within the body of the function. To initialize the halves, the \textbf{used} and 2870*ebfedea0SLionel Sambuc\textbf{sign} members are copied first. The first for loop on line @98,for@ copies the lower halves. Since they are both the same magnitude it 2871*ebfedea0SLionel Sambucis simpler to calculate both lower halves in a single loop. The for loop on lines @104,for@ and @109,for@ calculate the upper halves $x1$ and 2872*ebfedea0SLionel Sambuc$y1$ respectively. 2873*ebfedea0SLionel Sambuc 2874*ebfedea0SLionel SambucBy inlining the calculation of the halves, the Karatsuba multiplier has a slightly lower overhead and can be used for smaller magnitude inputs. 2875*ebfedea0SLionel Sambuc 2876*ebfedea0SLionel SambucWhen line @152,err@ is reached, the algorithm has completed succesfully. The ``error status'' variable $err$ is set to \textbf{MP\_OKAY} so that 2877*ebfedea0SLionel Sambucthe same code that handles errors can be used to clear the temporary variables and return. 2878*ebfedea0SLionel Sambuc 2879*ebfedea0SLionel Sambuc\subsection{Toom-Cook $3$-Way Multiplication} 2880*ebfedea0SLionel SambucToom-Cook $3$-Way \cite{TOOM} multiplication is essentially the polynomial basis algorithm for $n = 2$ except that the points are 2881*ebfedea0SLionel Sambucchosen such that $\zeta$ is easy to compute and the resulting system of equations easy to reduce. Here, the points $\zeta_{0}$, 2882*ebfedea0SLionel Sambuc$16 \cdot \zeta_{1 \over 2}$, $\zeta_1$, $\zeta_2$ and $\zeta_{\infty}$ make up the five required points to solve for the coefficients 2883*ebfedea0SLionel Sambucof the $W(x)$. 2884*ebfedea0SLionel Sambuc 2885*ebfedea0SLionel SambucWith the five relations that Toom-Cook specifies, the following system of equations is formed. 2886*ebfedea0SLionel Sambuc 2887*ebfedea0SLionel Sambuc\begin{center} 2888*ebfedea0SLionel Sambuc\begin{tabular}{rcrcrcrcrcr} 2889*ebfedea0SLionel Sambuc$\zeta_0$ & $=$ & $0w_4$ & $+$ & $0w_3$ & $+$ & $0w_2$ & $+$ & $0w_1$ & $+$ & $1w_0$ \\ 2890*ebfedea0SLionel Sambuc$16 \cdot \zeta_{1 \over 2}$ & $=$ & $1w_4$ & $+$ & $2w_3$ & $+$ & $4w_2$ & $+$ & $8w_1$ & $+$ & $16w_0$ \\ 2891*ebfedea0SLionel Sambuc$\zeta_1$ & $=$ & $1w_4$ & $+$ & $1w_3$ & $+$ & $1w_2$ & $+$ & $1w_1$ & $+$ & $1w_0$ \\ 2892*ebfedea0SLionel Sambuc$\zeta_2$ & $=$ & $16w_4$ & $+$ & $8w_3$ & $+$ & $4w_2$ & $+$ & $2w_1$ & $+$ & $1w_0$ \\ 2893*ebfedea0SLionel Sambuc$\zeta_{\infty}$ & $=$ & $1w_4$ & $+$ & $0w_3$ & $+$ & $0w_2$ & $+$ & $0w_1$ & $+$ & $0w_0$ \\ 2894*ebfedea0SLionel Sambuc\end{tabular} 2895*ebfedea0SLionel Sambuc\end{center} 2896*ebfedea0SLionel Sambuc 2897*ebfedea0SLionel SambucA trivial solution to this matrix requires $12$ subtractions, two multiplications by a small power of two, two divisions by a small power 2898*ebfedea0SLionel Sambucof two, two divisions by three and one multiplication by three. All of these $19$ sub-operations require less than quadratic time, meaning that 2899*ebfedea0SLionel Sambucthe algorithm can be faster than a baseline multiplication. However, the greater complexity of this algorithm places the cutoff point 2900*ebfedea0SLionel Sambuc(\textbf{TOOM\_MUL\_CUTOFF}) where Toom-Cook becomes more efficient much higher than the Karatsuba cutoff point. 2901*ebfedea0SLionel Sambuc 2902*ebfedea0SLionel Sambuc\begin{figure}[!here] 2903*ebfedea0SLionel Sambuc\begin{small} 2904*ebfedea0SLionel Sambuc\begin{center} 2905*ebfedea0SLionel Sambuc\begin{tabular}{l} 2906*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_toom\_mul}. \\ 2907*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ and mp\_int $b$ \\ 2908*ebfedea0SLionel Sambuc\textbf{Output}. $c \leftarrow a \cdot b $ \\ 2909*ebfedea0SLionel Sambuc\hline \\ 2910*ebfedea0SLionel SambucSplit $a$ and $b$ into three pieces. E.g. $a = a_2 \beta^{2k} + a_1 \beta^{k} + a_0$ \\ 2911*ebfedea0SLionel Sambuc1. $k \leftarrow \lfloor \mbox{min}(a.used, b.used) / 3 \rfloor$ \\ 2912*ebfedea0SLionel Sambuc2. $a_0 \leftarrow a \mbox{ (mod }\beta^{k}\mbox{)}$ \\ 2913*ebfedea0SLionel Sambuc3. $a_1 \leftarrow \lfloor a / \beta^k \rfloor$, $a_1 \leftarrow a_1 \mbox{ (mod }\beta^{k}\mbox{)}$ \\ 2914*ebfedea0SLionel Sambuc4. $a_2 \leftarrow \lfloor a / \beta^{2k} \rfloor$, $a_2 \leftarrow a_2 \mbox{ (mod }\beta^{k}\mbox{)}$ \\ 2915*ebfedea0SLionel Sambuc5. $b_0 \leftarrow a \mbox{ (mod }\beta^{k}\mbox{)}$ \\ 2916*ebfedea0SLionel Sambuc6. $b_1 \leftarrow \lfloor a / \beta^k \rfloor$, $b_1 \leftarrow b_1 \mbox{ (mod }\beta^{k}\mbox{)}$ \\ 2917*ebfedea0SLionel Sambuc7. $b_2 \leftarrow \lfloor a / \beta^{2k} \rfloor$, $b_2 \leftarrow b_2 \mbox{ (mod }\beta^{k}\mbox{)}$ \\ 2918*ebfedea0SLionel Sambuc\\ 2919*ebfedea0SLionel SambucFind the five equations for $w_0, w_1, ..., w_4$. \\ 2920*ebfedea0SLionel Sambuc8. $w_0 \leftarrow a_0 \cdot b_0$ \\ 2921*ebfedea0SLionel Sambuc9. $w_4 \leftarrow a_2 \cdot b_2$ \\ 2922*ebfedea0SLionel Sambuc10. $tmp_1 \leftarrow 2 \cdot a_0$, $tmp_1 \leftarrow a_1 + tmp_1$, $tmp_1 \leftarrow 2 \cdot tmp_1$, $tmp_1 \leftarrow tmp_1 + a_2$ \\ 2923*ebfedea0SLionel Sambuc11. $tmp_2 \leftarrow 2 \cdot b_0$, $tmp_2 \leftarrow b_1 + tmp_2$, $tmp_2 \leftarrow 2 \cdot tmp_2$, $tmp_2 \leftarrow tmp_2 + b_2$ \\ 2924*ebfedea0SLionel Sambuc12. $w_1 \leftarrow tmp_1 \cdot tmp_2$ \\ 2925*ebfedea0SLionel Sambuc13. $tmp_1 \leftarrow 2 \cdot a_2$, $tmp_1 \leftarrow a_1 + tmp_1$, $tmp_1 \leftarrow 2 \cdot tmp_1$, $tmp_1 \leftarrow tmp_1 + a_0$ \\ 2926*ebfedea0SLionel Sambuc14. $tmp_2 \leftarrow 2 \cdot b_2$, $tmp_2 \leftarrow b_1 + tmp_2$, $tmp_2 \leftarrow 2 \cdot tmp_2$, $tmp_2 \leftarrow tmp_2 + b_0$ \\ 2927*ebfedea0SLionel Sambuc15. $w_3 \leftarrow tmp_1 \cdot tmp_2$ \\ 2928*ebfedea0SLionel Sambuc16. $tmp_1 \leftarrow a_0 + a_1$, $tmp_1 \leftarrow tmp_1 + a_2$, $tmp_2 \leftarrow b_0 + b_1$, $tmp_2 \leftarrow tmp_2 + b_2$ \\ 2929*ebfedea0SLionel Sambuc17. $w_2 \leftarrow tmp_1 \cdot tmp_2$ \\ 2930*ebfedea0SLionel Sambuc\\ 2931*ebfedea0SLionel SambucContinued on the next page.\\ 2932*ebfedea0SLionel Sambuc\hline 2933*ebfedea0SLionel Sambuc\end{tabular} 2934*ebfedea0SLionel Sambuc\end{center} 2935*ebfedea0SLionel Sambuc\end{small} 2936*ebfedea0SLionel Sambuc\caption{Algorithm mp\_toom\_mul} 2937*ebfedea0SLionel Sambuc\end{figure} 2938*ebfedea0SLionel Sambuc 2939*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 2940*ebfedea0SLionel Sambuc\begin{small} 2941*ebfedea0SLionel Sambuc\begin{center} 2942*ebfedea0SLionel Sambuc\begin{tabular}{l} 2943*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_toom\_mul} (continued). \\ 2944*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ and mp\_int $b$ \\ 2945*ebfedea0SLionel Sambuc\textbf{Output}. $c \leftarrow a \cdot b $ \\ 2946*ebfedea0SLionel Sambuc\hline \\ 2947*ebfedea0SLionel SambucNow solve the system of equations. \\ 2948*ebfedea0SLionel Sambuc18. $w_1 \leftarrow w_4 - w_1$, $w_3 \leftarrow w_3 - w_0$ \\ 2949*ebfedea0SLionel Sambuc19. $w_1 \leftarrow \lfloor w_1 / 2 \rfloor$, $w_3 \leftarrow \lfloor w_3 / 2 \rfloor$ \\ 2950*ebfedea0SLionel Sambuc20. $w_2 \leftarrow w_2 - w_0$, $w_2 \leftarrow w_2 - w_4$ \\ 2951*ebfedea0SLionel Sambuc21. $w_1 \leftarrow w_1 - w_2$, $w_3 \leftarrow w_3 - w_2$ \\ 2952*ebfedea0SLionel Sambuc22. $tmp_1 \leftarrow 8 \cdot w_0$, $w_1 \leftarrow w_1 - tmp_1$, $tmp_1 \leftarrow 8 \cdot w_4$, $w_3 \leftarrow w_3 - tmp_1$ \\ 2953*ebfedea0SLionel Sambuc23. $w_2 \leftarrow 3 \cdot w_2$, $w_2 \leftarrow w_2 - w_1$, $w_2 \leftarrow w_2 - w_3$ \\ 2954*ebfedea0SLionel Sambuc24. $w_1 \leftarrow w_1 - w_2$, $w_3 \leftarrow w_3 - w_2$ \\ 2955*ebfedea0SLionel Sambuc25. $w_1 \leftarrow \lfloor w_1 / 3 \rfloor, w_3 \leftarrow \lfloor w_3 / 3 \rfloor$ \\ 2956*ebfedea0SLionel Sambuc\\ 2957*ebfedea0SLionel SambucNow substitute $\beta^k$ for $x$ by shifting $w_0, w_1, ..., w_4$. \\ 2958*ebfedea0SLionel Sambuc26. for $n$ from $1$ to $4$ do \\ 2959*ebfedea0SLionel Sambuc\hspace{3mm}26.1 $w_n \leftarrow w_n \cdot \beta^{nk}$ \\ 2960*ebfedea0SLionel Sambuc27. $c \leftarrow w_0 + w_1$, $c \leftarrow c + w_2$, $c \leftarrow c + w_3$, $c \leftarrow c + w_4$ \\ 2961*ebfedea0SLionel Sambuc28. Return(\textit{MP\_OKAY}) \\ 2962*ebfedea0SLionel Sambuc\hline 2963*ebfedea0SLionel Sambuc\end{tabular} 2964*ebfedea0SLionel Sambuc\end{center} 2965*ebfedea0SLionel Sambuc\end{small} 2966*ebfedea0SLionel Sambuc\caption{Algorithm mp\_toom\_mul (continued)} 2967*ebfedea0SLionel Sambuc\end{figure} 2968*ebfedea0SLionel Sambuc 2969*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_toom\_mul.} 2970*ebfedea0SLionel SambucThis algorithm computes the product of two mp\_int variables $a$ and $b$ using the Toom-Cook approach. Compared to the Karatsuba multiplication, this 2971*ebfedea0SLionel Sambucalgorithm has a lower asymptotic running time of approximately $O(n^{1.464})$ but at an obvious cost in overhead. In this 2972*ebfedea0SLionel Sambucdescription, several statements have been compounded to save space. The intention is that the statements are executed from left to right across 2973*ebfedea0SLionel Sambucany given step. 2974*ebfedea0SLionel Sambuc 2975*ebfedea0SLionel SambucThe two inputs $a$ and $b$ are first split into three $k$-digit integers $a_0, a_1, a_2$ and $b_0, b_1, b_2$ respectively. From these smaller 2976*ebfedea0SLionel Sambucintegers the coefficients of the polynomial basis representations $f(x)$ and $g(x)$ are known and can be used to find the relations required. 2977*ebfedea0SLionel Sambuc 2978*ebfedea0SLionel SambucThe first two relations $w_0$ and $w_4$ are the points $\zeta_{0}$ and $\zeta_{\infty}$ respectively. The relation $w_1, w_2$ and $w_3$ correspond 2979*ebfedea0SLionel Sambucto the points $16 \cdot \zeta_{1 \over 2}, \zeta_{2}$ and $\zeta_{1}$ respectively. These are found using logical shifts to independently find 2980*ebfedea0SLionel Sambuc$f(y)$ and $g(y)$ which significantly speeds up the algorithm. 2981*ebfedea0SLionel Sambuc 2982*ebfedea0SLionel SambucAfter the five relations $w_0, w_1, \ldots, w_4$ have been computed, the system they represent must be solved in order for the unknown coefficients 2983*ebfedea0SLionel Sambuc$w_1, w_2$ and $w_3$ to be isolated. The steps 18 through 25 perform the system reduction required as previously described. Each step of 2984*ebfedea0SLionel Sambucthe reduction represents the comparable matrix operation that would be performed had this been performed by pencil. For example, step 18 indicates 2985*ebfedea0SLionel Sambucthat row $1$ must be subtracted from row $4$ and simultaneously row $0$ subtracted from row $3$. 2986*ebfedea0SLionel Sambuc 2987*ebfedea0SLionel SambucOnce the coeffients have been isolated, the polynomial $W(x) = \sum_{i=0}^{2n} w_i x^i$ is known. By substituting $\beta^{k}$ for $x$, the integer 2988*ebfedea0SLionel Sambucresult $a \cdot b$ is produced. 2989*ebfedea0SLionel Sambuc 2990*ebfedea0SLionel SambucEXAM,bn_mp_toom_mul.c 2991*ebfedea0SLionel Sambuc 2992*ebfedea0SLionel SambucThe first obvious thing to note is that this algorithm is complicated. The complexity is worth it if you are multiplying very 2993*ebfedea0SLionel Sambuclarge numbers. For example, a 10,000 digit multiplication takes approximaly 99,282,205 fewer single precision multiplications with 2994*ebfedea0SLionel SambucToom--Cook than a Comba or baseline approach (this is a savings of more than 99$\%$). For most ``crypto'' sized numbers this 2995*ebfedea0SLionel Sambucalgorithm is not practical as Karatsuba has a much lower cutoff point. 2996*ebfedea0SLionel Sambuc 2997*ebfedea0SLionel SambucFirst we split $a$ and $b$ into three roughly equal portions. This has been accomplished (lines @40,mod@ to @69,rshd@) with 2998*ebfedea0SLionel Sambuccombinations of mp\_rshd() and mp\_mod\_2d() function calls. At this point $a = a2 \cdot \beta^2 + a1 \cdot \beta + a0$ and similiarly 2999*ebfedea0SLionel Sambucfor $b$. 3000*ebfedea0SLionel Sambuc 3001*ebfedea0SLionel SambucNext we compute the five points $w0, w1, w2, w3$ and $w4$. Recall that $w0$ and $w4$ can be computed directly from the portions so 3002*ebfedea0SLionel Sambucwe get those out of the way first (lines @72,mul@ and @77,mul@). Next we compute $w1, w2$ and $w3$ using Horners method. 3003*ebfedea0SLionel Sambuc 3004*ebfedea0SLionel SambucAfter this point we solve for the actual values of $w1, w2$ and $w3$ by reducing the $5 \times 5$ system which is relatively 3005*ebfedea0SLionel Sambucstraight forward. 3006*ebfedea0SLionel Sambuc 3007*ebfedea0SLionel Sambuc\subsection{Signed Multiplication} 3008*ebfedea0SLionel SambucNow that algorithms to handle multiplications of every useful dimensions have been developed, a rather simple finishing touch is required. So far all 3009*ebfedea0SLionel Sambucof the multiplication algorithms have been unsigned multiplications which leaves only a signed multiplication algorithm to be established. 3010*ebfedea0SLionel Sambuc 3011*ebfedea0SLionel Sambuc\begin{figure}[!here] 3012*ebfedea0SLionel Sambuc\begin{small} 3013*ebfedea0SLionel Sambuc\begin{center} 3014*ebfedea0SLionel Sambuc\begin{tabular}{l} 3015*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_mul}. \\ 3016*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ and mp\_int $b$ \\ 3017*ebfedea0SLionel Sambuc\textbf{Output}. $c \leftarrow a \cdot b$ \\ 3018*ebfedea0SLionel Sambuc\hline \\ 3019*ebfedea0SLionel Sambuc1. If $a.sign = b.sign$ then \\ 3020*ebfedea0SLionel Sambuc\hspace{3mm}1.1 $sign = MP\_ZPOS$ \\ 3021*ebfedea0SLionel Sambuc2. else \\ 3022*ebfedea0SLionel Sambuc\hspace{3mm}2.1 $sign = MP\_ZNEG$ \\ 3023*ebfedea0SLionel Sambuc3. If min$(a.used, b.used) \ge TOOM\_MUL\_CUTOFF$ then \\ 3024*ebfedea0SLionel Sambuc\hspace{3mm}3.1 $c \leftarrow a \cdot b$ using algorithm mp\_toom\_mul \\ 3025*ebfedea0SLionel Sambuc4. else if min$(a.used, b.used) \ge KARATSUBA\_MUL\_CUTOFF$ then \\ 3026*ebfedea0SLionel Sambuc\hspace{3mm}4.1 $c \leftarrow a \cdot b$ using algorithm mp\_karatsuba\_mul \\ 3027*ebfedea0SLionel Sambuc5. else \\ 3028*ebfedea0SLionel Sambuc\hspace{3mm}5.1 $digs \leftarrow a.used + b.used + 1$ \\ 3029*ebfedea0SLionel Sambuc\hspace{3mm}5.2 If $digs < MP\_ARRAY$ and min$(a.used, b.used) \le \delta$ then \\ 3030*ebfedea0SLionel Sambuc\hspace{6mm}5.2.1 $c \leftarrow a \cdot b \mbox{ (mod }\beta^{digs}\mbox{)}$ using algorithm fast\_s\_mp\_mul\_digs. \\ 3031*ebfedea0SLionel Sambuc\hspace{3mm}5.3 else \\ 3032*ebfedea0SLionel Sambuc\hspace{6mm}5.3.1 $c \leftarrow a \cdot b \mbox{ (mod }\beta^{digs}\mbox{)}$ using algorithm s\_mp\_mul\_digs. \\ 3033*ebfedea0SLionel Sambuc6. $c.sign \leftarrow sign$ \\ 3034*ebfedea0SLionel Sambuc7. Return the result of the unsigned multiplication performed. \\ 3035*ebfedea0SLionel Sambuc\hline 3036*ebfedea0SLionel Sambuc\end{tabular} 3037*ebfedea0SLionel Sambuc\end{center} 3038*ebfedea0SLionel Sambuc\end{small} 3039*ebfedea0SLionel Sambuc\caption{Algorithm mp\_mul} 3040*ebfedea0SLionel Sambuc\end{figure} 3041*ebfedea0SLionel Sambuc 3042*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_mul.} 3043*ebfedea0SLionel SambucThis algorithm performs the signed multiplication of two inputs. It will make use of any of the three unsigned multiplication algorithms 3044*ebfedea0SLionel Sambucavailable when the input is of appropriate size. The \textbf{sign} of the result is not set until the end of the algorithm since algorithm 3045*ebfedea0SLionel Sambucs\_mp\_mul\_digs will clear it. 3046*ebfedea0SLionel Sambuc 3047*ebfedea0SLionel SambucEXAM,bn_mp_mul.c 3048*ebfedea0SLionel Sambuc 3049*ebfedea0SLionel SambucThe implementation is rather simplistic and is not particularly noteworthy. Line @22,?@ computes the sign of the result using the ``?'' 3050*ebfedea0SLionel Sambucoperator from the C programming language. Line @37,<<@ computes $\delta$ using the fact that $1 << k$ is equal to $2^k$. 3051*ebfedea0SLionel Sambuc 3052*ebfedea0SLionel Sambuc\section{Squaring} 3053*ebfedea0SLionel Sambuc\label{sec:basesquare} 3054*ebfedea0SLionel Sambuc 3055*ebfedea0SLionel SambucSquaring is a special case of multiplication where both multiplicands are equal. At first it may seem like there is no significant optimization 3056*ebfedea0SLionel Sambucavailable but in fact there is. Consider the multiplication of $576$ against $241$. In total there will be nine single precision multiplications 3057*ebfedea0SLionel Sambucperformed which are $1\cdot 6$, $1 \cdot 7$, $1 \cdot 5$, $4 \cdot 6$, $4 \cdot 7$, $4 \cdot 5$, $2 \cdot 6$, $2 \cdot 7$ and $2 \cdot 5$. Now consider 3058*ebfedea0SLionel Sambucthe multiplication of $123$ against $123$. The nine products are $3 \cdot 3$, $3 \cdot 2$, $3 \cdot 1$, $2 \cdot 3$, $2 \cdot 2$, $2 \cdot 1$, 3059*ebfedea0SLionel Sambuc$1 \cdot 3$, $1 \cdot 2$ and $1 \cdot 1$. On closer inspection some of the products are equivalent. For example, $3 \cdot 2 = 2 \cdot 3$ 3060*ebfedea0SLionel Sambucand $3 \cdot 1 = 1 \cdot 3$. 3061*ebfedea0SLionel Sambuc 3062*ebfedea0SLionel SambucFor any $n$-digit input, there are ${{\left (n^2 + n \right)}\over 2}$ possible unique single precision multiplications required compared to the $n^2$ 3063*ebfedea0SLionel Sambucrequired for multiplication. The following diagram gives an example of the operations required. 3064*ebfedea0SLionel Sambuc 3065*ebfedea0SLionel Sambuc\begin{figure}[here] 3066*ebfedea0SLionel Sambuc\begin{center} 3067*ebfedea0SLionel Sambuc\begin{tabular}{ccccc|c} 3068*ebfedea0SLionel Sambuc&&1&2&3&\\ 3069*ebfedea0SLionel Sambuc$\times$ &&1&2&3&\\ 3070*ebfedea0SLionel Sambuc\hline && $3 \cdot 1$ & $3 \cdot 2$ & $3 \cdot 3$ & Row 0\\ 3071*ebfedea0SLionel Sambuc & $2 \cdot 1$ & $2 \cdot 2$ & $2 \cdot 3$ && Row 1 \\ 3072*ebfedea0SLionel Sambuc $1 \cdot 1$ & $1 \cdot 2$ & $1 \cdot 3$ &&& Row 2 \\ 3073*ebfedea0SLionel Sambuc\end{tabular} 3074*ebfedea0SLionel Sambuc\end{center} 3075*ebfedea0SLionel Sambuc\caption{Squaring Optimization Diagram} 3076*ebfedea0SLionel Sambuc\end{figure} 3077*ebfedea0SLionel Sambuc 3078*ebfedea0SLionel SambucMARK,SQUARE 3079*ebfedea0SLionel SambucStarting from zero and numbering the columns from right to left a very simple pattern becomes obvious. For the purposes of this discussion let $x$ 3080*ebfedea0SLionel Sambucrepresent the number being squared. The first observation is that in row $k$ the $2k$'th column of the product has a $\left (x_k \right)^2$ term in it. 3081*ebfedea0SLionel Sambuc 3082*ebfedea0SLionel SambucThe second observation is that every column $j$ in row $k$ where $j \ne 2k$ is part of a double product. Every non-square term of a column will 3083*ebfedea0SLionel Sambucappear twice hence the name ``double product''. Every odd column is made up entirely of double products. In fact every column is made up of double 3084*ebfedea0SLionel Sambucproducts and at most one square (\textit{see the exercise section}). 3085*ebfedea0SLionel Sambuc 3086*ebfedea0SLionel SambucThe third and final observation is that for row $k$ the first unique non-square term, that is, one that hasn't already appeared in an earlier row, 3087*ebfedea0SLionel Sambucoccurs at column $2k + 1$. For example, on row $1$ of the previous squaring, column one is part of the double product with column one from row zero. 3088*ebfedea0SLionel SambucColumn two of row one is a square and column three is the first unique column. 3089*ebfedea0SLionel Sambuc 3090*ebfedea0SLionel Sambuc\subsection{The Baseline Squaring Algorithm} 3091*ebfedea0SLionel SambucThe baseline squaring algorithm is meant to be a catch-all squaring algorithm. It will handle any of the input sizes that the faster routines 3092*ebfedea0SLionel Sambucwill not handle. 3093*ebfedea0SLionel Sambuc 3094*ebfedea0SLionel Sambuc\begin{figure}[!here] 3095*ebfedea0SLionel Sambuc\begin{small} 3096*ebfedea0SLionel Sambuc\begin{center} 3097*ebfedea0SLionel Sambuc\begin{tabular}{l} 3098*ebfedea0SLionel Sambuc\hline Algorithm \textbf{s\_mp\_sqr}. \\ 3099*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ \\ 3100*ebfedea0SLionel Sambuc\textbf{Output}. $b \leftarrow a^2$ \\ 3101*ebfedea0SLionel Sambuc\hline \\ 3102*ebfedea0SLionel Sambuc1. Init a temporary mp\_int of at least $2 \cdot a.used +1$ digits. (\textit{mp\_init\_size}) \\ 3103*ebfedea0SLionel Sambuc2. If step 1 failed return(\textit{MP\_MEM}) \\ 3104*ebfedea0SLionel Sambuc3. $t.used \leftarrow 2 \cdot a.used + 1$ \\ 3105*ebfedea0SLionel Sambuc4. For $ix$ from 0 to $a.used - 1$ do \\ 3106*ebfedea0SLionel Sambuc\hspace{3mm}Calculate the square. \\ 3107*ebfedea0SLionel Sambuc\hspace{3mm}4.1 $\hat r \leftarrow t_{2ix} + \left (a_{ix} \right )^2$ \\ 3108*ebfedea0SLionel Sambuc\hspace{3mm}4.2 $t_{2ix} \leftarrow \hat r \mbox{ (mod }\beta\mbox{)}$ \\ 3109*ebfedea0SLionel Sambuc\hspace{3mm}Calculate the double products after the square. \\ 3110*ebfedea0SLionel Sambuc\hspace{3mm}4.3 $u \leftarrow \lfloor \hat r / \beta \rfloor$ \\ 3111*ebfedea0SLionel Sambuc\hspace{3mm}4.4 For $iy$ from $ix + 1$ to $a.used - 1$ do \\ 3112*ebfedea0SLionel Sambuc\hspace{6mm}4.4.1 $\hat r \leftarrow 2 \cdot a_{ix}a_{iy} + t_{ix + iy} + u$ \\ 3113*ebfedea0SLionel Sambuc\hspace{6mm}4.4.2 $t_{ix + iy} \leftarrow \hat r \mbox{ (mod }\beta\mbox{)}$ \\ 3114*ebfedea0SLionel Sambuc\hspace{6mm}4.4.3 $u \leftarrow \lfloor \hat r / \beta \rfloor$ \\ 3115*ebfedea0SLionel Sambuc\hspace{3mm}Set the last carry. \\ 3116*ebfedea0SLionel Sambuc\hspace{3mm}4.5 While $u > 0$ do \\ 3117*ebfedea0SLionel Sambuc\hspace{6mm}4.5.1 $iy \leftarrow iy + 1$ \\ 3118*ebfedea0SLionel Sambuc\hspace{6mm}4.5.2 $\hat r \leftarrow t_{ix + iy} + u$ \\ 3119*ebfedea0SLionel Sambuc\hspace{6mm}4.5.3 $t_{ix + iy} \leftarrow \hat r \mbox{ (mod }\beta\mbox{)}$ \\ 3120*ebfedea0SLionel Sambuc\hspace{6mm}4.5.4 $u \leftarrow \lfloor \hat r / \beta \rfloor$ \\ 3121*ebfedea0SLionel Sambuc5. Clamp excess digits of $t$. (\textit{mp\_clamp}) \\ 3122*ebfedea0SLionel Sambuc6. Exchange $b$ and $t$. \\ 3123*ebfedea0SLionel Sambuc7. Clear $t$ (\textit{mp\_clear}) \\ 3124*ebfedea0SLionel Sambuc8. Return(\textit{MP\_OKAY}) \\ 3125*ebfedea0SLionel Sambuc\hline 3126*ebfedea0SLionel Sambuc\end{tabular} 3127*ebfedea0SLionel Sambuc\end{center} 3128*ebfedea0SLionel Sambuc\end{small} 3129*ebfedea0SLionel Sambuc\caption{Algorithm s\_mp\_sqr} 3130*ebfedea0SLionel Sambuc\end{figure} 3131*ebfedea0SLionel Sambuc 3132*ebfedea0SLionel Sambuc\textbf{Algorithm s\_mp\_sqr.} 3133*ebfedea0SLionel SambucThis algorithm computes the square of an input using the three observations on squaring. It is based fairly faithfully on algorithm 14.16 of HAC 3134*ebfedea0SLionel Sambuc\cite[pp.596-597]{HAC}. Similar to algorithm s\_mp\_mul\_digs, a temporary mp\_int is allocated to hold the result of the squaring. This allows the 3135*ebfedea0SLionel Sambucdestination mp\_int to be the same as the source mp\_int. 3136*ebfedea0SLionel Sambuc 3137*ebfedea0SLionel SambucThe outer loop of this algorithm begins on step 4. It is best to think of the outer loop as walking down the rows of the partial results, while 3138*ebfedea0SLionel Sambucthe inner loop computes the columns of the partial result. Step 4.1 and 4.2 compute the square term for each row, and step 4.3 and 4.4 propagate 3139*ebfedea0SLionel Sambucthe carry and compute the double products. 3140*ebfedea0SLionel Sambuc 3141*ebfedea0SLionel SambucThe requirement that a mp\_word be able to represent the range $0 \le x < 2 \beta^2$ arises from this 3142*ebfedea0SLionel Sambucvery algorithm. The product $a_{ix}a_{iy}$ will lie in the range $0 \le x \le \beta^2 - 2\beta + 1$ which is obviously less than $\beta^2$ meaning that 3143*ebfedea0SLionel Sambucwhen it is multiplied by two, it can be properly represented by a mp\_word. 3144*ebfedea0SLionel Sambuc 3145*ebfedea0SLionel SambucSimilar to algorithm s\_mp\_mul\_digs, after every pass of the inner loop, the destination is correctly set to the sum of all of the partial 3146*ebfedea0SLionel Sambucresults calculated so far. This involves expensive carry propagation which will be eliminated in the next algorithm. 3147*ebfedea0SLionel Sambuc 3148*ebfedea0SLionel SambucEXAM,bn_s_mp_sqr.c 3149*ebfedea0SLionel Sambuc 3150*ebfedea0SLionel SambucInside the outer loop (line @32,for@) the square term is calculated on line @35,r =@. The carry (line @42,>>@) has been 3151*ebfedea0SLionel Sambucextracted from the mp\_word accumulator using a right shift. Aliases for $a_{ix}$ and $t_{ix+iy}$ are initialized 3152*ebfedea0SLionel Sambuc(lines @45,tmpx@ and @48,tmpt@) to simplify the inner loop. The doubling is performed using two 3153*ebfedea0SLionel Sambucadditions (line @57,r + r@) since it is usually faster than shifting, if not at least as fast. 3154*ebfedea0SLionel Sambuc 3155*ebfedea0SLionel SambucThe important observation is that the inner loop does not begin at $iy = 0$ like for multiplication. As such the inner loops 3156*ebfedea0SLionel Sambucget progressively shorter as the algorithm proceeds. This is what leads to the savings compared to using a multiplication to 3157*ebfedea0SLionel Sambucsquare a number. 3158*ebfedea0SLionel Sambuc 3159*ebfedea0SLionel Sambuc\subsection{Faster Squaring by the ``Comba'' Method} 3160*ebfedea0SLionel SambucA major drawback to the baseline method is the requirement for single precision shifting inside the $O(n^2)$ nested loop. Squaring has an additional 3161*ebfedea0SLionel Sambucdrawback that it must double the product inside the inner loop as well. As for multiplication, the Comba technique can be used to eliminate these 3162*ebfedea0SLionel Sambucperformance hazards. 3163*ebfedea0SLionel Sambuc 3164*ebfedea0SLionel SambucThe first obvious solution is to make an array of mp\_words which will hold all of the columns. This will indeed eliminate all of the carry 3165*ebfedea0SLionel Sambucpropagation operations from the inner loop. However, the inner product must still be doubled $O(n^2)$ times. The solution stems from the simple fact 3166*ebfedea0SLionel Sambucthat $2a + 2b + 2c = 2(a + b + c)$. That is the sum of all of the double products is equal to double the sum of all the products. For example, 3167*ebfedea0SLionel Sambuc$ab + ba + ac + ca = 2ab + 2ac = 2(ab + ac)$. 3168*ebfedea0SLionel Sambuc 3169*ebfedea0SLionel SambucHowever, we cannot simply double all of the columns, since the squares appear only once per row. The most practical solution is to have two 3170*ebfedea0SLionel Sambucmp\_word arrays. One array will hold the squares and the other array will hold the double products. With both arrays the doubling and 3171*ebfedea0SLionel Sambuccarry propagation can be moved to a $O(n)$ work level outside the $O(n^2)$ level. In this case, we have an even simpler solution in mind. 3172*ebfedea0SLionel Sambuc 3173*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 3174*ebfedea0SLionel Sambuc\begin{small} 3175*ebfedea0SLionel Sambuc\begin{center} 3176*ebfedea0SLionel Sambuc\begin{tabular}{l} 3177*ebfedea0SLionel Sambuc\hline Algorithm \textbf{fast\_s\_mp\_sqr}. \\ 3178*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ \\ 3179*ebfedea0SLionel Sambuc\textbf{Output}. $b \leftarrow a^2$ \\ 3180*ebfedea0SLionel Sambuc\hline \\ 3181*ebfedea0SLionel SambucPlace an array of \textbf{MP\_WARRAY} mp\_digits named $W$ on the stack. \\ 3182*ebfedea0SLionel Sambuc1. If $b.alloc < 2a.used + 1$ then grow $b$ to $2a.used + 1$ digits. (\textit{mp\_grow}). \\ 3183*ebfedea0SLionel Sambuc2. If step 1 failed return(\textit{MP\_MEM}). \\ 3184*ebfedea0SLionel Sambuc\\ 3185*ebfedea0SLionel Sambuc3. $pa \leftarrow 2 \cdot a.used$ \\ 3186*ebfedea0SLionel Sambuc4. $\hat W1 \leftarrow 0$ \\ 3187*ebfedea0SLionel Sambuc5. for $ix$ from $0$ to $pa - 1$ do \\ 3188*ebfedea0SLionel Sambuc\hspace{3mm}5.1 $\_ \hat W \leftarrow 0$ \\ 3189*ebfedea0SLionel Sambuc\hspace{3mm}5.2 $ty \leftarrow \mbox{MIN}(a.used - 1, ix)$ \\ 3190*ebfedea0SLionel Sambuc\hspace{3mm}5.3 $tx \leftarrow ix - ty$ \\ 3191*ebfedea0SLionel Sambuc\hspace{3mm}5.4 $iy \leftarrow \mbox{MIN}(a.used - tx, ty + 1)$ \\ 3192*ebfedea0SLionel Sambuc\hspace{3mm}5.5 $iy \leftarrow \mbox{MIN}(iy, \lfloor \left (ty - tx + 1 \right )/2 \rfloor)$ \\ 3193*ebfedea0SLionel Sambuc\hspace{3mm}5.6 for $iz$ from $0$ to $iz - 1$ do \\ 3194*ebfedea0SLionel Sambuc\hspace{6mm}5.6.1 $\_ \hat W \leftarrow \_ \hat W + a_{tx + iz}a_{ty - iz}$ \\ 3195*ebfedea0SLionel Sambuc\hspace{3mm}5.7 $\_ \hat W \leftarrow 2 \cdot \_ \hat W + \hat W1$ \\ 3196*ebfedea0SLionel Sambuc\hspace{3mm}5.8 if $ix$ is even then \\ 3197*ebfedea0SLionel Sambuc\hspace{6mm}5.8.1 $\_ \hat W \leftarrow \_ \hat W + \left ( a_{\lfloor ix/2 \rfloor}\right )^2$ \\ 3198*ebfedea0SLionel Sambuc\hspace{3mm}5.9 $W_{ix} \leftarrow \_ \hat W (\mbox{mod }\beta)$ \\ 3199*ebfedea0SLionel Sambuc\hspace{3mm}5.10 $\hat W1 \leftarrow \lfloor \_ \hat W / \beta \rfloor$ \\ 3200*ebfedea0SLionel Sambuc\\ 3201*ebfedea0SLionel Sambuc6. $oldused \leftarrow b.used$ \\ 3202*ebfedea0SLionel Sambuc7. $b.used \leftarrow 2 \cdot a.used$ \\ 3203*ebfedea0SLionel Sambuc8. for $ix$ from $0$ to $pa - 1$ do \\ 3204*ebfedea0SLionel Sambuc\hspace{3mm}8.1 $b_{ix} \leftarrow W_{ix}$ \\ 3205*ebfedea0SLionel Sambuc9. for $ix$ from $pa$ to $oldused - 1$ do \\ 3206*ebfedea0SLionel Sambuc\hspace{3mm}9.1 $b_{ix} \leftarrow 0$ \\ 3207*ebfedea0SLionel Sambuc10. Clamp excess digits from $b$. (\textit{mp\_clamp}) \\ 3208*ebfedea0SLionel Sambuc11. Return(\textit{MP\_OKAY}). \\ 3209*ebfedea0SLionel Sambuc\hline 3210*ebfedea0SLionel Sambuc\end{tabular} 3211*ebfedea0SLionel Sambuc\end{center} 3212*ebfedea0SLionel Sambuc\end{small} 3213*ebfedea0SLionel Sambuc\caption{Algorithm fast\_s\_mp\_sqr} 3214*ebfedea0SLionel Sambuc\end{figure} 3215*ebfedea0SLionel Sambuc 3216*ebfedea0SLionel Sambuc\textbf{Algorithm fast\_s\_mp\_sqr.} 3217*ebfedea0SLionel SambucThis algorithm computes the square of an input using the Comba technique. It is designed to be a replacement for algorithm 3218*ebfedea0SLionel Sambucs\_mp\_sqr when the number of input digits is less than \textbf{MP\_WARRAY} and less than $\delta \over 2$. 3219*ebfedea0SLionel SambucThis algorithm is very similar to the Comba multiplier except with a few key differences we shall make note of. 3220*ebfedea0SLionel Sambuc 3221*ebfedea0SLionel SambucFirst, we have an accumulator and carry variables $\_ \hat W$ and $\hat W1$ respectively. This is because the inner loop 3222*ebfedea0SLionel Sambucproducts are to be doubled. If we had added the previous carry in we would be doubling too much. Next we perform an 3223*ebfedea0SLionel Sambucaddition MIN condition on $iy$ (step 5.5) to prevent overlapping digits. For example, $a_3 \cdot a_5$ is equal 3224*ebfedea0SLionel Sambuc$a_5 \cdot a_3$. Whereas in the multiplication case we would have $5 < a.used$ and $3 \ge 0$ is maintained since we double the sum 3225*ebfedea0SLionel Sambucof the products just outside the inner loop we have to avoid doing this. This is also a good thing since we perform 3226*ebfedea0SLionel Sambucfewer multiplications and the routine ends up being faster. 3227*ebfedea0SLionel Sambuc 3228*ebfedea0SLionel SambucFinally the last difference is the addition of the ``square'' term outside the inner loop (step 5.8). We add in the square 3229*ebfedea0SLionel Sambuconly to even outputs and it is the square of the term at the $\lfloor ix / 2 \rfloor$ position. 3230*ebfedea0SLionel Sambuc 3231*ebfedea0SLionel SambucEXAM,bn_fast_s_mp_sqr.c 3232*ebfedea0SLionel Sambuc 3233*ebfedea0SLionel SambucThis implementation is essentially a copy of Comba multiplication with the appropriate changes added to make it faster for 3234*ebfedea0SLionel Sambucthe special case of squaring. 3235*ebfedea0SLionel Sambuc 3236*ebfedea0SLionel Sambuc\subsection{Polynomial Basis Squaring} 3237*ebfedea0SLionel SambucThe same algorithm that performs optimal polynomial basis multiplication can be used to perform polynomial basis squaring. The minor exception 3238*ebfedea0SLionel Sambucis that $\zeta_y = f(y)g(y)$ is actually equivalent to $\zeta_y = f(y)^2$ since $f(y) = g(y)$. Instead of performing $2n + 1$ 3239*ebfedea0SLionel Sambucmultiplications to find the $\zeta$ relations, squaring operations are performed instead. 3240*ebfedea0SLionel Sambuc 3241*ebfedea0SLionel Sambuc\subsection{Karatsuba Squaring} 3242*ebfedea0SLionel SambucLet $f(x) = ax + b$ represent the polynomial basis representation of a number to square. 3243*ebfedea0SLionel SambucLet $h(x) = \left ( f(x) \right )^2$ represent the square of the polynomial. The Karatsuba equation can be modified to square a 3244*ebfedea0SLionel Sambucnumber with the following equation. 3245*ebfedea0SLionel Sambuc 3246*ebfedea0SLionel Sambuc\begin{equation} 3247*ebfedea0SLionel Sambuch(x) = a^2x^2 + \left ((a + b)^2 - (a^2 + b^2) \right )x + b^2 3248*ebfedea0SLionel Sambuc\end{equation} 3249*ebfedea0SLionel Sambuc 3250*ebfedea0SLionel SambucUpon closer inspection this equation only requires the calculation of three half-sized squares: $a^2$, $b^2$ and $(a + b)^2$. As in 3251*ebfedea0SLionel SambucKaratsuba multiplication, this algorithm can be applied recursively on the input and will achieve an asymptotic running time of 3252*ebfedea0SLionel Sambuc$O \left ( n^{lg(3)} \right )$. 3253*ebfedea0SLionel Sambuc 3254*ebfedea0SLionel SambucIf the asymptotic times of Karatsuba squaring and multiplication are the same, why not simply use the multiplication algorithm 3255*ebfedea0SLionel Sambucinstead? The answer to this arises from the cutoff point for squaring. As in multiplication there exists a cutoff point, at which the 3256*ebfedea0SLionel Sambuctime required for a Comba based squaring and a Karatsuba based squaring meet. Due to the overhead inherent in the Karatsuba method, the cutoff 3257*ebfedea0SLionel Sambucpoint is fairly high. For example, on an AMD Athlon XP processor with $\beta = 2^{28}$, the cutoff point is around 127 digits. 3258*ebfedea0SLionel Sambuc 3259*ebfedea0SLionel SambucConsider squaring a 200 digit number with this technique. It will be split into two 100 digit halves which are subsequently squared. 3260*ebfedea0SLionel SambucThe 100 digit halves will not be squared using Karatsuba, but instead using the faster Comba based squaring algorithm. If Karatsuba multiplication 3261*ebfedea0SLionel Sambucwere used instead, the 100 digit numbers would be squared with a slower Comba based multiplication. 3262*ebfedea0SLionel Sambuc 3263*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 3264*ebfedea0SLionel Sambuc\begin{small} 3265*ebfedea0SLionel Sambuc\begin{center} 3266*ebfedea0SLionel Sambuc\begin{tabular}{l} 3267*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_karatsuba\_sqr}. \\ 3268*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ \\ 3269*ebfedea0SLionel Sambuc\textbf{Output}. $b \leftarrow a^2$ \\ 3270*ebfedea0SLionel Sambuc\hline \\ 3271*ebfedea0SLionel Sambuc1. Initialize the following temporary mp\_ints: $x0$, $x1$, $t1$, $t2$, $x0x0$ and $x1x1$. \\ 3272*ebfedea0SLionel Sambuc2. If any of the initializations on step 1 failed return(\textit{MP\_MEM}). \\ 3273*ebfedea0SLionel Sambuc\\ 3274*ebfedea0SLionel SambucSplit the input. e.g. $a = x1\beta^B + x0$ \\ 3275*ebfedea0SLionel Sambuc3. $B \leftarrow \lfloor a.used / 2 \rfloor$ \\ 3276*ebfedea0SLionel Sambuc4. $x0 \leftarrow a \mbox{ (mod }\beta^B\mbox{)}$ (\textit{mp\_mod\_2d}) \\ 3277*ebfedea0SLionel Sambuc5. $x1 \leftarrow \lfloor a / \beta^B \rfloor$ (\textit{mp\_lshd}) \\ 3278*ebfedea0SLionel Sambuc\\ 3279*ebfedea0SLionel SambucCalculate the three squares. \\ 3280*ebfedea0SLionel Sambuc6. $x0x0 \leftarrow x0^2$ (\textit{mp\_sqr}) \\ 3281*ebfedea0SLionel Sambuc7. $x1x1 \leftarrow x1^2$ \\ 3282*ebfedea0SLionel Sambuc8. $t1 \leftarrow x1 + x0$ (\textit{s\_mp\_add}) \\ 3283*ebfedea0SLionel Sambuc9. $t1 \leftarrow t1^2$ \\ 3284*ebfedea0SLionel Sambuc\\ 3285*ebfedea0SLionel SambucCompute the middle term. \\ 3286*ebfedea0SLionel Sambuc10. $t2 \leftarrow x0x0 + x1x1$ (\textit{s\_mp\_add}) \\ 3287*ebfedea0SLionel Sambuc11. $t1 \leftarrow t1 - t2$ \\ 3288*ebfedea0SLionel Sambuc\\ 3289*ebfedea0SLionel SambucCompute final product. \\ 3290*ebfedea0SLionel Sambuc12. $t1 \leftarrow t1\beta^B$ (\textit{mp\_lshd}) \\ 3291*ebfedea0SLionel Sambuc13. $x1x1 \leftarrow x1x1\beta^{2B}$ \\ 3292*ebfedea0SLionel Sambuc14. $t1 \leftarrow t1 + x0x0$ \\ 3293*ebfedea0SLionel Sambuc15. $b \leftarrow t1 + x1x1$ \\ 3294*ebfedea0SLionel Sambuc16. Return(\textit{MP\_OKAY}). \\ 3295*ebfedea0SLionel Sambuc\hline 3296*ebfedea0SLionel Sambuc\end{tabular} 3297*ebfedea0SLionel Sambuc\end{center} 3298*ebfedea0SLionel Sambuc\end{small} 3299*ebfedea0SLionel Sambuc\caption{Algorithm mp\_karatsuba\_sqr} 3300*ebfedea0SLionel Sambuc\end{figure} 3301*ebfedea0SLionel Sambuc 3302*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_karatsuba\_sqr.} 3303*ebfedea0SLionel SambucThis algorithm computes the square of an input $a$ using the Karatsuba technique. This algorithm is very similar to the Karatsuba based 3304*ebfedea0SLionel Sambucmultiplication algorithm with the exception that the three half-size multiplications have been replaced with three half-size squarings. 3305*ebfedea0SLionel Sambuc 3306*ebfedea0SLionel SambucThe radix point for squaring is simply placed exactly in the middle of the digits when the input has an odd number of digits, otherwise it is 3307*ebfedea0SLionel Sambucplaced just below the middle. Step 3, 4 and 5 compute the two halves required using $B$ 3308*ebfedea0SLionel Sambucas the radix point. The first two squares in steps 6 and 7 are rather straightforward while the last square is of a more compact form. 3309*ebfedea0SLionel Sambuc 3310*ebfedea0SLionel SambucBy expanding $\left (x1 + x0 \right )^2$, the $x1^2$ and $x0^2$ terms in the middle disappear, that is $(x0 - x1)^2 - (x1^2 + x0^2) = 2 \cdot x0 \cdot x1$. 3311*ebfedea0SLionel SambucNow if $5n$ single precision additions and a squaring of $n$-digits is faster than multiplying two $n$-digit numbers and doubling then 3312*ebfedea0SLionel Sambucthis method is faster. Assuming no further recursions occur, the difference can be estimated with the following inequality. 3313*ebfedea0SLionel Sambuc 3314*ebfedea0SLionel SambucLet $p$ represent the cost of a single precision addition and $q$ the cost of a single precision multiplication both in terms of time\footnote{Or 3315*ebfedea0SLionel Sambucmachine clock cycles.}. 3316*ebfedea0SLionel Sambuc 3317*ebfedea0SLionel Sambuc\begin{equation} 3318*ebfedea0SLionel Sambuc5pn +{{q(n^2 + n)} \over 2} \le pn + qn^2 3319*ebfedea0SLionel Sambuc\end{equation} 3320*ebfedea0SLionel Sambuc 3321*ebfedea0SLionel SambucFor example, on an AMD Athlon XP processor $p = {1 \over 3}$ and $q = 6$. This implies that the following inequality should hold. 3322*ebfedea0SLionel Sambuc\begin{center} 3323*ebfedea0SLionel Sambuc\begin{tabular}{rcl} 3324*ebfedea0SLionel Sambuc${5n \over 3} + 3n^2 + 3n$ & $<$ & ${n \over 3} + 6n^2$ \\ 3325*ebfedea0SLionel Sambuc${5 \over 3} + 3n + 3$ & $<$ & ${1 \over 3} + 6n$ \\ 3326*ebfedea0SLionel Sambuc${13 \over 9}$ & $<$ & $n$ \\ 3327*ebfedea0SLionel Sambuc\end{tabular} 3328*ebfedea0SLionel Sambuc\end{center} 3329*ebfedea0SLionel Sambuc 3330*ebfedea0SLionel SambucThis results in a cutoff point around $n = 2$. As a consequence it is actually faster to compute the middle term the ``long way'' on processors 3331*ebfedea0SLionel Sambucwhere multiplication is substantially slower\footnote{On the Athlon there is a 1:17 ratio between clock cycles for addition and multiplication. On 3332*ebfedea0SLionel Sambucthe Intel P4 processor this ratio is 1:29 making this method even more beneficial. The only common exception is the ARMv4 processor which has a 3333*ebfedea0SLionel Sambucratio of 1:7. } than simpler operations such as addition. 3334*ebfedea0SLionel Sambuc 3335*ebfedea0SLionel SambucEXAM,bn_mp_karatsuba_sqr.c 3336*ebfedea0SLionel Sambuc 3337*ebfedea0SLionel SambucThis implementation is largely based on the implementation of algorithm mp\_karatsuba\_mul. It uses the same inline style to copy and 3338*ebfedea0SLionel Sambucshift the input into the two halves. The loop from line @54,{@ to line @70,}@ has been modified since only one input exists. The \textbf{used} 3339*ebfedea0SLionel Sambuccount of both $x0$ and $x1$ is fixed up and $x0$ is clamped before the calculations begin. At this point $x1$ and $x0$ are valid equivalents 3340*ebfedea0SLionel Sambucto the respective halves as if mp\_rshd and mp\_mod\_2d had been used. 3341*ebfedea0SLionel Sambuc 3342*ebfedea0SLionel SambucBy inlining the copy and shift operations the cutoff point for Karatsuba multiplication can be lowered. On the Athlon the cutoff point 3343*ebfedea0SLionel Sambucis exactly at the point where Comba squaring can no longer be used (\textit{128 digits}). On slower processors such as the Intel P4 3344*ebfedea0SLionel Sambucit is actually below the Comba limit (\textit{at 110 digits}). 3345*ebfedea0SLionel Sambuc 3346*ebfedea0SLionel SambucThis routine uses the same error trap coding style as mp\_karatsuba\_sqr. As the temporary variables are initialized errors are 3347*ebfedea0SLionel Sambucredirected to the error trap higher up. If the algorithm completes without error the error code is set to \textbf{MP\_OKAY} and 3348*ebfedea0SLionel Sambucmp\_clears are executed normally. 3349*ebfedea0SLionel Sambuc 3350*ebfedea0SLionel Sambuc\subsection{Toom-Cook Squaring} 3351*ebfedea0SLionel SambucThe Toom-Cook squaring algorithm mp\_toom\_sqr is heavily based on the algorithm mp\_toom\_mul with the exception that squarings are used 3352*ebfedea0SLionel Sambucinstead of multiplication to find the five relations. The reader is encouraged to read the description of the latter algorithm and try to 3353*ebfedea0SLionel Sambucderive their own Toom-Cook squaring algorithm. 3354*ebfedea0SLionel Sambuc 3355*ebfedea0SLionel Sambuc\subsection{High Level Squaring} 3356*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 3357*ebfedea0SLionel Sambuc\begin{small} 3358*ebfedea0SLionel Sambuc\begin{center} 3359*ebfedea0SLionel Sambuc\begin{tabular}{l} 3360*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_sqr}. \\ 3361*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ \\ 3362*ebfedea0SLionel Sambuc\textbf{Output}. $b \leftarrow a^2$ \\ 3363*ebfedea0SLionel Sambuc\hline \\ 3364*ebfedea0SLionel Sambuc1. If $a.used \ge TOOM\_SQR\_CUTOFF$ then \\ 3365*ebfedea0SLionel Sambuc\hspace{3mm}1.1 $b \leftarrow a^2$ using algorithm mp\_toom\_sqr \\ 3366*ebfedea0SLionel Sambuc2. else if $a.used \ge KARATSUBA\_SQR\_CUTOFF$ then \\ 3367*ebfedea0SLionel Sambuc\hspace{3mm}2.1 $b \leftarrow a^2$ using algorithm mp\_karatsuba\_sqr \\ 3368*ebfedea0SLionel Sambuc3. else \\ 3369*ebfedea0SLionel Sambuc\hspace{3mm}3.1 $digs \leftarrow a.used + b.used + 1$ \\ 3370*ebfedea0SLionel Sambuc\hspace{3mm}3.2 If $digs < MP\_ARRAY$ and $a.used \le \delta$ then \\ 3371*ebfedea0SLionel Sambuc\hspace{6mm}3.2.1 $b \leftarrow a^2$ using algorithm fast\_s\_mp\_sqr. \\ 3372*ebfedea0SLionel Sambuc\hspace{3mm}3.3 else \\ 3373*ebfedea0SLionel Sambuc\hspace{6mm}3.3.1 $b \leftarrow a^2$ using algorithm s\_mp\_sqr. \\ 3374*ebfedea0SLionel Sambuc4. $b.sign \leftarrow MP\_ZPOS$ \\ 3375*ebfedea0SLionel Sambuc5. Return the result of the unsigned squaring performed. \\ 3376*ebfedea0SLionel Sambuc\hline 3377*ebfedea0SLionel Sambuc\end{tabular} 3378*ebfedea0SLionel Sambuc\end{center} 3379*ebfedea0SLionel Sambuc\end{small} 3380*ebfedea0SLionel Sambuc\caption{Algorithm mp\_sqr} 3381*ebfedea0SLionel Sambuc\end{figure} 3382*ebfedea0SLionel Sambuc 3383*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_sqr.} 3384*ebfedea0SLionel SambucThis algorithm computes the square of the input using one of four different algorithms. If the input is very large and has at least 3385*ebfedea0SLionel Sambuc\textbf{TOOM\_SQR\_CUTOFF} or \textbf{KARATSUBA\_SQR\_CUTOFF} digits then either the Toom-Cook or the Karatsuba Squaring algorithm is used. If 3386*ebfedea0SLionel Sambucneither of the polynomial basis algorithms should be used then either the Comba or baseline algorithm is used. 3387*ebfedea0SLionel Sambuc 3388*ebfedea0SLionel SambucEXAM,bn_mp_sqr.c 3389*ebfedea0SLionel Sambuc 3390*ebfedea0SLionel Sambuc\section*{Exercises} 3391*ebfedea0SLionel Sambuc\begin{tabular}{cl} 3392*ebfedea0SLionel Sambuc$\left [ 3 \right ] $ & Devise an efficient algorithm for selection of the radix point to handle inputs \\ 3393*ebfedea0SLionel Sambuc & that have different number of digits in Karatsuba multiplication. \\ 3394*ebfedea0SLionel Sambuc & \\ 3395*ebfedea0SLionel Sambuc$\left [ 2 \right ] $ & In ~SQUARE~ the fact that every column of a squaring is made up \\ 3396*ebfedea0SLionel Sambuc & of double products and at most one square is stated. Prove this statement. \\ 3397*ebfedea0SLionel Sambuc & \\ 3398*ebfedea0SLionel Sambuc$\left [ 3 \right ] $ & Prove the equation for Karatsuba squaring. \\ 3399*ebfedea0SLionel Sambuc & \\ 3400*ebfedea0SLionel Sambuc$\left [ 1 \right ] $ & Prove that Karatsuba squaring requires $O \left (n^{lg(3)} \right )$ time. \\ 3401*ebfedea0SLionel Sambuc & \\ 3402*ebfedea0SLionel Sambuc$\left [ 2 \right ] $ & Determine the minimal ratio between addition and multiplication clock cycles \\ 3403*ebfedea0SLionel Sambuc & required for equation $6.7$ to be true. \\ 3404*ebfedea0SLionel Sambuc & \\ 3405*ebfedea0SLionel Sambuc$\left [ 3 \right ] $ & Implement a threaded version of Comba multiplication (and squaring) where you \\ 3406*ebfedea0SLionel Sambuc & compute subsets of the columns in each thread. Determine a cutoff point where \\ 3407*ebfedea0SLionel Sambuc & it is effective and add the logic to mp\_mul() and mp\_sqr(). \\ 3408*ebfedea0SLionel Sambuc &\\ 3409*ebfedea0SLionel Sambuc$\left [ 4 \right ] $ & Same as the previous but also modify the Karatsuba and Toom-Cook. You must \\ 3410*ebfedea0SLionel Sambuc & increase the throughput of mp\_exptmod() for random odd moduli in the range \\ 3411*ebfedea0SLionel Sambuc & $512 \ldots 4096$ bits significantly ($> 2x$) to complete this challenge. \\ 3412*ebfedea0SLionel Sambuc & \\ 3413*ebfedea0SLionel Sambuc\end{tabular} 3414*ebfedea0SLionel Sambuc 3415*ebfedea0SLionel Sambuc\chapter{Modular Reduction} 3416*ebfedea0SLionel SambucMARK,REDUCTION 3417*ebfedea0SLionel Sambuc\section{Basics of Modular Reduction} 3418*ebfedea0SLionel Sambuc\index{modular residue} 3419*ebfedea0SLionel SambucModular reduction is an operation that arises quite often within public key cryptography algorithms and various number theoretic algorithms, 3420*ebfedea0SLionel Sambucsuch as factoring. Modular reduction algorithms are the third class of algorithms of the ``multipliers'' set. A number $a$ is said to be \textit{reduced} 3421*ebfedea0SLionel Sambucmodulo another number $b$ by finding the remainder of the division $a/b$. Full integer division with remainder is a topic to be covered 3422*ebfedea0SLionel Sambucin~\ref{sec:division}. 3423*ebfedea0SLionel Sambuc 3424*ebfedea0SLionel SambucModular reduction is equivalent to solving for $r$ in the following equation. $a = bq + r$ where $q = \lfloor a/b \rfloor$. The result 3425*ebfedea0SLionel Sambuc$r$ is said to be ``congruent to $a$ modulo $b$'' which is also written as $r \equiv a \mbox{ (mod }b\mbox{)}$. In other vernacular $r$ is known as the 3426*ebfedea0SLionel Sambuc``modular residue'' which leads to ``quadratic residue''\footnote{That's fancy talk for $b \equiv a^2 \mbox{ (mod }p\mbox{)}$.} and 3427*ebfedea0SLionel Sambucother forms of residues. 3428*ebfedea0SLionel Sambuc 3429*ebfedea0SLionel SambucModular reductions are normally used to create either finite groups, rings or fields. The most common usage for performance driven modular reductions 3430*ebfedea0SLionel Sambucis in modular exponentiation algorithms. That is to compute $d = a^b \mbox{ (mod }c\mbox{)}$ as fast as possible. This operation is used in the 3431*ebfedea0SLionel SambucRSA and Diffie-Hellman public key algorithms, for example. Modular multiplication and squaring also appears as a fundamental operation in 3432*ebfedea0SLionel Sambucelliptic curve cryptographic algorithms. As will be discussed in the subsequent chapter there exist fast algorithms for computing modular 3433*ebfedea0SLionel Sambucexponentiations without having to perform (\textit{in this example}) $b - 1$ multiplications. These algorithms will produce partial results in the 3434*ebfedea0SLionel Sambucrange $0 \le x < c^2$ which can be taken advantage of to create several efficient algorithms. They have also been used to create redundancy check 3435*ebfedea0SLionel Sambucalgorithms known as CRCs, error correction codes such as Reed-Solomon and solve a variety of number theoeretic problems. 3436*ebfedea0SLionel Sambuc 3437*ebfedea0SLionel Sambuc\section{The Barrett Reduction} 3438*ebfedea0SLionel SambucThe Barrett reduction algorithm \cite{BARRETT} was inspired by fast division algorithms which multiply by the reciprocal to emulate 3439*ebfedea0SLionel Sambucdivision. Barretts observation was that the residue $c$ of $a$ modulo $b$ is equal to 3440*ebfedea0SLionel Sambuc 3441*ebfedea0SLionel Sambuc\begin{equation} 3442*ebfedea0SLionel Sambucc = a - b \cdot \lfloor a/b \rfloor 3443*ebfedea0SLionel Sambuc\end{equation} 3444*ebfedea0SLionel Sambuc 3445*ebfedea0SLionel SambucSince algorithms such as modular exponentiation would be using the same modulus extensively, typical DSP\footnote{It is worth noting that Barrett's paper 3446*ebfedea0SLionel Sambuctargeted the DSP56K processor.} intuition would indicate the next step would be to replace $a/b$ by a multiplication by the reciprocal. However, 3447*ebfedea0SLionel SambucDSP intuition on its own will not work as these numbers are considerably larger than the precision of common DSP floating point data types. 3448*ebfedea0SLionel SambucIt would take another common optimization to optimize the algorithm. 3449*ebfedea0SLionel Sambuc 3450*ebfedea0SLionel Sambuc\subsection{Fixed Point Arithmetic} 3451*ebfedea0SLionel SambucThe trick used to optimize the above equation is based on a technique of emulating floating point data types with fixed precision integers. Fixed 3452*ebfedea0SLionel Sambucpoint arithmetic would become very popular as it greatly optimize the ``3d-shooter'' genre of games in the mid 1990s when floating point units were 3453*ebfedea0SLionel Sambucfairly slow if not unavailable. The idea behind fixed point arithmetic is to take a normal $k$-bit integer data type and break it into $p$-bit 3454*ebfedea0SLionel Sambucinteger and a $q$-bit fraction part (\textit{where $p+q = k$}). 3455*ebfedea0SLionel Sambuc 3456*ebfedea0SLionel SambucIn this system a $k$-bit integer $n$ would actually represent $n/2^q$. For example, with $q = 4$ the integer $n = 37$ would actually represent the 3457*ebfedea0SLionel Sambucvalue $2.3125$. To multiply two fixed point numbers the integers are multiplied using traditional arithmetic and subsequently normalized by 3458*ebfedea0SLionel Sambucmoving the implied decimal point back to where it should be. For example, with $q = 4$ to multiply the integers $9$ and $5$ they must be converted 3459*ebfedea0SLionel Sambucto fixed point first by multiplying by $2^q$. Let $a = 9(2^q)$ represent the fixed point representation of $9$ and $b = 5(2^q)$ represent the 3460*ebfedea0SLionel Sambucfixed point representation of $5$. The product $ab$ is equal to $45(2^{2q})$ which when normalized by dividing by $2^q$ produces $45(2^q)$. 3461*ebfedea0SLionel Sambuc 3462*ebfedea0SLionel SambucThis technique became popular since a normal integer multiplication and logical shift right are the only required operations to perform a multiplication 3463*ebfedea0SLionel Sambucof two fixed point numbers. Using fixed point arithmetic, division can be easily approximated by multiplying by the reciprocal. If $2^q$ is 3464*ebfedea0SLionel Sambucequivalent to one than $2^q/b$ is equivalent to the fixed point approximation of $1/b$ using real arithmetic. Using this fact dividing an integer 3465*ebfedea0SLionel Sambuc$a$ by another integer $b$ can be achieved with the following expression. 3466*ebfedea0SLionel Sambuc 3467*ebfedea0SLionel Sambuc\begin{equation} 3468*ebfedea0SLionel Sambuc\lfloor a / b \rfloor \mbox{ }\approx\mbox{ } \lfloor (a \cdot \lfloor 2^q / b \rfloor)/2^q \rfloor 3469*ebfedea0SLionel Sambuc\end{equation} 3470*ebfedea0SLionel Sambuc 3471*ebfedea0SLionel SambucThe precision of the division is proportional to the value of $q$. If the divisor $b$ is used frequently as is the case with 3472*ebfedea0SLionel Sambucmodular exponentiation pre-computing $2^q/b$ will allow a division to be performed with a multiplication and a right shift. Both operations 3473*ebfedea0SLionel Sambucare considerably faster than division on most processors. 3474*ebfedea0SLionel Sambuc 3475*ebfedea0SLionel SambucConsider dividing $19$ by $5$. The correct result is $\lfloor 19/5 \rfloor = 3$. With $q = 3$ the reciprocal is $\lfloor 2^q/5 \rfloor = 1$ which 3476*ebfedea0SLionel Sambucleads to a product of $19$ which when divided by $2^q$ produces $2$. However, with $q = 4$ the reciprocal is $\lfloor 2^q/5 \rfloor = 3$ and 3477*ebfedea0SLionel Sambucthe result of the emulated division is $\lfloor 3 \cdot 19 / 2^q \rfloor = 3$ which is correct. The value of $2^q$ must be close to or ideally 3478*ebfedea0SLionel Sambuclarger than the dividend. In effect if $a$ is the dividend then $q$ should allow $0 \le \lfloor a/2^q \rfloor \le 1$ in order for this approach 3479*ebfedea0SLionel Sambucto work correctly. Plugging this form of divison into the original equation the following modular residue equation arises. 3480*ebfedea0SLionel Sambuc 3481*ebfedea0SLionel Sambuc\begin{equation} 3482*ebfedea0SLionel Sambucc = a - b \cdot \lfloor (a \cdot \lfloor 2^q / b \rfloor)/2^q \rfloor 3483*ebfedea0SLionel Sambuc\end{equation} 3484*ebfedea0SLionel Sambuc 3485*ebfedea0SLionel SambucUsing the notation from \cite{BARRETT} the value of $\lfloor 2^q / b \rfloor$ will be represented by the $\mu$ symbol. Using the $\mu$ 3486*ebfedea0SLionel Sambucvariable also helps re-inforce the idea that it is meant to be computed once and re-used. 3487*ebfedea0SLionel Sambuc 3488*ebfedea0SLionel Sambuc\begin{equation} 3489*ebfedea0SLionel Sambucc = a - b \cdot \lfloor (a \cdot \mu)/2^q \rfloor 3490*ebfedea0SLionel Sambuc\end{equation} 3491*ebfedea0SLionel Sambuc 3492*ebfedea0SLionel SambucProvided that $2^q \ge a$ this algorithm will produce a quotient that is either exactly correct or off by a value of one. In the context of Barrett 3493*ebfedea0SLionel Sambucreduction the value of $a$ is bound by $0 \le a \le (b - 1)^2$ meaning that $2^q \ge b^2$ is sufficient to ensure the reciprocal will have enough 3494*ebfedea0SLionel Sambucprecision. 3495*ebfedea0SLionel Sambuc 3496*ebfedea0SLionel SambucLet $n$ represent the number of digits in $b$. This algorithm requires approximately $2n^2$ single precision multiplications to produce the quotient and 3497*ebfedea0SLionel Sambucanother $n^2$ single precision multiplications to find the residue. In total $3n^2$ single precision multiplications are required to 3498*ebfedea0SLionel Sambucreduce the number. 3499*ebfedea0SLionel Sambuc 3500*ebfedea0SLionel SambucFor example, if $b = 1179677$ and $q = 41$ ($2^q > b^2$), then the reciprocal $\mu$ is equal to $\lfloor 2^q / b \rfloor = 1864089$. Consider reducing 3501*ebfedea0SLionel Sambuc$a = 180388626447$ modulo $b$ using the above reduction equation. The quotient using the new formula is $\lfloor (a \cdot \mu) / 2^q \rfloor = 152913$. 3502*ebfedea0SLionel SambucBy subtracting $152913b$ from $a$ the correct residue $a \equiv 677346 \mbox{ (mod }b\mbox{)}$ is found. 3503*ebfedea0SLionel Sambuc 3504*ebfedea0SLionel Sambuc\subsection{Choosing a Radix Point} 3505*ebfedea0SLionel SambucUsing the fixed point representation a modular reduction can be performed with $3n^2$ single precision multiplications. If that were the best 3506*ebfedea0SLionel Sambucthat could be achieved a full division\footnote{A division requires approximately $O(2cn^2)$ single precision multiplications for a small value of $c$. 3507*ebfedea0SLionel SambucSee~\ref{sec:division} for further details.} might as well be used in its place. The key to optimizing the reduction is to reduce the precision of 3508*ebfedea0SLionel Sambucthe initial multiplication that finds the quotient. 3509*ebfedea0SLionel Sambuc 3510*ebfedea0SLionel SambucLet $a$ represent the number of which the residue is sought. Let $b$ represent the modulus used to find the residue. Let $m$ represent 3511*ebfedea0SLionel Sambucthe number of digits in $b$. For the purposes of this discussion we will assume that the number of digits in $a$ is $2m$, which is generally true if 3512*ebfedea0SLionel Sambuctwo $m$-digit numbers have been multiplied. Dividing $a$ by $b$ is the same as dividing a $2m$ digit integer by a $m$ digit integer. Digits below the 3513*ebfedea0SLionel Sambuc$m - 1$'th digit of $a$ will contribute at most a value of $1$ to the quotient because $\beta^k < b$ for any $0 \le k \le m - 1$. Another way to 3514*ebfedea0SLionel Sambucexpress this is by re-writing $a$ as two parts. If $a' \equiv a \mbox{ (mod }b^m\mbox{)}$ and $a'' = a - a'$ then 3515*ebfedea0SLionel Sambuc${a \over b} \equiv {{a' + a''} \over b}$ which is equivalent to ${a' \over b} + {a'' \over b}$. Since $a'$ is bound to be less than $b$ the quotient 3516*ebfedea0SLionel Sambucis bound by $0 \le {a' \over b} < 1$. 3517*ebfedea0SLionel Sambuc 3518*ebfedea0SLionel SambucSince the digits of $a'$ do not contribute much to the quotient the observation is that they might as well be zero. However, if the digits 3519*ebfedea0SLionel Sambuc``might as well be zero'' they might as well not be there in the first place. Let $q_0 = \lfloor a/\beta^{m-1} \rfloor$ represent the input 3520*ebfedea0SLionel Sambucwith the irrelevant digits trimmed. Now the modular reduction is trimmed to the almost equivalent equation 3521*ebfedea0SLionel Sambuc 3522*ebfedea0SLionel Sambuc\begin{equation} 3523*ebfedea0SLionel Sambucc = a - b \cdot \lfloor (q_0 \cdot \mu) / \beta^{m+1} \rfloor 3524*ebfedea0SLionel Sambuc\end{equation} 3525*ebfedea0SLionel Sambuc 3526*ebfedea0SLionel SambucNote that the original divisor $2^q$ has been replaced with $\beta^{m+1}$ where in this case $q$ is a multiple of $lg(\beta)$. Also note that the 3527*ebfedea0SLionel Sambucexponent on the divisor when added to the amount $q_0$ was shifted by equals $2m$. If the optimization had not been performed the divisor 3528*ebfedea0SLionel Sambucwould have the exponent $2m$ so in the end the exponents do ``add up''. Using the above equation the quotient 3529*ebfedea0SLionel Sambuc$\lfloor (q_0 \cdot \mu) / \beta^{m+1} \rfloor$ can be off from the true quotient by at most two. The original fixed point quotient can be off 3530*ebfedea0SLionel Sambucby as much as one (\textit{provided the radix point is chosen suitably}) and now that the lower irrelevent digits have been trimmed the quotient 3531*ebfedea0SLionel Sambuccan be off by an additional value of one for a total of at most two. This implies that 3532*ebfedea0SLionel Sambuc$0 \le a - b \cdot \lfloor (q_0 \cdot \mu) / \beta^{m+1} \rfloor < 3b$. By first subtracting $b$ times the quotient and then conditionally subtracting 3533*ebfedea0SLionel Sambuc$b$ once or twice the residue is found. 3534*ebfedea0SLionel Sambuc 3535*ebfedea0SLionel SambucThe quotient is now found using $(m + 1)(m) = m^2 + m$ single precision multiplications and the residue with an additional $m^2$ single 3536*ebfedea0SLionel Sambucprecision multiplications, ignoring the subtractions required. In total $2m^2 + m$ single precision multiplications are required to find the residue. 3537*ebfedea0SLionel SambucThis is considerably faster than the original attempt. 3538*ebfedea0SLionel Sambuc 3539*ebfedea0SLionel SambucFor example, let $\beta = 10$ represent the radix of the digits. Let $b = 9999$ represent the modulus which implies $m = 4$. Let $a = 99929878$ 3540*ebfedea0SLionel Sambucrepresent the value of which the residue is desired. In this case $q = 8$ since $10^7 < 9999^2$ meaning that $\mu = \lfloor \beta^{q}/b \rfloor = 10001$. 3541*ebfedea0SLionel SambucWith the new observation the multiplicand for the quotient is equal to $q_0 = \lfloor a / \beta^{m - 1} \rfloor = 99929$. The quotient is then 3542*ebfedea0SLionel Sambuc$\lfloor (q_0 \cdot \mu) / \beta^{m+1} \rfloor = 9993$. Subtracting $9993b$ from $a$ and the correct residue $a \equiv 9871 \mbox{ (mod }b\mbox{)}$ 3543*ebfedea0SLionel Sambucis found. 3544*ebfedea0SLionel Sambuc 3545*ebfedea0SLionel Sambuc\subsection{Trimming the Quotient} 3546*ebfedea0SLionel SambucSo far the reduction algorithm has been optimized from $3m^2$ single precision multiplications down to $2m^2 + m$ single precision multiplications. As 3547*ebfedea0SLionel Sambucit stands now the algorithm is already fairly fast compared to a full integer division algorithm. However, there is still room for 3548*ebfedea0SLionel Sambucoptimization. 3549*ebfedea0SLionel Sambuc 3550*ebfedea0SLionel SambucAfter the first multiplication inside the quotient ($q_0 \cdot \mu$) the value is shifted right by $m + 1$ places effectively nullifying the lower 3551*ebfedea0SLionel Sambuchalf of the product. It would be nice to be able to remove those digits from the product to effectively cut down the number of single precision 3552*ebfedea0SLionel Sambucmultiplications. If the number of digits in the modulus $m$ is far less than $\beta$ a full product is not required for the algorithm to work properly. 3553*ebfedea0SLionel SambucIn fact the lower $m - 2$ digits will not affect the upper half of the product at all and do not need to be computed. 3554*ebfedea0SLionel Sambuc 3555*ebfedea0SLionel SambucThe value of $\mu$ is a $m$-digit number and $q_0$ is a $m + 1$ digit number. Using a full multiplier $(m + 1)(m) = m^2 + m$ single precision 3556*ebfedea0SLionel Sambucmultiplications would be required. Using a multiplier that will only produce digits at and above the $m - 1$'th digit reduces the number 3557*ebfedea0SLionel Sambucof single precision multiplications to ${m^2 + m} \over 2$ single precision multiplications. 3558*ebfedea0SLionel Sambuc 3559*ebfedea0SLionel Sambuc\subsection{Trimming the Residue} 3560*ebfedea0SLionel SambucAfter the quotient has been calculated it is used to reduce the input. As previously noted the algorithm is not exact and it can be off by a small 3561*ebfedea0SLionel Sambucmultiple of the modulus, that is $0 \le a - b \cdot \lfloor (q_0 \cdot \mu) / \beta^{m+1} \rfloor < 3b$. If $b$ is $m$ digits than the 3562*ebfedea0SLionel Sambucresult of reduction equation is a value of at most $m + 1$ digits (\textit{provided $3 < \beta$}) implying that the upper $m - 1$ digits are 3563*ebfedea0SLionel Sambucimplicitly zero. 3564*ebfedea0SLionel Sambuc 3565*ebfedea0SLionel SambucThe next optimization arises from this very fact. Instead of computing $b \cdot \lfloor (q_0 \cdot \mu) / \beta^{m+1} \rfloor$ using a full 3566*ebfedea0SLionel Sambuc$O(m^2)$ multiplication algorithm only the lower $m+1$ digits of the product have to be computed. Similarly the value of $a$ can 3567*ebfedea0SLionel Sambucbe reduced modulo $\beta^{m+1}$ before the multiple of $b$ is subtracted which simplifes the subtraction as well. A multiplication that produces 3568*ebfedea0SLionel Sambuconly the lower $m+1$ digits requires ${m^2 + 3m - 2} \over 2$ single precision multiplications. 3569*ebfedea0SLionel Sambuc 3570*ebfedea0SLionel SambucWith both optimizations in place the algorithm is the algorithm Barrett proposed. It requires $m^2 + 2m - 1$ single precision multiplications which 3571*ebfedea0SLionel Sambucis considerably faster than the straightforward $3m^2$ method. 3572*ebfedea0SLionel Sambuc 3573*ebfedea0SLionel Sambuc\subsection{The Barrett Algorithm} 3574*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 3575*ebfedea0SLionel Sambuc\begin{small} 3576*ebfedea0SLionel Sambuc\begin{center} 3577*ebfedea0SLionel Sambuc\begin{tabular}{l} 3578*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_reduce}. \\ 3579*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$, mp\_int $b$ and $\mu = \lfloor \beta^{2m}/b \rfloor, m = \lceil lg_{\beta}(b) \rceil, (0 \le a < b^2, b > 1)$ \\ 3580*ebfedea0SLionel Sambuc\textbf{Output}. $a \mbox{ (mod }b\mbox{)}$ \\ 3581*ebfedea0SLionel Sambuc\hline \\ 3582*ebfedea0SLionel SambucLet $m$ represent the number of digits in $b$. \\ 3583*ebfedea0SLionel Sambuc1. Make a copy of $a$ and store it in $q$. (\textit{mp\_init\_copy}) \\ 3584*ebfedea0SLionel Sambuc2. $q \leftarrow \lfloor q / \beta^{m - 1} \rfloor$ (\textit{mp\_rshd}) \\ 3585*ebfedea0SLionel Sambuc\\ 3586*ebfedea0SLionel SambucProduce the quotient. \\ 3587*ebfedea0SLionel Sambuc3. $q \leftarrow q \cdot \mu$ (\textit{note: only produce digits at or above $m-1$}) \\ 3588*ebfedea0SLionel Sambuc4. $q \leftarrow \lfloor q / \beta^{m + 1} \rfloor$ \\ 3589*ebfedea0SLionel Sambuc\\ 3590*ebfedea0SLionel SambucSubtract the multiple of modulus from the input. \\ 3591*ebfedea0SLionel Sambuc5. $a \leftarrow a \mbox{ (mod }\beta^{m+1}\mbox{)}$ (\textit{mp\_mod\_2d}) \\ 3592*ebfedea0SLionel Sambuc6. $q \leftarrow q \cdot b \mbox{ (mod }\beta^{m+1}\mbox{)}$ (\textit{s\_mp\_mul\_digs}) \\ 3593*ebfedea0SLionel Sambuc7. $a \leftarrow a - q$ (\textit{mp\_sub}) \\ 3594*ebfedea0SLionel Sambuc\\ 3595*ebfedea0SLionel SambucAdd $\beta^{m+1}$ if a carry occured. \\ 3596*ebfedea0SLionel Sambuc8. If $a < 0$ then (\textit{mp\_cmp\_d}) \\ 3597*ebfedea0SLionel Sambuc\hspace{3mm}8.1 $q \leftarrow 1$ (\textit{mp\_set}) \\ 3598*ebfedea0SLionel Sambuc\hspace{3mm}8.2 $q \leftarrow q \cdot \beta^{m+1}$ (\textit{mp\_lshd}) \\ 3599*ebfedea0SLionel Sambuc\hspace{3mm}8.3 $a \leftarrow a + q$ \\ 3600*ebfedea0SLionel Sambuc\\ 3601*ebfedea0SLionel SambucNow subtract the modulus if the residue is too large (e.g. quotient too small). \\ 3602*ebfedea0SLionel Sambuc9. While $a \ge b$ do (\textit{mp\_cmp}) \\ 3603*ebfedea0SLionel Sambuc\hspace{3mm}9.1 $c \leftarrow a - b$ \\ 3604*ebfedea0SLionel Sambuc10. Clear $q$. \\ 3605*ebfedea0SLionel Sambuc11. Return(\textit{MP\_OKAY}) \\ 3606*ebfedea0SLionel Sambuc\hline 3607*ebfedea0SLionel Sambuc\end{tabular} 3608*ebfedea0SLionel Sambuc\end{center} 3609*ebfedea0SLionel Sambuc\end{small} 3610*ebfedea0SLionel Sambuc\caption{Algorithm mp\_reduce} 3611*ebfedea0SLionel Sambuc\end{figure} 3612*ebfedea0SLionel Sambuc 3613*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_reduce.} 3614*ebfedea0SLionel SambucThis algorithm will reduce the input $a$ modulo $b$ in place using the Barrett algorithm. It is loosely based on algorithm 14.42 of HAC 3615*ebfedea0SLionel Sambuc\cite[pp. 602]{HAC} which is based on the paper from Paul Barrett \cite{BARRETT}. The algorithm has several restrictions and assumptions which must 3616*ebfedea0SLionel Sambucbe adhered to for the algorithm to work. 3617*ebfedea0SLionel Sambuc 3618*ebfedea0SLionel SambucFirst the modulus $b$ is assumed to be positive and greater than one. If the modulus were less than or equal to one than subtracting 3619*ebfedea0SLionel Sambuca multiple of it would either accomplish nothing or actually enlarge the input. The input $a$ must be in the range $0 \le a < b^2$ in order 3620*ebfedea0SLionel Sambucfor the quotient to have enough precision. If $a$ is the product of two numbers that were already reduced modulo $b$, this will not be a problem. 3621*ebfedea0SLionel SambucTechnically the algorithm will still work if $a \ge b^2$ but it will take much longer to finish. The value of $\mu$ is passed as an argument to this 3622*ebfedea0SLionel Sambucalgorithm and is assumed to be calculated and stored before the algorithm is used. 3623*ebfedea0SLionel Sambuc 3624*ebfedea0SLionel SambucRecall that the multiplication for the quotient on step 3 must only produce digits at or above the $m-1$'th position. An algorithm called 3625*ebfedea0SLionel Sambuc$s\_mp\_mul\_high\_digs$ which has not been presented is used to accomplish this task. The algorithm is based on $s\_mp\_mul\_digs$ except that 3626*ebfedea0SLionel Sambucinstead of stopping at a given level of precision it starts at a given level of precision. This optimal algorithm can only be used if the number 3627*ebfedea0SLionel Sambucof digits in $b$ is very much smaller than $\beta$. 3628*ebfedea0SLionel Sambuc 3629*ebfedea0SLionel SambucWhile it is known that 3630*ebfedea0SLionel Sambuc$a \ge b \cdot \lfloor (q_0 \cdot \mu) / \beta^{m+1} \rfloor$ only the lower $m+1$ digits are being used to compute the residue, so an implied 3631*ebfedea0SLionel Sambuc``borrow'' from the higher digits might leave a negative result. After the multiple of the modulus has been subtracted from $a$ the residue must be 3632*ebfedea0SLionel Sambucfixed up in case it is negative. The invariant $\beta^{m+1}$ must be added to the residue to make it positive again. 3633*ebfedea0SLionel Sambuc 3634*ebfedea0SLionel SambucThe while loop at step 9 will subtract $b$ until the residue is less than $b$. If the algorithm is performed correctly this step is 3635*ebfedea0SLionel Sambucperformed at most twice, and on average once. However, if $a \ge b^2$ than it will iterate substantially more times than it should. 3636*ebfedea0SLionel Sambuc 3637*ebfedea0SLionel SambucEXAM,bn_mp_reduce.c 3638*ebfedea0SLionel Sambuc 3639*ebfedea0SLionel SambucThe first multiplication that determines the quotient can be performed by only producing the digits from $m - 1$ and up. This essentially halves 3640*ebfedea0SLionel Sambucthe number of single precision multiplications required. However, the optimization is only safe if $\beta$ is much larger than the number of digits 3641*ebfedea0SLionel Sambucin the modulus. In the source code this is evaluated on lines @36,if@ to @44,}@ where algorithm s\_mp\_mul\_high\_digs is used when it is 3642*ebfedea0SLionel Sambucsafe to do so. 3643*ebfedea0SLionel Sambuc 3644*ebfedea0SLionel Sambuc\subsection{The Barrett Setup Algorithm} 3645*ebfedea0SLionel SambucIn order to use algorithm mp\_reduce the value of $\mu$ must be calculated in advance. Ideally this value should be computed once and stored for 3646*ebfedea0SLionel Sambucfuture use so that the Barrett algorithm can be used without delay. 3647*ebfedea0SLionel Sambuc 3648*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 3649*ebfedea0SLionel Sambuc\begin{small} 3650*ebfedea0SLionel Sambuc\begin{center} 3651*ebfedea0SLionel Sambuc\begin{tabular}{l} 3652*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_reduce\_setup}. \\ 3653*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ ($a > 1$) \\ 3654*ebfedea0SLionel Sambuc\textbf{Output}. $\mu \leftarrow \lfloor \beta^{2m}/a \rfloor$ \\ 3655*ebfedea0SLionel Sambuc\hline \\ 3656*ebfedea0SLionel Sambuc1. $\mu \leftarrow 2^{2 \cdot lg(\beta) \cdot m}$ (\textit{mp\_2expt}) \\ 3657*ebfedea0SLionel Sambuc2. $\mu \leftarrow \lfloor \mu / b \rfloor$ (\textit{mp\_div}) \\ 3658*ebfedea0SLionel Sambuc3. Return(\textit{MP\_OKAY}) \\ 3659*ebfedea0SLionel Sambuc\hline 3660*ebfedea0SLionel Sambuc\end{tabular} 3661*ebfedea0SLionel Sambuc\end{center} 3662*ebfedea0SLionel Sambuc\end{small} 3663*ebfedea0SLionel Sambuc\caption{Algorithm mp\_reduce\_setup} 3664*ebfedea0SLionel Sambuc\end{figure} 3665*ebfedea0SLionel Sambuc 3666*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_reduce\_setup.} 3667*ebfedea0SLionel SambucThis algorithm computes the reciprocal $\mu$ required for Barrett reduction. First $\beta^{2m}$ is calculated as $2^{2 \cdot lg(\beta) \cdot m}$ which 3668*ebfedea0SLionel Sambucis equivalent and much faster. The final value is computed by taking the integer quotient of $\lfloor \mu / b \rfloor$. 3669*ebfedea0SLionel Sambuc 3670*ebfedea0SLionel SambucEXAM,bn_mp_reduce_setup.c 3671*ebfedea0SLionel Sambuc 3672*ebfedea0SLionel SambucThis simple routine calculates the reciprocal $\mu$ required by Barrett reduction. Note the extended usage of algorithm mp\_div where the variable 3673*ebfedea0SLionel Sambucwhich would received the remainder is passed as NULL. As will be discussed in~\ref{sec:division} the division routine allows both the quotient and the 3674*ebfedea0SLionel Sambucremainder to be passed as NULL meaning to ignore the value. 3675*ebfedea0SLionel Sambuc 3676*ebfedea0SLionel Sambuc\section{The Montgomery Reduction} 3677*ebfedea0SLionel SambucMontgomery reduction\footnote{Thanks to Niels Ferguson for his insightful explanation of the algorithm.} \cite{MONT} is by far the most interesting 3678*ebfedea0SLionel Sambucform of reduction in common use. It computes a modular residue which is not actually equal to the residue of the input yet instead equal to a 3679*ebfedea0SLionel Sambucresidue times a constant. However, as perplexing as this may sound the algorithm is relatively simple and very efficient. 3680*ebfedea0SLionel Sambuc 3681*ebfedea0SLionel SambucThroughout this entire section the variable $n$ will represent the modulus used to form the residue. As will be discussed shortly the value of 3682*ebfedea0SLionel Sambuc$n$ must be odd. The variable $x$ will represent the quantity of which the residue is sought. Similar to the Barrett algorithm the input 3683*ebfedea0SLionel Sambucis restricted to $0 \le x < n^2$. To begin the description some simple number theory facts must be established. 3684*ebfedea0SLionel Sambuc 3685*ebfedea0SLionel Sambuc\textbf{Fact 1.} Adding $n$ to $x$ does not change the residue since in effect it adds one to the quotient $\lfloor x / n \rfloor$. Another way 3686*ebfedea0SLionel Sambucto explain this is that $n$ is (\textit{or multiples of $n$ are}) congruent to zero modulo $n$. Adding zero will not change the value of the residue. 3687*ebfedea0SLionel Sambuc 3688*ebfedea0SLionel Sambuc\textbf{Fact 2.} If $x$ is even then performing a division by two in $\Z$ is congruent to $x \cdot 2^{-1} \mbox{ (mod }n\mbox{)}$. Actually 3689*ebfedea0SLionel Sambucthis is an application of the fact that if $x$ is evenly divisible by any $k \in \Z$ then division in $\Z$ will be congruent to 3690*ebfedea0SLionel Sambucmultiplication by $k^{-1}$ modulo $n$. 3691*ebfedea0SLionel Sambuc 3692*ebfedea0SLionel SambucFrom these two simple facts the following simple algorithm can be derived. 3693*ebfedea0SLionel Sambuc 3694*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 3695*ebfedea0SLionel Sambuc\begin{small} 3696*ebfedea0SLionel Sambuc\begin{center} 3697*ebfedea0SLionel Sambuc\begin{tabular}{l} 3698*ebfedea0SLionel Sambuc\hline Algorithm \textbf{Montgomery Reduction}. \\ 3699*ebfedea0SLionel Sambuc\textbf{Input}. Integer $x$, $n$ and $k$ \\ 3700*ebfedea0SLionel Sambuc\textbf{Output}. $2^{-k}x \mbox{ (mod }n\mbox{)}$ \\ 3701*ebfedea0SLionel Sambuc\hline \\ 3702*ebfedea0SLionel Sambuc1. for $t$ from $1$ to $k$ do \\ 3703*ebfedea0SLionel Sambuc\hspace{3mm}1.1 If $x$ is odd then \\ 3704*ebfedea0SLionel Sambuc\hspace{6mm}1.1.1 $x \leftarrow x + n$ \\ 3705*ebfedea0SLionel Sambuc\hspace{3mm}1.2 $x \leftarrow x/2$ \\ 3706*ebfedea0SLionel Sambuc2. Return $x$. \\ 3707*ebfedea0SLionel Sambuc\hline 3708*ebfedea0SLionel Sambuc\end{tabular} 3709*ebfedea0SLionel Sambuc\end{center} 3710*ebfedea0SLionel Sambuc\end{small} 3711*ebfedea0SLionel Sambuc\caption{Algorithm Montgomery Reduction} 3712*ebfedea0SLionel Sambuc\end{figure} 3713*ebfedea0SLionel Sambuc 3714*ebfedea0SLionel SambucThe algorithm reduces the input one bit at a time using the two congruencies stated previously. Inside the loop $n$, which is odd, is 3715*ebfedea0SLionel Sambucadded to $x$ if $x$ is odd. This forces $x$ to be even which allows the division by two in $\Z$ to be congruent to a modular division by two. Since 3716*ebfedea0SLionel Sambuc$x$ is assumed to be initially much larger than $n$ the addition of $n$ will contribute an insignificant magnitude to $x$. Let $r$ represent the 3717*ebfedea0SLionel Sambucfinal result of the Montgomery algorithm. If $k > lg(n)$ and $0 \le x < n^2$ then the final result is limited to 3718*ebfedea0SLionel Sambuc$0 \le r < \lfloor x/2^k \rfloor + n$. As a result at most a single subtraction is required to get the residue desired. 3719*ebfedea0SLionel Sambuc 3720*ebfedea0SLionel Sambuc\begin{figure}[here] 3721*ebfedea0SLionel Sambuc\begin{small} 3722*ebfedea0SLionel Sambuc\begin{center} 3723*ebfedea0SLionel Sambuc\begin{tabular}{|c|l|} 3724*ebfedea0SLionel Sambuc\hline \textbf{Step number ($t$)} & \textbf{Result ($x$)} \\ 3725*ebfedea0SLionel Sambuc\hline $1$ & $x + n = 5812$, $x/2 = 2906$ \\ 3726*ebfedea0SLionel Sambuc\hline $2$ & $x/2 = 1453$ \\ 3727*ebfedea0SLionel Sambuc\hline $3$ & $x + n = 1710$, $x/2 = 855$ \\ 3728*ebfedea0SLionel Sambuc\hline $4$ & $x + n = 1112$, $x/2 = 556$ \\ 3729*ebfedea0SLionel Sambuc\hline $5$ & $x/2 = 278$ \\ 3730*ebfedea0SLionel Sambuc\hline $6$ & $x/2 = 139$ \\ 3731*ebfedea0SLionel Sambuc\hline $7$ & $x + n = 396$, $x/2 = 198$ \\ 3732*ebfedea0SLionel Sambuc\hline $8$ & $x/2 = 99$ \\ 3733*ebfedea0SLionel Sambuc\hline $9$ & $x + n = 356$, $x/2 = 178$ \\ 3734*ebfedea0SLionel Sambuc\hline 3735*ebfedea0SLionel Sambuc\end{tabular} 3736*ebfedea0SLionel Sambuc\end{center} 3737*ebfedea0SLionel Sambuc\end{small} 3738*ebfedea0SLionel Sambuc\caption{Example of Montgomery Reduction (I)} 3739*ebfedea0SLionel Sambuc\label{fig:MONT1} 3740*ebfedea0SLionel Sambuc\end{figure} 3741*ebfedea0SLionel Sambuc 3742*ebfedea0SLionel SambucConsider the example in figure~\ref{fig:MONT1} which reduces $x = 5555$ modulo $n = 257$ when $k = 9$ (note $\beta^k = 512$ which is larger than $n$). The result of 3743*ebfedea0SLionel Sambucthe algorithm $r = 178$ is congruent to the value of $2^{-9} \cdot 5555 \mbox{ (mod }257\mbox{)}$. When $r$ is multiplied by $2^9$ modulo $257$ the correct residue 3744*ebfedea0SLionel Sambuc$r \equiv 158$ is produced. 3745*ebfedea0SLionel Sambuc 3746*ebfedea0SLionel SambucLet $k = \lfloor lg(n) \rfloor + 1$ represent the number of bits in $n$. The current algorithm requires $2k^2$ single precision shifts 3747*ebfedea0SLionel Sambucand $k^2$ single precision additions. At this rate the algorithm is most certainly slower than Barrett reduction and not terribly useful. 3748*ebfedea0SLionel SambucFortunately there exists an alternative representation of the algorithm. 3749*ebfedea0SLionel Sambuc 3750*ebfedea0SLionel Sambuc\begin{figure}[!here] 3751*ebfedea0SLionel Sambuc\begin{small} 3752*ebfedea0SLionel Sambuc\begin{center} 3753*ebfedea0SLionel Sambuc\begin{tabular}{l} 3754*ebfedea0SLionel Sambuc\hline Algorithm \textbf{Montgomery Reduction} (modified I). \\ 3755*ebfedea0SLionel Sambuc\textbf{Input}. Integer $x$, $n$ and $k$ ($2^k > n$) \\ 3756*ebfedea0SLionel Sambuc\textbf{Output}. $2^{-k}x \mbox{ (mod }n\mbox{)}$ \\ 3757*ebfedea0SLionel Sambuc\hline \\ 3758*ebfedea0SLionel Sambuc1. for $t$ from $1$ to $k$ do \\ 3759*ebfedea0SLionel Sambuc\hspace{3mm}1.1 If the $t$'th bit of $x$ is one then \\ 3760*ebfedea0SLionel Sambuc\hspace{6mm}1.1.1 $x \leftarrow x + 2^tn$ \\ 3761*ebfedea0SLionel Sambuc2. Return $x/2^k$. \\ 3762*ebfedea0SLionel Sambuc\hline 3763*ebfedea0SLionel Sambuc\end{tabular} 3764*ebfedea0SLionel Sambuc\end{center} 3765*ebfedea0SLionel Sambuc\end{small} 3766*ebfedea0SLionel Sambuc\caption{Algorithm Montgomery Reduction (modified I)} 3767*ebfedea0SLionel Sambuc\end{figure} 3768*ebfedea0SLionel Sambuc 3769*ebfedea0SLionel SambucThis algorithm is equivalent since $2^tn$ is a multiple of $n$ and the lower $k$ bits of $x$ are zero by step 2. The number of single 3770*ebfedea0SLionel Sambucprecision shifts has now been reduced from $2k^2$ to $k^2 + k$ which is only a small improvement. 3771*ebfedea0SLionel Sambuc 3772*ebfedea0SLionel Sambuc\begin{figure}[here] 3773*ebfedea0SLionel Sambuc\begin{small} 3774*ebfedea0SLionel Sambuc\begin{center} 3775*ebfedea0SLionel Sambuc\begin{tabular}{|c|l|r|} 3776*ebfedea0SLionel Sambuc\hline \textbf{Step number ($t$)} & \textbf{Result ($x$)} & \textbf{Result ($x$) in Binary} \\ 3777*ebfedea0SLionel Sambuc\hline -- & $5555$ & $1010110110011$ \\ 3778*ebfedea0SLionel Sambuc\hline $1$ & $x + 2^{0}n = 5812$ & $1011010110100$ \\ 3779*ebfedea0SLionel Sambuc\hline $2$ & $5812$ & $1011010110100$ \\ 3780*ebfedea0SLionel Sambuc\hline $3$ & $x + 2^{2}n = 6840$ & $1101010111000$ \\ 3781*ebfedea0SLionel Sambuc\hline $4$ & $x + 2^{3}n = 8896$ & $10001011000000$ \\ 3782*ebfedea0SLionel Sambuc\hline $5$ & $8896$ & $10001011000000$ \\ 3783*ebfedea0SLionel Sambuc\hline $6$ & $8896$ & $10001011000000$ \\ 3784*ebfedea0SLionel Sambuc\hline $7$ & $x + 2^{6}n = 25344$ & $110001100000000$ \\ 3785*ebfedea0SLionel Sambuc\hline $8$ & $25344$ & $110001100000000$ \\ 3786*ebfedea0SLionel Sambuc\hline $9$ & $x + 2^{7}n = 91136$ & $10110010000000000$ \\ 3787*ebfedea0SLionel Sambuc\hline -- & $x/2^k = 178$ & \\ 3788*ebfedea0SLionel Sambuc\hline 3789*ebfedea0SLionel Sambuc\end{tabular} 3790*ebfedea0SLionel Sambuc\end{center} 3791*ebfedea0SLionel Sambuc\end{small} 3792*ebfedea0SLionel Sambuc\caption{Example of Montgomery Reduction (II)} 3793*ebfedea0SLionel Sambuc\label{fig:MONT2} 3794*ebfedea0SLionel Sambuc\end{figure} 3795*ebfedea0SLionel Sambuc 3796*ebfedea0SLionel SambucFigure~\ref{fig:MONT2} demonstrates the modified algorithm reducing $x = 5555$ modulo $n = 257$ with $k = 9$. 3797*ebfedea0SLionel SambucWith this algorithm a single shift right at the end is the only right shift required to reduce the input instead of $k$ right shifts inside the 3798*ebfedea0SLionel Sambucloop. Note that for the iterations $t = 2, 5, 6$ and $8$ where the result $x$ is not changed. In those iterations the $t$'th bit of $x$ is 3799*ebfedea0SLionel Sambuczero and the appropriate multiple of $n$ does not need to be added to force the $t$'th bit of the result to zero. 3800*ebfedea0SLionel Sambuc 3801*ebfedea0SLionel Sambuc\subsection{Digit Based Montgomery Reduction} 3802*ebfedea0SLionel SambucInstead of computing the reduction on a bit-by-bit basis it is actually much faster to compute it on digit-by-digit basis. Consider the 3803*ebfedea0SLionel Sambucprevious algorithm re-written to compute the Montgomery reduction in this new fashion. 3804*ebfedea0SLionel Sambuc 3805*ebfedea0SLionel Sambuc\begin{figure}[!here] 3806*ebfedea0SLionel Sambuc\begin{small} 3807*ebfedea0SLionel Sambuc\begin{center} 3808*ebfedea0SLionel Sambuc\begin{tabular}{l} 3809*ebfedea0SLionel Sambuc\hline Algorithm \textbf{Montgomery Reduction} (modified II). \\ 3810*ebfedea0SLionel Sambuc\textbf{Input}. Integer $x$, $n$ and $k$ ($\beta^k > n$) \\ 3811*ebfedea0SLionel Sambuc\textbf{Output}. $\beta^{-k}x \mbox{ (mod }n\mbox{)}$ \\ 3812*ebfedea0SLionel Sambuc\hline \\ 3813*ebfedea0SLionel Sambuc1. for $t$ from $0$ to $k - 1$ do \\ 3814*ebfedea0SLionel Sambuc\hspace{3mm}1.1 $x \leftarrow x + \mu n \beta^t$ \\ 3815*ebfedea0SLionel Sambuc2. Return $x/\beta^k$. \\ 3816*ebfedea0SLionel Sambuc\hline 3817*ebfedea0SLionel Sambuc\end{tabular} 3818*ebfedea0SLionel Sambuc\end{center} 3819*ebfedea0SLionel Sambuc\end{small} 3820*ebfedea0SLionel Sambuc\caption{Algorithm Montgomery Reduction (modified II)} 3821*ebfedea0SLionel Sambuc\end{figure} 3822*ebfedea0SLionel Sambuc 3823*ebfedea0SLionel SambucThe value $\mu n \beta^t$ is a multiple of the modulus $n$ meaning that it will not change the residue. If the first digit of 3824*ebfedea0SLionel Sambucthe value $\mu n \beta^t$ equals the negative (modulo $\beta$) of the $t$'th digit of $x$ then the addition will result in a zero digit. This 3825*ebfedea0SLionel Sambucproblem breaks down to solving the following congruency. 3826*ebfedea0SLionel Sambuc 3827*ebfedea0SLionel Sambuc\begin{center} 3828*ebfedea0SLionel Sambuc\begin{tabular}{rcl} 3829*ebfedea0SLionel Sambuc$x_t + \mu n_0$ & $\equiv$ & $0 \mbox{ (mod }\beta\mbox{)}$ \\ 3830*ebfedea0SLionel Sambuc$\mu n_0$ & $\equiv$ & $-x_t \mbox{ (mod }\beta\mbox{)}$ \\ 3831*ebfedea0SLionel Sambuc$\mu$ & $\equiv$ & $-x_t/n_0 \mbox{ (mod }\beta\mbox{)}$ \\ 3832*ebfedea0SLionel Sambuc\end{tabular} 3833*ebfedea0SLionel Sambuc\end{center} 3834*ebfedea0SLionel Sambuc 3835*ebfedea0SLionel SambucIn each iteration of the loop on step 1 a new value of $\mu$ must be calculated. The value of $-1/n_0 \mbox{ (mod }\beta\mbox{)}$ is used 3836*ebfedea0SLionel Sambucextensively in this algorithm and should be precomputed. Let $\rho$ represent the negative of the modular inverse of $n_0$ modulo $\beta$. 3837*ebfedea0SLionel Sambuc 3838*ebfedea0SLionel SambucFor example, let $\beta = 10$ represent the radix. Let $n = 17$ represent the modulus which implies $k = 2$ and $\rho \equiv 7$. Let $x = 33$ 3839*ebfedea0SLionel Sambucrepresent the value to reduce. 3840*ebfedea0SLionel Sambuc 3841*ebfedea0SLionel Sambuc\newpage\begin{figure} 3842*ebfedea0SLionel Sambuc\begin{center} 3843*ebfedea0SLionel Sambuc\begin{tabular}{|c|c|c|} 3844*ebfedea0SLionel Sambuc\hline \textbf{Step ($t$)} & \textbf{Value of $x$} & \textbf{Value of $\mu$} \\ 3845*ebfedea0SLionel Sambuc\hline -- & $33$ & --\\ 3846*ebfedea0SLionel Sambuc\hline $0$ & $33 + \mu n = 50$ & $1$ \\ 3847*ebfedea0SLionel Sambuc\hline $1$ & $50 + \mu n \beta = 900$ & $5$ \\ 3848*ebfedea0SLionel Sambuc\hline 3849*ebfedea0SLionel Sambuc\end{tabular} 3850*ebfedea0SLionel Sambuc\end{center} 3851*ebfedea0SLionel Sambuc\caption{Example of Montgomery Reduction} 3852*ebfedea0SLionel Sambuc\end{figure} 3853*ebfedea0SLionel Sambuc 3854*ebfedea0SLionel SambucThe final result $900$ is then divided by $\beta^k$ to produce the final result $9$. The first observation is that $9 \nequiv x \mbox{ (mod }n\mbox{)}$ 3855*ebfedea0SLionel Sambucwhich implies the result is not the modular residue of $x$ modulo $n$. However, recall that the residue is actually multiplied by $\beta^{-k}$ in 3856*ebfedea0SLionel Sambucthe algorithm. To get the true residue the value must be multiplied by $\beta^k$. In this case $\beta^k \equiv 15 \mbox{ (mod }n\mbox{)}$ and 3857*ebfedea0SLionel Sambucthe correct residue is $9 \cdot 15 \equiv 16 \mbox{ (mod }n\mbox{)}$. 3858*ebfedea0SLionel Sambuc 3859*ebfedea0SLionel Sambuc\subsection{Baseline Montgomery Reduction} 3860*ebfedea0SLionel SambucThe baseline Montgomery reduction algorithm will produce the residue for any size input. It is designed to be a catch-all algororithm for 3861*ebfedea0SLionel SambucMontgomery reductions. 3862*ebfedea0SLionel Sambuc 3863*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 3864*ebfedea0SLionel Sambuc\begin{small} 3865*ebfedea0SLionel Sambuc\begin{center} 3866*ebfedea0SLionel Sambuc\begin{tabular}{l} 3867*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_montgomery\_reduce}. \\ 3868*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $x$, mp\_int $n$ and a digit $\rho \equiv -1/n_0 \mbox{ (mod }n\mbox{)}$. \\ 3869*ebfedea0SLionel Sambuc\hspace{11.5mm}($0 \le x < n^2, n > 1, (n, \beta) = 1, \beta^k > n$) \\ 3870*ebfedea0SLionel Sambuc\textbf{Output}. $\beta^{-k}x \mbox{ (mod }n\mbox{)}$ \\ 3871*ebfedea0SLionel Sambuc\hline \\ 3872*ebfedea0SLionel Sambuc1. $digs \leftarrow 2n.used + 1$ \\ 3873*ebfedea0SLionel Sambuc2. If $digs < MP\_ARRAY$ and $m.used < \delta$ then \\ 3874*ebfedea0SLionel Sambuc\hspace{3mm}2.1 Use algorithm fast\_mp\_montgomery\_reduce instead. \\ 3875*ebfedea0SLionel Sambuc\\ 3876*ebfedea0SLionel SambucSetup $x$ for the reduction. \\ 3877*ebfedea0SLionel Sambuc3. If $x.alloc < digs$ then grow $x$ to $digs$ digits. \\ 3878*ebfedea0SLionel Sambuc4. $x.used \leftarrow digs$ \\ 3879*ebfedea0SLionel Sambuc\\ 3880*ebfedea0SLionel SambucEliminate the lower $k$ digits. \\ 3881*ebfedea0SLionel Sambuc5. For $ix$ from $0$ to $k - 1$ do \\ 3882*ebfedea0SLionel Sambuc\hspace{3mm}5.1 $\mu \leftarrow x_{ix} \cdot \rho \mbox{ (mod }\beta\mbox{)}$ \\ 3883*ebfedea0SLionel Sambuc\hspace{3mm}5.2 $u \leftarrow 0$ \\ 3884*ebfedea0SLionel Sambuc\hspace{3mm}5.3 For $iy$ from $0$ to $k - 1$ do \\ 3885*ebfedea0SLionel Sambuc\hspace{6mm}5.3.1 $\hat r \leftarrow \mu n_{iy} + x_{ix + iy} + u$ \\ 3886*ebfedea0SLionel Sambuc\hspace{6mm}5.3.2 $x_{ix + iy} \leftarrow \hat r \mbox{ (mod }\beta\mbox{)}$ \\ 3887*ebfedea0SLionel Sambuc\hspace{6mm}5.3.3 $u \leftarrow \lfloor \hat r / \beta \rfloor$ \\ 3888*ebfedea0SLionel Sambuc\hspace{3mm}5.4 While $u > 0$ do \\ 3889*ebfedea0SLionel Sambuc\hspace{6mm}5.4.1 $iy \leftarrow iy + 1$ \\ 3890*ebfedea0SLionel Sambuc\hspace{6mm}5.4.2 $x_{ix + iy} \leftarrow x_{ix + iy} + u$ \\ 3891*ebfedea0SLionel Sambuc\hspace{6mm}5.4.3 $u \leftarrow \lfloor x_{ix+iy} / \beta \rfloor$ \\ 3892*ebfedea0SLionel Sambuc\hspace{6mm}5.4.4 $x_{ix + iy} \leftarrow x_{ix+iy} \mbox{ (mod }\beta\mbox{)}$ \\ 3893*ebfedea0SLionel Sambuc\\ 3894*ebfedea0SLionel SambucDivide by $\beta^k$ and fix up as required. \\ 3895*ebfedea0SLionel Sambuc6. $x \leftarrow \lfloor x / \beta^k \rfloor$ \\ 3896*ebfedea0SLionel Sambuc7. If $x \ge n$ then \\ 3897*ebfedea0SLionel Sambuc\hspace{3mm}7.1 $x \leftarrow x - n$ \\ 3898*ebfedea0SLionel Sambuc8. Return(\textit{MP\_OKAY}). \\ 3899*ebfedea0SLionel Sambuc\hline 3900*ebfedea0SLionel Sambuc\end{tabular} 3901*ebfedea0SLionel Sambuc\end{center} 3902*ebfedea0SLionel Sambuc\end{small} 3903*ebfedea0SLionel Sambuc\caption{Algorithm mp\_montgomery\_reduce} 3904*ebfedea0SLionel Sambuc\end{figure} 3905*ebfedea0SLionel Sambuc 3906*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_montgomery\_reduce.} 3907*ebfedea0SLionel SambucThis algorithm reduces the input $x$ modulo $n$ in place using the Montgomery reduction algorithm. The algorithm is loosely based 3908*ebfedea0SLionel Sambucon algorithm 14.32 of \cite[pp.601]{HAC} except it merges the multiplication of $\mu n \beta^t$ with the addition in the inner loop. The 3909*ebfedea0SLionel Sambucrestrictions on this algorithm are fairly easy to adapt to. First $0 \le x < n^2$ bounds the input to numbers in the same range as 3910*ebfedea0SLionel Sambucfor the Barrett algorithm. Additionally if $n > 1$ and $n$ is odd there will exist a modular inverse $\rho$. $\rho$ must be calculated in 3911*ebfedea0SLionel Sambucadvance of this algorithm. Finally the variable $k$ is fixed and a pseudonym for $n.used$. 3912*ebfedea0SLionel Sambuc 3913*ebfedea0SLionel SambucStep 2 decides whether a faster Montgomery algorithm can be used. It is based on the Comba technique meaning that there are limits on 3914*ebfedea0SLionel Sambucthe size of the input. This algorithm is discussed in ~COMBARED~. 3915*ebfedea0SLionel Sambuc 3916*ebfedea0SLionel SambucStep 5 is the main reduction loop of the algorithm. The value of $\mu$ is calculated once per iteration in the outer loop. The inner loop 3917*ebfedea0SLionel Sambuccalculates $x + \mu n \beta^{ix}$ by multiplying $\mu n$ and adding the result to $x$ shifted by $ix$ digits. Both the addition and 3918*ebfedea0SLionel Sambucmultiplication are performed in the same loop to save time and memory. Step 5.4 will handle any additional carries that escape the inner loop. 3919*ebfedea0SLionel Sambuc 3920*ebfedea0SLionel SambucUsing a quick inspection this algorithm requires $n$ single precision multiplications for the outer loop and $n^2$ single precision multiplications 3921*ebfedea0SLionel Sambucin the inner loop. In total $n^2 + n$ single precision multiplications which compares favourably to Barrett at $n^2 + 2n - 1$ single precision 3922*ebfedea0SLionel Sambucmultiplications. 3923*ebfedea0SLionel Sambuc 3924*ebfedea0SLionel SambucEXAM,bn_mp_montgomery_reduce.c 3925*ebfedea0SLionel Sambuc 3926*ebfedea0SLionel SambucThis is the baseline implementation of the Montgomery reduction algorithm. Lines @30,digs@ to @35,}@ determine if the Comba based 3927*ebfedea0SLionel Sambucroutine can be used instead. Line @47,mu@ computes the value of $\mu$ for that particular iteration of the outer loop. 3928*ebfedea0SLionel Sambuc 3929*ebfedea0SLionel SambucThe multiplication $\mu n \beta^{ix}$ is performed in one step in the inner loop. The alias $tmpx$ refers to the $ix$'th digit of $x$ and 3930*ebfedea0SLionel Sambucthe alias $tmpn$ refers to the modulus $n$. 3931*ebfedea0SLionel Sambuc 3932*ebfedea0SLionel Sambuc\subsection{Faster ``Comba'' Montgomery Reduction} 3933*ebfedea0SLionel SambucMARK,COMBARED 3934*ebfedea0SLionel Sambuc 3935*ebfedea0SLionel SambucThe Montgomery reduction requires fewer single precision multiplications than a Barrett reduction, however it is much slower due to the serial 3936*ebfedea0SLionel Sambucnature of the inner loop. The Barrett reduction algorithm requires two slightly modified multipliers which can be implemented with the Comba 3937*ebfedea0SLionel Sambuctechnique. The Montgomery reduction algorithm cannot directly use the Comba technique to any significant advantage since the inner loop calculates 3938*ebfedea0SLionel Sambuca $k \times 1$ product $k$ times. 3939*ebfedea0SLionel Sambuc 3940*ebfedea0SLionel SambucThe biggest obstacle is that at the $ix$'th iteration of the outer loop the value of $x_{ix}$ is required to calculate $\mu$. This means the 3941*ebfedea0SLionel Sambuccarries from $0$ to $ix - 1$ must have been propagated upwards to form a valid $ix$'th digit. The solution as it turns out is very simple. 3942*ebfedea0SLionel SambucPerform a Comba like multiplier and inside the outer loop just after the inner loop fix up the $ix + 1$'th digit by forwarding the carry. 3943*ebfedea0SLionel Sambuc 3944*ebfedea0SLionel SambucWith this change in place the Montgomery reduction algorithm can be performed with a Comba style multiplication loop which substantially increases 3945*ebfedea0SLionel Sambucthe speed of the algorithm. 3946*ebfedea0SLionel Sambuc 3947*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 3948*ebfedea0SLionel Sambuc\begin{small} 3949*ebfedea0SLionel Sambuc\begin{center} 3950*ebfedea0SLionel Sambuc\begin{tabular}{l} 3951*ebfedea0SLionel Sambuc\hline Algorithm \textbf{fast\_mp\_montgomery\_reduce}. \\ 3952*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $x$, mp\_int $n$ and a digit $\rho \equiv -1/n_0 \mbox{ (mod }n\mbox{)}$. \\ 3953*ebfedea0SLionel Sambuc\hspace{11.5mm}($0 \le x < n^2, n > 1, (n, \beta) = 1, \beta^k > n$) \\ 3954*ebfedea0SLionel Sambuc\textbf{Output}. $\beta^{-k}x \mbox{ (mod }n\mbox{)}$ \\ 3955*ebfedea0SLionel Sambuc\hline \\ 3956*ebfedea0SLionel SambucPlace an array of \textbf{MP\_WARRAY} mp\_word variables called $\hat W$ on the stack. \\ 3957*ebfedea0SLionel Sambuc1. if $x.alloc < n.used + 1$ then grow $x$ to $n.used + 1$ digits. \\ 3958*ebfedea0SLionel SambucCopy the digits of $x$ into the array $\hat W$ \\ 3959*ebfedea0SLionel Sambuc2. For $ix$ from $0$ to $x.used - 1$ do \\ 3960*ebfedea0SLionel Sambuc\hspace{3mm}2.1 $\hat W_{ix} \leftarrow x_{ix}$ \\ 3961*ebfedea0SLionel Sambuc3. For $ix$ from $x.used$ to $2n.used - 1$ do \\ 3962*ebfedea0SLionel Sambuc\hspace{3mm}3.1 $\hat W_{ix} \leftarrow 0$ \\ 3963*ebfedea0SLionel SambucElimiate the lower $k$ digits. \\ 3964*ebfedea0SLionel Sambuc4. for $ix$ from $0$ to $n.used - 1$ do \\ 3965*ebfedea0SLionel Sambuc\hspace{3mm}4.1 $\mu \leftarrow \hat W_{ix} \cdot \rho \mbox{ (mod }\beta\mbox{)}$ \\ 3966*ebfedea0SLionel Sambuc\hspace{3mm}4.2 For $iy$ from $0$ to $n.used - 1$ do \\ 3967*ebfedea0SLionel Sambuc\hspace{6mm}4.2.1 $\hat W_{iy + ix} \leftarrow \hat W_{iy + ix} + \mu \cdot n_{iy}$ \\ 3968*ebfedea0SLionel Sambuc\hspace{3mm}4.3 $\hat W_{ix + 1} \leftarrow \hat W_{ix + 1} + \lfloor \hat W_{ix} / \beta \rfloor$ \\ 3969*ebfedea0SLionel SambucPropagate carries upwards. \\ 3970*ebfedea0SLionel Sambuc5. for $ix$ from $n.used$ to $2n.used + 1$ do \\ 3971*ebfedea0SLionel Sambuc\hspace{3mm}5.1 $\hat W_{ix + 1} \leftarrow \hat W_{ix + 1} + \lfloor \hat W_{ix} / \beta \rfloor$ \\ 3972*ebfedea0SLionel SambucShift right and reduce modulo $\beta$ simultaneously. \\ 3973*ebfedea0SLionel Sambuc6. for $ix$ from $0$ to $n.used + 1$ do \\ 3974*ebfedea0SLionel Sambuc\hspace{3mm}6.1 $x_{ix} \leftarrow \hat W_{ix + n.used} \mbox{ (mod }\beta\mbox{)}$ \\ 3975*ebfedea0SLionel SambucZero excess digits and fixup $x$. \\ 3976*ebfedea0SLionel Sambuc7. if $x.used > n.used + 1$ then do \\ 3977*ebfedea0SLionel Sambuc\hspace{3mm}7.1 for $ix$ from $n.used + 1$ to $x.used - 1$ do \\ 3978*ebfedea0SLionel Sambuc\hspace{6mm}7.1.1 $x_{ix} \leftarrow 0$ \\ 3979*ebfedea0SLionel Sambuc8. $x.used \leftarrow n.used + 1$ \\ 3980*ebfedea0SLionel Sambuc9. Clamp excessive digits of $x$. \\ 3981*ebfedea0SLionel Sambuc10. If $x \ge n$ then \\ 3982*ebfedea0SLionel Sambuc\hspace{3mm}10.1 $x \leftarrow x - n$ \\ 3983*ebfedea0SLionel Sambuc11. Return(\textit{MP\_OKAY}). \\ 3984*ebfedea0SLionel Sambuc\hline 3985*ebfedea0SLionel Sambuc\end{tabular} 3986*ebfedea0SLionel Sambuc\end{center} 3987*ebfedea0SLionel Sambuc\end{small} 3988*ebfedea0SLionel Sambuc\caption{Algorithm fast\_mp\_montgomery\_reduce} 3989*ebfedea0SLionel Sambuc\end{figure} 3990*ebfedea0SLionel Sambuc 3991*ebfedea0SLionel Sambuc\textbf{Algorithm fast\_mp\_montgomery\_reduce.} 3992*ebfedea0SLionel SambucThis algorithm will compute the Montgomery reduction of $x$ modulo $n$ using the Comba technique. It is on most computer platforms significantly 3993*ebfedea0SLionel Sambucfaster than algorithm mp\_montgomery\_reduce and algorithm mp\_reduce (\textit{Barrett reduction}). The algorithm has the same restrictions 3994*ebfedea0SLionel Sambucon the input as the baseline reduction algorithm. An additional two restrictions are imposed on this algorithm. The number of digits $k$ in the 3995*ebfedea0SLionel Sambucthe modulus $n$ must not violate $MP\_WARRAY > 2k +1$ and $n < \delta$. When $\beta = 2^{28}$ this algorithm can be used to reduce modulo 3996*ebfedea0SLionel Sambuca modulus of at most $3,556$ bits in length. 3997*ebfedea0SLionel Sambuc 3998*ebfedea0SLionel SambucAs in the other Comba reduction algorithms there is a $\hat W$ array which stores the columns of the product. It is initially filled with the 3999*ebfedea0SLionel Sambuccontents of $x$ with the excess digits zeroed. The reduction loop is very similar the to the baseline loop at heart. The multiplication on step 4000*ebfedea0SLionel Sambuc4.1 can be single precision only since $ab \mbox{ (mod }\beta\mbox{)} \equiv (a \mbox{ mod }\beta)(b \mbox{ mod }\beta)$. Some multipliers such 4001*ebfedea0SLionel Sambucas those on the ARM processors take a variable length time to complete depending on the number of bytes of result it must produce. By performing 4002*ebfedea0SLionel Sambuca single precision multiplication instead half the amount of time is spent. 4003*ebfedea0SLionel Sambuc 4004*ebfedea0SLionel SambucAlso note that digit $\hat W_{ix}$ must have the carry from the $ix - 1$'th digit propagated upwards in order for this to work. That is what step 4005*ebfedea0SLionel Sambuc4.3 will do. In effect over the $n.used$ iterations of the outer loop the $n.used$'th lower columns all have the their carries propagated forwards. Note 4006*ebfedea0SLionel Sambuchow the upper bits of those same words are not reduced modulo $\beta$. This is because those values will be discarded shortly and there is no 4007*ebfedea0SLionel Sambucpoint. 4008*ebfedea0SLionel Sambuc 4009*ebfedea0SLionel SambucStep 5 will propagate the remainder of the carries upwards. On step 6 the columns are reduced modulo $\beta$ and shifted simultaneously as they are 4010*ebfedea0SLionel Sambucstored in the destination $x$. 4011*ebfedea0SLionel Sambuc 4012*ebfedea0SLionel SambucEXAM,bn_fast_mp_montgomery_reduce.c 4013*ebfedea0SLionel Sambuc 4014*ebfedea0SLionel SambucThe $\hat W$ array is first filled with digits of $x$ on line @49,for@ then the rest of the digits are zeroed on line @54,for@. Both loops share 4015*ebfedea0SLionel Sambucthe same alias variables to make the code easier to read. 4016*ebfedea0SLionel Sambuc 4017*ebfedea0SLionel SambucThe value of $\mu$ is calculated in an interesting fashion. First the value $\hat W_{ix}$ is reduced modulo $\beta$ and cast to a mp\_digit. This 4018*ebfedea0SLionel Sambucforces the compiler to use a single precision multiplication and prevents any concerns about loss of precision. Line @101,>>@ fixes the carry 4019*ebfedea0SLionel Sambucfor the next iteration of the loop by propagating the carry from $\hat W_{ix}$ to $\hat W_{ix+1}$. 4020*ebfedea0SLionel Sambuc 4021*ebfedea0SLionel SambucThe for loop on line @113,for@ propagates the rest of the carries upwards through the columns. The for loop on line @126,for@ reduces the columns 4022*ebfedea0SLionel Sambucmodulo $\beta$ and shifts them $k$ places at the same time. The alias $\_ \hat W$ actually refers to the array $\hat W$ starting at the $n.used$'th 4023*ebfedea0SLionel Sambucdigit, that is $\_ \hat W_{t} = \hat W_{n.used + t}$. 4024*ebfedea0SLionel Sambuc 4025*ebfedea0SLionel Sambuc\subsection{Montgomery Setup} 4026*ebfedea0SLionel SambucTo calculate the variable $\rho$ a relatively simple algorithm will be required. 4027*ebfedea0SLionel Sambuc 4028*ebfedea0SLionel Sambuc\begin{figure}[!here] 4029*ebfedea0SLionel Sambuc\begin{small} 4030*ebfedea0SLionel Sambuc\begin{center} 4031*ebfedea0SLionel Sambuc\begin{tabular}{l} 4032*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_montgomery\_setup}. \\ 4033*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $n$ ($n > 1$ and $(n, 2) = 1$) \\ 4034*ebfedea0SLionel Sambuc\textbf{Output}. $\rho \equiv -1/n_0 \mbox{ (mod }\beta\mbox{)}$ \\ 4035*ebfedea0SLionel Sambuc\hline \\ 4036*ebfedea0SLionel Sambuc1. $b \leftarrow n_0$ \\ 4037*ebfedea0SLionel Sambuc2. If $b$ is even return(\textit{MP\_VAL}) \\ 4038*ebfedea0SLionel Sambuc3. $x \leftarrow (((b + 2) \mbox{ AND } 4) << 1) + b$ \\ 4039*ebfedea0SLionel Sambuc4. for $k$ from 0 to $\lceil lg(lg(\beta)) \rceil - 2$ do \\ 4040*ebfedea0SLionel Sambuc\hspace{3mm}4.1 $x \leftarrow x \cdot (2 - bx)$ \\ 4041*ebfedea0SLionel Sambuc5. $\rho \leftarrow \beta - x \mbox{ (mod }\beta\mbox{)}$ \\ 4042*ebfedea0SLionel Sambuc6. Return(\textit{MP\_OKAY}). \\ 4043*ebfedea0SLionel Sambuc\hline 4044*ebfedea0SLionel Sambuc\end{tabular} 4045*ebfedea0SLionel Sambuc\end{center} 4046*ebfedea0SLionel Sambuc\end{small} 4047*ebfedea0SLionel Sambuc\caption{Algorithm mp\_montgomery\_setup} 4048*ebfedea0SLionel Sambuc\end{figure} 4049*ebfedea0SLionel Sambuc 4050*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_montgomery\_setup.} 4051*ebfedea0SLionel SambucThis algorithm will calculate the value of $\rho$ required within the Montgomery reduction algorithms. It uses a very interesting trick 4052*ebfedea0SLionel Sambucto calculate $1/n_0$ when $\beta$ is a power of two. 4053*ebfedea0SLionel Sambuc 4054*ebfedea0SLionel SambucEXAM,bn_mp_montgomery_setup.c 4055*ebfedea0SLionel Sambuc 4056*ebfedea0SLionel SambucThis source code computes the value of $\rho$ required to perform Montgomery reduction. It has been modified to avoid performing excess 4057*ebfedea0SLionel Sambucmultiplications when $\beta$ is not the default 28-bits. 4058*ebfedea0SLionel Sambuc 4059*ebfedea0SLionel Sambuc\section{The Diminished Radix Algorithm} 4060*ebfedea0SLionel SambucThe Diminished Radix method of modular reduction \cite{DRMET} is a fairly clever technique which can be more efficient than either the Barrett 4061*ebfedea0SLionel Sambucor Montgomery methods for certain forms of moduli. The technique is based on the following simple congruence. 4062*ebfedea0SLionel Sambuc 4063*ebfedea0SLionel Sambuc\begin{equation} 4064*ebfedea0SLionel Sambuc(x \mbox{ mod } n) + k \lfloor x / n \rfloor \equiv x \mbox{ (mod }(n - k)\mbox{)} 4065*ebfedea0SLionel Sambuc\end{equation} 4066*ebfedea0SLionel Sambuc 4067*ebfedea0SLionel SambucThis observation was used in the MMB \cite{MMB} block cipher to create a diffusion primitive. It used the fact that if $n = 2^{31}$ and $k=1$ that 4068*ebfedea0SLionel Sambucthen a x86 multiplier could produce the 62-bit product and use the ``shrd'' instruction to perform a double-precision right shift. The proof 4069*ebfedea0SLionel Sambucof the above equation is very simple. First write $x$ in the product form. 4070*ebfedea0SLionel Sambuc 4071*ebfedea0SLionel Sambuc\begin{equation} 4072*ebfedea0SLionel Sambucx = qn + r 4073*ebfedea0SLionel Sambuc\end{equation} 4074*ebfedea0SLionel Sambuc 4075*ebfedea0SLionel SambucNow reduce both sides modulo $(n - k)$. 4076*ebfedea0SLionel Sambuc 4077*ebfedea0SLionel Sambuc\begin{equation} 4078*ebfedea0SLionel Sambucx \equiv qk + r \mbox{ (mod }(n-k)\mbox{)} 4079*ebfedea0SLionel Sambuc\end{equation} 4080*ebfedea0SLionel Sambuc 4081*ebfedea0SLionel SambucThe variable $n$ reduces modulo $n - k$ to $k$. By putting $q = \lfloor x/n \rfloor$ and $r = x \mbox{ mod } n$ 4082*ebfedea0SLionel Sambucinto the equation the original congruence is reproduced, thus concluding the proof. The following algorithm is based on this observation. 4083*ebfedea0SLionel Sambuc 4084*ebfedea0SLionel Sambuc\begin{figure}[!here] 4085*ebfedea0SLionel Sambuc\begin{small} 4086*ebfedea0SLionel Sambuc\begin{center} 4087*ebfedea0SLionel Sambuc\begin{tabular}{l} 4088*ebfedea0SLionel Sambuc\hline Algorithm \textbf{Diminished Radix Reduction}. \\ 4089*ebfedea0SLionel Sambuc\textbf{Input}. Integer $x$, $n$, $k$ \\ 4090*ebfedea0SLionel Sambuc\textbf{Output}. $x \mbox{ mod } (n - k)$ \\ 4091*ebfedea0SLionel Sambuc\hline \\ 4092*ebfedea0SLionel Sambuc1. $q \leftarrow \lfloor x / n \rfloor$ \\ 4093*ebfedea0SLionel Sambuc2. $q \leftarrow k \cdot q$ \\ 4094*ebfedea0SLionel Sambuc3. $x \leftarrow x \mbox{ (mod }n\mbox{)}$ \\ 4095*ebfedea0SLionel Sambuc4. $x \leftarrow x + q$ \\ 4096*ebfedea0SLionel Sambuc5. If $x \ge (n - k)$ then \\ 4097*ebfedea0SLionel Sambuc\hspace{3mm}5.1 $x \leftarrow x - (n - k)$ \\ 4098*ebfedea0SLionel Sambuc\hspace{3mm}5.2 Goto step 1. \\ 4099*ebfedea0SLionel Sambuc6. Return $x$ \\ 4100*ebfedea0SLionel Sambuc\hline 4101*ebfedea0SLionel Sambuc\end{tabular} 4102*ebfedea0SLionel Sambuc\end{center} 4103*ebfedea0SLionel Sambuc\end{small} 4104*ebfedea0SLionel Sambuc\caption{Algorithm Diminished Radix Reduction} 4105*ebfedea0SLionel Sambuc\label{fig:DR} 4106*ebfedea0SLionel Sambuc\end{figure} 4107*ebfedea0SLionel Sambuc 4108*ebfedea0SLionel SambucThis algorithm will reduce $x$ modulo $n - k$ and return the residue. If $0 \le x < (n - k)^2$ then the algorithm will loop almost always 4109*ebfedea0SLionel Sambuconce or twice and occasionally three times. For simplicity sake the value of $x$ is bounded by the following simple polynomial. 4110*ebfedea0SLionel Sambuc 4111*ebfedea0SLionel Sambuc\begin{equation} 4112*ebfedea0SLionel Sambuc0 \le x < n^2 + k^2 - 2nk 4113*ebfedea0SLionel Sambuc\end{equation} 4114*ebfedea0SLionel Sambuc 4115*ebfedea0SLionel SambucThe true bound is $0 \le x < (n - k - 1)^2$ but this has quite a few more terms. The value of $q$ after step 1 is bounded by the following. 4116*ebfedea0SLionel Sambuc 4117*ebfedea0SLionel Sambuc\begin{equation} 4118*ebfedea0SLionel Sambucq < n - 2k - k^2/n 4119*ebfedea0SLionel Sambuc\end{equation} 4120*ebfedea0SLionel Sambuc 4121*ebfedea0SLionel SambucSince $k^2$ is going to be considerably smaller than $n$ that term will always be zero. The value of $x$ after step 3 is bounded trivially as 4122*ebfedea0SLionel Sambuc$0 \le x < n$. By step four the sum $x + q$ is bounded by 4123*ebfedea0SLionel Sambuc 4124*ebfedea0SLionel Sambuc\begin{equation} 4125*ebfedea0SLionel Sambuc0 \le q + x < (k + 1)n - 2k^2 - 1 4126*ebfedea0SLionel Sambuc\end{equation} 4127*ebfedea0SLionel Sambuc 4128*ebfedea0SLionel SambucWith a second pass $q$ will be loosely bounded by $0 \le q < k^2$ after step 2 while $x$ will still be loosely bounded by $0 \le x < n$ after step 3. After the second pass it is highly unlike that the 4129*ebfedea0SLionel Sambucsum in step 4 will exceed $n - k$. In practice fewer than three passes of the algorithm are required to reduce virtually every input in the 4130*ebfedea0SLionel Sambucrange $0 \le x < (n - k - 1)^2$. 4131*ebfedea0SLionel Sambuc 4132*ebfedea0SLionel Sambuc\begin{figure} 4133*ebfedea0SLionel Sambuc\begin{small} 4134*ebfedea0SLionel Sambuc\begin{center} 4135*ebfedea0SLionel Sambuc\begin{tabular}{|l|} 4136*ebfedea0SLionel Sambuc\hline 4137*ebfedea0SLionel Sambuc$x = 123456789, n = 256, k = 3$ \\ 4138*ebfedea0SLionel Sambuc\hline $q \leftarrow \lfloor x/n \rfloor = 482253$ \\ 4139*ebfedea0SLionel Sambuc$q \leftarrow q*k = 1446759$ \\ 4140*ebfedea0SLionel Sambuc$x \leftarrow x \mbox{ mod } n = 21$ \\ 4141*ebfedea0SLionel Sambuc$x \leftarrow x + q = 1446780$ \\ 4142*ebfedea0SLionel Sambuc$x \leftarrow x - (n - k) = 1446527$ \\ 4143*ebfedea0SLionel Sambuc\hline 4144*ebfedea0SLionel Sambuc$q \leftarrow \lfloor x/n \rfloor = 5650$ \\ 4145*ebfedea0SLionel Sambuc$q \leftarrow q*k = 16950$ \\ 4146*ebfedea0SLionel Sambuc$x \leftarrow x \mbox{ mod } n = 127$ \\ 4147*ebfedea0SLionel Sambuc$x \leftarrow x + q = 17077$ \\ 4148*ebfedea0SLionel Sambuc$x \leftarrow x - (n - k) = 16824$ \\ 4149*ebfedea0SLionel Sambuc\hline 4150*ebfedea0SLionel Sambuc$q \leftarrow \lfloor x/n \rfloor = 65$ \\ 4151*ebfedea0SLionel Sambuc$q \leftarrow q*k = 195$ \\ 4152*ebfedea0SLionel Sambuc$x \leftarrow x \mbox{ mod } n = 184$ \\ 4153*ebfedea0SLionel Sambuc$x \leftarrow x + q = 379$ \\ 4154*ebfedea0SLionel Sambuc$x \leftarrow x - (n - k) = 126$ \\ 4155*ebfedea0SLionel Sambuc\hline 4156*ebfedea0SLionel Sambuc\end{tabular} 4157*ebfedea0SLionel Sambuc\end{center} 4158*ebfedea0SLionel Sambuc\end{small} 4159*ebfedea0SLionel Sambuc\caption{Example Diminished Radix Reduction} 4160*ebfedea0SLionel Sambuc\label{fig:EXDR} 4161*ebfedea0SLionel Sambuc\end{figure} 4162*ebfedea0SLionel Sambuc 4163*ebfedea0SLionel SambucFigure~\ref{fig:EXDR} demonstrates the reduction of $x = 123456789$ modulo $n - k = 253$ when $n = 256$ and $k = 3$. Note that even while $x$ 4164*ebfedea0SLionel Sambucis considerably larger than $(n - k - 1)^2 = 63504$ the algorithm still converges on the modular residue exceedingly fast. In this case only 4165*ebfedea0SLionel Sambucthree passes were required to find the residue $x \equiv 126$. 4166*ebfedea0SLionel Sambuc 4167*ebfedea0SLionel Sambuc 4168*ebfedea0SLionel Sambuc\subsection{Choice of Moduli} 4169*ebfedea0SLionel SambucOn the surface this algorithm looks like a very expensive algorithm. It requires a couple of subtractions followed by multiplication and other 4170*ebfedea0SLionel Sambucmodular reductions. The usefulness of this algorithm becomes exceedingly clear when an appropriate modulus is chosen. 4171*ebfedea0SLionel Sambuc 4172*ebfedea0SLionel SambucDivision in general is a very expensive operation to perform. The one exception is when the division is by a power of the radix of representation used. 4173*ebfedea0SLionel SambucDivision by ten for example is simple for pencil and paper mathematics since it amounts to shifting the decimal place to the right. Similarly division 4174*ebfedea0SLionel Sambucby two (\textit{or powers of two}) is very simple for binary computers to perform. It would therefore seem logical to choose $n$ of the form $2^p$ 4175*ebfedea0SLionel Sambucwhich would imply that $\lfloor x / n \rfloor$ is a simple shift of $x$ right $p$ bits. 4176*ebfedea0SLionel Sambuc 4177*ebfedea0SLionel SambucHowever, there is one operation related to division of power of twos that is even faster than this. If $n = \beta^p$ then the division may be 4178*ebfedea0SLionel Sambucperformed by moving whole digits to the right $p$ places. In practice division by $\beta^p$ is much faster than division by $2^p$ for any $p$. 4179*ebfedea0SLionel SambucAlso with the choice of $n = \beta^p$ reducing $x$ modulo $n$ merely requires zeroing the digits above the $p-1$'th digit of $x$. 4180*ebfedea0SLionel Sambuc 4181*ebfedea0SLionel SambucThroughout the next section the term ``restricted modulus'' will refer to a modulus of the form $\beta^p - k$ whereas the term ``unrestricted 4182*ebfedea0SLionel Sambucmodulus'' will refer to a modulus of the form $2^p - k$. The word ``restricted'' in this case refers to the fact that it is based on the 4183*ebfedea0SLionel Sambuc$2^p$ logic except $p$ must be a multiple of $lg(\beta)$. 4184*ebfedea0SLionel Sambuc 4185*ebfedea0SLionel Sambuc\subsection{Choice of $k$} 4186*ebfedea0SLionel SambucNow that division and reduction (\textit{step 1 and 3 of figure~\ref{fig:DR}}) have been optimized to simple digit operations the multiplication by $k$ 4187*ebfedea0SLionel Sambucin step 2 is the most expensive operation. Fortunately the choice of $k$ is not terribly limited. For all intents and purposes it might 4188*ebfedea0SLionel Sambucas well be a single digit. The smaller the value of $k$ is the faster the algorithm will be. 4189*ebfedea0SLionel Sambuc 4190*ebfedea0SLionel Sambuc\subsection{Restricted Diminished Radix Reduction} 4191*ebfedea0SLionel SambucThe restricted Diminished Radix algorithm can quickly reduce an input modulo a modulus of the form $n = \beta^p - k$. This algorithm can reduce 4192*ebfedea0SLionel Sambucan input $x$ within the range $0 \le x < n^2$ using only a couple passes of the algorithm demonstrated in figure~\ref{fig:DR}. The implementation 4193*ebfedea0SLionel Sambucof this algorithm has been optimized to avoid additional overhead associated with a division by $\beta^p$, the multiplication by $k$ or the addition 4194*ebfedea0SLionel Sambucof $x$ and $q$. The resulting algorithm is very efficient and can lead to substantial improvements over Barrett and Montgomery reduction when modular 4195*ebfedea0SLionel Sambucexponentiations are performed. 4196*ebfedea0SLionel Sambuc 4197*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 4198*ebfedea0SLionel Sambuc\begin{small} 4199*ebfedea0SLionel Sambuc\begin{center} 4200*ebfedea0SLionel Sambuc\begin{tabular}{l} 4201*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_dr\_reduce}. \\ 4202*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $x$, $n$ and a mp\_digit $k = \beta - n_0$ \\ 4203*ebfedea0SLionel Sambuc\hspace{11.5mm}($0 \le x < n^2$, $n > 1$, $0 < k < \beta$) \\ 4204*ebfedea0SLionel Sambuc\textbf{Output}. $x \mbox{ mod } n$ \\ 4205*ebfedea0SLionel Sambuc\hline \\ 4206*ebfedea0SLionel Sambuc1. $m \leftarrow n.used$ \\ 4207*ebfedea0SLionel Sambuc2. If $x.alloc < 2m$ then grow $x$ to $2m$ digits. \\ 4208*ebfedea0SLionel Sambuc3. $\mu \leftarrow 0$ \\ 4209*ebfedea0SLionel Sambuc4. for $i$ from $0$ to $m - 1$ do \\ 4210*ebfedea0SLionel Sambuc\hspace{3mm}4.1 $\hat r \leftarrow k \cdot x_{m+i} + x_{i} + \mu$ \\ 4211*ebfedea0SLionel Sambuc\hspace{3mm}4.2 $x_{i} \leftarrow \hat r \mbox{ (mod }\beta\mbox{)}$ \\ 4212*ebfedea0SLionel Sambuc\hspace{3mm}4.3 $\mu \leftarrow \lfloor \hat r / \beta \rfloor$ \\ 4213*ebfedea0SLionel Sambuc5. $x_{m} \leftarrow \mu$ \\ 4214*ebfedea0SLionel Sambuc6. for $i$ from $m + 1$ to $x.used - 1$ do \\ 4215*ebfedea0SLionel Sambuc\hspace{3mm}6.1 $x_{i} \leftarrow 0$ \\ 4216*ebfedea0SLionel Sambuc7. Clamp excess digits of $x$. \\ 4217*ebfedea0SLionel Sambuc8. If $x \ge n$ then \\ 4218*ebfedea0SLionel Sambuc\hspace{3mm}8.1 $x \leftarrow x - n$ \\ 4219*ebfedea0SLionel Sambuc\hspace{3mm}8.2 Goto step 3. \\ 4220*ebfedea0SLionel Sambuc9. Return(\textit{MP\_OKAY}). \\ 4221*ebfedea0SLionel Sambuc\hline 4222*ebfedea0SLionel Sambuc\end{tabular} 4223*ebfedea0SLionel Sambuc\end{center} 4224*ebfedea0SLionel Sambuc\end{small} 4225*ebfedea0SLionel Sambuc\caption{Algorithm mp\_dr\_reduce} 4226*ebfedea0SLionel Sambuc\end{figure} 4227*ebfedea0SLionel Sambuc 4228*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_dr\_reduce.} 4229*ebfedea0SLionel SambucThis algorithm will perform the Dimished Radix reduction of $x$ modulo $n$. It has similar restrictions to that of the Barrett reduction 4230*ebfedea0SLionel Sambucwith the addition that $n$ must be of the form $n = \beta^m - k$ where $0 < k <\beta$. 4231*ebfedea0SLionel Sambuc 4232*ebfedea0SLionel SambucThis algorithm essentially implements the pseudo-code in figure~\ref{fig:DR} except with a slight optimization. The division by $\beta^m$, multiplication by $k$ 4233*ebfedea0SLionel Sambucand addition of $x \mbox{ mod }\beta^m$ are all performed simultaneously inside the loop on step 4. The division by $\beta^m$ is emulated by accessing 4234*ebfedea0SLionel Sambucthe term at the $m+i$'th position which is subsequently multiplied by $k$ and added to the term at the $i$'th position. After the loop the $m$'th 4235*ebfedea0SLionel Sambucdigit is set to the carry and the upper digits are zeroed. Steps 5 and 6 emulate the reduction modulo $\beta^m$ that should have happend to 4236*ebfedea0SLionel Sambuc$x$ before the addition of the multiple of the upper half. 4237*ebfedea0SLionel Sambuc 4238*ebfedea0SLionel SambucAt step 8 if $x$ is still larger than $n$ another pass of the algorithm is required. First $n$ is subtracted from $x$ and then the algorithm resumes 4239*ebfedea0SLionel Sambucat step 3. 4240*ebfedea0SLionel Sambuc 4241*ebfedea0SLionel SambucEXAM,bn_mp_dr_reduce.c 4242*ebfedea0SLionel Sambuc 4243*ebfedea0SLionel SambucThe first step is to grow $x$ as required to $2m$ digits since the reduction is performed in place on $x$. The label on line @49,top:@ is where 4244*ebfedea0SLionel Sambucthe algorithm will resume if further reduction passes are required. In theory it could be placed at the top of the function however, the size of 4245*ebfedea0SLionel Sambucthe modulus and question of whether $x$ is large enough are invariant after the first pass meaning that it would be a waste of time. 4246*ebfedea0SLionel Sambuc 4247*ebfedea0SLionel SambucThe aliases $tmpx1$ and $tmpx2$ refer to the digits of $x$ where the latter is offset by $m$ digits. By reading digits from $x$ offset by $m$ digits 4248*ebfedea0SLionel Sambuca division by $\beta^m$ can be simulated virtually for free. The loop on line @61,for@ performs the bulk of the work (\textit{corresponds to step 4 of algorithm 7.11}) 4249*ebfedea0SLionel Sambucin this algorithm. 4250*ebfedea0SLionel Sambuc 4251*ebfedea0SLionel SambucBy line @68,mu@ the pointer $tmpx1$ points to the $m$'th digit of $x$ which is where the final carry will be placed. Similarly by line @71,for@ the 4252*ebfedea0SLionel Sambucsame pointer will point to the $m+1$'th digit where the zeroes will be placed. 4253*ebfedea0SLionel Sambuc 4254*ebfedea0SLionel SambucSince the algorithm is only valid if both $x$ and $n$ are greater than zero an unsigned comparison suffices to determine if another pass is required. 4255*ebfedea0SLionel SambucWith the same logic at line @82,sub@ the value of $x$ is known to be greater than or equal to $n$ meaning that an unsigned subtraction can be used 4256*ebfedea0SLionel Sambucas well. Since the destination of the subtraction is the larger of the inputs the call to algorithm s\_mp\_sub cannot fail and the return code 4257*ebfedea0SLionel Sambucdoes not need to be checked. 4258*ebfedea0SLionel Sambuc 4259*ebfedea0SLionel Sambuc\subsubsection{Setup} 4260*ebfedea0SLionel SambucTo setup the restricted Diminished Radix algorithm the value $k = \beta - n_0$ is required. This algorithm is not really complicated but provided for 4261*ebfedea0SLionel Sambuccompleteness. 4262*ebfedea0SLionel Sambuc 4263*ebfedea0SLionel Sambuc\begin{figure}[!here] 4264*ebfedea0SLionel Sambuc\begin{small} 4265*ebfedea0SLionel Sambuc\begin{center} 4266*ebfedea0SLionel Sambuc\begin{tabular}{l} 4267*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_dr\_setup}. \\ 4268*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $n$ \\ 4269*ebfedea0SLionel Sambuc\textbf{Output}. $k = \beta - n_0$ \\ 4270*ebfedea0SLionel Sambuc\hline \\ 4271*ebfedea0SLionel Sambuc1. $k \leftarrow \beta - n_0$ \\ 4272*ebfedea0SLionel Sambuc\hline 4273*ebfedea0SLionel Sambuc\end{tabular} 4274*ebfedea0SLionel Sambuc\end{center} 4275*ebfedea0SLionel Sambuc\end{small} 4276*ebfedea0SLionel Sambuc\caption{Algorithm mp\_dr\_setup} 4277*ebfedea0SLionel Sambuc\end{figure} 4278*ebfedea0SLionel Sambuc 4279*ebfedea0SLionel SambucEXAM,bn_mp_dr_setup.c 4280*ebfedea0SLionel Sambuc 4281*ebfedea0SLionel Sambuc\subsubsection{Modulus Detection} 4282*ebfedea0SLionel SambucAnother algorithm which will be useful is the ability to detect a restricted Diminished Radix modulus. An integer is said to be 4283*ebfedea0SLionel Sambucof restricted Diminished Radix form if all of the digits are equal to $\beta - 1$ except the trailing digit which may be any value. 4284*ebfedea0SLionel Sambuc 4285*ebfedea0SLionel Sambuc\begin{figure}[!here] 4286*ebfedea0SLionel Sambuc\begin{small} 4287*ebfedea0SLionel Sambuc\begin{center} 4288*ebfedea0SLionel Sambuc\begin{tabular}{l} 4289*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_dr\_is\_modulus}. \\ 4290*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $n$ \\ 4291*ebfedea0SLionel Sambuc\textbf{Output}. $1$ if $n$ is in D.R form, $0$ otherwise \\ 4292*ebfedea0SLionel Sambuc\hline 4293*ebfedea0SLionel Sambuc1. If $n.used < 2$ then return($0$). \\ 4294*ebfedea0SLionel Sambuc2. for $ix$ from $1$ to $n.used - 1$ do \\ 4295*ebfedea0SLionel Sambuc\hspace{3mm}2.1 If $n_{ix} \ne \beta - 1$ return($0$). \\ 4296*ebfedea0SLionel Sambuc3. Return($1$). \\ 4297*ebfedea0SLionel Sambuc\hline 4298*ebfedea0SLionel Sambuc\end{tabular} 4299*ebfedea0SLionel Sambuc\end{center} 4300*ebfedea0SLionel Sambuc\end{small} 4301*ebfedea0SLionel Sambuc\caption{Algorithm mp\_dr\_is\_modulus} 4302*ebfedea0SLionel Sambuc\end{figure} 4303*ebfedea0SLionel Sambuc 4304*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_dr\_is\_modulus.} 4305*ebfedea0SLionel SambucThis algorithm determines if a value is in Diminished Radix form. Step 1 rejects obvious cases where fewer than two digits are 4306*ebfedea0SLionel Sambucin the mp\_int. Step 2 tests all but the first digit to see if they are equal to $\beta - 1$. If the algorithm manages to get to 4307*ebfedea0SLionel Sambucstep 3 then $n$ must be of Diminished Radix form. 4308*ebfedea0SLionel Sambuc 4309*ebfedea0SLionel SambucEXAM,bn_mp_dr_is_modulus.c 4310*ebfedea0SLionel Sambuc 4311*ebfedea0SLionel Sambuc\subsection{Unrestricted Diminished Radix Reduction} 4312*ebfedea0SLionel SambucThe unrestricted Diminished Radix algorithm allows modular reductions to be performed when the modulus is of the form $2^p - k$. This algorithm 4313*ebfedea0SLionel Sambucis a straightforward adaptation of algorithm~\ref{fig:DR}. 4314*ebfedea0SLionel Sambuc 4315*ebfedea0SLionel SambucIn general the restricted Diminished Radix reduction algorithm is much faster since it has considerably lower overhead. However, this new 4316*ebfedea0SLionel Sambucalgorithm is much faster than either Montgomery or Barrett reduction when the moduli are of the appropriate form. 4317*ebfedea0SLionel Sambuc 4318*ebfedea0SLionel Sambuc\begin{figure}[!here] 4319*ebfedea0SLionel Sambuc\begin{small} 4320*ebfedea0SLionel Sambuc\begin{center} 4321*ebfedea0SLionel Sambuc\begin{tabular}{l} 4322*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_reduce\_2k}. \\ 4323*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ and $n$. mp\_digit $k$ \\ 4324*ebfedea0SLionel Sambuc\hspace{11.5mm}($a \ge 0$, $n > 1$, $0 < k < \beta$, $n + k$ is a power of two) \\ 4325*ebfedea0SLionel Sambuc\textbf{Output}. $a \mbox{ (mod }n\mbox{)}$ \\ 4326*ebfedea0SLionel Sambuc\hline 4327*ebfedea0SLionel Sambuc1. $p \leftarrow \lceil lg(n) \rceil$ (\textit{mp\_count\_bits}) \\ 4328*ebfedea0SLionel Sambuc2. While $a \ge n$ do \\ 4329*ebfedea0SLionel Sambuc\hspace{3mm}2.1 $q \leftarrow \lfloor a / 2^p \rfloor$ (\textit{mp\_div\_2d}) \\ 4330*ebfedea0SLionel Sambuc\hspace{3mm}2.2 $a \leftarrow a \mbox{ (mod }2^p\mbox{)}$ (\textit{mp\_mod\_2d}) \\ 4331*ebfedea0SLionel Sambuc\hspace{3mm}2.3 $q \leftarrow q \cdot k$ (\textit{mp\_mul\_d}) \\ 4332*ebfedea0SLionel Sambuc\hspace{3mm}2.4 $a \leftarrow a - q$ (\textit{s\_mp\_sub}) \\ 4333*ebfedea0SLionel Sambuc\hspace{3mm}2.5 If $a \ge n$ then do \\ 4334*ebfedea0SLionel Sambuc\hspace{6mm}2.5.1 $a \leftarrow a - n$ \\ 4335*ebfedea0SLionel Sambuc3. Return(\textit{MP\_OKAY}). \\ 4336*ebfedea0SLionel Sambuc\hline 4337*ebfedea0SLionel Sambuc\end{tabular} 4338*ebfedea0SLionel Sambuc\end{center} 4339*ebfedea0SLionel Sambuc\end{small} 4340*ebfedea0SLionel Sambuc\caption{Algorithm mp\_reduce\_2k} 4341*ebfedea0SLionel Sambuc\end{figure} 4342*ebfedea0SLionel Sambuc 4343*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_reduce\_2k.} 4344*ebfedea0SLionel SambucThis algorithm quickly reduces an input $a$ modulo an unrestricted Diminished Radix modulus $n$. Division by $2^p$ is emulated with a right 4345*ebfedea0SLionel Sambucshift which makes the algorithm fairly inexpensive to use. 4346*ebfedea0SLionel Sambuc 4347*ebfedea0SLionel SambucEXAM,bn_mp_reduce_2k.c 4348*ebfedea0SLionel Sambuc 4349*ebfedea0SLionel SambucThe algorithm mp\_count\_bits calculates the number of bits in an mp\_int which is used to find the initial value of $p$. The call to mp\_div\_2d 4350*ebfedea0SLionel Sambucon line @31,mp_div_2d@ calculates both the quotient $q$ and the remainder $a$ required. By doing both in a single function call the code size 4351*ebfedea0SLionel Sambucis kept fairly small. The multiplication by $k$ is only performed if $k > 1$. This allows reductions modulo $2^p - 1$ to be performed without 4352*ebfedea0SLionel Sambucany multiplications. 4353*ebfedea0SLionel Sambuc 4354*ebfedea0SLionel SambucThe unsigned s\_mp\_add, mp\_cmp\_mag and s\_mp\_sub are used in place of their full sign counterparts since the inputs are only valid if they are 4355*ebfedea0SLionel Sambucpositive. By using the unsigned versions the overhead is kept to a minimum. 4356*ebfedea0SLionel Sambuc 4357*ebfedea0SLionel Sambuc\subsubsection{Unrestricted Setup} 4358*ebfedea0SLionel SambucTo setup this reduction algorithm the value of $k = 2^p - n$ is required. 4359*ebfedea0SLionel Sambuc 4360*ebfedea0SLionel Sambuc\begin{figure}[!here] 4361*ebfedea0SLionel Sambuc\begin{small} 4362*ebfedea0SLionel Sambuc\begin{center} 4363*ebfedea0SLionel Sambuc\begin{tabular}{l} 4364*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_reduce\_2k\_setup}. \\ 4365*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $n$ \\ 4366*ebfedea0SLionel Sambuc\textbf{Output}. $k = 2^p - n$ \\ 4367*ebfedea0SLionel Sambuc\hline 4368*ebfedea0SLionel Sambuc1. $p \leftarrow \lceil lg(n) \rceil$ (\textit{mp\_count\_bits}) \\ 4369*ebfedea0SLionel Sambuc2. $x \leftarrow 2^p$ (\textit{mp\_2expt}) \\ 4370*ebfedea0SLionel Sambuc3. $x \leftarrow x - n$ (\textit{mp\_sub}) \\ 4371*ebfedea0SLionel Sambuc4. $k \leftarrow x_0$ \\ 4372*ebfedea0SLionel Sambuc5. Return(\textit{MP\_OKAY}). \\ 4373*ebfedea0SLionel Sambuc\hline 4374*ebfedea0SLionel Sambuc\end{tabular} 4375*ebfedea0SLionel Sambuc\end{center} 4376*ebfedea0SLionel Sambuc\end{small} 4377*ebfedea0SLionel Sambuc\caption{Algorithm mp\_reduce\_2k\_setup} 4378*ebfedea0SLionel Sambuc\end{figure} 4379*ebfedea0SLionel Sambuc 4380*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_reduce\_2k\_setup.} 4381*ebfedea0SLionel SambucThis algorithm computes the value of $k$ required for the algorithm mp\_reduce\_2k. By making a temporary variable $x$ equal to $2^p$ a subtraction 4382*ebfedea0SLionel Sambucis sufficient to solve for $k$. Alternatively if $n$ has more than one digit the value of $k$ is simply $\beta - n_0$. 4383*ebfedea0SLionel Sambuc 4384*ebfedea0SLionel SambucEXAM,bn_mp_reduce_2k_setup.c 4385*ebfedea0SLionel Sambuc 4386*ebfedea0SLionel Sambuc\subsubsection{Unrestricted Detection} 4387*ebfedea0SLionel SambucAn integer $n$ is a valid unrestricted Diminished Radix modulus if either of the following are true. 4388*ebfedea0SLionel Sambuc 4389*ebfedea0SLionel Sambuc\begin{enumerate} 4390*ebfedea0SLionel Sambuc\item The number has only one digit. 4391*ebfedea0SLionel Sambuc\item The number has more than one digit and every bit from the $\beta$'th to the most significant is one. 4392*ebfedea0SLionel Sambuc\end{enumerate} 4393*ebfedea0SLionel Sambuc 4394*ebfedea0SLionel SambucIf either condition is true than there is a power of two $2^p$ such that $0 < 2^p - n < \beta$. If the input is only 4395*ebfedea0SLionel Sambucone digit than it will always be of the correct form. Otherwise all of the bits above the first digit must be one. This arises from the fact 4396*ebfedea0SLionel Sambucthat there will be value of $k$ that when added to the modulus causes a carry in the first digit which propagates all the way to the most 4397*ebfedea0SLionel Sambucsignificant bit. The resulting sum will be a power of two. 4398*ebfedea0SLionel Sambuc 4399*ebfedea0SLionel Sambuc\begin{figure}[!here] 4400*ebfedea0SLionel Sambuc\begin{small} 4401*ebfedea0SLionel Sambuc\begin{center} 4402*ebfedea0SLionel Sambuc\begin{tabular}{l} 4403*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_reduce\_is\_2k}. \\ 4404*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $n$ \\ 4405*ebfedea0SLionel Sambuc\textbf{Output}. $1$ if of proper form, $0$ otherwise \\ 4406*ebfedea0SLionel Sambuc\hline 4407*ebfedea0SLionel Sambuc1. If $n.used = 0$ then return($0$). \\ 4408*ebfedea0SLionel Sambuc2. If $n.used = 1$ then return($1$). \\ 4409*ebfedea0SLionel Sambuc3. $p \leftarrow \lceil lg(n) \rceil$ (\textit{mp\_count\_bits}) \\ 4410*ebfedea0SLionel Sambuc4. for $x$ from $lg(\beta)$ to $p$ do \\ 4411*ebfedea0SLionel Sambuc\hspace{3mm}4.1 If the ($x \mbox{ mod }lg(\beta)$)'th bit of the $\lfloor x / lg(\beta) \rfloor$ of $n$ is zero then return($0$). \\ 4412*ebfedea0SLionel Sambuc5. Return($1$). \\ 4413*ebfedea0SLionel Sambuc\hline 4414*ebfedea0SLionel Sambuc\end{tabular} 4415*ebfedea0SLionel Sambuc\end{center} 4416*ebfedea0SLionel Sambuc\end{small} 4417*ebfedea0SLionel Sambuc\caption{Algorithm mp\_reduce\_is\_2k} 4418*ebfedea0SLionel Sambuc\end{figure} 4419*ebfedea0SLionel Sambuc 4420*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_reduce\_is\_2k.} 4421*ebfedea0SLionel SambucThis algorithm quickly determines if a modulus is of the form required for algorithm mp\_reduce\_2k to function properly. 4422*ebfedea0SLionel Sambuc 4423*ebfedea0SLionel SambucEXAM,bn_mp_reduce_is_2k.c 4424*ebfedea0SLionel Sambuc 4425*ebfedea0SLionel Sambuc 4426*ebfedea0SLionel Sambuc 4427*ebfedea0SLionel Sambuc\section{Algorithm Comparison} 4428*ebfedea0SLionel SambucSo far three very different algorithms for modular reduction have been discussed. Each of the algorithms have their own strengths and weaknesses 4429*ebfedea0SLionel Sambucthat makes having such a selection very useful. The following table sumarizes the three algorithms along with comparisons of work factors. Since 4430*ebfedea0SLionel Sambucall three algorithms have the restriction that $0 \le x < n^2$ and $n > 1$ those limitations are not included in the table. 4431*ebfedea0SLionel Sambuc 4432*ebfedea0SLionel Sambuc\begin{center} 4433*ebfedea0SLionel Sambuc\begin{small} 4434*ebfedea0SLionel Sambuc\begin{tabular}{|c|c|c|c|c|c|} 4435*ebfedea0SLionel Sambuc\hline \textbf{Method} & \textbf{Work Required} & \textbf{Limitations} & \textbf{$m = 8$} & \textbf{$m = 32$} & \textbf{$m = 64$} \\ 4436*ebfedea0SLionel Sambuc\hline Barrett & $m^2 + 2m - 1$ & None & $79$ & $1087$ & $4223$ \\ 4437*ebfedea0SLionel Sambuc\hline Montgomery & $m^2 + m$ & $n$ must be odd & $72$ & $1056$ & $4160$ \\ 4438*ebfedea0SLionel Sambuc\hline D.R. & $2m$ & $n = \beta^m - k$ & $16$ & $64$ & $128$ \\ 4439*ebfedea0SLionel Sambuc\hline 4440*ebfedea0SLionel Sambuc\end{tabular} 4441*ebfedea0SLionel Sambuc\end{small} 4442*ebfedea0SLionel Sambuc\end{center} 4443*ebfedea0SLionel Sambuc 4444*ebfedea0SLionel SambucIn theory Montgomery and Barrett reductions would require roughly the same amount of time to complete. However, in practice since Montgomery 4445*ebfedea0SLionel Sambucreduction can be written as a single function with the Comba technique it is much faster. Barrett reduction suffers from the overhead of 4446*ebfedea0SLionel Sambuccalling the half precision multipliers, addition and division by $\beta$ algorithms. 4447*ebfedea0SLionel Sambuc 4448*ebfedea0SLionel SambucFor almost every cryptographic algorithm Montgomery reduction is the algorithm of choice. The one set of algorithms where Diminished Radix reduction truly 4449*ebfedea0SLionel Sambucshines are based on the discrete logarithm problem such as Diffie-Hellman \cite{DH} and ElGamal \cite{ELGAMAL}. In these algorithms 4450*ebfedea0SLionel Sambucprimes of the form $\beta^m - k$ can be found and shared amongst users. These primes will allow the Diminished Radix algorithm to be used in 4451*ebfedea0SLionel Sambucmodular exponentiation to greatly speed up the operation. 4452*ebfedea0SLionel Sambuc 4453*ebfedea0SLionel Sambuc 4454*ebfedea0SLionel Sambuc 4455*ebfedea0SLionel Sambuc\section*{Exercises} 4456*ebfedea0SLionel Sambuc\begin{tabular}{cl} 4457*ebfedea0SLionel Sambuc$\left [ 3 \right ]$ & Prove that the ``trick'' in algorithm mp\_montgomery\_setup actually \\ 4458*ebfedea0SLionel Sambuc & calculates the correct value of $\rho$. \\ 4459*ebfedea0SLionel Sambuc & \\ 4460*ebfedea0SLionel Sambuc$\left [ 2 \right ]$ & Devise an algorithm to reduce modulo $n + k$ for small $k$ quickly. \\ 4461*ebfedea0SLionel Sambuc & \\ 4462*ebfedea0SLionel Sambuc$\left [ 4 \right ]$ & Prove that the pseudo-code algorithm ``Diminished Radix Reduction'' \\ 4463*ebfedea0SLionel Sambuc & (\textit{figure~\ref{fig:DR}}) terminates. Also prove the probability that it will \\ 4464*ebfedea0SLionel Sambuc & terminate within $1 \le k \le 10$ iterations. \\ 4465*ebfedea0SLionel Sambuc & \\ 4466*ebfedea0SLionel Sambuc\end{tabular} 4467*ebfedea0SLionel Sambuc 4468*ebfedea0SLionel Sambuc 4469*ebfedea0SLionel Sambuc\chapter{Exponentiation} 4470*ebfedea0SLionel SambucExponentiation is the operation of raising one variable to the power of another, for example, $a^b$. A variant of exponentiation, computed 4471*ebfedea0SLionel Sambucin a finite field or ring, is called modular exponentiation. This latter style of operation is typically used in public key 4472*ebfedea0SLionel Sambuccryptosystems such as RSA and Diffie-Hellman. The ability to quickly compute modular exponentiations is of great benefit to any 4473*ebfedea0SLionel Sambucsuch cryptosystem and many methods have been sought to speed it up. 4474*ebfedea0SLionel Sambuc 4475*ebfedea0SLionel Sambuc\section{Exponentiation Basics} 4476*ebfedea0SLionel SambucA trivial algorithm would simply multiply $a$ against itself $b - 1$ times to compute the exponentiation desired. However, as $b$ grows in size 4477*ebfedea0SLionel Sambucthe number of multiplications becomes prohibitive. Imagine what would happen if $b$ $\approx$ $2^{1024}$ as is the case when computing an RSA signature 4478*ebfedea0SLionel Sambucwith a $1024$-bit key. Such a calculation could never be completed as it would take simply far too long. 4479*ebfedea0SLionel Sambuc 4480*ebfedea0SLionel SambucFortunately there is a very simple algorithm based on the laws of exponents. Recall that $lg_a(a^b) = b$ and that $lg_a(a^ba^c) = b + c$ which 4481*ebfedea0SLionel Sambucare two trivial relationships between the base and the exponent. Let $b_i$ represent the $i$'th bit of $b$ starting from the least 4482*ebfedea0SLionel Sambucsignificant bit. If $b$ is a $k$-bit integer than the following equation is true. 4483*ebfedea0SLionel Sambuc 4484*ebfedea0SLionel Sambuc\begin{equation} 4485*ebfedea0SLionel Sambuca^b = \prod_{i=0}^{k-1} a^{2^i \cdot b_i} 4486*ebfedea0SLionel Sambuc\end{equation} 4487*ebfedea0SLionel Sambuc 4488*ebfedea0SLionel SambucBy taking the base $a$ logarithm of both sides of the equation the following equation is the result. 4489*ebfedea0SLionel Sambuc 4490*ebfedea0SLionel Sambuc\begin{equation} 4491*ebfedea0SLionel Sambucb = \sum_{i=0}^{k-1}2^i \cdot b_i 4492*ebfedea0SLionel Sambuc\end{equation} 4493*ebfedea0SLionel Sambuc 4494*ebfedea0SLionel SambucThe term $a^{2^i}$ can be found from the $i - 1$'th term by squaring the term since $\left ( a^{2^i} \right )^2$ is equal to 4495*ebfedea0SLionel Sambuc$a^{2^{i+1}}$. This observation forms the basis of essentially all fast exponentiation algorithms. It requires $k$ squarings and on average 4496*ebfedea0SLionel Sambuc$k \over 2$ multiplications to compute the result. This is indeed quite an improvement over simply multiplying by $a$ a total of $b-1$ times. 4497*ebfedea0SLionel Sambuc 4498*ebfedea0SLionel SambucWhile this current method is a considerable speed up there are further improvements to be made. For example, the $a^{2^i}$ term does not need to 4499*ebfedea0SLionel Sambucbe computed in an auxilary variable. Consider the following equivalent algorithm. 4500*ebfedea0SLionel Sambuc 4501*ebfedea0SLionel Sambuc\begin{figure}[!here] 4502*ebfedea0SLionel Sambuc\begin{small} 4503*ebfedea0SLionel Sambuc\begin{center} 4504*ebfedea0SLionel Sambuc\begin{tabular}{l} 4505*ebfedea0SLionel Sambuc\hline Algorithm \textbf{Left to Right Exponentiation}. \\ 4506*ebfedea0SLionel Sambuc\textbf{Input}. Integer $a$, $b$ and $k$ \\ 4507*ebfedea0SLionel Sambuc\textbf{Output}. $c = a^b$ \\ 4508*ebfedea0SLionel Sambuc\hline \\ 4509*ebfedea0SLionel Sambuc1. $c \leftarrow 1$ \\ 4510*ebfedea0SLionel Sambuc2. for $i$ from $k - 1$ to $0$ do \\ 4511*ebfedea0SLionel Sambuc\hspace{3mm}2.1 $c \leftarrow c^2$ \\ 4512*ebfedea0SLionel Sambuc\hspace{3mm}2.2 $c \leftarrow c \cdot a^{b_i}$ \\ 4513*ebfedea0SLionel Sambuc3. Return $c$. \\ 4514*ebfedea0SLionel Sambuc\hline 4515*ebfedea0SLionel Sambuc\end{tabular} 4516*ebfedea0SLionel Sambuc\end{center} 4517*ebfedea0SLionel Sambuc\end{small} 4518*ebfedea0SLionel Sambuc\caption{Left to Right Exponentiation} 4519*ebfedea0SLionel Sambuc\label{fig:LTOR} 4520*ebfedea0SLionel Sambuc\end{figure} 4521*ebfedea0SLionel Sambuc 4522*ebfedea0SLionel SambucThis algorithm starts from the most significant bit and works towards the least significant bit. When the $i$'th bit of $b$ is set $a$ is 4523*ebfedea0SLionel Sambucmultiplied against the current product. In each iteration the product is squared which doubles the exponent of the individual terms of the 4524*ebfedea0SLionel Sambucproduct. 4525*ebfedea0SLionel Sambuc 4526*ebfedea0SLionel SambucFor example, let $b = 101100_2 \equiv 44_{10}$. The following chart demonstrates the actions of the algorithm. 4527*ebfedea0SLionel Sambuc 4528*ebfedea0SLionel Sambuc\newpage\begin{figure} 4529*ebfedea0SLionel Sambuc\begin{center} 4530*ebfedea0SLionel Sambuc\begin{tabular}{|c|c|} 4531*ebfedea0SLionel Sambuc\hline \textbf{Value of $i$} & \textbf{Value of $c$} \\ 4532*ebfedea0SLionel Sambuc\hline - & $1$ \\ 4533*ebfedea0SLionel Sambuc\hline $5$ & $a$ \\ 4534*ebfedea0SLionel Sambuc\hline $4$ & $a^2$ \\ 4535*ebfedea0SLionel Sambuc\hline $3$ & $a^4 \cdot a$ \\ 4536*ebfedea0SLionel Sambuc\hline $2$ & $a^8 \cdot a^2 \cdot a$ \\ 4537*ebfedea0SLionel Sambuc\hline $1$ & $a^{16} \cdot a^4 \cdot a^2$ \\ 4538*ebfedea0SLionel Sambuc\hline $0$ & $a^{32} \cdot a^8 \cdot a^4$ \\ 4539*ebfedea0SLionel Sambuc\hline 4540*ebfedea0SLionel Sambuc\end{tabular} 4541*ebfedea0SLionel Sambuc\end{center} 4542*ebfedea0SLionel Sambuc\caption{Example of Left to Right Exponentiation} 4543*ebfedea0SLionel Sambuc\end{figure} 4544*ebfedea0SLionel Sambuc 4545*ebfedea0SLionel SambucWhen the product $a^{32} \cdot a^8 \cdot a^4$ is simplified it is equal $a^{44}$ which is the desired exponentiation. This particular algorithm is 4546*ebfedea0SLionel Sambuccalled ``Left to Right'' because it reads the exponent in that order. All of the exponentiation algorithms that will be presented are of this nature. 4547*ebfedea0SLionel Sambuc 4548*ebfedea0SLionel Sambuc\subsection{Single Digit Exponentiation} 4549*ebfedea0SLionel SambucThe first algorithm in the series of exponentiation algorithms will be an unbounded algorithm where the exponent is a single digit. It is intended 4550*ebfedea0SLionel Sambucto be used when a small power of an input is required (\textit{e.g. $a^5$}). It is faster than simply multiplying $b - 1$ times for all values of 4551*ebfedea0SLionel Sambuc$b$ that are greater than three. 4552*ebfedea0SLionel Sambuc 4553*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 4554*ebfedea0SLionel Sambuc\begin{small} 4555*ebfedea0SLionel Sambuc\begin{center} 4556*ebfedea0SLionel Sambuc\begin{tabular}{l} 4557*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_expt\_d}. \\ 4558*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ and mp\_digit $b$ \\ 4559*ebfedea0SLionel Sambuc\textbf{Output}. $c = a^b$ \\ 4560*ebfedea0SLionel Sambuc\hline \\ 4561*ebfedea0SLionel Sambuc1. $g \leftarrow a$ (\textit{mp\_init\_copy}) \\ 4562*ebfedea0SLionel Sambuc2. $c \leftarrow 1$ (\textit{mp\_set}) \\ 4563*ebfedea0SLionel Sambuc3. for $x$ from 1 to $lg(\beta)$ do \\ 4564*ebfedea0SLionel Sambuc\hspace{3mm}3.1 $c \leftarrow c^2$ (\textit{mp\_sqr}) \\ 4565*ebfedea0SLionel Sambuc\hspace{3mm}3.2 If $b$ AND $2^{lg(\beta) - 1} \ne 0$ then \\ 4566*ebfedea0SLionel Sambuc\hspace{6mm}3.2.1 $c \leftarrow c \cdot g$ (\textit{mp\_mul}) \\ 4567*ebfedea0SLionel Sambuc\hspace{3mm}3.3 $b \leftarrow b << 1$ \\ 4568*ebfedea0SLionel Sambuc4. Clear $g$. \\ 4569*ebfedea0SLionel Sambuc5. Return(\textit{MP\_OKAY}). \\ 4570*ebfedea0SLionel Sambuc\hline 4571*ebfedea0SLionel Sambuc\end{tabular} 4572*ebfedea0SLionel Sambuc\end{center} 4573*ebfedea0SLionel Sambuc\end{small} 4574*ebfedea0SLionel Sambuc\caption{Algorithm mp\_expt\_d} 4575*ebfedea0SLionel Sambuc\end{figure} 4576*ebfedea0SLionel Sambuc 4577*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_expt\_d.} 4578*ebfedea0SLionel SambucThis algorithm computes the value of $a$ raised to the power of a single digit $b$. It uses the left to right exponentiation algorithm to 4579*ebfedea0SLionel Sambucquickly compute the exponentiation. It is loosely based on algorithm 14.79 of HAC \cite[pp. 615]{HAC} with the difference that the 4580*ebfedea0SLionel Sambucexponent is a fixed width. 4581*ebfedea0SLionel Sambuc 4582*ebfedea0SLionel SambucA copy of $a$ is made first to allow destination variable $c$ be the same as the source variable $a$. The result is set to the initial value of 4583*ebfedea0SLionel Sambuc$1$ in the subsequent step. 4584*ebfedea0SLionel Sambuc 4585*ebfedea0SLionel SambucInside the loop the exponent is read from the most significant bit first down to the least significant bit. First $c$ is invariably squared 4586*ebfedea0SLionel Sambucon step 3.1. In the following step if the most significant bit of $b$ is one the copy of $a$ is multiplied against $c$. The value 4587*ebfedea0SLionel Sambucof $b$ is shifted left one bit to make the next bit down from the most signficant bit the new most significant bit. In effect each 4588*ebfedea0SLionel Sambuciteration of the loop moves the bits of the exponent $b$ upwards to the most significant location. 4589*ebfedea0SLionel Sambuc 4590*ebfedea0SLionel SambucEXAM,bn_mp_expt_d.c 4591*ebfedea0SLionel Sambuc 4592*ebfedea0SLionel SambucLine @29,mp_set@ sets the initial value of the result to $1$. Next the loop on line @31,for@ steps through each bit of the exponent starting from 4593*ebfedea0SLionel Sambucthe most significant down towards the least significant. The invariant squaring operation placed on line @333,mp_sqr@ is performed first. After 4594*ebfedea0SLionel Sambucthe squaring the result $c$ is multiplied by the base $g$ if and only if the most significant bit of the exponent is set. The shift on line 4595*ebfedea0SLionel Sambuc@47,<<@ moves all of the bits of the exponent upwards towards the most significant location. 4596*ebfedea0SLionel Sambuc 4597*ebfedea0SLionel Sambuc\section{$k$-ary Exponentiation} 4598*ebfedea0SLionel SambucWhen calculating an exponentiation the most time consuming bottleneck is the multiplications which are in general a small factor 4599*ebfedea0SLionel Sambucslower than squaring. Recall from the previous algorithm that $b_{i}$ refers to the $i$'th bit of the exponent $b$. Suppose instead it referred to 4600*ebfedea0SLionel Sambucthe $i$'th $k$-bit digit of the exponent of $b$. For $k = 1$ the definitions are synonymous and for $k > 1$ algorithm~\ref{fig:KARY} 4601*ebfedea0SLionel Sambuccomputes the same exponentiation. A group of $k$ bits from the exponent is called a \textit{window}. That is it is a small window on only a 4602*ebfedea0SLionel Sambucportion of the entire exponent. Consider the following modification to the basic left to right exponentiation algorithm. 4603*ebfedea0SLionel Sambuc 4604*ebfedea0SLionel Sambuc\begin{figure}[!here] 4605*ebfedea0SLionel Sambuc\begin{small} 4606*ebfedea0SLionel Sambuc\begin{center} 4607*ebfedea0SLionel Sambuc\begin{tabular}{l} 4608*ebfedea0SLionel Sambuc\hline Algorithm \textbf{$k$-ary Exponentiation}. \\ 4609*ebfedea0SLionel Sambuc\textbf{Input}. Integer $a$, $b$, $k$ and $t$ \\ 4610*ebfedea0SLionel Sambuc\textbf{Output}. $c = a^b$ \\ 4611*ebfedea0SLionel Sambuc\hline \\ 4612*ebfedea0SLionel Sambuc1. $c \leftarrow 1$ \\ 4613*ebfedea0SLionel Sambuc2. for $i$ from $t - 1$ to $0$ do \\ 4614*ebfedea0SLionel Sambuc\hspace{3mm}2.1 $c \leftarrow c^{2^k} $ \\ 4615*ebfedea0SLionel Sambuc\hspace{3mm}2.2 Extract the $i$'th $k$-bit word from $b$ and store it in $g$. \\ 4616*ebfedea0SLionel Sambuc\hspace{3mm}2.3 $c \leftarrow c \cdot a^g$ \\ 4617*ebfedea0SLionel Sambuc3. Return $c$. \\ 4618*ebfedea0SLionel Sambuc\hline 4619*ebfedea0SLionel Sambuc\end{tabular} 4620*ebfedea0SLionel Sambuc\end{center} 4621*ebfedea0SLionel Sambuc\end{small} 4622*ebfedea0SLionel Sambuc\caption{$k$-ary Exponentiation} 4623*ebfedea0SLionel Sambuc\label{fig:KARY} 4624*ebfedea0SLionel Sambuc\end{figure} 4625*ebfedea0SLionel Sambuc 4626*ebfedea0SLionel SambucThe squaring on step 2.1 can be calculated by squaring the value $c$ successively $k$ times. If the values of $a^g$ for $0 < g < 2^k$ have been 4627*ebfedea0SLionel Sambucprecomputed this algorithm requires only $t$ multiplications and $tk$ squarings. The table can be generated with $2^{k - 1} - 1$ squarings and 4628*ebfedea0SLionel Sambuc$2^{k - 1} + 1$ multiplications. This algorithm assumes that the number of bits in the exponent is evenly divisible by $k$. 4629*ebfedea0SLionel SambucHowever, when it is not the remaining $0 < x \le k - 1$ bits can be handled with algorithm~\ref{fig:LTOR}. 4630*ebfedea0SLionel Sambuc 4631*ebfedea0SLionel SambucSuppose $k = 4$ and $t = 100$. This modified algorithm will require $109$ multiplications and $408$ squarings to compute the exponentiation. The 4632*ebfedea0SLionel Sambucoriginal algorithm would on average have required $200$ multiplications and $400$ squrings to compute the same value. The total number of squarings 4633*ebfedea0SLionel Sambuchas increased slightly but the number of multiplications has nearly halved. 4634*ebfedea0SLionel Sambuc 4635*ebfedea0SLionel Sambuc\subsection{Optimal Values of $k$} 4636*ebfedea0SLionel SambucAn optimal value of $k$ will minimize $2^{k} + \lceil n / k \rceil + n - 1$ for a fixed number of bits in the exponent $n$. The simplest 4637*ebfedea0SLionel Sambucapproach is to brute force search amongst the values $k = 2, 3, \ldots, 8$ for the lowest result. Table~\ref{fig:OPTK} lists optimal values of $k$ 4638*ebfedea0SLionel Sambucfor various exponent sizes and compares the number of multiplication and squarings required against algorithm~\ref{fig:LTOR}. 4639*ebfedea0SLionel Sambuc 4640*ebfedea0SLionel Sambuc\begin{figure}[here] 4641*ebfedea0SLionel Sambuc\begin{center} 4642*ebfedea0SLionel Sambuc\begin{small} 4643*ebfedea0SLionel Sambuc\begin{tabular}{|c|c|c|c|c|c|} 4644*ebfedea0SLionel Sambuc\hline \textbf{Exponent (bits)} & \textbf{Optimal $k$} & \textbf{Work at $k$} & \textbf{Work with ~\ref{fig:LTOR}} \\ 4645*ebfedea0SLionel Sambuc\hline $16$ & $2$ & $27$ & $24$ \\ 4646*ebfedea0SLionel Sambuc\hline $32$ & $3$ & $49$ & $48$ \\ 4647*ebfedea0SLionel Sambuc\hline $64$ & $3$ & $92$ & $96$ \\ 4648*ebfedea0SLionel Sambuc\hline $128$ & $4$ & $175$ & $192$ \\ 4649*ebfedea0SLionel Sambuc\hline $256$ & $4$ & $335$ & $384$ \\ 4650*ebfedea0SLionel Sambuc\hline $512$ & $5$ & $645$ & $768$ \\ 4651*ebfedea0SLionel Sambuc\hline $1024$ & $6$ & $1257$ & $1536$ \\ 4652*ebfedea0SLionel Sambuc\hline $2048$ & $6$ & $2452$ & $3072$ \\ 4653*ebfedea0SLionel Sambuc\hline $4096$ & $7$ & $4808$ & $6144$ \\ 4654*ebfedea0SLionel Sambuc\hline 4655*ebfedea0SLionel Sambuc\end{tabular} 4656*ebfedea0SLionel Sambuc\end{small} 4657*ebfedea0SLionel Sambuc\end{center} 4658*ebfedea0SLionel Sambuc\caption{Optimal Values of $k$ for $k$-ary Exponentiation} 4659*ebfedea0SLionel Sambuc\label{fig:OPTK} 4660*ebfedea0SLionel Sambuc\end{figure} 4661*ebfedea0SLionel Sambuc 4662*ebfedea0SLionel Sambuc\subsection{Sliding-Window Exponentiation} 4663*ebfedea0SLionel SambucA simple modification to the previous algorithm is only generate the upper half of the table in the range $2^{k-1} \le g < 2^k$. Essentially 4664*ebfedea0SLionel Sambucthis is a table for all values of $g$ where the most significant bit of $g$ is a one. However, in order for this to be allowed in the 4665*ebfedea0SLionel Sambucalgorithm values of $g$ in the range $0 \le g < 2^{k-1}$ must be avoided. 4666*ebfedea0SLionel Sambuc 4667*ebfedea0SLionel SambucTable~\ref{fig:OPTK2} lists optimal values of $k$ for various exponent sizes and compares the work required against algorithm~\ref{fig:KARY}. 4668*ebfedea0SLionel Sambuc 4669*ebfedea0SLionel Sambuc\begin{figure}[here] 4670*ebfedea0SLionel Sambuc\begin{center} 4671*ebfedea0SLionel Sambuc\begin{small} 4672*ebfedea0SLionel Sambuc\begin{tabular}{|c|c|c|c|c|c|} 4673*ebfedea0SLionel Sambuc\hline \textbf{Exponent (bits)} & \textbf{Optimal $k$} & \textbf{Work at $k$} & \textbf{Work with ~\ref{fig:KARY}} \\ 4674*ebfedea0SLionel Sambuc\hline $16$ & $3$ & $24$ & $27$ \\ 4675*ebfedea0SLionel Sambuc\hline $32$ & $3$ & $45$ & $49$ \\ 4676*ebfedea0SLionel Sambuc\hline $64$ & $4$ & $87$ & $92$ \\ 4677*ebfedea0SLionel Sambuc\hline $128$ & $4$ & $167$ & $175$ \\ 4678*ebfedea0SLionel Sambuc\hline $256$ & $5$ & $322$ & $335$ \\ 4679*ebfedea0SLionel Sambuc\hline $512$ & $6$ & $628$ & $645$ \\ 4680*ebfedea0SLionel Sambuc\hline $1024$ & $6$ & $1225$ & $1257$ \\ 4681*ebfedea0SLionel Sambuc\hline $2048$ & $7$ & $2403$ & $2452$ \\ 4682*ebfedea0SLionel Sambuc\hline $4096$ & $8$ & $4735$ & $4808$ \\ 4683*ebfedea0SLionel Sambuc\hline 4684*ebfedea0SLionel Sambuc\end{tabular} 4685*ebfedea0SLionel Sambuc\end{small} 4686*ebfedea0SLionel Sambuc\end{center} 4687*ebfedea0SLionel Sambuc\caption{Optimal Values of $k$ for Sliding Window Exponentiation} 4688*ebfedea0SLionel Sambuc\label{fig:OPTK2} 4689*ebfedea0SLionel Sambuc\end{figure} 4690*ebfedea0SLionel Sambuc 4691*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 4692*ebfedea0SLionel Sambuc\begin{small} 4693*ebfedea0SLionel Sambuc\begin{center} 4694*ebfedea0SLionel Sambuc\begin{tabular}{l} 4695*ebfedea0SLionel Sambuc\hline Algorithm \textbf{Sliding Window $k$-ary Exponentiation}. \\ 4696*ebfedea0SLionel Sambuc\textbf{Input}. Integer $a$, $b$, $k$ and $t$ \\ 4697*ebfedea0SLionel Sambuc\textbf{Output}. $c = a^b$ \\ 4698*ebfedea0SLionel Sambuc\hline \\ 4699*ebfedea0SLionel Sambuc1. $c \leftarrow 1$ \\ 4700*ebfedea0SLionel Sambuc2. for $i$ from $t - 1$ to $0$ do \\ 4701*ebfedea0SLionel Sambuc\hspace{3mm}2.1 If the $i$'th bit of $b$ is a zero then \\ 4702*ebfedea0SLionel Sambuc\hspace{6mm}2.1.1 $c \leftarrow c^2$ \\ 4703*ebfedea0SLionel Sambuc\hspace{3mm}2.2 else do \\ 4704*ebfedea0SLionel Sambuc\hspace{6mm}2.2.1 $c \leftarrow c^{2^k}$ \\ 4705*ebfedea0SLionel Sambuc\hspace{6mm}2.2.2 Extract the $k$ bits from $(b_{i}b_{i-1}\ldots b_{i-(k-1)})$ and store it in $g$. \\ 4706*ebfedea0SLionel Sambuc\hspace{6mm}2.2.3 $c \leftarrow c \cdot a^g$ \\ 4707*ebfedea0SLionel Sambuc\hspace{6mm}2.2.4 $i \leftarrow i - k$ \\ 4708*ebfedea0SLionel Sambuc3. Return $c$. \\ 4709*ebfedea0SLionel Sambuc\hline 4710*ebfedea0SLionel Sambuc\end{tabular} 4711*ebfedea0SLionel Sambuc\end{center} 4712*ebfedea0SLionel Sambuc\end{small} 4713*ebfedea0SLionel Sambuc\caption{Sliding Window $k$-ary Exponentiation} 4714*ebfedea0SLionel Sambuc\end{figure} 4715*ebfedea0SLionel Sambuc 4716*ebfedea0SLionel SambucSimilar to the previous algorithm this algorithm must have a special handler when fewer than $k$ bits are left in the exponent. While this 4717*ebfedea0SLionel Sambucalgorithm requires the same number of squarings it can potentially have fewer multiplications. The pre-computed table $a^g$ is also half 4718*ebfedea0SLionel Sambucthe size as the previous table. 4719*ebfedea0SLionel Sambuc 4720*ebfedea0SLionel SambucConsider the exponent $b = 111101011001000_2 \equiv 31432_{10}$ with $k = 3$ using both algorithms. The first algorithm will divide the exponent up as 4721*ebfedea0SLionel Sambucthe following five $3$-bit words $b \equiv \left ( 111, 101, 011, 001, 000 \right )_{2}$. The second algorithm will break the 4722*ebfedea0SLionel Sambucexponent as $b \equiv \left ( 111, 101, 0, 110, 0, 100, 0 \right )_{2}$. The single digit $0$ in the second representation are where 4723*ebfedea0SLionel Sambuca single squaring took place instead of a squaring and multiplication. In total the first method requires $10$ multiplications and $18$ 4724*ebfedea0SLionel Sambucsquarings. The second method requires $8$ multiplications and $18$ squarings. 4725*ebfedea0SLionel Sambuc 4726*ebfedea0SLionel SambucIn general the sliding window method is never slower than the generic $k$-ary method and often it is slightly faster. 4727*ebfedea0SLionel Sambuc 4728*ebfedea0SLionel Sambuc\section{Modular Exponentiation} 4729*ebfedea0SLionel Sambuc 4730*ebfedea0SLionel SambucModular exponentiation is essentially computing the power of a base within a finite field or ring. For example, computing 4731*ebfedea0SLionel Sambuc$d \equiv a^b \mbox{ (mod }c\mbox{)}$ is a modular exponentiation. Instead of first computing $a^b$ and then reducing it 4732*ebfedea0SLionel Sambucmodulo $c$ the intermediate result is reduced modulo $c$ after every squaring or multiplication operation. 4733*ebfedea0SLionel Sambuc 4734*ebfedea0SLionel SambucThis guarantees that any intermediate result is bounded by $0 \le d \le c^2 - 2c + 1$ and can be reduced modulo $c$ quickly using 4735*ebfedea0SLionel Sambucone of the algorithms presented in ~REDUCTION~. 4736*ebfedea0SLionel Sambuc 4737*ebfedea0SLionel SambucBefore the actual modular exponentiation algorithm can be written a wrapper algorithm must be written first. This algorithm 4738*ebfedea0SLionel Sambucwill allow the exponent $b$ to be negative which is computed as $c \equiv \left (1 / a \right )^{\vert b \vert} \mbox{(mod }d\mbox{)}$. The 4739*ebfedea0SLionel Sambucvalue of $(1/a) \mbox{ mod }c$ is computed using the modular inverse (\textit{see \ref{sec;modinv}}). If no inverse exists the algorithm 4740*ebfedea0SLionel Sambucterminates with an error. 4741*ebfedea0SLionel Sambuc 4742*ebfedea0SLionel Sambuc\begin{figure}[!here] 4743*ebfedea0SLionel Sambuc\begin{small} 4744*ebfedea0SLionel Sambuc\begin{center} 4745*ebfedea0SLionel Sambuc\begin{tabular}{l} 4746*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_exptmod}. \\ 4747*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$, $b$ and $c$ \\ 4748*ebfedea0SLionel Sambuc\textbf{Output}. $y \equiv g^x \mbox{ (mod }p\mbox{)}$ \\ 4749*ebfedea0SLionel Sambuc\hline \\ 4750*ebfedea0SLionel Sambuc1. If $c.sign = MP\_NEG$ return(\textit{MP\_VAL}). \\ 4751*ebfedea0SLionel Sambuc2. If $b.sign = MP\_NEG$ then \\ 4752*ebfedea0SLionel Sambuc\hspace{3mm}2.1 $g' \leftarrow g^{-1} \mbox{ (mod }c\mbox{)}$ \\ 4753*ebfedea0SLionel Sambuc\hspace{3mm}2.2 $x' \leftarrow \vert x \vert$ \\ 4754*ebfedea0SLionel Sambuc\hspace{3mm}2.3 Compute $d \equiv g'^{x'} \mbox{ (mod }c\mbox{)}$ via recursion. \\ 4755*ebfedea0SLionel Sambuc3. if $p$ is odd \textbf{OR} $p$ is a D.R. modulus then \\ 4756*ebfedea0SLionel Sambuc\hspace{3mm}3.1 Compute $y \equiv g^{x} \mbox{ (mod }p\mbox{)}$ via algorithm mp\_exptmod\_fast. \\ 4757*ebfedea0SLionel Sambuc4. else \\ 4758*ebfedea0SLionel Sambuc\hspace{3mm}4.1 Compute $y \equiv g^{x} \mbox{ (mod }p\mbox{)}$ via algorithm s\_mp\_exptmod. \\ 4759*ebfedea0SLionel Sambuc\hline 4760*ebfedea0SLionel Sambuc\end{tabular} 4761*ebfedea0SLionel Sambuc\end{center} 4762*ebfedea0SLionel Sambuc\end{small} 4763*ebfedea0SLionel Sambuc\caption{Algorithm mp\_exptmod} 4764*ebfedea0SLionel Sambuc\end{figure} 4765*ebfedea0SLionel Sambuc 4766*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_exptmod.} 4767*ebfedea0SLionel SambucThe first algorithm which actually performs modular exponentiation is algorithm s\_mp\_exptmod. It is a sliding window $k$-ary algorithm 4768*ebfedea0SLionel Sambucwhich uses Barrett reduction to reduce the product modulo $p$. The second algorithm mp\_exptmod\_fast performs the same operation 4769*ebfedea0SLionel Sambucexcept it uses either Montgomery or Diminished Radix reduction. The two latter reduction algorithms are clumped in the same exponentiation 4770*ebfedea0SLionel Sambucalgorithm since their arguments are essentially the same (\textit{two mp\_ints and one mp\_digit}). 4771*ebfedea0SLionel Sambuc 4772*ebfedea0SLionel SambucEXAM,bn_mp_exptmod.c 4773*ebfedea0SLionel Sambuc 4774*ebfedea0SLionel SambucIn order to keep the algorithms in a known state the first step on line @29,if@ is to reject any negative modulus as input. If the exponent is 4775*ebfedea0SLionel Sambucnegative the algorithm tries to perform a modular exponentiation with the modular inverse of the base $G$. The temporary variable $tmpG$ is assigned 4776*ebfedea0SLionel Sambucthe modular inverse of $G$ and $tmpX$ is assigned the absolute value of $X$. The algorithm will recuse with these new values with a positive 4777*ebfedea0SLionel Sambucexponent. 4778*ebfedea0SLionel Sambuc 4779*ebfedea0SLionel SambucIf the exponent is positive the algorithm resumes the exponentiation. Line @63,dr_@ determines if the modulus is of the restricted Diminished Radix 4780*ebfedea0SLionel Sambucform. If it is not line @65,reduce@ attempts to determine if it is of a unrestricted Diminished Radix form. The integer $dr$ will take on one 4781*ebfedea0SLionel Sambucof three values. 4782*ebfedea0SLionel Sambuc 4783*ebfedea0SLionel Sambuc\begin{enumerate} 4784*ebfedea0SLionel Sambuc\item $dr = 0$ means that the modulus is not of either restricted or unrestricted Diminished Radix form. 4785*ebfedea0SLionel Sambuc\item $dr = 1$ means that the modulus is of restricted Diminished Radix form. 4786*ebfedea0SLionel Sambuc\item $dr = 2$ means that the modulus is of unrestricted Diminished Radix form. 4787*ebfedea0SLionel Sambuc\end{enumerate} 4788*ebfedea0SLionel Sambuc 4789*ebfedea0SLionel SambucLine @69,if@ determines if the fast modular exponentiation algorithm can be used. It is allowed if $dr \ne 0$ or if the modulus is odd. Otherwise, 4790*ebfedea0SLionel Sambucthe slower s\_mp\_exptmod algorithm is used which uses Barrett reduction. 4791*ebfedea0SLionel Sambuc 4792*ebfedea0SLionel Sambuc\subsection{Barrett Modular Exponentiation} 4793*ebfedea0SLionel Sambuc 4794*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 4795*ebfedea0SLionel Sambuc\begin{small} 4796*ebfedea0SLionel Sambuc\begin{center} 4797*ebfedea0SLionel Sambuc\begin{tabular}{l} 4798*ebfedea0SLionel Sambuc\hline Algorithm \textbf{s\_mp\_exptmod}. \\ 4799*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$, $b$ and $c$ \\ 4800*ebfedea0SLionel Sambuc\textbf{Output}. $y \equiv g^x \mbox{ (mod }p\mbox{)}$ \\ 4801*ebfedea0SLionel Sambuc\hline \\ 4802*ebfedea0SLionel Sambuc1. $k \leftarrow lg(x)$ \\ 4803*ebfedea0SLionel Sambuc2. $winsize \leftarrow \left \lbrace \begin{array}{ll} 4804*ebfedea0SLionel Sambuc 2 & \mbox{if }k \le 7 \\ 4805*ebfedea0SLionel Sambuc 3 & \mbox{if }7 < k \le 36 \\ 4806*ebfedea0SLionel Sambuc 4 & \mbox{if }36 < k \le 140 \\ 4807*ebfedea0SLionel Sambuc 5 & \mbox{if }140 < k \le 450 \\ 4808*ebfedea0SLionel Sambuc 6 & \mbox{if }450 < k \le 1303 \\ 4809*ebfedea0SLionel Sambuc 7 & \mbox{if }1303 < k \le 3529 \\ 4810*ebfedea0SLionel Sambuc 8 & \mbox{if }3529 < k \\ 4811*ebfedea0SLionel Sambuc \end{array} \right .$ \\ 4812*ebfedea0SLionel Sambuc3. Initialize $2^{winsize}$ mp\_ints in an array named $M$ and one mp\_int named $\mu$ \\ 4813*ebfedea0SLionel Sambuc4. Calculate the $\mu$ required for Barrett Reduction (\textit{mp\_reduce\_setup}). \\ 4814*ebfedea0SLionel Sambuc5. $M_1 \leftarrow g \mbox{ (mod }p\mbox{)}$ \\ 4815*ebfedea0SLionel Sambuc\\ 4816*ebfedea0SLionel SambucSetup the table of small powers of $g$. First find $g^{2^{winsize}}$ and then all multiples of it. \\ 4817*ebfedea0SLionel Sambuc6. $k \leftarrow 2^{winsize - 1}$ \\ 4818*ebfedea0SLionel Sambuc7. $M_{k} \leftarrow M_1$ \\ 4819*ebfedea0SLionel Sambuc8. for $ix$ from 0 to $winsize - 2$ do \\ 4820*ebfedea0SLionel Sambuc\hspace{3mm}8.1 $M_k \leftarrow \left ( M_k \right )^2$ (\textit{mp\_sqr}) \\ 4821*ebfedea0SLionel Sambuc\hspace{3mm}8.2 $M_k \leftarrow M_k \mbox{ (mod }p\mbox{)}$ (\textit{mp\_reduce}) \\ 4822*ebfedea0SLionel Sambuc9. for $ix$ from $2^{winsize - 1} + 1$ to $2^{winsize} - 1$ do \\ 4823*ebfedea0SLionel Sambuc\hspace{3mm}9.1 $M_{ix} \leftarrow M_{ix - 1} \cdot M_{1}$ (\textit{mp\_mul}) \\ 4824*ebfedea0SLionel Sambuc\hspace{3mm}9.2 $M_{ix} \leftarrow M_{ix} \mbox{ (mod }p\mbox{)}$ (\textit{mp\_reduce}) \\ 4825*ebfedea0SLionel Sambuc10. $res \leftarrow 1$ \\ 4826*ebfedea0SLionel Sambuc\\ 4827*ebfedea0SLionel SambucStart Sliding Window. \\ 4828*ebfedea0SLionel Sambuc11. $mode \leftarrow 0, bitcnt \leftarrow 1, buf \leftarrow 0, digidx \leftarrow x.used - 1, bitcpy \leftarrow 0, bitbuf \leftarrow 0$ \\ 4829*ebfedea0SLionel Sambuc12. Loop \\ 4830*ebfedea0SLionel Sambuc\hspace{3mm}12.1 $bitcnt \leftarrow bitcnt - 1$ \\ 4831*ebfedea0SLionel Sambuc\hspace{3mm}12.2 If $bitcnt = 0$ then do \\ 4832*ebfedea0SLionel Sambuc\hspace{6mm}12.2.1 If $digidx = -1$ goto step 13. \\ 4833*ebfedea0SLionel Sambuc\hspace{6mm}12.2.2 $buf \leftarrow x_{digidx}$ \\ 4834*ebfedea0SLionel Sambuc\hspace{6mm}12.2.3 $digidx \leftarrow digidx - 1$ \\ 4835*ebfedea0SLionel Sambuc\hspace{6mm}12.2.4 $bitcnt \leftarrow lg(\beta)$ \\ 4836*ebfedea0SLionel SambucContinued on next page. \\ 4837*ebfedea0SLionel Sambuc\hline 4838*ebfedea0SLionel Sambuc\end{tabular} 4839*ebfedea0SLionel Sambuc\end{center} 4840*ebfedea0SLionel Sambuc\end{small} 4841*ebfedea0SLionel Sambuc\caption{Algorithm s\_mp\_exptmod} 4842*ebfedea0SLionel Sambuc\end{figure} 4843*ebfedea0SLionel Sambuc 4844*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 4845*ebfedea0SLionel Sambuc\begin{small} 4846*ebfedea0SLionel Sambuc\begin{center} 4847*ebfedea0SLionel Sambuc\begin{tabular}{l} 4848*ebfedea0SLionel Sambuc\hline Algorithm \textbf{s\_mp\_exptmod} (\textit{continued}). \\ 4849*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$, $b$ and $c$ \\ 4850*ebfedea0SLionel Sambuc\textbf{Output}. $y \equiv g^x \mbox{ (mod }p\mbox{)}$ \\ 4851*ebfedea0SLionel Sambuc\hline \\ 4852*ebfedea0SLionel Sambuc\hspace{3mm}12.3 $y \leftarrow (buf >> (lg(\beta) - 1))$ AND $1$ \\ 4853*ebfedea0SLionel Sambuc\hspace{3mm}12.4 $buf \leftarrow buf << 1$ \\ 4854*ebfedea0SLionel Sambuc\hspace{3mm}12.5 if $mode = 0$ and $y = 0$ then goto step 12. \\ 4855*ebfedea0SLionel Sambuc\hspace{3mm}12.6 if $mode = 1$ and $y = 0$ then do \\ 4856*ebfedea0SLionel Sambuc\hspace{6mm}12.6.1 $res \leftarrow res^2$ \\ 4857*ebfedea0SLionel Sambuc\hspace{6mm}12.6.2 $res \leftarrow res \mbox{ (mod }p\mbox{)}$ \\ 4858*ebfedea0SLionel Sambuc\hspace{6mm}12.6.3 Goto step 12. \\ 4859*ebfedea0SLionel Sambuc\hspace{3mm}12.7 $bitcpy \leftarrow bitcpy + 1$ \\ 4860*ebfedea0SLionel Sambuc\hspace{3mm}12.8 $bitbuf \leftarrow bitbuf + (y << (winsize - bitcpy))$ \\ 4861*ebfedea0SLionel Sambuc\hspace{3mm}12.9 $mode \leftarrow 2$ \\ 4862*ebfedea0SLionel Sambuc\hspace{3mm}12.10 If $bitcpy = winsize$ then do \\ 4863*ebfedea0SLionel Sambuc\hspace{6mm}Window is full so perform the squarings and single multiplication. \\ 4864*ebfedea0SLionel Sambuc\hspace{6mm}12.10.1 for $ix$ from $0$ to $winsize -1$ do \\ 4865*ebfedea0SLionel Sambuc\hspace{9mm}12.10.1.1 $res \leftarrow res^2$ \\ 4866*ebfedea0SLionel Sambuc\hspace{9mm}12.10.1.2 $res \leftarrow res \mbox{ (mod }p\mbox{)}$ \\ 4867*ebfedea0SLionel Sambuc\hspace{6mm}12.10.2 $res \leftarrow res \cdot M_{bitbuf}$ \\ 4868*ebfedea0SLionel Sambuc\hspace{6mm}12.10.3 $res \leftarrow res \mbox{ (mod }p\mbox{)}$ \\ 4869*ebfedea0SLionel Sambuc\hspace{6mm}Reset the window. \\ 4870*ebfedea0SLionel Sambuc\hspace{6mm}12.10.4 $bitcpy \leftarrow 0, bitbuf \leftarrow 0, mode \leftarrow 1$ \\ 4871*ebfedea0SLionel Sambuc\\ 4872*ebfedea0SLionel SambucNo more windows left. Check for residual bits of exponent. \\ 4873*ebfedea0SLionel Sambuc13. If $mode = 2$ and $bitcpy > 0$ then do \\ 4874*ebfedea0SLionel Sambuc\hspace{3mm}13.1 for $ix$ form $0$ to $bitcpy - 1$ do \\ 4875*ebfedea0SLionel Sambuc\hspace{6mm}13.1.1 $res \leftarrow res^2$ \\ 4876*ebfedea0SLionel Sambuc\hspace{6mm}13.1.2 $res \leftarrow res \mbox{ (mod }p\mbox{)}$ \\ 4877*ebfedea0SLionel Sambuc\hspace{6mm}13.1.3 $bitbuf \leftarrow bitbuf << 1$ \\ 4878*ebfedea0SLionel Sambuc\hspace{6mm}13.1.4 If $bitbuf$ AND $2^{winsize} \ne 0$ then do \\ 4879*ebfedea0SLionel Sambuc\hspace{9mm}13.1.4.1 $res \leftarrow res \cdot M_{1}$ \\ 4880*ebfedea0SLionel Sambuc\hspace{9mm}13.1.4.2 $res \leftarrow res \mbox{ (mod }p\mbox{)}$ \\ 4881*ebfedea0SLionel Sambuc14. $y \leftarrow res$ \\ 4882*ebfedea0SLionel Sambuc15. Clear $res$, $mu$ and the $M$ array. \\ 4883*ebfedea0SLionel Sambuc16. Return(\textit{MP\_OKAY}). \\ 4884*ebfedea0SLionel Sambuc\hline 4885*ebfedea0SLionel Sambuc\end{tabular} 4886*ebfedea0SLionel Sambuc\end{center} 4887*ebfedea0SLionel Sambuc\end{small} 4888*ebfedea0SLionel Sambuc\caption{Algorithm s\_mp\_exptmod (continued)} 4889*ebfedea0SLionel Sambuc\end{figure} 4890*ebfedea0SLionel Sambuc 4891*ebfedea0SLionel Sambuc\textbf{Algorithm s\_mp\_exptmod.} 4892*ebfedea0SLionel SambucThis algorithm computes the $x$'th power of $g$ modulo $p$ and stores the result in $y$. It takes advantage of the Barrett reduction 4893*ebfedea0SLionel Sambucalgorithm to keep the product small throughout the algorithm. 4894*ebfedea0SLionel Sambuc 4895*ebfedea0SLionel SambucThe first two steps determine the optimal window size based on the number of bits in the exponent. The larger the exponent the 4896*ebfedea0SLionel Sambuclarger the window size becomes. After a window size $winsize$ has been chosen an array of $2^{winsize}$ mp\_int variables is allocated. This 4897*ebfedea0SLionel Sambuctable will hold the values of $g^x \mbox{ (mod }p\mbox{)}$ for $2^{winsize - 1} \le x < 2^{winsize}$. 4898*ebfedea0SLionel Sambuc 4899*ebfedea0SLionel SambucAfter the table is allocated the first power of $g$ is found. Since $g \ge p$ is allowed it must be first reduced modulo $p$ to make 4900*ebfedea0SLionel Sambucthe rest of the algorithm more efficient. The first element of the table at $2^{winsize - 1}$ is found by squaring $M_1$ successively $winsize - 2$ 4901*ebfedea0SLionel Sambuctimes. The rest of the table elements are found by multiplying the previous element by $M_1$ modulo $p$. 4902*ebfedea0SLionel Sambuc 4903*ebfedea0SLionel SambucNow that the table is available the sliding window may begin. The following list describes the functions of all the variables in the window. 4904*ebfedea0SLionel Sambuc\begin{enumerate} 4905*ebfedea0SLionel Sambuc\item The variable $mode$ dictates how the bits of the exponent are interpreted. 4906*ebfedea0SLionel Sambuc\begin{enumerate} 4907*ebfedea0SLionel Sambuc \item When $mode = 0$ the bits are ignored since no non-zero bit of the exponent has been seen yet. For example, if the exponent were simply 4908*ebfedea0SLionel Sambuc $1$ then there would be $lg(\beta) - 1$ zero bits before the first non-zero bit. In this case bits are ignored until a non-zero bit is found. 4909*ebfedea0SLionel Sambuc \item When $mode = 1$ a non-zero bit has been seen before and a new $winsize$-bit window has not been formed yet. In this mode leading $0$ bits 4910*ebfedea0SLionel Sambuc are read and a single squaring is performed. If a non-zero bit is read a new window is created. 4911*ebfedea0SLionel Sambuc \item When $mode = 2$ the algorithm is in the middle of forming a window and new bits are appended to the window from the most significant bit 4912*ebfedea0SLionel Sambuc downwards. 4913*ebfedea0SLionel Sambuc\end{enumerate} 4914*ebfedea0SLionel Sambuc\item The variable $bitcnt$ indicates how many bits are left in the current digit of the exponent left to be read. When it reaches zero a new digit 4915*ebfedea0SLionel Sambuc is fetched from the exponent. 4916*ebfedea0SLionel Sambuc\item The variable $buf$ holds the currently read digit of the exponent. 4917*ebfedea0SLionel Sambuc\item The variable $digidx$ is an index into the exponents digits. It starts at the leading digit $x.used - 1$ and moves towards the trailing digit. 4918*ebfedea0SLionel Sambuc\item The variable $bitcpy$ indicates how many bits are in the currently formed window. When it reaches $winsize$ the window is flushed and 4919*ebfedea0SLionel Sambuc the appropriate operations performed. 4920*ebfedea0SLionel Sambuc\item The variable $bitbuf$ holds the current bits of the window being formed. 4921*ebfedea0SLionel Sambuc\end{enumerate} 4922*ebfedea0SLionel Sambuc 4923*ebfedea0SLionel SambucAll of step 12 is the window processing loop. It will iterate while there are digits available form the exponent to read. The first step 4924*ebfedea0SLionel Sambucinside this loop is to extract a new digit if no more bits are available in the current digit. If there are no bits left a new digit is 4925*ebfedea0SLionel Sambucread and if there are no digits left than the loop terminates. 4926*ebfedea0SLionel Sambuc 4927*ebfedea0SLionel SambucAfter a digit is made available step 12.3 will extract the most significant bit of the current digit and move all other bits in the digit 4928*ebfedea0SLionel Sambucupwards. In effect the digit is read from most significant bit to least significant bit and since the digits are read from leading to 4929*ebfedea0SLionel Sambuctrailing edges the entire exponent is read from most significant bit to least significant bit. 4930*ebfedea0SLionel Sambuc 4931*ebfedea0SLionel SambucAt step 12.5 if the $mode$ and currently extracted bit $y$ are both zero the bit is ignored and the next bit is read. This prevents the 4932*ebfedea0SLionel Sambucalgorithm from having to perform trivial squaring and reduction operations before the first non-zero bit is read. Step 12.6 and 12.7-10 handle 4933*ebfedea0SLionel Sambucthe two cases of $mode = 1$ and $mode = 2$ respectively. 4934*ebfedea0SLionel Sambuc 4935*ebfedea0SLionel SambucFIGU,expt_state,Sliding Window State Diagram 4936*ebfedea0SLionel Sambuc 4937*ebfedea0SLionel SambucBy step 13 there are no more digits left in the exponent. However, there may be partial bits in the window left. If $mode = 2$ then 4938*ebfedea0SLionel Sambuca Left-to-Right algorithm is used to process the remaining few bits. 4939*ebfedea0SLionel Sambuc 4940*ebfedea0SLionel SambucEXAM,bn_s_mp_exptmod.c 4941*ebfedea0SLionel Sambuc 4942*ebfedea0SLionel SambucLines @31,if@ through @45,}@ determine the optimal window size based on the length of the exponent in bits. The window divisions are sorted 4943*ebfedea0SLionel Sambucfrom smallest to greatest so that in each \textbf{if} statement only one condition must be tested. For example, by the \textbf{if} statement 4944*ebfedea0SLionel Sambucon line @37,if@ the value of $x$ is already known to be greater than $140$. 4945*ebfedea0SLionel Sambuc 4946*ebfedea0SLionel SambucThe conditional piece of code beginning on line @42,ifdef@ allows the window size to be restricted to five bits. This logic is used to ensure 4947*ebfedea0SLionel Sambucthe table of precomputed powers of $G$ remains relatively small. 4948*ebfedea0SLionel Sambuc 4949*ebfedea0SLionel SambucThe for loop on line @60,for@ initializes the $M$ array while lines @71,mp_init@ and @75,mp_reduce@ through @85,}@ initialize the reduction 4950*ebfedea0SLionel Sambucfunction that will be used for this modulus. 4951*ebfedea0SLionel Sambuc 4952*ebfedea0SLionel Sambuc-- More later. 4953*ebfedea0SLionel Sambuc 4954*ebfedea0SLionel Sambuc\section{Quick Power of Two} 4955*ebfedea0SLionel SambucCalculating $b = 2^a$ can be performed much quicker than with any of the previous algorithms. Recall that a logical shift left $m << k$ is 4956*ebfedea0SLionel Sambucequivalent to $m \cdot 2^k$. By this logic when $m = 1$ a quick power of two can be achieved. 4957*ebfedea0SLionel Sambuc 4958*ebfedea0SLionel Sambuc\begin{figure}[!here] 4959*ebfedea0SLionel Sambuc\begin{small} 4960*ebfedea0SLionel Sambuc\begin{center} 4961*ebfedea0SLionel Sambuc\begin{tabular}{l} 4962*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_2expt}. \\ 4963*ebfedea0SLionel Sambuc\textbf{Input}. integer $b$ \\ 4964*ebfedea0SLionel Sambuc\textbf{Output}. $a \leftarrow 2^b$ \\ 4965*ebfedea0SLionel Sambuc\hline \\ 4966*ebfedea0SLionel Sambuc1. $a \leftarrow 0$ \\ 4967*ebfedea0SLionel Sambuc2. If $a.alloc < \lfloor b / lg(\beta) \rfloor + 1$ then grow $a$ appropriately. \\ 4968*ebfedea0SLionel Sambuc3. $a.used \leftarrow \lfloor b / lg(\beta) \rfloor + 1$ \\ 4969*ebfedea0SLionel Sambuc4. $a_{\lfloor b / lg(\beta) \rfloor} \leftarrow 1 << (b \mbox{ mod } lg(\beta))$ \\ 4970*ebfedea0SLionel Sambuc5. Return(\textit{MP\_OKAY}). \\ 4971*ebfedea0SLionel Sambuc\hline 4972*ebfedea0SLionel Sambuc\end{tabular} 4973*ebfedea0SLionel Sambuc\end{center} 4974*ebfedea0SLionel Sambuc\end{small} 4975*ebfedea0SLionel Sambuc\caption{Algorithm mp\_2expt} 4976*ebfedea0SLionel Sambuc\end{figure} 4977*ebfedea0SLionel Sambuc 4978*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_2expt.} 4979*ebfedea0SLionel Sambuc 4980*ebfedea0SLionel SambucEXAM,bn_mp_2expt.c 4981*ebfedea0SLionel Sambuc 4982*ebfedea0SLionel Sambuc\chapter{Higher Level Algorithms} 4983*ebfedea0SLionel Sambuc 4984*ebfedea0SLionel SambucThis chapter discusses the various higher level algorithms that are required to complete a well rounded multiple precision integer package. These 4985*ebfedea0SLionel Sambucroutines are less performance oriented than the algorithms of chapters five, six and seven but are no less important. 4986*ebfedea0SLionel Sambuc 4987*ebfedea0SLionel SambucThe first section describes a method of integer division with remainder that is universally well known. It provides the signed division logic 4988*ebfedea0SLionel Sambucfor the package. The subsequent section discusses a set of algorithms which allow a single digit to be the 2nd operand for a variety of operations. 4989*ebfedea0SLionel SambucThese algorithms serve mostly to simplify other algorithms where small constants are required. The last two sections discuss how to manipulate 4990*ebfedea0SLionel Sambucvarious representations of integers. For example, converting from an mp\_int to a string of character. 4991*ebfedea0SLionel Sambuc 4992*ebfedea0SLionel Sambuc\section{Integer Division with Remainder} 4993*ebfedea0SLionel Sambuc\label{sec:division} 4994*ebfedea0SLionel Sambuc 4995*ebfedea0SLionel SambucInteger division aside from modular exponentiation is the most intensive algorithm to compute. Like addition, subtraction and multiplication 4996*ebfedea0SLionel Sambucthe basis of this algorithm is the long-hand division algorithm taught to school children. Throughout this discussion several common variables 4997*ebfedea0SLionel Sambucwill be used. Let $x$ represent the divisor and $y$ represent the dividend. Let $q$ represent the integer quotient $\lfloor y / x \rfloor$ and 4998*ebfedea0SLionel Sambuclet $r$ represent the remainder $r = y - x \lfloor y / x \rfloor$. The following simple algorithm will be used to start the discussion. 4999*ebfedea0SLionel Sambuc 5000*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 5001*ebfedea0SLionel Sambuc\begin{small} 5002*ebfedea0SLionel Sambuc\begin{center} 5003*ebfedea0SLionel Sambuc\begin{tabular}{l} 5004*ebfedea0SLionel Sambuc\hline Algorithm \textbf{Radix-$\beta$ Integer Division}. \\ 5005*ebfedea0SLionel Sambuc\textbf{Input}. integer $x$ and $y$ \\ 5006*ebfedea0SLionel Sambuc\textbf{Output}. $q = \lfloor y/x\rfloor, r = y - xq$ \\ 5007*ebfedea0SLionel Sambuc\hline \\ 5008*ebfedea0SLionel Sambuc1. $q \leftarrow 0$ \\ 5009*ebfedea0SLionel Sambuc2. $n \leftarrow \vert \vert y \vert \vert - \vert \vert x \vert \vert$ \\ 5010*ebfedea0SLionel Sambuc3. for $t$ from $n$ down to $0$ do \\ 5011*ebfedea0SLionel Sambuc\hspace{3mm}3.1 Maximize $k$ such that $kx\beta^t$ is less than or equal to $y$ and $(k + 1)x\beta^t$ is greater. \\ 5012*ebfedea0SLionel Sambuc\hspace{3mm}3.2 $q \leftarrow q + k\beta^t$ \\ 5013*ebfedea0SLionel Sambuc\hspace{3mm}3.3 $y \leftarrow y - kx\beta^t$ \\ 5014*ebfedea0SLionel Sambuc4. $r \leftarrow y$ \\ 5015*ebfedea0SLionel Sambuc5. Return($q, r$) \\ 5016*ebfedea0SLionel Sambuc\hline 5017*ebfedea0SLionel Sambuc\end{tabular} 5018*ebfedea0SLionel Sambuc\end{center} 5019*ebfedea0SLionel Sambuc\end{small} 5020*ebfedea0SLionel Sambuc\caption{Algorithm Radix-$\beta$ Integer Division} 5021*ebfedea0SLionel Sambuc\label{fig:raddiv} 5022*ebfedea0SLionel Sambuc\end{figure} 5023*ebfedea0SLionel Sambuc 5024*ebfedea0SLionel SambucAs children we are taught this very simple algorithm for the case of $\beta = 10$. Almost instinctively several optimizations are taught for which 5025*ebfedea0SLionel Sambuctheir reason of existing are never explained. For this example let $y = 5471$ represent the dividend and $x = 23$ represent the divisor. 5026*ebfedea0SLionel Sambuc 5027*ebfedea0SLionel SambucTo find the first digit of the quotient the value of $k$ must be maximized such that $kx\beta^t$ is less than or equal to $y$ and 5028*ebfedea0SLionel Sambucsimultaneously $(k + 1)x\beta^t$ is greater than $y$. Implicitly $k$ is the maximum value the $t$'th digit of the quotient may have. The habitual method 5029*ebfedea0SLionel Sambucused to find the maximum is to ``eyeball'' the two numbers, typically only the leading digits and quickly estimate a quotient. By only using leading 5030*ebfedea0SLionel Sambucdigits a much simpler division may be used to form an educated guess at what the value must be. In this case $k = \lfloor 54/23\rfloor = 2$ quickly 5031*ebfedea0SLionel Sambucarises as a possible solution. Indeed $2x\beta^2 = 4600$ is less than $y = 5471$ and simultaneously $(k + 1)x\beta^2 = 6900$ is larger than $y$. 5032*ebfedea0SLionel SambucAs a result $k\beta^2$ is added to the quotient which now equals $q = 200$ and $4600$ is subtracted from $y$ to give a remainder of $y = 841$. 5033*ebfedea0SLionel Sambuc 5034*ebfedea0SLionel SambucAgain this process is repeated to produce the quotient digit $k = 3$ which makes the quotient $q = 200 + 3\beta = 230$ and the remainder 5035*ebfedea0SLionel Sambuc$y = 841 - 3x\beta = 181$. Finally the last iteration of the loop produces $k = 7$ which leads to the quotient $q = 230 + 7 = 237$ and the 5036*ebfedea0SLionel Sambucremainder $y = 181 - 7x = 20$. The final quotient and remainder found are $q = 237$ and $r = y = 20$ which are indeed correct since 5037*ebfedea0SLionel Sambuc$237 \cdot 23 + 20 = 5471$ is true. 5038*ebfedea0SLionel Sambuc 5039*ebfedea0SLionel Sambuc\subsection{Quotient Estimation} 5040*ebfedea0SLionel Sambuc\label{sec:divest} 5041*ebfedea0SLionel SambucAs alluded to earlier the quotient digit $k$ can be estimated from only the leading digits of both the divisor and dividend. When $p$ leading 5042*ebfedea0SLionel Sambucdigits are used from both the divisor and dividend to form an estimation the accuracy of the estimation rises as $p$ grows. Technically 5043*ebfedea0SLionel Sambucspeaking the estimation is based on assuming the lower $\vert \vert y \vert \vert - p$ and $\vert \vert x \vert \vert - p$ lower digits of the 5044*ebfedea0SLionel Sambucdividend and divisor are zero. 5045*ebfedea0SLionel Sambuc 5046*ebfedea0SLionel SambucThe value of the estimation may off by a few values in either direction and in general is fairly correct. A simplification \cite[pp. 271]{TAOCPV2} 5047*ebfedea0SLionel Sambucof the estimation technique is to use $t + 1$ digits of the dividend and $t$ digits of the divisor, in particularly when $t = 1$. The estimate 5048*ebfedea0SLionel Sambucusing this technique is never too small. For the following proof let $t = \vert \vert y \vert \vert - 1$ and $s = \vert \vert x \vert \vert - 1$ 5049*ebfedea0SLionel Sambucrepresent the most significant digits of the dividend and divisor respectively. 5050*ebfedea0SLionel Sambuc 5051*ebfedea0SLionel Sambuc\textbf{Proof.}\textit{ The quotient $\hat k = \lfloor (y_t\beta + y_{t-1}) / x_s \rfloor$ is greater than or equal to 5052*ebfedea0SLionel Sambuc$k = \lfloor y / (x \cdot \beta^{\vert \vert y \vert \vert - \vert \vert x \vert \vert - 1}) \rfloor$. } 5053*ebfedea0SLionel SambucThe first obvious case is when $\hat k = \beta - 1$ in which case the proof is concluded since the real quotient cannot be larger. For all other 5054*ebfedea0SLionel Sambuccases $\hat k = \lfloor (y_t\beta + y_{t-1}) / x_s \rfloor$ and $\hat k x_s \ge y_t\beta + y_{t-1} - x_s + 1$. The latter portion of the inequalility 5055*ebfedea0SLionel Sambuc$-x_s + 1$ arises from the fact that a truncated integer division will give the same quotient for at most $x_s - 1$ values. Next a series of 5056*ebfedea0SLionel Sambucinequalities will prove the hypothesis. 5057*ebfedea0SLionel Sambuc 5058*ebfedea0SLionel Sambuc\begin{equation} 5059*ebfedea0SLionel Sambucy - \hat k x \le y - \hat k x_s\beta^s 5060*ebfedea0SLionel Sambuc\end{equation} 5061*ebfedea0SLionel Sambuc 5062*ebfedea0SLionel SambucThis is trivially true since $x \ge x_s\beta^s$. Next we replace $\hat kx_s\beta^s$ by the previous inequality for $\hat kx_s$. 5063*ebfedea0SLionel Sambuc 5064*ebfedea0SLionel Sambuc\begin{equation} 5065*ebfedea0SLionel Sambucy - \hat k x \le y_t\beta^t + \ldots + y_0 - (y_t\beta^t + y_{t-1}\beta^{t-1} - x_s\beta^t + \beta^s) 5066*ebfedea0SLionel Sambuc\end{equation} 5067*ebfedea0SLionel Sambuc 5068*ebfedea0SLionel SambucBy simplifying the previous inequality the following inequality is formed. 5069*ebfedea0SLionel Sambuc 5070*ebfedea0SLionel Sambuc\begin{equation} 5071*ebfedea0SLionel Sambucy - \hat k x \le y_{t-2}\beta^{t-2} + \ldots + y_0 + x_s\beta^s - \beta^s 5072*ebfedea0SLionel Sambuc\end{equation} 5073*ebfedea0SLionel Sambuc 5074*ebfedea0SLionel SambucSubsequently, 5075*ebfedea0SLionel Sambuc 5076*ebfedea0SLionel Sambuc\begin{equation} 5077*ebfedea0SLionel Sambucy_{t-2}\beta^{t-2} + \ldots + y_0 + x_s\beta^s - \beta^s < x_s\beta^s \le x 5078*ebfedea0SLionel Sambuc\end{equation} 5079*ebfedea0SLionel Sambuc 5080*ebfedea0SLionel SambucWhich proves that $y - \hat kx \le x$ and by consequence $\hat k \ge k$ which concludes the proof. \textbf{QED} 5081*ebfedea0SLionel Sambuc 5082*ebfedea0SLionel Sambuc 5083*ebfedea0SLionel Sambuc\subsection{Normalized Integers} 5084*ebfedea0SLionel SambucFor the purposes of division a normalized input is when the divisors leading digit $x_n$ is greater than or equal to $\beta / 2$. By multiplying both 5085*ebfedea0SLionel Sambuc$x$ and $y$ by $j = \lfloor (\beta / 2) / x_n \rfloor$ the quotient remains unchanged and the remainder is simply $j$ times the original 5086*ebfedea0SLionel Sambucremainder. The purpose of normalization is to ensure the leading digit of the divisor is sufficiently large such that the estimated quotient will 5087*ebfedea0SLionel Sambuclie in the domain of a single digit. Consider the maximum dividend $(\beta - 1) \cdot \beta + (\beta - 1)$ and the minimum divisor $\beta / 2$. 5088*ebfedea0SLionel Sambuc 5089*ebfedea0SLionel Sambuc\begin{equation} 5090*ebfedea0SLionel Sambuc{{\beta^2 - 1} \over { \beta / 2}} \le 2\beta - {2 \over \beta} 5091*ebfedea0SLionel Sambuc\end{equation} 5092*ebfedea0SLionel Sambuc 5093*ebfedea0SLionel SambucAt most the quotient approaches $2\beta$, however, in practice this will not occur since that would imply the previous quotient digit was too small. 5094*ebfedea0SLionel Sambuc 5095*ebfedea0SLionel Sambuc\subsection{Radix-$\beta$ Division with Remainder} 5096*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 5097*ebfedea0SLionel Sambuc\begin{small} 5098*ebfedea0SLionel Sambuc\begin{center} 5099*ebfedea0SLionel Sambuc\begin{tabular}{l} 5100*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_div}. \\ 5101*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a, b$ \\ 5102*ebfedea0SLionel Sambuc\textbf{Output}. $c = \lfloor a/b \rfloor$, $d = a - bc$ \\ 5103*ebfedea0SLionel Sambuc\hline \\ 5104*ebfedea0SLionel Sambuc1. If $b = 0$ return(\textit{MP\_VAL}). \\ 5105*ebfedea0SLionel Sambuc2. If $\vert a \vert < \vert b \vert$ then do \\ 5106*ebfedea0SLionel Sambuc\hspace{3mm}2.1 $d \leftarrow a$ \\ 5107*ebfedea0SLionel Sambuc\hspace{3mm}2.2 $c \leftarrow 0$ \\ 5108*ebfedea0SLionel Sambuc\hspace{3mm}2.3 Return(\textit{MP\_OKAY}). \\ 5109*ebfedea0SLionel Sambuc\\ 5110*ebfedea0SLionel SambucSetup the quotient to receive the digits. \\ 5111*ebfedea0SLionel Sambuc3. Grow $q$ to $a.used + 2$ digits. \\ 5112*ebfedea0SLionel Sambuc4. $q \leftarrow 0$ \\ 5113*ebfedea0SLionel Sambuc5. $x \leftarrow \vert a \vert , y \leftarrow \vert b \vert$ \\ 5114*ebfedea0SLionel Sambuc6. $sign \leftarrow \left \lbrace \begin{array}{ll} 5115*ebfedea0SLionel Sambuc MP\_ZPOS & \mbox{if }a.sign = b.sign \\ 5116*ebfedea0SLionel Sambuc MP\_NEG & \mbox{otherwise} \\ 5117*ebfedea0SLionel Sambuc \end{array} \right .$ \\ 5118*ebfedea0SLionel Sambuc\\ 5119*ebfedea0SLionel SambucNormalize the inputs such that the leading digit of $y$ is greater than or equal to $\beta / 2$. \\ 5120*ebfedea0SLionel Sambuc7. $norm \leftarrow (lg(\beta) - 1) - (\lceil lg(y) \rceil \mbox{ (mod }lg(\beta)\mbox{)})$ \\ 5121*ebfedea0SLionel Sambuc8. $x \leftarrow x \cdot 2^{norm}, y \leftarrow y \cdot 2^{norm}$ \\ 5122*ebfedea0SLionel Sambuc\\ 5123*ebfedea0SLionel SambucFind the leading digit of the quotient. \\ 5124*ebfedea0SLionel Sambuc9. $n \leftarrow x.used - 1, t \leftarrow y.used - 1$ \\ 5125*ebfedea0SLionel Sambuc10. $y \leftarrow y \cdot \beta^{n - t}$ \\ 5126*ebfedea0SLionel Sambuc11. While ($x \ge y$) do \\ 5127*ebfedea0SLionel Sambuc\hspace{3mm}11.1 $q_{n - t} \leftarrow q_{n - t} + 1$ \\ 5128*ebfedea0SLionel Sambuc\hspace{3mm}11.2 $x \leftarrow x - y$ \\ 5129*ebfedea0SLionel Sambuc12. $y \leftarrow \lfloor y / \beta^{n-t} \rfloor$ \\ 5130*ebfedea0SLionel Sambuc\\ 5131*ebfedea0SLionel SambucContinued on the next page. \\ 5132*ebfedea0SLionel Sambuc\hline 5133*ebfedea0SLionel Sambuc\end{tabular} 5134*ebfedea0SLionel Sambuc\end{center} 5135*ebfedea0SLionel Sambuc\end{small} 5136*ebfedea0SLionel Sambuc\caption{Algorithm mp\_div} 5137*ebfedea0SLionel Sambuc\end{figure} 5138*ebfedea0SLionel Sambuc 5139*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 5140*ebfedea0SLionel Sambuc\begin{small} 5141*ebfedea0SLionel Sambuc\begin{center} 5142*ebfedea0SLionel Sambuc\begin{tabular}{l} 5143*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_div} (continued). \\ 5144*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a, b$ \\ 5145*ebfedea0SLionel Sambuc\textbf{Output}. $c = \lfloor a/b \rfloor$, $d = a - bc$ \\ 5146*ebfedea0SLionel Sambuc\hline \\ 5147*ebfedea0SLionel SambucNow find the remainder fo the digits. \\ 5148*ebfedea0SLionel Sambuc13. for $i$ from $n$ down to $(t + 1)$ do \\ 5149*ebfedea0SLionel Sambuc\hspace{3mm}13.1 If $i > x.used$ then jump to the next iteration of this loop. \\ 5150*ebfedea0SLionel Sambuc\hspace{3mm}13.2 If $x_{i} = y_{t}$ then \\ 5151*ebfedea0SLionel Sambuc\hspace{6mm}13.2.1 $q_{i - t - 1} \leftarrow \beta - 1$ \\ 5152*ebfedea0SLionel Sambuc\hspace{3mm}13.3 else \\ 5153*ebfedea0SLionel Sambuc\hspace{6mm}13.3.1 $\hat r \leftarrow x_{i} \cdot \beta + x_{i - 1}$ \\ 5154*ebfedea0SLionel Sambuc\hspace{6mm}13.3.2 $\hat r \leftarrow \lfloor \hat r / y_{t} \rfloor$ \\ 5155*ebfedea0SLionel Sambuc\hspace{6mm}13.3.3 $q_{i - t - 1} \leftarrow \hat r$ \\ 5156*ebfedea0SLionel Sambuc\hspace{3mm}13.4 $q_{i - t - 1} \leftarrow q_{i - t - 1} + 1$ \\ 5157*ebfedea0SLionel Sambuc\\ 5158*ebfedea0SLionel SambucFixup quotient estimation. \\ 5159*ebfedea0SLionel Sambuc\hspace{3mm}13.5 Loop \\ 5160*ebfedea0SLionel Sambuc\hspace{6mm}13.5.1 $q_{i - t - 1} \leftarrow q_{i - t - 1} - 1$ \\ 5161*ebfedea0SLionel Sambuc\hspace{6mm}13.5.2 t$1 \leftarrow 0$ \\ 5162*ebfedea0SLionel Sambuc\hspace{6mm}13.5.3 t$1_0 \leftarrow y_{t - 1}, $ t$1_1 \leftarrow y_t,$ t$1.used \leftarrow 2$ \\ 5163*ebfedea0SLionel Sambuc\hspace{6mm}13.5.4 $t1 \leftarrow t1 \cdot q_{i - t - 1}$ \\ 5164*ebfedea0SLionel Sambuc\hspace{6mm}13.5.5 t$2_0 \leftarrow x_{i - 2}, $ t$2_1 \leftarrow x_{i - 1}, $ t$2_2 \leftarrow x_i, $ t$2.used \leftarrow 3$ \\ 5165*ebfedea0SLionel Sambuc\hspace{6mm}13.5.6 If $\vert t1 \vert > \vert t2 \vert$ then goto step 13.5. \\ 5166*ebfedea0SLionel Sambuc\hspace{3mm}13.6 t$1 \leftarrow y \cdot q_{i - t - 1}$ \\ 5167*ebfedea0SLionel Sambuc\hspace{3mm}13.7 t$1 \leftarrow $ t$1 \cdot \beta^{i - t - 1}$ \\ 5168*ebfedea0SLionel Sambuc\hspace{3mm}13.8 $x \leftarrow x - $ t$1$ \\ 5169*ebfedea0SLionel Sambuc\hspace{3mm}13.9 If $x.sign = MP\_NEG$ then \\ 5170*ebfedea0SLionel Sambuc\hspace{6mm}13.10 t$1 \leftarrow y$ \\ 5171*ebfedea0SLionel Sambuc\hspace{6mm}13.11 t$1 \leftarrow $ t$1 \cdot \beta^{i - t - 1}$ \\ 5172*ebfedea0SLionel Sambuc\hspace{6mm}13.12 $x \leftarrow x + $ t$1$ \\ 5173*ebfedea0SLionel Sambuc\hspace{6mm}13.13 $q_{i - t - 1} \leftarrow q_{i - t - 1} - 1$ \\ 5174*ebfedea0SLionel Sambuc\\ 5175*ebfedea0SLionel SambucFinalize the result. \\ 5176*ebfedea0SLionel Sambuc14. Clamp excess digits of $q$ \\ 5177*ebfedea0SLionel Sambuc15. $c \leftarrow q, c.sign \leftarrow sign$ \\ 5178*ebfedea0SLionel Sambuc16. $x.sign \leftarrow a.sign$ \\ 5179*ebfedea0SLionel Sambuc17. $d \leftarrow \lfloor x / 2^{norm} \rfloor$ \\ 5180*ebfedea0SLionel Sambuc18. Return(\textit{MP\_OKAY}). \\ 5181*ebfedea0SLionel Sambuc\hline 5182*ebfedea0SLionel Sambuc\end{tabular} 5183*ebfedea0SLionel Sambuc\end{center} 5184*ebfedea0SLionel Sambuc\end{small} 5185*ebfedea0SLionel Sambuc\caption{Algorithm mp\_div (continued)} 5186*ebfedea0SLionel Sambuc\end{figure} 5187*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_div.} 5188*ebfedea0SLionel SambucThis algorithm will calculate quotient and remainder from an integer division given a dividend and divisor. The algorithm is a signed 5189*ebfedea0SLionel Sambucdivision and will produce a fully qualified quotient and remainder. 5190*ebfedea0SLionel Sambuc 5191*ebfedea0SLionel SambucFirst the divisor $b$ must be non-zero which is enforced in step one. If the divisor is larger than the dividend than the quotient is implicitly 5192*ebfedea0SLionel Sambuczero and the remainder is the dividend. 5193*ebfedea0SLionel Sambuc 5194*ebfedea0SLionel SambucAfter the first two trivial cases of inputs are handled the variable $q$ is setup to receive the digits of the quotient. Two unsigned copies of the 5195*ebfedea0SLionel Sambucdivisor $y$ and dividend $x$ are made as well. The core of the division algorithm is an unsigned division and will only work if the values are 5196*ebfedea0SLionel Sambucpositive. Now the two values $x$ and $y$ must be normalized such that the leading digit of $y$ is greater than or equal to $\beta / 2$. 5197*ebfedea0SLionel SambucThis is performed by shifting both to the left by enough bits to get the desired normalization. 5198*ebfedea0SLionel Sambuc 5199*ebfedea0SLionel SambucAt this point the division algorithm can begin producing digits of the quotient. Recall that maximum value of the estimation used is 5200*ebfedea0SLionel Sambuc$2\beta - {2 \over \beta}$ which means that a digit of the quotient must be first produced by another means. In this case $y$ is shifted 5201*ebfedea0SLionel Sambucto the left (\textit{step ten}) so that it has the same number of digits as $x$. The loop on step eleven will subtract multiples of the 5202*ebfedea0SLionel Sambucshifted copy of $y$ until $x$ is smaller. Since the leading digit of $y$ is greater than or equal to $\beta/2$ this loop will iterate at most two 5203*ebfedea0SLionel Sambuctimes to produce the desired leading digit of the quotient. 5204*ebfedea0SLionel Sambuc 5205*ebfedea0SLionel SambucNow the remainder of the digits can be produced. The equation $\hat q = \lfloor {{x_i \beta + x_{i-1}}\over y_t} \rfloor$ is used to fairly 5206*ebfedea0SLionel Sambucaccurately approximate the true quotient digit. The estimation can in theory produce an estimation as high as $2\beta - {2 \over \beta}$ but by 5207*ebfedea0SLionel Sambucinduction the upper quotient digit is correct (\textit{as established on step eleven}) and the estimate must be less than $\beta$. 5208*ebfedea0SLionel Sambuc 5209*ebfedea0SLionel SambucRecall from section~\ref{sec:divest} that the estimation is never too low but may be too high. The next step of the estimation process is 5210*ebfedea0SLionel Sambucto refine the estimation. The loop on step 13.5 uses $x_i\beta^2 + x_{i-1}\beta + x_{i-2}$ and $q_{i - t - 1}(y_t\beta + y_{t-1})$ as a higher 5211*ebfedea0SLionel Sambucorder approximation to adjust the quotient digit. 5212*ebfedea0SLionel Sambuc 5213*ebfedea0SLionel SambucAfter both phases of estimation the quotient digit may still be off by a value of one\footnote{This is similar to the error introduced 5214*ebfedea0SLionel Sambucby optimizing Barrett reduction.}. Steps 13.6 and 13.7 subtract the multiple of the divisor from the dividend (\textit{Similar to step 3.3 of 5215*ebfedea0SLionel Sambucalgorithm~\ref{fig:raddiv}} and then subsequently add a multiple of the divisor if the quotient was too large. 5216*ebfedea0SLionel Sambuc 5217*ebfedea0SLionel SambucNow that the quotient has been determine finializing the result is a matter of clamping the quotient, fixing the sizes and de-normalizing the 5218*ebfedea0SLionel Sambucremainder. An important aspect of this algorithm seemingly overlooked in other descriptions such as that of Algorithm 14.20 HAC \cite[pp. 598]{HAC} 5219*ebfedea0SLionel Sambucis that when the estimations are being made (\textit{inside the loop on step 13.5}) that the digits $y_{t-1}$, $x_{i-2}$ and $x_{i-1}$ may lie 5220*ebfedea0SLionel Sambucoutside their respective boundaries. For example, if $t = 0$ or $i \le 1$ then the digits would be undefined. In those cases the digits should 5221*ebfedea0SLionel Sambucrespectively be replaced with a zero. 5222*ebfedea0SLionel Sambuc 5223*ebfedea0SLionel SambucEXAM,bn_mp_div.c 5224*ebfedea0SLionel Sambuc 5225*ebfedea0SLionel SambucThe implementation of this algorithm differs slightly from the pseudo code presented previously. In this algorithm either of the quotient $c$ or 5226*ebfedea0SLionel Sambucremainder $d$ may be passed as a \textbf{NULL} pointer which indicates their value is not desired. For example, the C code to call the division 5227*ebfedea0SLionel Sambucalgorithm with only the quotient is 5228*ebfedea0SLionel Sambuc 5229*ebfedea0SLionel Sambuc\begin{verbatim} 5230*ebfedea0SLionel Sambucmp_div(&a, &b, &c, NULL); /* c = [a/b] */ 5231*ebfedea0SLionel Sambuc\end{verbatim} 5232*ebfedea0SLionel Sambuc 5233*ebfedea0SLionel SambucLines @108,if@ and @113,if@ handle the two trivial cases of inputs which are division by zero and dividend smaller than the divisor 5234*ebfedea0SLionel Sambucrespectively. After the two trivial cases all of the temporary variables are initialized. Line @147,neg@ determines the sign of 5235*ebfedea0SLionel Sambucthe quotient and line @148,sign@ ensures that both $x$ and $y$ are positive. 5236*ebfedea0SLionel Sambuc 5237*ebfedea0SLionel SambucThe number of bits in the leading digit is calculated on line @151,norm@. Implictly an mp\_int with $r$ digits will require $lg(\beta)(r-1) + k$ bits 5238*ebfedea0SLionel Sambucof precision which when reduced modulo $lg(\beta)$ produces the value of $k$. In this case $k$ is the number of bits in the leading digit which is 5239*ebfedea0SLionel Sambucexactly what is required. For the algorithm to operate $k$ must equal $lg(\beta) - 1$ and when it does not the inputs must be normalized by shifting 5240*ebfedea0SLionel Sambucthem to the left by $lg(\beta) - 1 - k$ bits. 5241*ebfedea0SLionel Sambuc 5242*ebfedea0SLionel SambucThroughout the variables $n$ and $t$ will represent the highest digit of $x$ and $y$ respectively. These are first used to produce the 5243*ebfedea0SLionel Sambucleading digit of the quotient. The loop beginning on line @184,for@ will produce the remainder of the quotient digits. 5244*ebfedea0SLionel Sambuc 5245*ebfedea0SLionel SambucThe conditional ``continue'' on line @186,continue@ is used to prevent the algorithm from reading past the leading edge of $x$ which can occur when the 5246*ebfedea0SLionel Sambucalgorithm eliminates multiple non-zero digits in a single iteration. This ensures that $x_i$ is always non-zero since by definition the digits 5247*ebfedea0SLionel Sambucabove the $i$'th position $x$ must be zero in order for the quotient to be precise\footnote{Precise as far as integer division is concerned.}. 5248*ebfedea0SLionel Sambuc 5249*ebfedea0SLionel SambucLines @214,t1@, @216,t1@ and @222,t2@ through @225,t2@ manually construct the high accuracy estimations by setting the digits of the two mp\_int 5250*ebfedea0SLionel Sambucvariables directly. 5251*ebfedea0SLionel Sambuc 5252*ebfedea0SLionel Sambuc\section{Single Digit Helpers} 5253*ebfedea0SLionel Sambuc 5254*ebfedea0SLionel SambucThis section briefly describes a series of single digit helper algorithms which come in handy when working with small constants. All of 5255*ebfedea0SLionel Sambucthe helper functions assume the single digit input is positive and will treat them as such. 5256*ebfedea0SLionel Sambuc 5257*ebfedea0SLionel Sambuc\subsection{Single Digit Addition and Subtraction} 5258*ebfedea0SLionel Sambuc 5259*ebfedea0SLionel SambucBoth addition and subtraction are performed by ``cheating'' and using mp\_set followed by the higher level addition or subtraction 5260*ebfedea0SLionel Sambucalgorithms. As a result these algorithms are subtantially simpler with a slight cost in performance. 5261*ebfedea0SLionel Sambuc 5262*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 5263*ebfedea0SLionel Sambuc\begin{small} 5264*ebfedea0SLionel Sambuc\begin{center} 5265*ebfedea0SLionel Sambuc\begin{tabular}{l} 5266*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_add\_d}. \\ 5267*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ and a mp\_digit $b$ \\ 5268*ebfedea0SLionel Sambuc\textbf{Output}. $c = a + b$ \\ 5269*ebfedea0SLionel Sambuc\hline \\ 5270*ebfedea0SLionel Sambuc1. $t \leftarrow b$ (\textit{mp\_set}) \\ 5271*ebfedea0SLionel Sambuc2. $c \leftarrow a + t$ \\ 5272*ebfedea0SLionel Sambuc3. Return(\textit{MP\_OKAY}) \\ 5273*ebfedea0SLionel Sambuc\hline 5274*ebfedea0SLionel Sambuc\end{tabular} 5275*ebfedea0SLionel Sambuc\end{center} 5276*ebfedea0SLionel Sambuc\end{small} 5277*ebfedea0SLionel Sambuc\caption{Algorithm mp\_add\_d} 5278*ebfedea0SLionel Sambuc\end{figure} 5279*ebfedea0SLionel Sambuc 5280*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_add\_d.} 5281*ebfedea0SLionel SambucThis algorithm initiates a temporary mp\_int with the value of the single digit and uses algorithm mp\_add to add the two values together. 5282*ebfedea0SLionel Sambuc 5283*ebfedea0SLionel SambucEXAM,bn_mp_add_d.c 5284*ebfedea0SLionel Sambuc 5285*ebfedea0SLionel SambucClever use of the letter 't'. 5286*ebfedea0SLionel Sambuc 5287*ebfedea0SLionel Sambuc\subsubsection{Subtraction} 5288*ebfedea0SLionel SambucThe single digit subtraction algorithm mp\_sub\_d is essentially the same except it uses mp\_sub to subtract the digit from the mp\_int. 5289*ebfedea0SLionel Sambuc 5290*ebfedea0SLionel Sambuc\subsection{Single Digit Multiplication} 5291*ebfedea0SLionel SambucSingle digit multiplication arises enough in division and radix conversion that it ought to be implement as a special case of the baseline 5292*ebfedea0SLionel Sambucmultiplication algorithm. Essentially this algorithm is a modified version of algorithm s\_mp\_mul\_digs where one of the multiplicands 5293*ebfedea0SLionel Sambuconly has one digit. 5294*ebfedea0SLionel Sambuc 5295*ebfedea0SLionel Sambuc\begin{figure}[!here] 5296*ebfedea0SLionel Sambuc\begin{small} 5297*ebfedea0SLionel Sambuc\begin{center} 5298*ebfedea0SLionel Sambuc\begin{tabular}{l} 5299*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_mul\_d}. \\ 5300*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ and a mp\_digit $b$ \\ 5301*ebfedea0SLionel Sambuc\textbf{Output}. $c = ab$ \\ 5302*ebfedea0SLionel Sambuc\hline \\ 5303*ebfedea0SLionel Sambuc1. $pa \leftarrow a.used$ \\ 5304*ebfedea0SLionel Sambuc2. Grow $c$ to at least $pa + 1$ digits. \\ 5305*ebfedea0SLionel Sambuc3. $oldused \leftarrow c.used$ \\ 5306*ebfedea0SLionel Sambuc4. $c.used \leftarrow pa + 1$ \\ 5307*ebfedea0SLionel Sambuc5. $c.sign \leftarrow a.sign$ \\ 5308*ebfedea0SLionel Sambuc6. $\mu \leftarrow 0$ \\ 5309*ebfedea0SLionel Sambuc7. for $ix$ from $0$ to $pa - 1$ do \\ 5310*ebfedea0SLionel Sambuc\hspace{3mm}7.1 $\hat r \leftarrow \mu + a_{ix}b$ \\ 5311*ebfedea0SLionel Sambuc\hspace{3mm}7.2 $c_{ix} \leftarrow \hat r \mbox{ (mod }\beta\mbox{)}$ \\ 5312*ebfedea0SLionel Sambuc\hspace{3mm}7.3 $\mu \leftarrow \lfloor \hat r / \beta \rfloor$ \\ 5313*ebfedea0SLionel Sambuc8. $c_{pa} \leftarrow \mu$ \\ 5314*ebfedea0SLionel Sambuc9. for $ix$ from $pa + 1$ to $oldused$ do \\ 5315*ebfedea0SLionel Sambuc\hspace{3mm}9.1 $c_{ix} \leftarrow 0$ \\ 5316*ebfedea0SLionel Sambuc10. Clamp excess digits of $c$. \\ 5317*ebfedea0SLionel Sambuc11. Return(\textit{MP\_OKAY}). \\ 5318*ebfedea0SLionel Sambuc\hline 5319*ebfedea0SLionel Sambuc\end{tabular} 5320*ebfedea0SLionel Sambuc\end{center} 5321*ebfedea0SLionel Sambuc\end{small} 5322*ebfedea0SLionel Sambuc\caption{Algorithm mp\_mul\_d} 5323*ebfedea0SLionel Sambuc\end{figure} 5324*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_mul\_d.} 5325*ebfedea0SLionel SambucThis algorithm quickly multiplies an mp\_int by a small single digit value. It is specially tailored to the job and has a minimal of overhead. 5326*ebfedea0SLionel SambucUnlike the full multiplication algorithms this algorithm does not require any significnat temporary storage or memory allocations. 5327*ebfedea0SLionel Sambuc 5328*ebfedea0SLionel SambucEXAM,bn_mp_mul_d.c 5329*ebfedea0SLionel Sambuc 5330*ebfedea0SLionel SambucIn this implementation the destination $c$ may point to the same mp\_int as the source $a$ since the result is written after the digit is 5331*ebfedea0SLionel Sambucread from the source. This function uses pointer aliases $tmpa$ and $tmpc$ for the digits of $a$ and $c$ respectively. 5332*ebfedea0SLionel Sambuc 5333*ebfedea0SLionel Sambuc\subsection{Single Digit Division} 5334*ebfedea0SLionel SambucLike the single digit multiplication algorithm, single digit division is also a fairly common algorithm used in radix conversion. Since the 5335*ebfedea0SLionel Sambucdivisor is only a single digit a specialized variant of the division algorithm can be used to compute the quotient. 5336*ebfedea0SLionel Sambuc 5337*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 5338*ebfedea0SLionel Sambuc\begin{small} 5339*ebfedea0SLionel Sambuc\begin{center} 5340*ebfedea0SLionel Sambuc\begin{tabular}{l} 5341*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_div\_d}. \\ 5342*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ and a mp\_digit $b$ \\ 5343*ebfedea0SLionel Sambuc\textbf{Output}. $c = \lfloor a / b \rfloor, d = a - cb$ \\ 5344*ebfedea0SLionel Sambuc\hline \\ 5345*ebfedea0SLionel Sambuc1. If $b = 0$ then return(\textit{MP\_VAL}).\\ 5346*ebfedea0SLionel Sambuc2. If $b = 3$ then use algorithm mp\_div\_3 instead. \\ 5347*ebfedea0SLionel Sambuc3. Init $q$ to $a.used$ digits. \\ 5348*ebfedea0SLionel Sambuc4. $q.used \leftarrow a.used$ \\ 5349*ebfedea0SLionel Sambuc5. $q.sign \leftarrow a.sign$ \\ 5350*ebfedea0SLionel Sambuc6. $\hat w \leftarrow 0$ \\ 5351*ebfedea0SLionel Sambuc7. for $ix$ from $a.used - 1$ down to $0$ do \\ 5352*ebfedea0SLionel Sambuc\hspace{3mm}7.1 $\hat w \leftarrow \hat w \beta + a_{ix}$ \\ 5353*ebfedea0SLionel Sambuc\hspace{3mm}7.2 If $\hat w \ge b$ then \\ 5354*ebfedea0SLionel Sambuc\hspace{6mm}7.2.1 $t \leftarrow \lfloor \hat w / b \rfloor$ \\ 5355*ebfedea0SLionel Sambuc\hspace{6mm}7.2.2 $\hat w \leftarrow \hat w \mbox{ (mod }b\mbox{)}$ \\ 5356*ebfedea0SLionel Sambuc\hspace{3mm}7.3 else\\ 5357*ebfedea0SLionel Sambuc\hspace{6mm}7.3.1 $t \leftarrow 0$ \\ 5358*ebfedea0SLionel Sambuc\hspace{3mm}7.4 $q_{ix} \leftarrow t$ \\ 5359*ebfedea0SLionel Sambuc8. $d \leftarrow \hat w$ \\ 5360*ebfedea0SLionel Sambuc9. Clamp excess digits of $q$. \\ 5361*ebfedea0SLionel Sambuc10. $c \leftarrow q$ \\ 5362*ebfedea0SLionel Sambuc11. Return(\textit{MP\_OKAY}). \\ 5363*ebfedea0SLionel Sambuc\hline 5364*ebfedea0SLionel Sambuc\end{tabular} 5365*ebfedea0SLionel Sambuc\end{center} 5366*ebfedea0SLionel Sambuc\end{small} 5367*ebfedea0SLionel Sambuc\caption{Algorithm mp\_div\_d} 5368*ebfedea0SLionel Sambuc\end{figure} 5369*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_div\_d.} 5370*ebfedea0SLionel SambucThis algorithm divides the mp\_int $a$ by the single mp\_digit $b$ using an optimized approach. Essentially in every iteration of the 5371*ebfedea0SLionel Sambucalgorithm another digit of the dividend is reduced and another digit of quotient produced. Provided $b < \beta$ the value of $\hat w$ 5372*ebfedea0SLionel Sambucafter step 7.1 will be limited such that $0 \le \lfloor \hat w / b \rfloor < \beta$. 5373*ebfedea0SLionel Sambuc 5374*ebfedea0SLionel SambucIf the divisor $b$ is equal to three a variant of this algorithm is used which is called mp\_div\_3. It replaces the division by three with 5375*ebfedea0SLionel Sambuca multiplication by $\lfloor \beta / 3 \rfloor$ and the appropriate shift and residual fixup. In essence it is much like the Barrett reduction 5376*ebfedea0SLionel Sambucfrom chapter seven. 5377*ebfedea0SLionel Sambuc 5378*ebfedea0SLionel SambucEXAM,bn_mp_div_d.c 5379*ebfedea0SLionel Sambuc 5380*ebfedea0SLionel SambucLike the implementation of algorithm mp\_div this algorithm allows either of the quotient or remainder to be passed as a \textbf{NULL} pointer to 5381*ebfedea0SLionel Sambucindicate the respective value is not required. This allows a trivial single digit modular reduction algorithm, mp\_mod\_d to be created. 5382*ebfedea0SLionel Sambuc 5383*ebfedea0SLionel SambucThe division and remainder on lines @44,/@ and @45,%@ can be replaced often by a single division on most processors. For example, the 32-bit x86 based 5384*ebfedea0SLionel Sambucprocessors can divide a 64-bit quantity by a 32-bit quantity and produce the quotient and remainder simultaneously. Unfortunately the GCC 5385*ebfedea0SLionel Sambuccompiler does not recognize that optimization and will actually produce two function calls to find the quotient and remainder respectively. 5386*ebfedea0SLionel Sambuc 5387*ebfedea0SLionel Sambuc\subsection{Single Digit Root Extraction} 5388*ebfedea0SLionel Sambuc 5389*ebfedea0SLionel SambucFinding the $n$'th root of an integer is fairly easy as far as numerical analysis is concerned. Algorithms such as the Newton-Raphson approximation 5390*ebfedea0SLionel Sambuc(\ref{eqn:newton}) series will converge very quickly to a root for any continuous function $f(x)$. 5391*ebfedea0SLionel Sambuc 5392*ebfedea0SLionel Sambuc\begin{equation} 5393*ebfedea0SLionel Sambucx_{i+1} = x_i - {f(x_i) \over f'(x_i)} 5394*ebfedea0SLionel Sambuc\label{eqn:newton} 5395*ebfedea0SLionel Sambuc\end{equation} 5396*ebfedea0SLionel Sambuc 5397*ebfedea0SLionel SambucIn this case the $n$'th root is desired and $f(x) = x^n - a$ where $a$ is the integer of which the root is desired. The derivative of $f(x)$ is 5398*ebfedea0SLionel Sambucsimply $f'(x) = nx^{n - 1}$. Of particular importance is that this algorithm will be used over the integers not over the a more continuous domain 5399*ebfedea0SLionel Sambucsuch as the real numbers. As a result the root found can be above the true root by few and must be manually adjusted. Ideally at the end of the 5400*ebfedea0SLionel Sambucalgorithm the $n$'th root $b$ of an integer $a$ is desired such that $b^n \le a$. 5401*ebfedea0SLionel Sambuc 5402*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 5403*ebfedea0SLionel Sambuc\begin{small} 5404*ebfedea0SLionel Sambuc\begin{center} 5405*ebfedea0SLionel Sambuc\begin{tabular}{l} 5406*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_n\_root}. \\ 5407*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ and a mp\_digit $b$ \\ 5408*ebfedea0SLionel Sambuc\textbf{Output}. $c^b \le a$ \\ 5409*ebfedea0SLionel Sambuc\hline \\ 5410*ebfedea0SLionel Sambuc1. If $b$ is even and $a.sign = MP\_NEG$ return(\textit{MP\_VAL}). \\ 5411*ebfedea0SLionel Sambuc2. $sign \leftarrow a.sign$ \\ 5412*ebfedea0SLionel Sambuc3. $a.sign \leftarrow MP\_ZPOS$ \\ 5413*ebfedea0SLionel Sambuc4. t$2 \leftarrow 2$ \\ 5414*ebfedea0SLionel Sambuc5. Loop \\ 5415*ebfedea0SLionel Sambuc\hspace{3mm}5.1 t$1 \leftarrow $ t$2$ \\ 5416*ebfedea0SLionel Sambuc\hspace{3mm}5.2 t$3 \leftarrow $ t$1^{b - 1}$ \\ 5417*ebfedea0SLionel Sambuc\hspace{3mm}5.3 t$2 \leftarrow $ t$3 $ $\cdot$ t$1$ \\ 5418*ebfedea0SLionel Sambuc\hspace{3mm}5.4 t$2 \leftarrow $ t$2 - a$ \\ 5419*ebfedea0SLionel Sambuc\hspace{3mm}5.5 t$3 \leftarrow $ t$3 \cdot b$ \\ 5420*ebfedea0SLionel Sambuc\hspace{3mm}5.6 t$3 \leftarrow \lfloor $t$2 / $t$3 \rfloor$ \\ 5421*ebfedea0SLionel Sambuc\hspace{3mm}5.7 t$2 \leftarrow $ t$1 - $ t$3$ \\ 5422*ebfedea0SLionel Sambuc\hspace{3mm}5.8 If t$1 \ne $ t$2$ then goto step 5. \\ 5423*ebfedea0SLionel Sambuc6. Loop \\ 5424*ebfedea0SLionel Sambuc\hspace{3mm}6.1 t$2 \leftarrow $ t$1^b$ \\ 5425*ebfedea0SLionel Sambuc\hspace{3mm}6.2 If t$2 > a$ then \\ 5426*ebfedea0SLionel Sambuc\hspace{6mm}6.2.1 t$1 \leftarrow $ t$1 - 1$ \\ 5427*ebfedea0SLionel Sambuc\hspace{6mm}6.2.2 Goto step 6. \\ 5428*ebfedea0SLionel Sambuc7. $a.sign \leftarrow sign$ \\ 5429*ebfedea0SLionel Sambuc8. $c \leftarrow $ t$1$ \\ 5430*ebfedea0SLionel Sambuc9. $c.sign \leftarrow sign$ \\ 5431*ebfedea0SLionel Sambuc10. Return(\textit{MP\_OKAY}). \\ 5432*ebfedea0SLionel Sambuc\hline 5433*ebfedea0SLionel Sambuc\end{tabular} 5434*ebfedea0SLionel Sambuc\end{center} 5435*ebfedea0SLionel Sambuc\end{small} 5436*ebfedea0SLionel Sambuc\caption{Algorithm mp\_n\_root} 5437*ebfedea0SLionel Sambuc\end{figure} 5438*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_n\_root.} 5439*ebfedea0SLionel SambucThis algorithm finds the integer $n$'th root of an input using the Newton-Raphson approach. It is partially optimized based on the observation 5440*ebfedea0SLionel Sambucthat the numerator of ${f(x) \over f'(x)}$ can be derived from a partial denominator. That is at first the denominator is calculated by finding 5441*ebfedea0SLionel Sambuc$x^{b - 1}$. This value can then be multiplied by $x$ and have $a$ subtracted from it to find the numerator. This saves a total of $b - 1$ 5442*ebfedea0SLionel Sambucmultiplications by t$1$ inside the loop. 5443*ebfedea0SLionel Sambuc 5444*ebfedea0SLionel SambucThe initial value of the approximation is t$2 = 2$ which allows the algorithm to start with very small values and quickly converge on the 5445*ebfedea0SLionel Sambucroot. Ideally this algorithm is meant to find the $n$'th root of an input where $n$ is bounded by $2 \le n \le 5$. 5446*ebfedea0SLionel Sambuc 5447*ebfedea0SLionel SambucEXAM,bn_mp_n_root.c 5448*ebfedea0SLionel Sambuc 5449*ebfedea0SLionel Sambuc\section{Random Number Generation} 5450*ebfedea0SLionel Sambuc 5451*ebfedea0SLionel SambucRandom numbers come up in a variety of activities from public key cryptography to simple simulations and various randomized algorithms. Pollard-Rho 5452*ebfedea0SLionel Sambucfactoring for example, can make use of random values as starting points to find factors of a composite integer. In this case the algorithm presented 5453*ebfedea0SLionel Sambucis solely for simulations and not intended for cryptographic use. 5454*ebfedea0SLionel Sambuc 5455*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 5456*ebfedea0SLionel Sambuc\begin{small} 5457*ebfedea0SLionel Sambuc\begin{center} 5458*ebfedea0SLionel Sambuc\begin{tabular}{l} 5459*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_rand}. \\ 5460*ebfedea0SLionel Sambuc\textbf{Input}. An integer $b$ \\ 5461*ebfedea0SLionel Sambuc\textbf{Output}. A pseudo-random number of $b$ digits \\ 5462*ebfedea0SLionel Sambuc\hline \\ 5463*ebfedea0SLionel Sambuc1. $a \leftarrow 0$ \\ 5464*ebfedea0SLionel Sambuc2. If $b \le 0$ return(\textit{MP\_OKAY}) \\ 5465*ebfedea0SLionel Sambuc3. Pick a non-zero random digit $d$. \\ 5466*ebfedea0SLionel Sambuc4. $a \leftarrow a + d$ \\ 5467*ebfedea0SLionel Sambuc5. for $ix$ from 1 to $d - 1$ do \\ 5468*ebfedea0SLionel Sambuc\hspace{3mm}5.1 $a \leftarrow a \cdot \beta$ \\ 5469*ebfedea0SLionel Sambuc\hspace{3mm}5.2 Pick a random digit $d$. \\ 5470*ebfedea0SLionel Sambuc\hspace{3mm}5.3 $a \leftarrow a + d$ \\ 5471*ebfedea0SLionel Sambuc6. Return(\textit{MP\_OKAY}). \\ 5472*ebfedea0SLionel Sambuc\hline 5473*ebfedea0SLionel Sambuc\end{tabular} 5474*ebfedea0SLionel Sambuc\end{center} 5475*ebfedea0SLionel Sambuc\end{small} 5476*ebfedea0SLionel Sambuc\caption{Algorithm mp\_rand} 5477*ebfedea0SLionel Sambuc\end{figure} 5478*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_rand.} 5479*ebfedea0SLionel SambucThis algorithm produces a pseudo-random integer of $b$ digits. By ensuring that the first digit is non-zero the algorithm also guarantees that the 5480*ebfedea0SLionel Sambucfinal result has at least $b$ digits. It relies heavily on a third-part random number generator which should ideally generate uniformly all of 5481*ebfedea0SLionel Sambucthe integers from $0$ to $\beta - 1$. 5482*ebfedea0SLionel Sambuc 5483*ebfedea0SLionel SambucEXAM,bn_mp_rand.c 5484*ebfedea0SLionel Sambuc 5485*ebfedea0SLionel Sambuc\section{Formatted Representations} 5486*ebfedea0SLionel SambucThe ability to emit a radix-$n$ textual representation of an integer is useful for interacting with human parties. For example, the ability to 5487*ebfedea0SLionel Sambucbe given a string of characters such as ``114585'' and turn it into the radix-$\beta$ equivalent would make it easier to enter numbers 5488*ebfedea0SLionel Sambucinto a program. 5489*ebfedea0SLionel Sambuc 5490*ebfedea0SLionel Sambuc\subsection{Reading Radix-n Input} 5491*ebfedea0SLionel SambucFor the purposes of this text we will assume that a simple lower ASCII map (\ref{fig:ASC}) is used for the values of from $0$ to $63$ to 5492*ebfedea0SLionel Sambucprintable characters. For example, when the character ``N'' is read it represents the integer $23$. The first $16$ characters of the 5493*ebfedea0SLionel Sambucmap are for the common representations up to hexadecimal. After that they match the ``base64'' encoding scheme which are suitable chosen 5494*ebfedea0SLionel Sambucsuch that they are printable. While outputting as base64 may not be too helpful for human operators it does allow communication via non binary 5495*ebfedea0SLionel Sambucmediums. 5496*ebfedea0SLionel Sambuc 5497*ebfedea0SLionel Sambuc\newpage\begin{figure}[here] 5498*ebfedea0SLionel Sambuc\begin{center} 5499*ebfedea0SLionel Sambuc\begin{tabular}{cc|cc|cc|cc} 5500*ebfedea0SLionel Sambuc\hline \textbf{Value} & \textbf{Char} & \textbf{Value} & \textbf{Char} & \textbf{Value} & \textbf{Char} & \textbf{Value} & \textbf{Char} \\ 5501*ebfedea0SLionel Sambuc\hline 5502*ebfedea0SLionel Sambuc0 & 0 & 1 & 1 & 2 & 2 & 3 & 3 \\ 5503*ebfedea0SLionel Sambuc4 & 4 & 5 & 5 & 6 & 6 & 7 & 7 \\ 5504*ebfedea0SLionel Sambuc8 & 8 & 9 & 9 & 10 & A & 11 & B \\ 5505*ebfedea0SLionel Sambuc12 & C & 13 & D & 14 & E & 15 & F \\ 5506*ebfedea0SLionel Sambuc16 & G & 17 & H & 18 & I & 19 & J \\ 5507*ebfedea0SLionel Sambuc20 & K & 21 & L & 22 & M & 23 & N \\ 5508*ebfedea0SLionel Sambuc24 & O & 25 & P & 26 & Q & 27 & R \\ 5509*ebfedea0SLionel Sambuc28 & S & 29 & T & 30 & U & 31 & V \\ 5510*ebfedea0SLionel Sambuc32 & W & 33 & X & 34 & Y & 35 & Z \\ 5511*ebfedea0SLionel Sambuc36 & a & 37 & b & 38 & c & 39 & d \\ 5512*ebfedea0SLionel Sambuc40 & e & 41 & f & 42 & g & 43 & h \\ 5513*ebfedea0SLionel Sambuc44 & i & 45 & j & 46 & k & 47 & l \\ 5514*ebfedea0SLionel Sambuc48 & m & 49 & n & 50 & o & 51 & p \\ 5515*ebfedea0SLionel Sambuc52 & q & 53 & r & 54 & s & 55 & t \\ 5516*ebfedea0SLionel Sambuc56 & u & 57 & v & 58 & w & 59 & x \\ 5517*ebfedea0SLionel Sambuc60 & y & 61 & z & 62 & $+$ & 63 & $/$ \\ 5518*ebfedea0SLionel Sambuc\hline 5519*ebfedea0SLionel Sambuc\end{tabular} 5520*ebfedea0SLionel Sambuc\end{center} 5521*ebfedea0SLionel Sambuc\caption{Lower ASCII Map} 5522*ebfedea0SLionel Sambuc\label{fig:ASC} 5523*ebfedea0SLionel Sambuc\end{figure} 5524*ebfedea0SLionel Sambuc 5525*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 5526*ebfedea0SLionel Sambuc\begin{small} 5527*ebfedea0SLionel Sambuc\begin{center} 5528*ebfedea0SLionel Sambuc\begin{tabular}{l} 5529*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_read\_radix}. \\ 5530*ebfedea0SLionel Sambuc\textbf{Input}. A string $str$ of length $sn$ and radix $r$. \\ 5531*ebfedea0SLionel Sambuc\textbf{Output}. The radix-$\beta$ equivalent mp\_int. \\ 5532*ebfedea0SLionel Sambuc\hline \\ 5533*ebfedea0SLionel Sambuc1. If $r < 2$ or $r > 64$ return(\textit{MP\_VAL}). \\ 5534*ebfedea0SLionel Sambuc2. $ix \leftarrow 0$ \\ 5535*ebfedea0SLionel Sambuc3. If $str_0 =$ ``-'' then do \\ 5536*ebfedea0SLionel Sambuc\hspace{3mm}3.1 $ix \leftarrow ix + 1$ \\ 5537*ebfedea0SLionel Sambuc\hspace{3mm}3.2 $sign \leftarrow MP\_NEG$ \\ 5538*ebfedea0SLionel Sambuc4. else \\ 5539*ebfedea0SLionel Sambuc\hspace{3mm}4.1 $sign \leftarrow MP\_ZPOS$ \\ 5540*ebfedea0SLionel Sambuc5. $a \leftarrow 0$ \\ 5541*ebfedea0SLionel Sambuc6. for $iy$ from $ix$ to $sn - 1$ do \\ 5542*ebfedea0SLionel Sambuc\hspace{3mm}6.1 Let $y$ denote the position in the map of $str_{iy}$. \\ 5543*ebfedea0SLionel Sambuc\hspace{3mm}6.2 If $str_{iy}$ is not in the map or $y \ge r$ then goto step 7. \\ 5544*ebfedea0SLionel Sambuc\hspace{3mm}6.3 $a \leftarrow a \cdot r$ \\ 5545*ebfedea0SLionel Sambuc\hspace{3mm}6.4 $a \leftarrow a + y$ \\ 5546*ebfedea0SLionel Sambuc7. If $a \ne 0$ then $a.sign \leftarrow sign$ \\ 5547*ebfedea0SLionel Sambuc8. Return(\textit{MP\_OKAY}). \\ 5548*ebfedea0SLionel Sambuc\hline 5549*ebfedea0SLionel Sambuc\end{tabular} 5550*ebfedea0SLionel Sambuc\end{center} 5551*ebfedea0SLionel Sambuc\end{small} 5552*ebfedea0SLionel Sambuc\caption{Algorithm mp\_read\_radix} 5553*ebfedea0SLionel Sambuc\end{figure} 5554*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_read\_radix.} 5555*ebfedea0SLionel SambucThis algorithm will read an ASCII string and produce the radix-$\beta$ mp\_int representation of the same integer. A minus symbol ``-'' may precede the 5556*ebfedea0SLionel Sambucstring to indicate the value is negative, otherwise it is assumed to be positive. The algorithm will read up to $sn$ characters from the input 5557*ebfedea0SLionel Sambucand will stop when it reads a character it cannot map the algorithm stops reading characters from the string. This allows numbers to be embedded 5558*ebfedea0SLionel Sambucas part of larger input without any significant problem. 5559*ebfedea0SLionel Sambuc 5560*ebfedea0SLionel SambucEXAM,bn_mp_read_radix.c 5561*ebfedea0SLionel Sambuc 5562*ebfedea0SLionel Sambuc\subsection{Generating Radix-$n$ Output} 5563*ebfedea0SLionel SambucGenerating radix-$n$ output is fairly trivial with a division and remainder algorithm. 5564*ebfedea0SLionel Sambuc 5565*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 5566*ebfedea0SLionel Sambuc\begin{small} 5567*ebfedea0SLionel Sambuc\begin{center} 5568*ebfedea0SLionel Sambuc\begin{tabular}{l} 5569*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_toradix}. \\ 5570*ebfedea0SLionel Sambuc\textbf{Input}. A mp\_int $a$ and an integer $r$\\ 5571*ebfedea0SLionel Sambuc\textbf{Output}. The radix-$r$ representation of $a$ \\ 5572*ebfedea0SLionel Sambuc\hline \\ 5573*ebfedea0SLionel Sambuc1. If $r < 2$ or $r > 64$ return(\textit{MP\_VAL}). \\ 5574*ebfedea0SLionel Sambuc2. If $a = 0$ then $str = $ ``$0$'' and return(\textit{MP\_OKAY}). \\ 5575*ebfedea0SLionel Sambuc3. $t \leftarrow a$ \\ 5576*ebfedea0SLionel Sambuc4. $str \leftarrow$ ``'' \\ 5577*ebfedea0SLionel Sambuc5. if $t.sign = MP\_NEG$ then \\ 5578*ebfedea0SLionel Sambuc\hspace{3mm}5.1 $str \leftarrow str + $ ``-'' \\ 5579*ebfedea0SLionel Sambuc\hspace{3mm}5.2 $t.sign = MP\_ZPOS$ \\ 5580*ebfedea0SLionel Sambuc6. While ($t \ne 0$) do \\ 5581*ebfedea0SLionel Sambuc\hspace{3mm}6.1 $d \leftarrow t \mbox{ (mod }r\mbox{)}$ \\ 5582*ebfedea0SLionel Sambuc\hspace{3mm}6.2 $t \leftarrow \lfloor t / r \rfloor$ \\ 5583*ebfedea0SLionel Sambuc\hspace{3mm}6.3 Look up $d$ in the map and store the equivalent character in $y$. \\ 5584*ebfedea0SLionel Sambuc\hspace{3mm}6.4 $str \leftarrow str + y$ \\ 5585*ebfedea0SLionel Sambuc7. If $str_0 = $``$-$'' then \\ 5586*ebfedea0SLionel Sambuc\hspace{3mm}7.1 Reverse the digits $str_1, str_2, \ldots str_n$. \\ 5587*ebfedea0SLionel Sambuc8. Otherwise \\ 5588*ebfedea0SLionel Sambuc\hspace{3mm}8.1 Reverse the digits $str_0, str_1, \ldots str_n$. \\ 5589*ebfedea0SLionel Sambuc9. Return(\textit{MP\_OKAY}).\\ 5590*ebfedea0SLionel Sambuc\hline 5591*ebfedea0SLionel Sambuc\end{tabular} 5592*ebfedea0SLionel Sambuc\end{center} 5593*ebfedea0SLionel Sambuc\end{small} 5594*ebfedea0SLionel Sambuc\caption{Algorithm mp\_toradix} 5595*ebfedea0SLionel Sambuc\end{figure} 5596*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_toradix.} 5597*ebfedea0SLionel SambucThis algorithm computes the radix-$r$ representation of an mp\_int $a$. The ``digits'' of the representation are extracted by reducing 5598*ebfedea0SLionel Sambucsuccessive powers of $\lfloor a / r^k \rfloor$ the input modulo $r$ until $r^k > a$. Note that instead of actually dividing by $r^k$ in 5599*ebfedea0SLionel Sambuceach iteration the quotient $\lfloor a / r \rfloor$ is saved for the next iteration. As a result a series of trivial $n \times 1$ divisions 5600*ebfedea0SLionel Sambucare required instead of a series of $n \times k$ divisions. One design flaw of this approach is that the digits are produced in the reverse order 5601*ebfedea0SLionel Sambuc(see~\ref{fig:mpradix}). To remedy this flaw the digits must be swapped or simply ``reversed''. 5602*ebfedea0SLionel Sambuc 5603*ebfedea0SLionel Sambuc\begin{figure} 5604*ebfedea0SLionel Sambuc\begin{center} 5605*ebfedea0SLionel Sambuc\begin{tabular}{|c|c|c|} 5606*ebfedea0SLionel Sambuc\hline \textbf{Value of $a$} & \textbf{Value of $d$} & \textbf{Value of $str$} \\ 5607*ebfedea0SLionel Sambuc\hline $1234$ & -- & -- \\ 5608*ebfedea0SLionel Sambuc\hline $123$ & $4$ & ``4'' \\ 5609*ebfedea0SLionel Sambuc\hline $12$ & $3$ & ``43'' \\ 5610*ebfedea0SLionel Sambuc\hline $1$ & $2$ & ``432'' \\ 5611*ebfedea0SLionel Sambuc\hline $0$ & $1$ & ``4321'' \\ 5612*ebfedea0SLionel Sambuc\hline 5613*ebfedea0SLionel Sambuc\end{tabular} 5614*ebfedea0SLionel Sambuc\end{center} 5615*ebfedea0SLionel Sambuc\caption{Example of Algorithm mp\_toradix.} 5616*ebfedea0SLionel Sambuc\label{fig:mpradix} 5617*ebfedea0SLionel Sambuc\end{figure} 5618*ebfedea0SLionel Sambuc 5619*ebfedea0SLionel SambucEXAM,bn_mp_toradix.c 5620*ebfedea0SLionel Sambuc 5621*ebfedea0SLionel Sambuc\chapter{Number Theoretic Algorithms} 5622*ebfedea0SLionel SambucThis chapter discusses several fundamental number theoretic algorithms such as the greatest common divisor, least common multiple and Jacobi 5623*ebfedea0SLionel Sambucsymbol computation. These algorithms arise as essential components in several key cryptographic algorithms such as the RSA public key algorithm and 5624*ebfedea0SLionel Sambucvarious Sieve based factoring algorithms. 5625*ebfedea0SLionel Sambuc 5626*ebfedea0SLionel Sambuc\section{Greatest Common Divisor} 5627*ebfedea0SLionel SambucThe greatest common divisor of two integers $a$ and $b$, often denoted as $(a, b)$ is the largest integer $k$ that is a proper divisor of 5628*ebfedea0SLionel Sambucboth $a$ and $b$. That is, $k$ is the largest integer such that $0 \equiv a \mbox{ (mod }k\mbox{)}$ and $0 \equiv b \mbox{ (mod }k\mbox{)}$ occur 5629*ebfedea0SLionel Sambucsimultaneously. 5630*ebfedea0SLionel Sambuc 5631*ebfedea0SLionel SambucThe most common approach (cite) is to reduce one input modulo another. That is if $a$ and $b$ are divisible by some integer $k$ and if $qa + r = b$ then 5632*ebfedea0SLionel Sambuc$r$ is also divisible by $k$. The reduction pattern follows $\left < a , b \right > \rightarrow \left < b, a \mbox{ mod } b \right >$. 5633*ebfedea0SLionel Sambuc 5634*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 5635*ebfedea0SLionel Sambuc\begin{small} 5636*ebfedea0SLionel Sambuc\begin{center} 5637*ebfedea0SLionel Sambuc\begin{tabular}{l} 5638*ebfedea0SLionel Sambuc\hline Algorithm \textbf{Greatest Common Divisor (I)}. \\ 5639*ebfedea0SLionel Sambuc\textbf{Input}. Two positive integers $a$ and $b$ greater than zero. \\ 5640*ebfedea0SLionel Sambuc\textbf{Output}. The greatest common divisor $(a, b)$. \\ 5641*ebfedea0SLionel Sambuc\hline \\ 5642*ebfedea0SLionel Sambuc1. While ($b > 0$) do \\ 5643*ebfedea0SLionel Sambuc\hspace{3mm}1.1 $r \leftarrow a \mbox{ (mod }b\mbox{)}$ \\ 5644*ebfedea0SLionel Sambuc\hspace{3mm}1.2 $a \leftarrow b$ \\ 5645*ebfedea0SLionel Sambuc\hspace{3mm}1.3 $b \leftarrow r$ \\ 5646*ebfedea0SLionel Sambuc2. Return($a$). \\ 5647*ebfedea0SLionel Sambuc\hline 5648*ebfedea0SLionel Sambuc\end{tabular} 5649*ebfedea0SLionel Sambuc\end{center} 5650*ebfedea0SLionel Sambuc\end{small} 5651*ebfedea0SLionel Sambuc\caption{Algorithm Greatest Common Divisor (I)} 5652*ebfedea0SLionel Sambuc\label{fig:gcd1} 5653*ebfedea0SLionel Sambuc\end{figure} 5654*ebfedea0SLionel Sambuc 5655*ebfedea0SLionel SambucThis algorithm will quickly converge on the greatest common divisor since the residue $r$ tends diminish rapidly. However, divisions are 5656*ebfedea0SLionel Sambucrelatively expensive operations to perform and should ideally be avoided. There is another approach based on a similar relationship of 5657*ebfedea0SLionel Sambucgreatest common divisors. The faster approach is based on the observation that if $k$ divides both $a$ and $b$ it will also divide $a - b$. 5658*ebfedea0SLionel SambucIn particular, we would like $a - b$ to decrease in magnitude which implies that $b \ge a$. 5659*ebfedea0SLionel Sambuc 5660*ebfedea0SLionel Sambuc\begin{figure}[!here] 5661*ebfedea0SLionel Sambuc\begin{small} 5662*ebfedea0SLionel Sambuc\begin{center} 5663*ebfedea0SLionel Sambuc\begin{tabular}{l} 5664*ebfedea0SLionel Sambuc\hline Algorithm \textbf{Greatest Common Divisor (II)}. \\ 5665*ebfedea0SLionel Sambuc\textbf{Input}. Two positive integers $a$ and $b$ greater than zero. \\ 5666*ebfedea0SLionel Sambuc\textbf{Output}. The greatest common divisor $(a, b)$. \\ 5667*ebfedea0SLionel Sambuc\hline \\ 5668*ebfedea0SLionel Sambuc1. While ($b > 0$) do \\ 5669*ebfedea0SLionel Sambuc\hspace{3mm}1.1 Swap $a$ and $b$ such that $a$ is the smallest of the two. \\ 5670*ebfedea0SLionel Sambuc\hspace{3mm}1.2 $b \leftarrow b - a$ \\ 5671*ebfedea0SLionel Sambuc2. Return($a$). \\ 5672*ebfedea0SLionel Sambuc\hline 5673*ebfedea0SLionel Sambuc\end{tabular} 5674*ebfedea0SLionel Sambuc\end{center} 5675*ebfedea0SLionel Sambuc\end{small} 5676*ebfedea0SLionel Sambuc\caption{Algorithm Greatest Common Divisor (II)} 5677*ebfedea0SLionel Sambuc\label{fig:gcd2} 5678*ebfedea0SLionel Sambuc\end{figure} 5679*ebfedea0SLionel Sambuc 5680*ebfedea0SLionel Sambuc\textbf{Proof} \textit{Algorithm~\ref{fig:gcd2} will return the greatest common divisor of $a$ and $b$.} 5681*ebfedea0SLionel SambucThe algorithm in figure~\ref{fig:gcd2} will eventually terminate since $b \ge a$ the subtraction in step 1.2 will be a value less than $b$. In other 5682*ebfedea0SLionel Sambucwords in every iteration that tuple $\left < a, b \right >$ decrease in magnitude until eventually $a = b$. Since both $a$ and $b$ are always 5683*ebfedea0SLionel Sambucdivisible by the greatest common divisor (\textit{until the last iteration}) and in the last iteration of the algorithm $b = 0$, therefore, in the 5684*ebfedea0SLionel Sambucsecond to last iteration of the algorithm $b = a$ and clearly $(a, a) = a$ which concludes the proof. \textbf{QED}. 5685*ebfedea0SLionel Sambuc 5686*ebfedea0SLionel SambucAs a matter of practicality algorithm \ref{fig:gcd1} decreases far too slowly to be useful. Specially if $b$ is much larger than $a$ such that 5687*ebfedea0SLionel Sambuc$b - a$ is still very much larger than $a$. A simple addition to the algorithm is to divide $b - a$ by a power of some integer $p$ which does 5688*ebfedea0SLionel Sambucnot divide the greatest common divisor but will divide $b - a$. In this case ${b - a} \over p$ is also an integer and still divisible by 5689*ebfedea0SLionel Sambucthe greatest common divisor. 5690*ebfedea0SLionel Sambuc 5691*ebfedea0SLionel SambucHowever, instead of factoring $b - a$ to find a suitable value of $p$ the powers of $p$ can be removed from $a$ and $b$ that are in common first. 5692*ebfedea0SLionel SambucThen inside the loop whenever $b - a$ is divisible by some power of $p$ it can be safely removed. 5693*ebfedea0SLionel Sambuc 5694*ebfedea0SLionel Sambuc\begin{figure}[!here] 5695*ebfedea0SLionel Sambuc\begin{small} 5696*ebfedea0SLionel Sambuc\begin{center} 5697*ebfedea0SLionel Sambuc\begin{tabular}{l} 5698*ebfedea0SLionel Sambuc\hline Algorithm \textbf{Greatest Common Divisor (III)}. \\ 5699*ebfedea0SLionel Sambuc\textbf{Input}. Two positive integers $a$ and $b$ greater than zero. \\ 5700*ebfedea0SLionel Sambuc\textbf{Output}. The greatest common divisor $(a, b)$. \\ 5701*ebfedea0SLionel Sambuc\hline \\ 5702*ebfedea0SLionel Sambuc1. $k \leftarrow 0$ \\ 5703*ebfedea0SLionel Sambuc2. While $a$ and $b$ are both divisible by $p$ do \\ 5704*ebfedea0SLionel Sambuc\hspace{3mm}2.1 $a \leftarrow \lfloor a / p \rfloor$ \\ 5705*ebfedea0SLionel Sambuc\hspace{3mm}2.2 $b \leftarrow \lfloor b / p \rfloor$ \\ 5706*ebfedea0SLionel Sambuc\hspace{3mm}2.3 $k \leftarrow k + 1$ \\ 5707*ebfedea0SLionel Sambuc3. While $a$ is divisible by $p$ do \\ 5708*ebfedea0SLionel Sambuc\hspace{3mm}3.1 $a \leftarrow \lfloor a / p \rfloor$ \\ 5709*ebfedea0SLionel Sambuc4. While $b$ is divisible by $p$ do \\ 5710*ebfedea0SLionel Sambuc\hspace{3mm}4.1 $b \leftarrow \lfloor b / p \rfloor$ \\ 5711*ebfedea0SLionel Sambuc5. While ($b > 0$) do \\ 5712*ebfedea0SLionel Sambuc\hspace{3mm}5.1 Swap $a$ and $b$ such that $a$ is the smallest of the two. \\ 5713*ebfedea0SLionel Sambuc\hspace{3mm}5.2 $b \leftarrow b - a$ \\ 5714*ebfedea0SLionel Sambuc\hspace{3mm}5.3 While $b$ is divisible by $p$ do \\ 5715*ebfedea0SLionel Sambuc\hspace{6mm}5.3.1 $b \leftarrow \lfloor b / p \rfloor$ \\ 5716*ebfedea0SLionel Sambuc6. Return($a \cdot p^k$). \\ 5717*ebfedea0SLionel Sambuc\hline 5718*ebfedea0SLionel Sambuc\end{tabular} 5719*ebfedea0SLionel Sambuc\end{center} 5720*ebfedea0SLionel Sambuc\end{small} 5721*ebfedea0SLionel Sambuc\caption{Algorithm Greatest Common Divisor (III)} 5722*ebfedea0SLionel Sambuc\label{fig:gcd3} 5723*ebfedea0SLionel Sambuc\end{figure} 5724*ebfedea0SLionel Sambuc 5725*ebfedea0SLionel SambucThis algorithm is based on the first except it removes powers of $p$ first and inside the main loop to ensure the tuple $\left < a, b \right >$ 5726*ebfedea0SLionel Sambucdecreases more rapidly. The first loop on step two removes powers of $p$ that are in common. A count, $k$, is kept which will present a common 5727*ebfedea0SLionel Sambucdivisor of $p^k$. After step two the remaining common divisor of $a$ and $b$ cannot be divisible by $p$. This means that $p$ can be safely 5728*ebfedea0SLionel Sambucdivided out of the difference $b - a$ so long as the division leaves no remainder. 5729*ebfedea0SLionel Sambuc 5730*ebfedea0SLionel SambucIn particular the value of $p$ should be chosen such that the division on step 5.3.1 occur often. It also helps that division by $p$ be easy 5731*ebfedea0SLionel Sambucto compute. The ideal choice of $p$ is two since division by two amounts to a right logical shift. Another important observation is that by 5732*ebfedea0SLionel Sambucstep five both $a$ and $b$ are odd. Therefore, the diffrence $b - a$ must be even which means that each iteration removes one bit from the 5733*ebfedea0SLionel Sambuclargest of the pair. 5734*ebfedea0SLionel Sambuc 5735*ebfedea0SLionel Sambuc\subsection{Complete Greatest Common Divisor} 5736*ebfedea0SLionel SambucThe algorithms presented so far cannot handle inputs which are zero or negative. The following algorithm can handle all input cases properly 5737*ebfedea0SLionel Sambucand will produce the greatest common divisor. 5738*ebfedea0SLionel Sambuc 5739*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 5740*ebfedea0SLionel Sambuc\begin{small} 5741*ebfedea0SLionel Sambuc\begin{center} 5742*ebfedea0SLionel Sambuc\begin{tabular}{l} 5743*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_gcd}. \\ 5744*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ and $b$ \\ 5745*ebfedea0SLionel Sambuc\textbf{Output}. The greatest common divisor $c = (a, b)$. \\ 5746*ebfedea0SLionel Sambuc\hline \\ 5747*ebfedea0SLionel Sambuc1. If $a = 0$ then \\ 5748*ebfedea0SLionel Sambuc\hspace{3mm}1.1 $c \leftarrow \vert b \vert $ \\ 5749*ebfedea0SLionel Sambuc\hspace{3mm}1.2 Return(\textit{MP\_OKAY}). \\ 5750*ebfedea0SLionel Sambuc2. If $b = 0$ then \\ 5751*ebfedea0SLionel Sambuc\hspace{3mm}2.1 $c \leftarrow \vert a \vert $ \\ 5752*ebfedea0SLionel Sambuc\hspace{3mm}2.2 Return(\textit{MP\_OKAY}). \\ 5753*ebfedea0SLionel Sambuc3. $u \leftarrow \vert a \vert, v \leftarrow \vert b \vert$ \\ 5754*ebfedea0SLionel Sambuc4. $k \leftarrow 0$ \\ 5755*ebfedea0SLionel Sambuc5. While $u.used > 0$ and $v.used > 0$ and $u_0 \equiv v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ 5756*ebfedea0SLionel Sambuc\hspace{3mm}5.1 $k \leftarrow k + 1$ \\ 5757*ebfedea0SLionel Sambuc\hspace{3mm}5.2 $u \leftarrow \lfloor u / 2 \rfloor$ \\ 5758*ebfedea0SLionel Sambuc\hspace{3mm}5.3 $v \leftarrow \lfloor v / 2 \rfloor$ \\ 5759*ebfedea0SLionel Sambuc6. While $u.used > 0$ and $u_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ 5760*ebfedea0SLionel Sambuc\hspace{3mm}6.1 $u \leftarrow \lfloor u / 2 \rfloor$ \\ 5761*ebfedea0SLionel Sambuc7. While $v.used > 0$ and $v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ 5762*ebfedea0SLionel Sambuc\hspace{3mm}7.1 $v \leftarrow \lfloor v / 2 \rfloor$ \\ 5763*ebfedea0SLionel Sambuc8. While $v.used > 0$ \\ 5764*ebfedea0SLionel Sambuc\hspace{3mm}8.1 If $\vert u \vert > \vert v \vert$ then \\ 5765*ebfedea0SLionel Sambuc\hspace{6mm}8.1.1 Swap $u$ and $v$. \\ 5766*ebfedea0SLionel Sambuc\hspace{3mm}8.2 $v \leftarrow \vert v \vert - \vert u \vert$ \\ 5767*ebfedea0SLionel Sambuc\hspace{3mm}8.3 While $v.used > 0$ and $v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ 5768*ebfedea0SLionel Sambuc\hspace{6mm}8.3.1 $v \leftarrow \lfloor v / 2 \rfloor$ \\ 5769*ebfedea0SLionel Sambuc9. $c \leftarrow u \cdot 2^k$ \\ 5770*ebfedea0SLionel Sambuc10. Return(\textit{MP\_OKAY}). \\ 5771*ebfedea0SLionel Sambuc\hline 5772*ebfedea0SLionel Sambuc\end{tabular} 5773*ebfedea0SLionel Sambuc\end{center} 5774*ebfedea0SLionel Sambuc\end{small} 5775*ebfedea0SLionel Sambuc\caption{Algorithm mp\_gcd} 5776*ebfedea0SLionel Sambuc\end{figure} 5777*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_gcd.} 5778*ebfedea0SLionel SambucThis algorithm will produce the greatest common divisor of two mp\_ints $a$ and $b$. The algorithm was originally based on Algorithm B of 5779*ebfedea0SLionel SambucKnuth \cite[pp. 338]{TAOCPV2} but has been modified to be simpler to explain. In theory it achieves the same asymptotic working time as 5780*ebfedea0SLionel SambucAlgorithm B and in practice this appears to be true. 5781*ebfedea0SLionel Sambuc 5782*ebfedea0SLionel SambucThe first two steps handle the cases where either one of or both inputs are zero. If either input is zero the greatest common divisor is the 5783*ebfedea0SLionel Sambuclargest input or zero if they are both zero. If the inputs are not trivial than $u$ and $v$ are assigned the absolute values of 5784*ebfedea0SLionel Sambuc$a$ and $b$ respectively and the algorithm will proceed to reduce the pair. 5785*ebfedea0SLionel Sambuc 5786*ebfedea0SLionel SambucStep five will divide out any common factors of two and keep track of the count in the variable $k$. After this step, two is no longer a 5787*ebfedea0SLionel Sambucfactor of the remaining greatest common divisor between $u$ and $v$ and can be safely evenly divided out of either whenever they are even. Step 5788*ebfedea0SLionel Sambucsix and seven ensure that the $u$ and $v$ respectively have no more factors of two. At most only one of the while--loops will iterate since 5789*ebfedea0SLionel Sambucthey cannot both be even. 5790*ebfedea0SLionel Sambuc 5791*ebfedea0SLionel SambucBy step eight both of $u$ and $v$ are odd which is required for the inner logic. First the pair are swapped such that $v$ is equal to 5792*ebfedea0SLionel Sambucor greater than $u$. This ensures that the subtraction on step 8.2 will always produce a positive and even result. Step 8.3 removes any 5793*ebfedea0SLionel Sambucfactors of two from the difference $u$ to ensure that in the next iteration of the loop both are once again odd. 5794*ebfedea0SLionel Sambuc 5795*ebfedea0SLionel SambucAfter $v = 0$ occurs the variable $u$ has the greatest common divisor of the pair $\left < u, v \right >$ just after step six. The result 5796*ebfedea0SLionel Sambucmust be adjusted by multiplying by the common factors of two ($2^k$) removed earlier. 5797*ebfedea0SLionel Sambuc 5798*ebfedea0SLionel SambucEXAM,bn_mp_gcd.c 5799*ebfedea0SLionel Sambuc 5800*ebfedea0SLionel SambucThis function makes use of the macros mp\_iszero and mp\_iseven. The former evaluates to $1$ if the input mp\_int is equivalent to the 5801*ebfedea0SLionel Sambucinteger zero otherwise it evaluates to $0$. The latter evaluates to $1$ if the input mp\_int represents a non-zero even integer otherwise 5802*ebfedea0SLionel Sambucit evaluates to $0$. Note that just because mp\_iseven may evaluate to $0$ does not mean the input is odd, it could also be zero. The three 5803*ebfedea0SLionel Sambuctrivial cases of inputs are handled on lines @23,zero@ through @29,}@. After those lines the inputs are assumed to be non-zero. 5804*ebfedea0SLionel Sambuc 5805*ebfedea0SLionel SambucLines @32,if@ and @36,if@ make local copies $u$ and $v$ of the inputs $a$ and $b$ respectively. At this point the common factors of two 5806*ebfedea0SLionel Sambucmust be divided out of the two inputs. The block starting at line @43,common@ removes common factors of two by first counting the number of trailing 5807*ebfedea0SLionel Sambuczero bits in both. The local integer $k$ is used to keep track of how many factors of $2$ are pulled out of both values. It is assumed that 5808*ebfedea0SLionel Sambucthe number of factors will not exceed the maximum value of a C ``int'' data type\footnote{Strictly speaking no array in C may have more than 5809*ebfedea0SLionel Sambucentries than are accessible by an ``int'' so this is not a limitation.}. 5810*ebfedea0SLionel Sambuc 5811*ebfedea0SLionel SambucAt this point there are no more common factors of two in the two values. The divisions by a power of two on lines @60,div_2d@ and @67,div_2d@ remove 5812*ebfedea0SLionel Sambucany independent factors of two such that both $u$ and $v$ are guaranteed to be an odd integer before hitting the main body of the algorithm. The while loop 5813*ebfedea0SLionel Sambucon line @72, while@ performs the reduction of the pair until $v$ is equal to zero. The unsigned comparison and subtraction algorithms are used in 5814*ebfedea0SLionel Sambucplace of the full signed routines since both values are guaranteed to be positive and the result of the subtraction is guaranteed to be non-negative. 5815*ebfedea0SLionel Sambuc 5816*ebfedea0SLionel Sambuc\section{Least Common Multiple} 5817*ebfedea0SLionel SambucThe least common multiple of a pair of integers is their product divided by their greatest common divisor. For two integers $a$ and $b$ the 5818*ebfedea0SLionel Sambucleast common multiple is normally denoted as $[ a, b ]$ and numerically equivalent to ${ab} \over {(a, b)}$. For example, if $a = 2 \cdot 2 \cdot 3 = 12$ 5819*ebfedea0SLionel Sambucand $b = 2 \cdot 3 \cdot 3 \cdot 7 = 126$ the least common multiple is ${126 \over {(12, 126)}} = {126 \over 6} = 21$. 5820*ebfedea0SLionel Sambuc 5821*ebfedea0SLionel SambucThe least common multiple arises often in coding theory as well as number theory. If two functions have periods of $a$ and $b$ respectively they will 5822*ebfedea0SLionel Sambuccollide, that is be in synchronous states, after only $[ a, b ]$ iterations. This is why, for example, random number generators based on 5823*ebfedea0SLionel SambucLinear Feedback Shift Registers (LFSR) tend to use registers with periods which are co-prime (\textit{e.g. the greatest common divisor is one.}). 5824*ebfedea0SLionel SambucSimilarly in number theory if a composite $n$ has two prime factors $p$ and $q$ then maximal order of any unit of $\Z/n\Z$ will be $[ p - 1, q - 1] $. 5825*ebfedea0SLionel Sambuc 5826*ebfedea0SLionel Sambuc\begin{figure}[!here] 5827*ebfedea0SLionel Sambuc\begin{small} 5828*ebfedea0SLionel Sambuc\begin{center} 5829*ebfedea0SLionel Sambuc\begin{tabular}{l} 5830*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_lcm}. \\ 5831*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ and $b$ \\ 5832*ebfedea0SLionel Sambuc\textbf{Output}. The least common multiple $c = [a, b]$. \\ 5833*ebfedea0SLionel Sambuc\hline \\ 5834*ebfedea0SLionel Sambuc1. $c \leftarrow (a, b)$ \\ 5835*ebfedea0SLionel Sambuc2. $t \leftarrow a \cdot b$ \\ 5836*ebfedea0SLionel Sambuc3. $c \leftarrow \lfloor t / c \rfloor$ \\ 5837*ebfedea0SLionel Sambuc4. Return(\textit{MP\_OKAY}). \\ 5838*ebfedea0SLionel Sambuc\hline 5839*ebfedea0SLionel Sambuc\end{tabular} 5840*ebfedea0SLionel Sambuc\end{center} 5841*ebfedea0SLionel Sambuc\end{small} 5842*ebfedea0SLionel Sambuc\caption{Algorithm mp\_lcm} 5843*ebfedea0SLionel Sambuc\end{figure} 5844*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_lcm.} 5845*ebfedea0SLionel SambucThis algorithm computes the least common multiple of two mp\_int inputs $a$ and $b$. It computes the least common multiple directly by 5846*ebfedea0SLionel Sambucdividing the product of the two inputs by their greatest common divisor. 5847*ebfedea0SLionel Sambuc 5848*ebfedea0SLionel SambucEXAM,bn_mp_lcm.c 5849*ebfedea0SLionel Sambuc 5850*ebfedea0SLionel Sambuc\section{Jacobi Symbol Computation} 5851*ebfedea0SLionel SambucTo explain the Jacobi Symbol we shall first discuss the Legendre function\footnote{Arrg. What is the name of this?} off which the Jacobi symbol is 5852*ebfedea0SLionel Sambucdefined. The Legendre function computes whether or not an integer $a$ is a quadratic residue modulo an odd prime $p$. Numerically it is 5853*ebfedea0SLionel Sambucequivalent to equation \ref{eqn:legendre}. 5854*ebfedea0SLionel Sambuc 5855*ebfedea0SLionel Sambuc\textit{-- Tom, don't be an ass, cite your source here...!} 5856*ebfedea0SLionel Sambuc 5857*ebfedea0SLionel Sambuc\begin{equation} 5858*ebfedea0SLionel Sambuca^{(p-1)/2} \equiv \begin{array}{rl} 5859*ebfedea0SLionel Sambuc -1 & \mbox{if }a\mbox{ is a quadratic non-residue.} \\ 5860*ebfedea0SLionel Sambuc 0 & \mbox{if }a\mbox{ divides }p\mbox{.} \\ 5861*ebfedea0SLionel Sambuc 1 & \mbox{if }a\mbox{ is a quadratic residue}. 5862*ebfedea0SLionel Sambuc \end{array} \mbox{ (mod }p\mbox{)} 5863*ebfedea0SLionel Sambuc\label{eqn:legendre} 5864*ebfedea0SLionel Sambuc\end{equation} 5865*ebfedea0SLionel Sambuc 5866*ebfedea0SLionel Sambuc\textbf{Proof.} \textit{Equation \ref{eqn:legendre} correctly identifies the residue status of an integer $a$ modulo a prime $p$.} 5867*ebfedea0SLionel SambucAn integer $a$ is a quadratic residue if the following equation has a solution. 5868*ebfedea0SLionel Sambuc 5869*ebfedea0SLionel Sambuc\begin{equation} 5870*ebfedea0SLionel Sambucx^2 \equiv a \mbox{ (mod }p\mbox{)} 5871*ebfedea0SLionel Sambuc\label{eqn:root} 5872*ebfedea0SLionel Sambuc\end{equation} 5873*ebfedea0SLionel Sambuc 5874*ebfedea0SLionel SambucConsider the following equation. 5875*ebfedea0SLionel Sambuc 5876*ebfedea0SLionel Sambuc\begin{equation} 5877*ebfedea0SLionel Sambuc0 \equiv x^{p-1} - 1 \equiv \left \lbrace \left (x^2 \right )^{(p-1)/2} - a^{(p-1)/2} \right \rbrace + \left ( a^{(p-1)/2} - 1 \right ) \mbox{ (mod }p\mbox{)} 5878*ebfedea0SLionel Sambuc\label{eqn:rooti} 5879*ebfedea0SLionel Sambuc\end{equation} 5880*ebfedea0SLionel Sambuc 5881*ebfedea0SLionel SambucWhether equation \ref{eqn:root} has a solution or not equation \ref{eqn:rooti} is always true. If $a^{(p-1)/2} - 1 \equiv 0 \mbox{ (mod }p\mbox{)}$ 5882*ebfedea0SLionel Sambucthen the quantity in the braces must be zero. By reduction, 5883*ebfedea0SLionel Sambuc 5884*ebfedea0SLionel Sambuc\begin{eqnarray} 5885*ebfedea0SLionel Sambuc\left (x^2 \right )^{(p-1)/2} - a^{(p-1)/2} \equiv 0 \nonumber \\ 5886*ebfedea0SLionel Sambuc\left (x^2 \right )^{(p-1)/2} \equiv a^{(p-1)/2} \nonumber \\ 5887*ebfedea0SLionel Sambucx^2 \equiv a \mbox{ (mod }p\mbox{)} 5888*ebfedea0SLionel Sambuc\end{eqnarray} 5889*ebfedea0SLionel Sambuc 5890*ebfedea0SLionel SambucAs a result there must be a solution to the quadratic equation and in turn $a$ must be a quadratic residue. If $a$ does not divide $p$ and $a$ 5891*ebfedea0SLionel Sambucis not a quadratic residue then the only other value $a^{(p-1)/2}$ may be congruent to is $-1$ since 5892*ebfedea0SLionel Sambuc\begin{equation} 5893*ebfedea0SLionel Sambuc0 \equiv a^{p - 1} - 1 \equiv (a^{(p-1)/2} + 1)(a^{(p-1)/2} - 1) \mbox{ (mod }p\mbox{)} 5894*ebfedea0SLionel Sambuc\end{equation} 5895*ebfedea0SLionel SambucOne of the terms on the right hand side must be zero. \textbf{QED} 5896*ebfedea0SLionel Sambuc 5897*ebfedea0SLionel Sambuc\subsection{Jacobi Symbol} 5898*ebfedea0SLionel SambucThe Jacobi symbol is a generalization of the Legendre function for any odd non prime moduli $p$ greater than 2. If $p = \prod_{i=0}^n p_i$ then 5899*ebfedea0SLionel Sambucthe Jacobi symbol $\left ( { a \over p } \right )$ is equal to the following equation. 5900*ebfedea0SLionel Sambuc 5901*ebfedea0SLionel Sambuc\begin{equation} 5902*ebfedea0SLionel Sambuc\left ( { a \over p } \right ) = \left ( { a \over p_0} \right ) \left ( { a \over p_1} \right ) \ldots \left ( { a \over p_n} \right ) 5903*ebfedea0SLionel Sambuc\end{equation} 5904*ebfedea0SLionel Sambuc 5905*ebfedea0SLionel SambucBy inspection if $p$ is prime the Jacobi symbol is equivalent to the Legendre function. The following facts\footnote{See HAC \cite[pp. 72-74]{HAC} for 5906*ebfedea0SLionel Sambucfurther details.} will be used to derive an efficient Jacobi symbol algorithm. Where $p$ is an odd integer greater than two and $a, b \in \Z$ the 5907*ebfedea0SLionel Sambucfollowing are true. 5908*ebfedea0SLionel Sambuc 5909*ebfedea0SLionel Sambuc\begin{enumerate} 5910*ebfedea0SLionel Sambuc\item $\left ( { a \over p} \right )$ equals $-1$, $0$ or $1$. 5911*ebfedea0SLionel Sambuc\item $\left ( { ab \over p} \right ) = \left ( { a \over p} \right )\left ( { b \over p} \right )$. 5912*ebfedea0SLionel Sambuc\item If $a \equiv b$ then $\left ( { a \over p} \right ) = \left ( { b \over p} \right )$. 5913*ebfedea0SLionel Sambuc\item $\left ( { 2 \over p} \right )$ equals $1$ if $p \equiv 1$ or $7 \mbox{ (mod }8\mbox{)}$. Otherwise, it equals $-1$. 5914*ebfedea0SLionel Sambuc\item $\left ( { a \over p} \right ) \equiv \left ( { p \over a} \right ) \cdot (-1)^{(p-1)(a-1)/4}$. More specifically 5915*ebfedea0SLionel Sambuc$\left ( { a \over p} \right ) = \left ( { p \over a} \right )$ if $p \equiv a \equiv 1 \mbox{ (mod }4\mbox{)}$. 5916*ebfedea0SLionel Sambuc\end{enumerate} 5917*ebfedea0SLionel Sambuc 5918*ebfedea0SLionel SambucUsing these facts if $a = 2^k \cdot a'$ then 5919*ebfedea0SLionel Sambuc 5920*ebfedea0SLionel Sambuc\begin{eqnarray} 5921*ebfedea0SLionel Sambuc\left ( { a \over p } \right ) = \left ( {{2^k} \over p } \right ) \left ( {a' \over p} \right ) \nonumber \\ 5922*ebfedea0SLionel Sambuc = \left ( {2 \over p } \right )^k \left ( {a' \over p} \right ) 5923*ebfedea0SLionel Sambuc\label{eqn:jacobi} 5924*ebfedea0SLionel Sambuc\end{eqnarray} 5925*ebfedea0SLionel Sambuc 5926*ebfedea0SLionel SambucBy fact five, 5927*ebfedea0SLionel Sambuc 5928*ebfedea0SLionel Sambuc\begin{equation} 5929*ebfedea0SLionel Sambuc\left ( { a \over p } \right ) = \left ( { p \over a } \right ) \cdot (-1)^{(p-1)(a-1)/4} 5930*ebfedea0SLionel Sambuc\end{equation} 5931*ebfedea0SLionel Sambuc 5932*ebfedea0SLionel SambucSubsequently by fact three since $p \equiv (p \mbox{ mod }a) \mbox{ (mod }a\mbox{)}$ then 5933*ebfedea0SLionel Sambuc 5934*ebfedea0SLionel Sambuc\begin{equation} 5935*ebfedea0SLionel Sambuc\left ( { a \over p } \right ) = \left ( { {p \mbox{ mod } a} \over a } \right ) \cdot (-1)^{(p-1)(a-1)/4} 5936*ebfedea0SLionel Sambuc\end{equation} 5937*ebfedea0SLionel Sambuc 5938*ebfedea0SLionel SambucBy putting both observations into equation \ref{eqn:jacobi} the following simplified equation is formed. 5939*ebfedea0SLionel Sambuc 5940*ebfedea0SLionel Sambuc\begin{equation} 5941*ebfedea0SLionel Sambuc\left ( { a \over p } \right ) = \left ( {2 \over p } \right )^k \left ( {{p\mbox{ mod }a'} \over a'} \right ) \cdot (-1)^{(p-1)(a'-1)/4} 5942*ebfedea0SLionel Sambuc\end{equation} 5943*ebfedea0SLionel Sambuc 5944*ebfedea0SLionel SambucThe value of $\left ( {{p \mbox{ mod }a'} \over a'} \right )$ can be found by using the same equation recursively. The value of 5945*ebfedea0SLionel Sambuc$\left ( {2 \over p } \right )^k$ equals $1$ if $k$ is even otherwise it equals $\left ( {2 \over p } \right )$. Using this approach the 5946*ebfedea0SLionel Sambucfactors of $p$ do not have to be known. Furthermore, if $(a, p) = 1$ then the algorithm will terminate when the recursion requests the 5947*ebfedea0SLionel SambucJacobi symbol computation of $\left ( {1 \over a'} \right )$ which is simply $1$. 5948*ebfedea0SLionel Sambuc 5949*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 5950*ebfedea0SLionel Sambuc\begin{small} 5951*ebfedea0SLionel Sambuc\begin{center} 5952*ebfedea0SLionel Sambuc\begin{tabular}{l} 5953*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_jacobi}. \\ 5954*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ and $p$, $a \ge 0$, $p \ge 3$, $p \equiv 1 \mbox{ (mod }2\mbox{)}$ \\ 5955*ebfedea0SLionel Sambuc\textbf{Output}. The Jacobi symbol $c = \left ( {a \over p } \right )$. \\ 5956*ebfedea0SLionel Sambuc\hline \\ 5957*ebfedea0SLionel Sambuc1. If $a = 0$ then \\ 5958*ebfedea0SLionel Sambuc\hspace{3mm}1.1 $c \leftarrow 0$ \\ 5959*ebfedea0SLionel Sambuc\hspace{3mm}1.2 Return(\textit{MP\_OKAY}). \\ 5960*ebfedea0SLionel Sambuc2. If $a = 1$ then \\ 5961*ebfedea0SLionel Sambuc\hspace{3mm}2.1 $c \leftarrow 1$ \\ 5962*ebfedea0SLionel Sambuc\hspace{3mm}2.2 Return(\textit{MP\_OKAY}). \\ 5963*ebfedea0SLionel Sambuc3. $a' \leftarrow a$ \\ 5964*ebfedea0SLionel Sambuc4. $k \leftarrow 0$ \\ 5965*ebfedea0SLionel Sambuc5. While $a'.used > 0$ and $a'_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ 5966*ebfedea0SLionel Sambuc\hspace{3mm}5.1 $k \leftarrow k + 1$ \\ 5967*ebfedea0SLionel Sambuc\hspace{3mm}5.2 $a' \leftarrow \lfloor a' / 2 \rfloor$ \\ 5968*ebfedea0SLionel Sambuc6. If $k \equiv 0 \mbox{ (mod }2\mbox{)}$ then \\ 5969*ebfedea0SLionel Sambuc\hspace{3mm}6.1 $s \leftarrow 1$ \\ 5970*ebfedea0SLionel Sambuc7. else \\ 5971*ebfedea0SLionel Sambuc\hspace{3mm}7.1 $r \leftarrow p_0 \mbox{ (mod }8\mbox{)}$ \\ 5972*ebfedea0SLionel Sambuc\hspace{3mm}7.2 If $r = 1$ or $r = 7$ then \\ 5973*ebfedea0SLionel Sambuc\hspace{6mm}7.2.1 $s \leftarrow 1$ \\ 5974*ebfedea0SLionel Sambuc\hspace{3mm}7.3 else \\ 5975*ebfedea0SLionel Sambuc\hspace{6mm}7.3.1 $s \leftarrow -1$ \\ 5976*ebfedea0SLionel Sambuc8. If $p_0 \equiv a'_0 \equiv 3 \mbox{ (mod }4\mbox{)}$ then \\ 5977*ebfedea0SLionel Sambuc\hspace{3mm}8.1 $s \leftarrow -s$ \\ 5978*ebfedea0SLionel Sambuc9. If $a' \ne 1$ then \\ 5979*ebfedea0SLionel Sambuc\hspace{3mm}9.1 $p' \leftarrow p \mbox{ (mod }a'\mbox{)}$ \\ 5980*ebfedea0SLionel Sambuc\hspace{3mm}9.2 $s \leftarrow s \cdot \mbox{mp\_jacobi}(p', a')$ \\ 5981*ebfedea0SLionel Sambuc10. $c \leftarrow s$ \\ 5982*ebfedea0SLionel Sambuc11. Return(\textit{MP\_OKAY}). \\ 5983*ebfedea0SLionel Sambuc\hline 5984*ebfedea0SLionel Sambuc\end{tabular} 5985*ebfedea0SLionel Sambuc\end{center} 5986*ebfedea0SLionel Sambuc\end{small} 5987*ebfedea0SLionel Sambuc\caption{Algorithm mp\_jacobi} 5988*ebfedea0SLionel Sambuc\end{figure} 5989*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_jacobi.} 5990*ebfedea0SLionel SambucThis algorithm computes the Jacobi symbol for an arbitrary positive integer $a$ with respect to an odd integer $p$ greater than three. The algorithm 5991*ebfedea0SLionel Sambucis based on algorithm 2.149 of HAC \cite[pp. 73]{HAC}. 5992*ebfedea0SLionel Sambuc 5993*ebfedea0SLionel SambucStep numbers one and two handle the trivial cases of $a = 0$ and $a = 1$ respectively. Step five determines the number of two factors in the 5994*ebfedea0SLionel Sambucinput $a$. If $k$ is even than the term $\left ( { 2 \over p } \right )^k$ must always evaluate to one. If $k$ is odd than the term evaluates to one 5995*ebfedea0SLionel Sambucif $p_0$ is congruent to one or seven modulo eight, otherwise it evaluates to $-1$. After the the $\left ( { 2 \over p } \right )^k$ term is handled 5996*ebfedea0SLionel Sambucthe $(-1)^{(p-1)(a'-1)/4}$ is computed and multiplied against the current product $s$. The latter term evaluates to one if both $p$ and $a'$ 5997*ebfedea0SLionel Sambucare congruent to one modulo four, otherwise it evaluates to negative one. 5998*ebfedea0SLionel Sambuc 5999*ebfedea0SLionel SambucBy step nine if $a'$ does not equal one a recursion is required. Step 9.1 computes $p' \equiv p \mbox{ (mod }a'\mbox{)}$ and will recurse to compute 6000*ebfedea0SLionel Sambuc$\left ( {p' \over a'} \right )$ which is multiplied against the current Jacobi product. 6001*ebfedea0SLionel Sambuc 6002*ebfedea0SLionel SambucEXAM,bn_mp_jacobi.c 6003*ebfedea0SLionel Sambuc 6004*ebfedea0SLionel SambucAs a matter of practicality the variable $a'$ as per the pseudo-code is reprensented by the variable $a1$ since the $'$ symbol is not valid for a C 6005*ebfedea0SLionel Sambucvariable name character. 6006*ebfedea0SLionel Sambuc 6007*ebfedea0SLionel SambucThe two simple cases of $a = 0$ and $a = 1$ are handled at the very beginning to simplify the algorithm. If the input is non-trivial the algorithm 6008*ebfedea0SLionel Sambuchas to proceed compute the Jacobi. The variable $s$ is used to hold the current Jacobi product. Note that $s$ is merely a C ``int'' data type since 6009*ebfedea0SLionel Sambucthe values it may obtain are merely $-1$, $0$ and $1$. 6010*ebfedea0SLionel Sambuc 6011*ebfedea0SLionel SambucAfter a local copy of $a$ is made all of the factors of two are divided out and the total stored in $k$. Technically only the least significant 6012*ebfedea0SLionel Sambucbit of $k$ is required, however, it makes the algorithm simpler to follow to perform an addition. In practice an exclusive-or and addition have the same 6013*ebfedea0SLionel Sambucprocessor requirements and neither is faster than the other. 6014*ebfedea0SLionel Sambuc 6015*ebfedea0SLionel SambucLine @59, if@ through @70, }@ determines the value of $\left ( { 2 \over p } \right )^k$. If the least significant bit of $k$ is zero than 6016*ebfedea0SLionel Sambuc$k$ is even and the value is one. Otherwise, the value of $s$ depends on which residue class $p$ belongs to modulo eight. The value of 6017*ebfedea0SLionel Sambuc$(-1)^{(p-1)(a'-1)/4}$ is compute and multiplied against $s$ on lines @73, if@ through @75, }@. 6018*ebfedea0SLionel Sambuc 6019*ebfedea0SLionel SambucFinally, if $a1$ does not equal one the algorithm must recurse and compute $\left ( {p' \over a'} \right )$. 6020*ebfedea0SLionel Sambuc 6021*ebfedea0SLionel Sambuc\textit{-- Comment about default $s$ and such...} 6022*ebfedea0SLionel Sambuc 6023*ebfedea0SLionel Sambuc\section{Modular Inverse} 6024*ebfedea0SLionel Sambuc\label{sec:modinv} 6025*ebfedea0SLionel SambucThe modular inverse of a number actually refers to the modular multiplicative inverse. Essentially for any integer $a$ such that $(a, p) = 1$ there 6026*ebfedea0SLionel Sambucexist another integer $b$ such that $ab \equiv 1 \mbox{ (mod }p\mbox{)}$. The integer $b$ is called the multiplicative inverse of $a$ which is 6027*ebfedea0SLionel Sambucdenoted as $b = a^{-1}$. Technically speaking modular inversion is a well defined operation for any finite ring or field not just for rings and 6028*ebfedea0SLionel Sambucfields of integers. However, the former will be the matter of discussion. 6029*ebfedea0SLionel Sambuc 6030*ebfedea0SLionel SambucThe simplest approach is to compute the algebraic inverse of the input. That is to compute $b \equiv a^{\Phi(p) - 1}$. If $\Phi(p)$ is the 6031*ebfedea0SLionel Sambucorder of the multiplicative subgroup modulo $p$ then $b$ must be the multiplicative inverse of $a$. The proof of which is trivial. 6032*ebfedea0SLionel Sambuc 6033*ebfedea0SLionel Sambuc\begin{equation} 6034*ebfedea0SLionel Sambucab \equiv a \left (a^{\Phi(p) - 1} \right ) \equiv a^{\Phi(p)} \equiv a^0 \equiv 1 \mbox{ (mod }p\mbox{)} 6035*ebfedea0SLionel Sambuc\end{equation} 6036*ebfedea0SLionel Sambuc 6037*ebfedea0SLionel SambucHowever, as simple as this approach may be it has two serious flaws. It requires that the value of $\Phi(p)$ be known which if $p$ is composite 6038*ebfedea0SLionel Sambucrequires all of the prime factors. This approach also is very slow as the size of $p$ grows. 6039*ebfedea0SLionel Sambuc 6040*ebfedea0SLionel SambucA simpler approach is based on the observation that solving for the multiplicative inverse is equivalent to solving the linear 6041*ebfedea0SLionel SambucDiophantine\footnote{See LeVeque \cite[pp. 40-43]{LeVeque} for more information.} equation. 6042*ebfedea0SLionel Sambuc 6043*ebfedea0SLionel Sambuc\begin{equation} 6044*ebfedea0SLionel Sambucab + pq = 1 6045*ebfedea0SLionel Sambuc\end{equation} 6046*ebfedea0SLionel Sambuc 6047*ebfedea0SLionel SambucWhere $a$, $b$, $p$ and $q$ are all integers. If such a pair of integers $ \left < b, q \right >$ exist than $b$ is the multiplicative inverse of 6048*ebfedea0SLionel Sambuc$a$ modulo $p$. The extended Euclidean algorithm (Knuth \cite[pp. 342]{TAOCPV2}) can be used to solve such equations provided $(a, p) = 1$. 6049*ebfedea0SLionel SambucHowever, instead of using that algorithm directly a variant known as the binary Extended Euclidean algorithm will be used in its place. The 6050*ebfedea0SLionel Sambucbinary approach is very similar to the binary greatest common divisor algorithm except it will produce a full solution to the Diophantine 6051*ebfedea0SLionel Sambucequation. 6052*ebfedea0SLionel Sambuc 6053*ebfedea0SLionel Sambuc\subsection{General Case} 6054*ebfedea0SLionel Sambuc\newpage\begin{figure}[!here] 6055*ebfedea0SLionel Sambuc\begin{small} 6056*ebfedea0SLionel Sambuc\begin{center} 6057*ebfedea0SLionel Sambuc\begin{tabular}{l} 6058*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_invmod}. \\ 6059*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ and $b$, $(a, b) = 1$, $p \ge 2$, $0 < a < p$. \\ 6060*ebfedea0SLionel Sambuc\textbf{Output}. The modular inverse $c \equiv a^{-1} \mbox{ (mod }b\mbox{)}$. \\ 6061*ebfedea0SLionel Sambuc\hline \\ 6062*ebfedea0SLionel Sambuc1. If $b \le 0$ then return(\textit{MP\_VAL}). \\ 6063*ebfedea0SLionel Sambuc2. If $b_0 \equiv 1 \mbox{ (mod }2\mbox{)}$ then use algorithm fast\_mp\_invmod. \\ 6064*ebfedea0SLionel Sambuc3. $x \leftarrow \vert a \vert, y \leftarrow b$ \\ 6065*ebfedea0SLionel Sambuc4. If $x_0 \equiv y_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ then return(\textit{MP\_VAL}). \\ 6066*ebfedea0SLionel Sambuc5. $B \leftarrow 0, C \leftarrow 0, A \leftarrow 1, D \leftarrow 1$ \\ 6067*ebfedea0SLionel Sambuc6. While $u.used > 0$ and $u_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ 6068*ebfedea0SLionel Sambuc\hspace{3mm}6.1 $u \leftarrow \lfloor u / 2 \rfloor$ \\ 6069*ebfedea0SLionel Sambuc\hspace{3mm}6.2 If ($A.used > 0$ and $A_0 \equiv 1 \mbox{ (mod }2\mbox{)}$) or ($B.used > 0$ and $B_0 \equiv 1 \mbox{ (mod }2\mbox{)}$) then \\ 6070*ebfedea0SLionel Sambuc\hspace{6mm}6.2.1 $A \leftarrow A + y$ \\ 6071*ebfedea0SLionel Sambuc\hspace{6mm}6.2.2 $B \leftarrow B - x$ \\ 6072*ebfedea0SLionel Sambuc\hspace{3mm}6.3 $A \leftarrow \lfloor A / 2 \rfloor$ \\ 6073*ebfedea0SLionel Sambuc\hspace{3mm}6.4 $B \leftarrow \lfloor B / 2 \rfloor$ \\ 6074*ebfedea0SLionel Sambuc7. While $v.used > 0$ and $v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ 6075*ebfedea0SLionel Sambuc\hspace{3mm}7.1 $v \leftarrow \lfloor v / 2 \rfloor$ \\ 6076*ebfedea0SLionel Sambuc\hspace{3mm}7.2 If ($C.used > 0$ and $C_0 \equiv 1 \mbox{ (mod }2\mbox{)}$) or ($D.used > 0$ and $D_0 \equiv 1 \mbox{ (mod }2\mbox{)}$) then \\ 6077*ebfedea0SLionel Sambuc\hspace{6mm}7.2.1 $C \leftarrow C + y$ \\ 6078*ebfedea0SLionel Sambuc\hspace{6mm}7.2.2 $D \leftarrow D - x$ \\ 6079*ebfedea0SLionel Sambuc\hspace{3mm}7.3 $C \leftarrow \lfloor C / 2 \rfloor$ \\ 6080*ebfedea0SLionel Sambuc\hspace{3mm}7.4 $D \leftarrow \lfloor D / 2 \rfloor$ \\ 6081*ebfedea0SLionel Sambuc8. If $u \ge v$ then \\ 6082*ebfedea0SLionel Sambuc\hspace{3mm}8.1 $u \leftarrow u - v$ \\ 6083*ebfedea0SLionel Sambuc\hspace{3mm}8.2 $A \leftarrow A - C$ \\ 6084*ebfedea0SLionel Sambuc\hspace{3mm}8.3 $B \leftarrow B - D$ \\ 6085*ebfedea0SLionel Sambuc9. else \\ 6086*ebfedea0SLionel Sambuc\hspace{3mm}9.1 $v \leftarrow v - u$ \\ 6087*ebfedea0SLionel Sambuc\hspace{3mm}9.2 $C \leftarrow C - A$ \\ 6088*ebfedea0SLionel Sambuc\hspace{3mm}9.3 $D \leftarrow D - B$ \\ 6089*ebfedea0SLionel Sambuc10. If $u \ne 0$ goto step 6. \\ 6090*ebfedea0SLionel Sambuc11. If $v \ne 1$ return(\textit{MP\_VAL}). \\ 6091*ebfedea0SLionel Sambuc12. While $C \le 0$ do \\ 6092*ebfedea0SLionel Sambuc\hspace{3mm}12.1 $C \leftarrow C + b$ \\ 6093*ebfedea0SLionel Sambuc13. While $C \ge b$ do \\ 6094*ebfedea0SLionel Sambuc\hspace{3mm}13.1 $C \leftarrow C - b$ \\ 6095*ebfedea0SLionel Sambuc14. $c \leftarrow C$ \\ 6096*ebfedea0SLionel Sambuc15. Return(\textit{MP\_OKAY}). \\ 6097*ebfedea0SLionel Sambuc\hline 6098*ebfedea0SLionel Sambuc\end{tabular} 6099*ebfedea0SLionel Sambuc\end{center} 6100*ebfedea0SLionel Sambuc\end{small} 6101*ebfedea0SLionel Sambuc\end{figure} 6102*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_invmod.} 6103*ebfedea0SLionel SambucThis algorithm computes the modular multiplicative inverse of an integer $a$ modulo an integer $b$. This algorithm is a variation of the 6104*ebfedea0SLionel Sambucextended binary Euclidean algorithm from HAC \cite[pp. 608]{HAC}. It has been modified to only compute the modular inverse and not a complete 6105*ebfedea0SLionel SambucDiophantine solution. 6106*ebfedea0SLionel Sambuc 6107*ebfedea0SLionel SambucIf $b \le 0$ than the modulus is invalid and MP\_VAL is returned. Similarly if both $a$ and $b$ are even then there cannot be a multiplicative 6108*ebfedea0SLionel Sambucinverse for $a$ and the error is reported. 6109*ebfedea0SLionel Sambuc 6110*ebfedea0SLionel SambucThe astute reader will observe that steps seven through nine are very similar to the binary greatest common divisor algorithm mp\_gcd. In this case 6111*ebfedea0SLionel Sambucthe other variables to the Diophantine equation are solved. The algorithm terminates when $u = 0$ in which case the solution is 6112*ebfedea0SLionel Sambuc 6113*ebfedea0SLionel Sambuc\begin{equation} 6114*ebfedea0SLionel SambucCa + Db = v 6115*ebfedea0SLionel Sambuc\end{equation} 6116*ebfedea0SLionel Sambuc 6117*ebfedea0SLionel SambucIf $v$, the greatest common divisor of $a$ and $b$ is not equal to one then the algorithm will report an error as no inverse exists. Otherwise, $C$ 6118*ebfedea0SLionel Sambucis the modular inverse of $a$. The actual value of $C$ is congruent to, but not necessarily equal to, the ideal modular inverse which should lie 6119*ebfedea0SLionel Sambucwithin $1 \le a^{-1} < b$. Step numbers twelve and thirteen adjust the inverse until it is in range. If the original input $a$ is within $0 < a < p$ 6120*ebfedea0SLionel Sambucthen only a couple of additions or subtractions will be required to adjust the inverse. 6121*ebfedea0SLionel Sambuc 6122*ebfedea0SLionel SambucEXAM,bn_mp_invmod.c 6123*ebfedea0SLionel Sambuc 6124*ebfedea0SLionel Sambuc\subsubsection{Odd Moduli} 6125*ebfedea0SLionel Sambuc 6126*ebfedea0SLionel SambucWhen the modulus $b$ is odd the variables $A$ and $C$ are fixed and are not required to compute the inverse. In particular by attempting to solve 6127*ebfedea0SLionel Sambucthe Diophantine $Cb + Da = 1$ only $B$ and $D$ are required to find the inverse of $a$. 6128*ebfedea0SLionel Sambuc 6129*ebfedea0SLionel SambucThe algorithm fast\_mp\_invmod is a direct adaptation of algorithm mp\_invmod with all all steps involving either $A$ or $C$ removed. This 6130*ebfedea0SLionel Sambucoptimization will halve the time required to compute the modular inverse. 6131*ebfedea0SLionel Sambuc 6132*ebfedea0SLionel Sambuc\section{Primality Tests} 6133*ebfedea0SLionel Sambuc 6134*ebfedea0SLionel SambucA non-zero integer $a$ is said to be prime if it is not divisible by any other integer excluding one and itself. For example, $a = 7$ is prime 6135*ebfedea0SLionel Sambucsince the integers $2 \ldots 6$ do not evenly divide $a$. By contrast, $a = 6$ is not prime since $a = 6 = 2 \cdot 3$. 6136*ebfedea0SLionel Sambuc 6137*ebfedea0SLionel SambucPrime numbers arise in cryptography considerably as they allow finite fields to be formed. The ability to determine whether an integer is prime or 6138*ebfedea0SLionel Sambucnot quickly has been a viable subject in cryptography and number theory for considerable time. The algorithms that will be presented are all 6139*ebfedea0SLionel Sambucprobablistic algorithms in that when they report an integer is composite it must be composite. However, when the algorithms report an integer is 6140*ebfedea0SLionel Sambucprime the algorithm may be incorrect. 6141*ebfedea0SLionel Sambuc 6142*ebfedea0SLionel SambucAs will be discussed it is possible to limit the probability of error so well that for practical purposes the probablity of error might as 6143*ebfedea0SLionel Sambucwell be zero. For the purposes of these discussions let $n$ represent the candidate integer of which the primality is in question. 6144*ebfedea0SLionel Sambuc 6145*ebfedea0SLionel Sambuc\subsection{Trial Division} 6146*ebfedea0SLionel Sambuc 6147*ebfedea0SLionel SambucTrial division means to attempt to evenly divide a candidate integer by small prime integers. If the candidate can be evenly divided it obviously 6148*ebfedea0SLionel Sambuccannot be prime. By dividing by all primes $1 < p \le \sqrt{n}$ this test can actually prove whether an integer is prime. However, such a test 6149*ebfedea0SLionel Sambucwould require a prohibitive amount of time as $n$ grows. 6150*ebfedea0SLionel Sambuc 6151*ebfedea0SLionel SambucInstead of dividing by every prime, a smaller, more mangeable set of primes may be used instead. By performing trial division with only a subset 6152*ebfedea0SLionel Sambucof the primes less than $\sqrt{n} + 1$ the algorithm cannot prove if a candidate is prime. However, often it can prove a candidate is not prime. 6153*ebfedea0SLionel Sambuc 6154*ebfedea0SLionel SambucThe benefit of this test is that trial division by small values is fairly efficient. Specially compared to the other algorithms that will be 6155*ebfedea0SLionel Sambucdiscussed shortly. The probability that this approach correctly identifies a composite candidate when tested with all primes upto $q$ is given by 6156*ebfedea0SLionel Sambuc$1 - {1.12 \over ln(q)}$. The graph (\ref{pic:primality}, will be added later) demonstrates the probability of success for the range 6157*ebfedea0SLionel Sambuc$3 \le q \le 100$. 6158*ebfedea0SLionel Sambuc 6159*ebfedea0SLionel SambucAt approximately $q = 30$ the gain of performing further tests diminishes fairly quickly. At $q = 90$ further testing is generally not going to 6160*ebfedea0SLionel Sambucbe of any practical use. In the case of LibTomMath the default limit $q = 256$ was chosen since it is not too high and will eliminate 6161*ebfedea0SLionel Sambucapproximately $80\%$ of all candidate integers. The constant \textbf{PRIME\_SIZE} is equal to the number of primes in the test base. The 6162*ebfedea0SLionel Sambucarray \_\_prime\_tab is an array of the first \textbf{PRIME\_SIZE} prime numbers. 6163*ebfedea0SLionel Sambuc 6164*ebfedea0SLionel Sambuc\begin{figure}[!here] 6165*ebfedea0SLionel Sambuc\begin{small} 6166*ebfedea0SLionel Sambuc\begin{center} 6167*ebfedea0SLionel Sambuc\begin{tabular}{l} 6168*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_prime\_is\_divisible}. \\ 6169*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ \\ 6170*ebfedea0SLionel Sambuc\textbf{Output}. $c = 1$ if $n$ is divisible by a small prime, otherwise $c = 0$. \\ 6171*ebfedea0SLionel Sambuc\hline \\ 6172*ebfedea0SLionel Sambuc1. for $ix$ from $0$ to $PRIME\_SIZE$ do \\ 6173*ebfedea0SLionel Sambuc\hspace{3mm}1.1 $d \leftarrow n \mbox{ (mod }\_\_prime\_tab_{ix}\mbox{)}$ \\ 6174*ebfedea0SLionel Sambuc\hspace{3mm}1.2 If $d = 0$ then \\ 6175*ebfedea0SLionel Sambuc\hspace{6mm}1.2.1 $c \leftarrow 1$ \\ 6176*ebfedea0SLionel Sambuc\hspace{6mm}1.2.2 Return(\textit{MP\_OKAY}). \\ 6177*ebfedea0SLionel Sambuc2. $c \leftarrow 0$ \\ 6178*ebfedea0SLionel Sambuc3. Return(\textit{MP\_OKAY}). \\ 6179*ebfedea0SLionel Sambuc\hline 6180*ebfedea0SLionel Sambuc\end{tabular} 6181*ebfedea0SLionel Sambuc\end{center} 6182*ebfedea0SLionel Sambuc\end{small} 6183*ebfedea0SLionel Sambuc\caption{Algorithm mp\_prime\_is\_divisible} 6184*ebfedea0SLionel Sambuc\end{figure} 6185*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_prime\_is\_divisible.} 6186*ebfedea0SLionel SambucThis algorithm attempts to determine if a candidate integer $n$ is composite by performing trial divisions. 6187*ebfedea0SLionel Sambuc 6188*ebfedea0SLionel SambucEXAM,bn_mp_prime_is_divisible.c 6189*ebfedea0SLionel Sambuc 6190*ebfedea0SLionel SambucThe algorithm defaults to a return of $0$ in case an error occurs. The values in the prime table are all specified to be in the range of a 6191*ebfedea0SLionel Sambucmp\_digit. The table \_\_prime\_tab is defined in the following file. 6192*ebfedea0SLionel Sambuc 6193*ebfedea0SLionel SambucEXAM,bn_prime_tab.c 6194*ebfedea0SLionel Sambuc 6195*ebfedea0SLionel SambucNote that there are two possible tables. When an mp\_digit is 7-bits long only the primes upto $127$ may be included, otherwise the primes 6196*ebfedea0SLionel Sambucupto $1619$ are used. Note that the value of \textbf{PRIME\_SIZE} is a constant dependent on the size of a mp\_digit. 6197*ebfedea0SLionel Sambuc 6198*ebfedea0SLionel Sambuc\subsection{The Fermat Test} 6199*ebfedea0SLionel SambucThe Fermat test is probably one the oldest tests to have a non-trivial probability of success. It is based on the fact that if $n$ is in 6200*ebfedea0SLionel Sambucfact prime then $a^{n} \equiv a \mbox{ (mod }n\mbox{)}$ for all $0 < a < n$. The reason being that if $n$ is prime than the order of 6201*ebfedea0SLionel Sambucthe multiplicative sub group is $n - 1$. Any base $a$ must have an order which divides $n - 1$ and as such $a^n$ is equivalent to 6202*ebfedea0SLionel Sambuc$a^1 = a$. 6203*ebfedea0SLionel Sambuc 6204*ebfedea0SLionel SambucIf $n$ is composite then any given base $a$ does not have to have a period which divides $n - 1$. In which case 6205*ebfedea0SLionel Sambucit is possible that $a^n \nequiv a \mbox{ (mod }n\mbox{)}$. However, this test is not absolute as it is possible that the order 6206*ebfedea0SLionel Sambucof a base will divide $n - 1$ which would then be reported as prime. Such a base yields what is known as a Fermat pseudo-prime. Several 6207*ebfedea0SLionel Sambucintegers known as Carmichael numbers will be a pseudo-prime to all valid bases. Fortunately such numbers are extremely rare as $n$ grows 6208*ebfedea0SLionel Sambucin size. 6209*ebfedea0SLionel Sambuc 6210*ebfedea0SLionel Sambuc\begin{figure}[!here] 6211*ebfedea0SLionel Sambuc\begin{small} 6212*ebfedea0SLionel Sambuc\begin{center} 6213*ebfedea0SLionel Sambuc\begin{tabular}{l} 6214*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_prime\_fermat}. \\ 6215*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ and $b$, $a \ge 2$, $0 < b < a$. \\ 6216*ebfedea0SLionel Sambuc\textbf{Output}. $c = 1$ if $b^a \equiv b \mbox{ (mod }a\mbox{)}$, otherwise $c = 0$. \\ 6217*ebfedea0SLionel Sambuc\hline \\ 6218*ebfedea0SLionel Sambuc1. $t \leftarrow b^a \mbox{ (mod }a\mbox{)}$ \\ 6219*ebfedea0SLionel Sambuc2. If $t = b$ then \\ 6220*ebfedea0SLionel Sambuc\hspace{3mm}2.1 $c = 1$ \\ 6221*ebfedea0SLionel Sambuc3. else \\ 6222*ebfedea0SLionel Sambuc\hspace{3mm}3.1 $c = 0$ \\ 6223*ebfedea0SLionel Sambuc4. Return(\textit{MP\_OKAY}). \\ 6224*ebfedea0SLionel Sambuc\hline 6225*ebfedea0SLionel Sambuc\end{tabular} 6226*ebfedea0SLionel Sambuc\end{center} 6227*ebfedea0SLionel Sambuc\end{small} 6228*ebfedea0SLionel Sambuc\caption{Algorithm mp\_prime\_fermat} 6229*ebfedea0SLionel Sambuc\end{figure} 6230*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_prime\_fermat.} 6231*ebfedea0SLionel SambucThis algorithm determines whether an mp\_int $a$ is a Fermat prime to the base $b$ or not. It uses a single modular exponentiation to 6232*ebfedea0SLionel Sambucdetermine the result. 6233*ebfedea0SLionel Sambuc 6234*ebfedea0SLionel SambucEXAM,bn_mp_prime_fermat.c 6235*ebfedea0SLionel Sambuc 6236*ebfedea0SLionel Sambuc\subsection{The Miller-Rabin Test} 6237*ebfedea0SLionel SambucThe Miller-Rabin (citation) test is another primality test which has tighter error bounds than the Fermat test specifically with sequentially chosen 6238*ebfedea0SLionel Sambuccandidate integers. The algorithm is based on the observation that if $n - 1 = 2^kr$ and if $b^r \nequiv \pm 1$ then after upto $k - 1$ squarings the 6239*ebfedea0SLionel Sambucvalue must be equal to $-1$. The squarings are stopped as soon as $-1$ is observed. If the value of $1$ is observed first it means that 6240*ebfedea0SLionel Sambucsome value not congruent to $\pm 1$ when squared equals one which cannot occur if $n$ is prime. 6241*ebfedea0SLionel Sambuc 6242*ebfedea0SLionel Sambuc\begin{figure}[!here] 6243*ebfedea0SLionel Sambuc\begin{small} 6244*ebfedea0SLionel Sambuc\begin{center} 6245*ebfedea0SLionel Sambuc\begin{tabular}{l} 6246*ebfedea0SLionel Sambuc\hline Algorithm \textbf{mp\_prime\_miller\_rabin}. \\ 6247*ebfedea0SLionel Sambuc\textbf{Input}. mp\_int $a$ and $b$, $a \ge 2$, $0 < b < a$. \\ 6248*ebfedea0SLionel Sambuc\textbf{Output}. $c = 1$ if $a$ is a Miller-Rabin prime to the base $a$, otherwise $c = 0$. \\ 6249*ebfedea0SLionel Sambuc\hline 6250*ebfedea0SLionel Sambuc1. $a' \leftarrow a - 1$ \\ 6251*ebfedea0SLionel Sambuc2. $r \leftarrow n1$ \\ 6252*ebfedea0SLionel Sambuc3. $c \leftarrow 0, s \leftarrow 0$ \\ 6253*ebfedea0SLionel Sambuc4. While $r.used > 0$ and $r_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ 6254*ebfedea0SLionel Sambuc\hspace{3mm}4.1 $s \leftarrow s + 1$ \\ 6255*ebfedea0SLionel Sambuc\hspace{3mm}4.2 $r \leftarrow \lfloor r / 2 \rfloor$ \\ 6256*ebfedea0SLionel Sambuc5. $y \leftarrow b^r \mbox{ (mod }a\mbox{)}$ \\ 6257*ebfedea0SLionel Sambuc6. If $y \nequiv \pm 1$ then \\ 6258*ebfedea0SLionel Sambuc\hspace{3mm}6.1 $j \leftarrow 1$ \\ 6259*ebfedea0SLionel Sambuc\hspace{3mm}6.2 While $j \le (s - 1)$ and $y \nequiv a'$ \\ 6260*ebfedea0SLionel Sambuc\hspace{6mm}6.2.1 $y \leftarrow y^2 \mbox{ (mod }a\mbox{)}$ \\ 6261*ebfedea0SLionel Sambuc\hspace{6mm}6.2.2 If $y = 1$ then goto step 8. \\ 6262*ebfedea0SLionel Sambuc\hspace{6mm}6.2.3 $j \leftarrow j + 1$ \\ 6263*ebfedea0SLionel Sambuc\hspace{3mm}6.3 If $y \nequiv a'$ goto step 8. \\ 6264*ebfedea0SLionel Sambuc7. $c \leftarrow 1$\\ 6265*ebfedea0SLionel Sambuc8. Return(\textit{MP\_OKAY}). \\ 6266*ebfedea0SLionel Sambuc\hline 6267*ebfedea0SLionel Sambuc\end{tabular} 6268*ebfedea0SLionel Sambuc\end{center} 6269*ebfedea0SLionel Sambuc\end{small} 6270*ebfedea0SLionel Sambuc\caption{Algorithm mp\_prime\_miller\_rabin} 6271*ebfedea0SLionel Sambuc\end{figure} 6272*ebfedea0SLionel Sambuc\textbf{Algorithm mp\_prime\_miller\_rabin.} 6273*ebfedea0SLionel SambucThis algorithm performs one trial round of the Miller-Rabin algorithm to the base $b$. It will set $c = 1$ if the algorithm cannot determine 6274*ebfedea0SLionel Sambucif $b$ is composite or $c = 0$ if $b$ is provably composite. The values of $s$ and $r$ are computed such that $a' = a - 1 = 2^sr$. 6275*ebfedea0SLionel Sambuc 6276*ebfedea0SLionel SambucIf the value $y \equiv b^r$ is congruent to $\pm 1$ then the algorithm cannot prove if $a$ is composite or not. Otherwise, the algorithm will 6277*ebfedea0SLionel Sambucsquare $y$ upto $s - 1$ times stopping only when $y \equiv -1$. If $y^2 \equiv 1$ and $y \nequiv \pm 1$ then the algorithm can report that $a$ 6278*ebfedea0SLionel Sambucis provably composite. If the algorithm performs $s - 1$ squarings and $y \nequiv -1$ then $a$ is provably composite. If $a$ is not provably 6279*ebfedea0SLionel Sambuccomposite then it is \textit{probably} prime. 6280*ebfedea0SLionel Sambuc 6281*ebfedea0SLionel SambucEXAM,bn_mp_prime_miller_rabin.c 6282*ebfedea0SLionel Sambuc 6283*ebfedea0SLionel Sambuc 6284*ebfedea0SLionel Sambuc 6285*ebfedea0SLionel Sambuc 6286*ebfedea0SLionel Sambuc\backmatter 6287*ebfedea0SLionel Sambuc\appendix 6288*ebfedea0SLionel Sambuc\begin{thebibliography}{ABCDEF} 6289*ebfedea0SLionel Sambuc\bibitem[1]{TAOCPV2} 6290*ebfedea0SLionel SambucDonald Knuth, \textit{The Art of Computer Programming}, Third Edition, Volume Two, Seminumerical Algorithms, Addison-Wesley, 1998 6291*ebfedea0SLionel Sambuc 6292*ebfedea0SLionel Sambuc\bibitem[2]{HAC} 6293*ebfedea0SLionel SambucA. Menezes, P. van Oorschot, S. Vanstone, \textit{Handbook of Applied Cryptography}, CRC Press, 1996 6294*ebfedea0SLionel Sambuc 6295*ebfedea0SLionel Sambuc\bibitem[3]{ROSE} 6296*ebfedea0SLionel SambucMichael Rosing, \textit{Implementing Elliptic Curve Cryptography}, Manning Publications, 1999 6297*ebfedea0SLionel Sambuc 6298*ebfedea0SLionel Sambuc\bibitem[4]{COMBA} 6299*ebfedea0SLionel SambucPaul G. Comba, \textit{Exponentiation Cryptosystems on the IBM PC}. IBM Systems Journal 29(4): 526-538 (1990) 6300*ebfedea0SLionel Sambuc 6301*ebfedea0SLionel Sambuc\bibitem[5]{KARA} 6302*ebfedea0SLionel SambucA. Karatsuba, Doklay Akad. Nauk SSSR 145 (1962), pp.293-294 6303*ebfedea0SLionel Sambuc 6304*ebfedea0SLionel Sambuc\bibitem[6]{KARAP} 6305*ebfedea0SLionel SambucAndre Weimerskirch and Christof Paar, \textit{Generalizations of the Karatsuba Algorithm for Polynomial Multiplication}, Submitted to Design, Codes and Cryptography, March 2002 6306*ebfedea0SLionel Sambuc 6307*ebfedea0SLionel Sambuc\bibitem[7]{BARRETT} 6308*ebfedea0SLionel SambucPaul Barrett, \textit{Implementing the Rivest Shamir and Adleman Public Key Encryption Algorithm on a Standard Digital Signal Processor}, Advances in Cryptology, Crypto '86, Springer-Verlag. 6309*ebfedea0SLionel Sambuc 6310*ebfedea0SLionel Sambuc\bibitem[8]{MONT} 6311*ebfedea0SLionel SambucP.L.Montgomery. \textit{Modular multiplication without trial division}. Mathematics of Computation, 44(170):519-521, April 1985. 6312*ebfedea0SLionel Sambuc 6313*ebfedea0SLionel Sambuc\bibitem[9]{DRMET} 6314*ebfedea0SLionel SambucChae Hoon Lim and Pil Joong Lee, \textit{Generating Efficient Primes for Discrete Log Cryptosystems}, POSTECH Information Research Laboratories 6315*ebfedea0SLionel Sambuc 6316*ebfedea0SLionel Sambuc\bibitem[10]{MMB} 6317*ebfedea0SLionel SambucJ. Daemen and R. Govaerts and J. Vandewalle, \textit{Block ciphers based on Modular Arithmetic}, State and {P}rogress in the {R}esearch of {C}ryptography, 1993, pp. 80-89 6318*ebfedea0SLionel Sambuc 6319*ebfedea0SLionel Sambuc\bibitem[11]{RSAREF} 6320*ebfedea0SLionel SambucR.L. Rivest, A. Shamir, L. Adleman, \textit{A Method for Obtaining Digital Signatures and Public-Key Cryptosystems} 6321*ebfedea0SLionel Sambuc 6322*ebfedea0SLionel Sambuc\bibitem[12]{DHREF} 6323*ebfedea0SLionel SambucWhitfield Diffie, Martin E. Hellman, \textit{New Directions in Cryptography}, IEEE Transactions on Information Theory, 1976 6324*ebfedea0SLionel Sambuc 6325*ebfedea0SLionel Sambuc\bibitem[13]{IEEE} 6326*ebfedea0SLionel SambucIEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985) 6327*ebfedea0SLionel Sambuc 6328*ebfedea0SLionel Sambuc\bibitem[14]{GMP} 6329*ebfedea0SLionel SambucGNU Multiple Precision (GMP), \url{http://www.swox.com/gmp/} 6330*ebfedea0SLionel Sambuc 6331*ebfedea0SLionel Sambuc\bibitem[15]{MPI} 6332*ebfedea0SLionel SambucMultiple Precision Integer Library (MPI), Michael Fromberger, \url{http://thayer.dartmouth.edu/~sting/mpi/} 6333*ebfedea0SLionel Sambuc 6334*ebfedea0SLionel Sambuc\bibitem[16]{OPENSSL} 6335*ebfedea0SLionel SambucOpenSSL Cryptographic Toolkit, \url{http://openssl.org} 6336*ebfedea0SLionel Sambuc 6337*ebfedea0SLionel Sambuc\bibitem[17]{LIP} 6338*ebfedea0SLionel SambucLarge Integer Package, \url{http://home.hetnet.nl/~ecstr/LIP.zip} 6339*ebfedea0SLionel Sambuc 6340*ebfedea0SLionel Sambuc\bibitem[18]{ISOC} 6341*ebfedea0SLionel SambucJTC1/SC22/WG14, ISO/IEC 9899:1999, ``A draft rationale for the C99 standard.'' 6342*ebfedea0SLionel Sambuc 6343*ebfedea0SLionel Sambuc\bibitem[19]{JAVA} 6344*ebfedea0SLionel SambucThe Sun Java Website, \url{http://java.sun.com/} 6345*ebfedea0SLionel Sambuc 6346*ebfedea0SLionel Sambuc\end{thebibliography} 6347*ebfedea0SLionel Sambuc 6348*ebfedea0SLionel Sambuc\input{tommath.ind} 6349*ebfedea0SLionel Sambuc 6350*ebfedea0SLionel Sambuc\end{document} 6351