View ALTIVECPIM_342358.PDF datasheet online --- IC-ON-LINE

Datasheet File OCR Text:

ALTIVECPIM/D 6/1999 Rev. 0
AltiVec TM Technology Programming Interface Manual
TM
DigitalDNA and Mfax are trademarks of Motorola, Inc. The PowerPC name and the PowerPC logotype are trademarks of International Business Machines Corporation used by Motorola under license from International Business Machines Corporation.
This document contains information on a new product under development. Motorola reserves the right to change or discontinue this product without notice. Information in this document is provided solely to enable system and software implementers to use PowerPC microprocessors. There are no express or implied copyright licenses granted hereunder to design or fabricate PowerPC integrated circuits or integrated circuits based on the information in this document. Motorola reserves the right to make changes without further notice to any products herein. Motorola makes no warranty, representation or guarantee regarding the suitability of its products for any particular purpose, nor does Motorola assume any liability arising out of the application or use of any product or circuit, and specifically disclaims any and all liability, including without limitation consequential or incidental damages. "Typical" parameters can and do vary in different applications. All operating parameters, including "Typicals" must be validated for each customer application by customer's technical experts. Motorola does not convey any license under its patent rights nor the rights of others. Motorola products are not designed, intended, or authorized for use as components in systems intended for surgical implant into the body, or other applications intended to support or sustain life, or for any other application in which the failure of the Motorola product could create a situation where personal injury or death may occur. Should Buyer purchase or use Motorola products for any such unintended or unauthorized application, Buyer shall indemnify and hold Motorola and its officers, employees, subsidiaries, affiliates, and distributors harmless against all claims, costs, damages, and expenses, and reasonable attorney fees arising out of, directly or indirectly, any claim of personal injury or death associated with such unintended or unauthorized use, even if such claim alleges that Motorola was negligent regarding the design or manufacture of the part. Motorola and are registered trademarks of Motorola, Inc. Motorola, Inc. is an Equal Opportunity/ Affirmative Action Employer. Motorola Literature Distribution Centers: USA/EUROPE: Motorola Literature Distribution; P.O. Box 5405; Denver, Colorado 80217; Tel.: 1-800-441-2447 or 1-303-675-2140/ JAPAN: Nippon Motorola Ltd SPD, Strategic Planning Office 4-32-1, Nishi-Gotanda Shinagawa-ku, Tokyo 141, Japan Tel.: 81-3-5487-8488 ASIA/PACIFC: Motorola Semiconductors H.K. Ltd.; 8B Tai Ping Industrial Park, 51 Ting Kok Road, Tai Po, N.T., Hong Kong; Tel.: 852-26629298 Mfax: RMFAX0@email.sps.mot.com; TOUCHTONE 1-602-244-6609; US & Canada ONLY (800) 774-1848; World Wide Web Address: http://sps.motorola.com/mfax INTERNET: http://motorola.com/sps Technical Information: Motorola Inc. SPS Customer Support Center 1-800-521-6274; electronic mail address: crc@wmkmail.sps.mot.com. Document Comments: FAX (512) 895-2638, Attn: RISC Applications Engineering. World Wide Web Addresses: http://www.mot.com/PowerPC http://www.mot.com/netcomm http://www.mot.com/HPESD (c) Motorola Inc. 1999. All rights reserved.
Overview
1
High-Level Language Interface
2
Application Binary Interface
3
AltiVec Operations and Predicates
4
AltiVec Instruction Set/Operations/Predicates Cross-Reference
A
Glossary of Terms and Abbreviations GLO
Index IND
1
Overview
2
High-Level Language Interface
3
Application Binary Interface
4
AltiVec Operations and Predicates
A
AltiVec Instruction Set/Operations/Predicates Cross-Reference
GLO Glossary of Terms and Abbreviations
IND
Index
CONTENTS
Paragraph Number Title Page Number
Audience .............................................................................................................. xvi Organization......................................................................................................... xvi Suggested Reading.............................................................................................. xvii PowerPC Documentation................................................................................ xvii General Information....................................................................................... xviii Chapter 1
Overview
1.1 1.2 High-Level Language Interface ........................................................................... 1-1 Application Binary Interface (ABI) ..................................................................... 1-2 Chapter 2
High-Level Language Interface
2.1 2.2 2.2.1 2.2.2 2.3 2.3.1 2.3.2 2.3.3 2.4 2.4.1 2.4.2 2.4.3 2.4.4 2.4.5 2.4.6 2.5 2.5.1 2.5.2 2.5.3 2.5.4 2.6 Data Types ........................................................................................................... 2-1 New Keywords..................................................................................................... 2-2 The Keyword and Predefine Method............................................................... 2-2 The Context Sensitive Keyword Method......................................................... 2-3 Alignment ............................................................................................................ 2-3 Alignment of Vector Types ............................................................................. 2-3 Alignment of Non-Vector Types ..................................................................... 2-3 Alignment of Aggregates and Unions Containing Vector Types .................... 2-3 Extensions of C/C++ Operators for the New Types ............................................ 2-4 sizeof() ............................................................................................................. 2-4 Assignment ...................................................................................................... 2-4 Address Operator ............................................................................................. 2-4 Pointer Arithmetic............................................................................................ 2-4 Pointer Dereferencing ...................................................................................... 2-4 Type Casting .................................................................................................... 2-5 New Operators ..................................................................................................... 2-5 Vector Literals ................................................................................................. 2-5 Vector Literals and Casts................................................................................. 2-6 Value for Adjusting Pointers ........................................................................... 2-7 New Operators Representing AltiVec Operations........................................... 2-7 Programming Interface ........................................................................................ 2-8 Chapter 3
Application Binary Interface (ABI)
3.1 3.2 Data Representation ............................................................................................. 3-1 Register Usage Conventions ................................................................................ 3-1
MOTOROLA
Contents
v
CONTENTS
Paragraph Number 3.3 3.3.1 3.3.2 3.3.3 3.4 3.4.1 3.4.2 3.4.3 3.5 3.6 3.7 3.8 3.8.1 3.8.2 Title Page Number
The Stack Frame .................................................................................................. 3-2 SVR4 ABI and EABI Stack Frame.................................................................. 3-3 Apple Macintosh ABI and AIX ABI Stack Frame .......................................... 3-5 Vector Register Saving and Restoring Functions ............................................ 3-7 Function Calls ...................................................................................................... 3-9 SVR4 ABI and EABI Parameter Passing and Varargs.................................... 3-9 Apple Macintosh ABI and AIX ABI Parameter Passing without Varargs...... 3-9 Apple Macintosh ABI and AIX ABI Parameter Passing with Varargs ......... 3-10 malloc(), vec_malloc(), and new ....................................................................... 3-10 setjmp() and longjmp() ...................................................................................... 3-11 Debugging Information...................................................................................... 3-11 printf() and scanf() Control Strings.................................................................... 3-12 Output Conversion Specifications ................................................................. 3-12 Input Conversion Specifications.................................................................... 3-14 Chapter 4
AltiVec Operations and Predicates
4.1 4.2 4.3 4.4 4.5 Vector Status and Control Register...................................................................... 4-1 Byte Ordering....................................................................................................... 4-3 Notation and Conventions.................................................................................... 4-4 Generic and Specific AltiVec Operations............................................................ 4-7 AltiVec Predicates ........................................................................................... 4-133 Appendix A AltiVec Instruction Set/Operation/Predicate Cross-Reference
Glossary of Terms and Abbreviations Index
vi
AltiVec Technology Programming Interface Manual
MOTOROLA
ILLUSTRATIONS
Figure Number 3-1 3-2 4-1 4-2 4-3 4-4 4-5 4-6 4-7 4-8 4-9 4-10 4-11 4-12 4-13 4-14 4-15 4-16 4-17 4-18 4-19 4-20 4-21 4-22 4-23 4-24 4-25 4-26 4-27 4-28 4-29 4-30 4-31 4-32 4-33 4-34 4-35 4-36 4-37 4-38 4-39 4-40 4-41 Title Page Number
SVR4 ABI and EABI Stack Frame ............................................................................. 3-3 Apple Macintosh ABI and AIX ABI Stack Frame...................................................... 3-5 Vector Status and Control Register (VSCR) ............................................................... 4-1 VSCR Moved to a Vector Register ............................................................................. 4-1 Big-Endian Byte Ordering for a Vector Register ........................................................ 4-3 Operation Description Format ..................................................................................... 4-7 Absolute Value of Sixteen Integer Elements (8-bit) ................................................... 4-8 Absolute Value of Eight Integer Elements (16-bit)..................................................... 4-9 Absolute Value of Four Integer Elements (32-bit) ...................................................... 4-9 Absolute Value of Four Floating-Point Elements (32-bit) .......................................... 4-9 Saturated Absolute Value of Sixteen Integer Elements (8-bit) ................................. 4-10 Saturated Absolute Value of Eight Integer Elements (16-bit)................................... 4-11 Saturated Absolute Value of Four Integer Elements (32-bit).................................... 4-11 Add Sixteen Integer Elements (8-bit) ........................................................................ 4-12 Add Eight Integer Elements (16-bit) ......................................................................... 4-13 Add Four Integer Elements (32-bit) .......................................................................... 4-13 Add Four Floating-Point Elements (32-bit)............................................................... 4-14 Carryout of Four Unsigned Integer Adds (32-bit)..................................................... 4-15 Add Saturating Sixteen Integer Elements (8-bit) ...................................................... 4-16 Add Saturating Eight Integer Elements (16-bit)........................................................ 4-17 Add Saturating Four Integer Elements (32-bit) ......................................................... 4-17 Logical Bit-Wise AND .............................................................................................. 4-18 Logical Bit-Wise AND with Complement ................................................................ 4-19 Average Sixteen Integer Elements (8-bit) ................................................................. 4-21 Average Eight Integer Elements (16-bit)................................................................... 4-22 Average Four Integer Elements (32-bit).................................................................... 4-22 Round to Plus Infinity of Four Floating-Point Integer Elements (32-Bit) ................ 4-23 Compare Bounds of Four Floating-Point Elements (32-Bit)..................................... 4-24 Compare Equal of Sixteen Integer Elements (8-bits) ................................................ 4-25 Compare Equal of Eight Integer Elements (16-Bit) .................................................. 4-26 Compare Equal of Four Integer Elements (32-Bit) ................................................... 4-26 Compare Equal of Four Floating-Point Elements (32-Bit) ....................................... 4-26 Compare Greater-Than-or-Equal of Four Floating-Point Elements (32-Bit) ............ 4-27 Compare Greater-Than of Sixteen Integer Elements (8-bits).................................... 4-28 Compare Greater-Than of Eight Integer Elements (16-Bit) ...................................... 4-29 Compare Greater-Than of Four Integer Elements (32-Bit) ....................................... 4-29 Compare Greater-Than of Four Floating-Point Elements (32-Bit) ........................... 4-29 Compare Less-Than-or-Equal of Four Floating-Point Elements (32-Bit)................. 4-30 Compare Less-Than of Sixteen Integer Elements (8-bits) ........................................ 4-31 Compare Less-Than of Eight Integer Elements (16-Bit)........................................... 4-32 Compare Less-Than of Four Integer Elements (32-Bit)............................................ 4-32 Compare Less-Than of Four Floating-Point Elements (32-Bit) ................................ 4-32 Convert Four Integer Elements to Four Floating-Point Elements (32-Bit) ............... 4-33
Illustrations vii
MOTOROLA
ILLUSTRATIONS
Figure Page Title Number Number 4-42 Convert Four Floating-Point Elements to Four Saturated Signed Integer Elements (32-Bit) ............................................................................................ 4-34 4-43 Convert Four Floating-Point Elements to Four Saturated Unsigned Integer Elements (32-Bit) ............................................................................................ 4-35 4-44 Format of b Type (32-bit) .......................................................................................... 4-38 4-45 Format of b Type (64-bit) .......................................................................................... 4-38 4-46 Format of b Type (32-bit) .......................................................................................... 4-40 4-47 Format of b Type (64-bit) .......................................................................................... 4-40 4-48 Format of b Type (32-bit) .......................................................................................... 4-42 4-49 Format of b Type (64-bit) .......................................................................................... 4-42 4-50 Format of b Type (32-bit) .......................................................................................... 4-44 4-51 Format of b Type (64-bit) .......................................................................................... 4-44 4-52 2 Raised to the Exponent Estimate Floating-Point for Four Floating-Point Elements (32-Bit) ............................................................................................ 4-46 4-53 Round to Minus Infinity of Four Floating-Point Integer Elements (32-Bit) ............. 4-47 4-54 Vector Load Indexed Operation ................................................................................ 4-48 4-55 Vector Load Element Indexed Operation .................................................................. 4-50 4-56 Vector Load Indexed LRU Operation ....................................................................... 4-51 4-57 Log2 Estimate Floating-Point for Four Floating-Point Elements (32-Bit)................ 4-53 4-58 Multiply-Add Four Floating-Point Elements (32-Bit)............................................... 4-56 4-59 Multiply-Add Four Floating-Point Elements (32-Bit)............................................... 4-57 4-60 Maximum of Sixteen Integer Elements (8-Bit) ......................................................... 4-58 4-61 Maximum of Eight Integer Elements (16-bit) ........................................................... 4-59 4-62 Maximum of Four Integer Elements (32-bit) ............................................................ 4-59 4-63 Maximum of Four Floating-Point Elements (32-bit) ................................................ 4-60 4-64 Merge Eight High-Order Elements (8-Bit)................................................................ 4-61 4-65 Merge Four High-Order Elements (16-bit) ............................................................... 4-62 4-66 Merge Two High-Order Elements (32-bit)................................................................ 4-62 4-67 Merge Eight Low-Order Elements (8-Bit) ................................................................ 4-63 4-68 Merge Four Low-Order Elements (16-bit) ................................................................ 4-64 4-69 Merge Two Low-Order Elements (32-bit) ................................................................ 4-64 4-70 Vector Move from VSCR.......................................................................................... 4-65 4-71 Minimum of Sixteen Integer Elements (8-Bit).......................................................... 4-66 4-72 Minimum of Eight Integer Elements (16-bit)............................................................ 4-67 4-73 Minimum of Four Integer Elements (32-bit) ............................................................. 4-67 4-74 Minimum of Four Floating-Point Elements (32-bit) ................................................. 4-68 4-75 Multiply-Add of Eight Integer Elements (16-Bit) ..................................................... 4-69 4-76 Multiply-Add of Eight Integer Elements (16-Bit) ..................................................... 4-70 4-77 Multiply Sum of Sixteen Integer Elements (8-Bit) ................................................... 4-71 4-78 Multiply Sum of Eight Integer Elements (16-Bit)..................................................... 4-72 4-79 Multiply-Sum of Integer Elements (16-Bit to 32-Bit)............................................... 4-73 4-80 Vector Move to VSCR .............................................................................................. 4-74 4-81 Even Multiply of Eight Integer Elements (8-Bit) ...................................................... 4-75
viii
AltiVec Technology Programming Interface Manual
MOTOROLA
ILLUSTRATIONS
Figure Number 4-82 4-83 4-84 4-85 4-86 4-87 4-88 4-89 4-90 4-91 4-92 4-93 4-94 4-95 4-96 4-97 4-98 4-99 4-100 4-101 4-102 4-103 4-104 4-105 4-106 4-107 4-108 4-109 4-110 4-111 4-112 4-113 4-114 4-115 4-116 4-117 4-118 4-119 4-120 Title Page Number
Even Multiply of Four Integer Elements (16-Bit) ..................................................... 4-75 Odd Multiply of Eight Integer Elements (8-Bit) ....................................................... 4-76 Odd Multiply of Four Integer Elements (16-Bit) ...................................................... 4-76 Negative Multiply-Subtract of Four Floating-Point Elements (32-Bit) .................... 4-77 Logical Bit-Wise NOR .............................................................................................. 4-78 Logical Bit-Wise OR ................................................................................................. 4-79 Pack Sixteen Unsigned Integer Elements (16-Bit) to Sixteen Unsigned Integer Elements (8-Bit) .............................................................................................. 4-80 Pack Eight Unsigned Integer Elements (32-Bit) to Eight Unsigned Integer Elements (16-Bit) ............................................................................................ 4-80 Pack Eight Pixel Elements (32-Bit) to Eight Elements (16-Bit) ............................... 4-81 Pack Sixteen Integer Elements (16-Bit) to Sixteen Integer Elements (8-Bit) ........... 4-82 Pack Eight Integer Elements (32-Bit) to Eight Integer Elements (16-Bit)................ 4-82 Pack Sixteen Integer Elements (16-Bit) to Sixteen Unsigned Integer Elements (8-Bit) .............................................................................................. 4-83 Pack Eight Integer Elements (32-Bit) to Eight Unsigned Integer Elements (16-Bit) ............................................................................................ 4-83 Permute Sixteen Integer Elements (8-Bit)................................................................. 4-84 Reciprocal Estimate of Four Floating-Point Elements (32-Bit) ................................ 4-85 Left Rotate of Sixteen Integer Elements (8-Bit)........................................................ 4-86 Left Rotate of Eight Integer Elements (16-bit).......................................................... 4-86 Left Rotate of Four Integer Elements (32-bit)........................................................... 4-87 Round to Nearest of Four Floating-Point Integer Elements (32-Bit) ........................ 4-88 Reciprocal Square Root Estimate of Four Floating-Point Elements (32-Bit) ........... 4-89 Bit-Wise Conditional Select of Vector Contents (128-bit) ....................................... 4-90 Shift Bits Left in Sixteen Integer Elements (8-Bit) ................................................... 4-91 Shift Bits Left in Eight Integer Elements (16-bit) ..................................................... 4-92 Shift Bits Left in Four Integer Elements (32-Bit)...................................................... 4-92 Bit-Wise Conditional Select of Vector Contents (128-bit) ....................................... 4-93 Shift Bits Left in Vector (128-Bit) ............................................................................ 4-95 Left Byte Shift of Vector (128-Bit) ........................................................................... 4-96 Copy Contents to Sixteen Integer Elements (8-Bit) .................................................. 4-97 Copy Contents to Eight Elements (16-bit) ................................................................ 4-97 Copy Contents to Four Integer Elements (32-Bit)..................................................... 4-98 Copy Value into Sixteen Signed Integer Elements (8-Bit)........................................ 4-99 Copy Value into Eight Signed Integer Elements (16-Bit) ....................................... 4-100 Copy Value into Four Signed Integer Elements (32-Bit) ........................................ 4-101 Copy Value into Sixteen Signed Integer Elements (8-Bit)...................................... 4-102 Copy Value into Eight Signed Integer Elements (16-Bit) ....................................... 4-103 Copy Value into Four Signed Integer Elements (32-Bit) ........................................ 4-104 Shift Bits Right in Sixteen Integer Elements (8-Bit) ............................................... 4-105 Shift Bits Right in Eight Integer Elements (16-bit) ................................................. 4-106 Shift Bits Right in Four Integer Elements (32-Bit) ................................................. 4-106
Illustrations ix
MOTOROLA
ILLUSTRATIONS
Figure Page Title Number Number 4-121 Shift Bits Right in Sixteen Integer Elements (8-Bit) ............................................... 4-107 4-122 Shift Bits Right in Eight Integer Elements (16-bit) ................................................. 4-108 4-123 Shift Bits Right in Four Integer Elements (32-Bit) ................................................. 4-108 4-124 Shift Bits Right in Vector (128-Bit) ........................................................................ 4-110 4-125 Right Byte Shift of Vector (128-Bit) ....................................................................... 4-111 4-126 Vector Store Indexed ............................................................................................... 4-112 4-127 Vector Store Element............................................................................................... 4-115 4-128 Vector Store Indexed LRU ...................................................................................... 4-116 4-129 Subtract Sixteen Integer Elements (8-bit) ............................................................... 4-118 4-130 Subtract Eight Integer Elements (16-bit)................................................................. 4-119 4-131 Subtract Four Integer Elements (32-bit) .................................................................. 4-119 4-132 Subtract Four Floating-Point Elements (32-bit) ...................................................... 4-120 4-133 Carryout of Four Unsigned Integer Subtracts (32-bit) ............................................ 4-121 4-134 Subtract Saturating Sixteen Integer Elements (8-bit) .............................................. 4-122 4-135 Subtract Saturating Eight Integer Elements (16-bit) ............................................... 4-123 4-136 Subtract Saturating Four Integer Elements (32-bit) ................................................ 4-123 4-137 Four Sums in the Integer Elements (32-Bit)............................................................ 4-124 4-138 Four Sums in the Integer Elements (32-Bit)............................................................ 4-124 4-139 Two Saturated Sums in the Four Signed Integer Elements (32-Bit) ....................... 4-125 4-140 Saturated Sum of Five Signed Integer Elements (32-Bit) ....................................... 4-126 4-141 Round-to-Zero of Four Floating-Point Integer Elements (32-Bit) .......................... 4-127 4-142 Unpack High-Order Elements (8-Bit) to Elements (16-Bit) ................................... 4-128 4-143 Unpack High-Order Pixel Elements (16-Bit) to Elements (32-Bit) ........................ 4-129 4-144 Unpack High-Order Signed Integer Elements (16-Bit) to Signed Integer Elements (32-Bit) .......................................................................................... 4-129 4-145 Unpack Low-Order Elements (8-Bit) to Elements (16-Bit) .................................... 4-130 4-146 Unpack Low-Order Pixel Elements (16-Bit) to Elements (32-Bit) ......................... 4-130 4-147 Unpack Low-Order Signed Integer Elements (16-Bit) to Signed Integer Elements (32-Bit) .......................................................................................... 4-131 4-148 Logical Bit-Wise XOR ............................................................................................ 4-132 4-149 All Equal of Sixteen Integer Elements (8-bits) ....................................................... 4-134 4-150 All Equal of Eight Integer Elements (16-Bit).......................................................... 4-135 4-151 All Equal of Four Integer Elements (32-Bit)........................................................... 4-135 4-152 All Equal of Four Floating-Point Elements (32-Bit) ............................................... 4-136 4-153 All Greater Than or Equal of Sixteen Integer Elements (8-bits) ............................. 4-137 4-154 All Greater Than or Equal of Eight Integer Elements (16-Bit) ............................... 4-138 4-155 All Greater Than or Equal of Four Integer Elements (32-Bit) ................................ 4-138 4-156 All Greater Than or Equal of Four Floating-Point Elements (32-Bit) .................... 4-139 4-157 All Greater Than of Sixteen Integer Elements (8-bits)............................................ 4-140 4-158 All Greater Than of Eight Integer Elements (16-Bit).............................................. 4-141 4-159 All Greater Than of Four Integer Elements (32-Bit) ............................................... 4-141 4-160 All Greater Than of Four Floating-Point Elements (32-Bit) ................................... 4-142 4-161 All in Bounds of Four Floating-Point Elements (32-Bit) ........................................ 4-143
x
AltiVec Technology Programming Interface Manual
MOTOROLA
ILLUSTRATIONS
Figure Number 4-162 4-163 4-164 4-165 4-166 4-167 4-168 4-169 4-170 4-171 4-172 4-173 4-174 4-175 4-176 4-177 4-178 4-179 4-180 4-181 4-182 4-183 4-184 4-185 4-186 4-187 4-188 4-189 4-190 4-191 4-192 4-193 4-194 4-195 4-196 4-197 4-198 4-199 4-200 4-201 4-202 4-203 4-204 Title Page Number
All Less Than or Equal of Sixteen Integer Elements (8-bits).................................. 4-144 All Less Than or Equal of Eight Integer Elements (16-Bit).................................... 4-145 All Less Than or Equal of Four Integer Elements (32-Bit) ..................................... 4-145 All Less Than or Equal of Four Floating-Point Elements (32-Bit) ......................... 4-146 All Less Than of Sixteen Integer Elements (8-bits) ................................................ 4-147 All Less Than of Eight Integer Elements (16-Bit) .................................................. 4-148 All Less Than of Four Integer Elements (32-Bit).................................................... 4-148 All Less Than of Four Floating-Point Elements (32-Bit)........................................ 4-149 All NaN of Four Floating-Point Elements (32-Bit)................................................. 4-150 All Not Equal of Sixteen Integer Elements (8-bits) ................................................ 4-151 All Not Equal of Eight Integer Elements (16-Bit)................................................... 4-152 All Not Equal of Four Integer Elements (32-Bit).................................................... 4-152 All Not Equal of Four Floating-Point Elements (32-Bit) ........................................ 4-153 All Not Greater Than or Equal of Four Floating-Point Elements (32-Bit) ............. 4-154 All Not Greater Than of Four Floating-Point Elements (32-Bit) ............................ 4-155 All Not Less Than or Equal of Four Floating-Point Elements (32-Bit) .................. 4-156 All Not Less Than of Four Floating-Point Elements (32-Bit)................................. 4-157 All Numeric of Four Floating-Point Elements (32-Bit) .......................................... 4-158 Any Equal of Sixteen Integer Elements (8-bits)...................................................... 4-159 Any Equal of Eight Integer Elements (16-Bit) ........................................................ 4-160 Any Equal of Four Integer Elements (32-Bit) ......................................................... 4-160 Any Equal of Four Floating-Point Elements (32-Bit) ............................................. 4-161 Any Greater Than or Equal of Sixteen Integer Elements (8-bits) ........................... 4-162 Any Greater Than or Equal of Eight Integer Elements (16-Bit) ............................. 4-163 Any Greater Than or Equal of Four Integer Elements (32-Bit)............................... 4-163 Any Greater Than or Equal of Four Floating-Point Elements (32-Bit)................... 4-164 Any Greater Than of Sixteen Integer Elements (8-bits).......................................... 4-165 Any Greater Than of Eight Integer Elements (16-Bit) ............................................ 4-166 Any Greater Than of Four Integer Elements (32-Bit) ............................................. 4-166 Any Greater Than of Four Floating-Point Elements (32-Bit) ................................. 4-167 Any Less Than or Equal of Sixteen Integer Elements (8-bits)................................ 4-168 Any Less Than or Equal of Eight Integer Elements (16-Bit) .................................. 4-169 Any Less Than or Equal of Four Integer Elements (32-Bit) ................................... 4-169 Any Less Than or Equal of Four Floating-Point Elements (32-Bit) ....................... 4-170 Any Less Than of Sixteen Integer Elements (8-bits) .............................................. 4-171 Any Less Than of Eight Integer Elements (16-Bit)................................................. 4-172 Any Less Than of Four Integer Elements (32-Bit).................................................. 4-172 Any Less Than of Four Floating-Point Elements (32-Bit) ...................................... 4-173 Any NaN of Four Floating-Point Elements (32-Bit) ............................................... 4-174 Any Not Equal of Sixteen Integer Elements (8-bits)............................................... 4-175 Any Not Equal of Eight Integer Elements (16-Bit) ................................................. 4-176 Any Not Equal of Four Integer Elements (32-Bit) .................................................. 4-176 Any Not Equal of Four Floating-Point Elements (32-Bit) ...................................... 4-177
Illustrations xi
MOTOROLA
ILLUSTRATIONS
Figure Page Title Number Number 4-205 Any Not Greater Than or Equal of Four Floating-Point Elements (32-Bit) .......................................................................................................... 4-178 4-206 Any Not Greater Than of Four Floating-Point Elements (32-Bit) .......................... 4-179 4-207 Any Not Less Than or Equal of Four Floating-Point Elements (32-Bit) ................ 4-180 4-208 Any Not Less Than of Four Floating-Point Elements (32-Bit) ............................... 4-181 4-209 Any Numeric of Four Floating-Point Elements (32-Bit)......................................... 4-182 4-210 Any Out of Bounds of Four Floating-Point Elements (32-Bit) ............................... 4-183
xii
AltiVec Technology Programming Interface Manual
MOTOROLA
TABLES
Table Number 2-1 2-2 2-3 3-1 3-2 3-3 4-1 4-2 4-3 4-4 4-5 4-6 4-7 4-8 4-9 4-10 4-11 4-12 4-13 4-14 4-15 4-16 4-17 4-18 4-19 A-1 A-2 A-3 Title Page Number
AltiVec Data Types ...................................................................................................... 2-1 Vector Literal Format and Description ......................................................................... 2-7 Increment Value for vec_step by Data Type ................................................................ 2-8 AltiVec Registers.......................................................................................................... 3-1 Vector Registers Valid Tag Format .............................................................................. 3-3 ABI Specifications for setjmp() and longjmp() .......................................................... 3-11 VSCR Field Descriptions.............................................................................................. 4-2 Notation and Conventions ............................................................................................ 4-4 Precedence Rules .......................................................................................................... 4-6 vec_dssNVector Data Stream Stop Argument Types................................................ 4-36 vec_dstNVector Data Stream Touch Argument Types ............................................. 4-39 vec_dststNVector Data Stream for Touch Store Argument Types ........................... 4-41 vec_dststtNVector Data Stream Touch for Store Transient Argument Types .......... 4-43 vec_dsttNVector Data Stream Touch Transient Argument Types ............................ 4-45 vec_ldNLoad Vector Indexed Argument Types........................................................ 4-49 vec_lde(a,b)NVector Load Element Indexed Argument Types ................................ 4-50 vec_ldlNVector Load Indexed LRU Argument Types.............................................. 4-52 vec_lvslNLoad Vector for Shift Left Argument Types ............................................. 4-54 vec_lvsrNVector Load for Shift Right Argument Types .......................................... 4-55 Vector Move from Vector Status and Control Registers Argument Type and Mapping........................................................................................................... 4-65 vec_mtvscrNVector Move to Vector Status and Control Register Argument Types 4-74 Special Value Results of Reciprocal Estimates .......................................................... 4-85 Special Value Results of Reciprocal Square Root Estimates ..................................... 4-89 vec_stNVector Store Indexed Argument Types ...................................................... 4-113 vec_stlNVector Store Index Argument Types......................................................... 4-117 Instructions to Operations/Predicates Cross-Reference............................................... A-1 Operations to Instructions Cross-Reference ................................................................ A-7 Predicate to Instruction Cross-Reference .................................................................. A-14
MOTOROLA
Tables
xiii
TABLES
Table Number Title Page Number
xiv
AltiVec Technology Programming Interface Manual
MOTOROLA
About This Book
The primary objective of this manual is to help programmers to provide software that is compatible across the family of PowerPC processors using AltiVec technology. To locate any published errata or updates for this document, refer to the website at http://www.mot.com/SPS/PowerPC/. This book is one of two that discuss the AltiVec architecture, the two books are: AltiVec: The Programming Interface Manual (AltiVec PIM) is used as a reference guide for high-level programmers. The AltiVec PIM provides a mechanism for programmers to access AltiVec functionality from programming languages such as C and C++. The AltiVec PIM denes a programming model for use with the AltiVec instruction set extension to the PowerPC architecture. AltiVec: The Programming Environments Manual (AltiVec PEM) is used as a reference guide for assembler programmers. The AltiVec PEM provides a description for each instruction that includes the instruction format, an individualized legend that provides such information as the level(s) of the PowerPC architecture in which the instruction may be found, the privilege level of the instruction, and gures to help in understanding how the instruction works.
It is beyond the scope of this manual to describe individual AltiVec technology implementations on PowerPC processors. It must be kept in mind that each PowerPC processor is unique in its implementation of the AltiVec technology. The information in this book is subject to change without notice, as described in the disclaimers on the title page of this book. As with any technical documentation, it is the readersO responsibility to be sure they are using the most recent version of the documentation. For more information, contact your sales representative or visit our website at: http://www.mot.com/SPS/PowerPC/.
MOTOROLA
About This Book
xv
Audience
This manual is intended for system software and application programmers who want to develop products using the AltiVec technology extension to the PowerPC processors in general. It is assumed that the reader understands operating systems, microprocessor system design, the basic principles of RISC processing, and the AltiVec Instruction Set.
Organization
Following is a summary and a brief description of the major sections of this manual: Chapter 1, OOverview,O is useful for those who want a general understanding of what the programming model denes in the AltiVec technology. Chapter 2, OHigh-Level Language Interface,O is useful for software engineers who need to understand how to access AltiVec functionality from high level languages such as C and C++. Chapter 3, OApplication Binary Interface (ABI),O describes AltiVec extensions for System V Application Binary Interface PowerPC Processor Supplement (SVR4 ABI), the PowerPC Embedded Application Binary Interface (EABI), Appendix A of The PowerPC Compiler WriterOs Guide (AIX ABI), and the Apple Macintosh ABI. Chapter 4, OAltiVec Operations and Predicates,O alphabetically denes the AltiVec operations and predicates. Each AltiVec operation and predicate description includes a pseudocode functional description and gures illustrating that function, a valid set of argument types for that AltiVec operation or predicate, the result type for that set of argument types, and the specic AltiVec instruction generated for that set of arguments. Appendix A, OAltiVec Instruction Set/Operation/Predicate Cross-Reference,O crossreferences the AltiVec instruction set, operations, and predicates by functionality. This manual also includes a glossary and an index.

xvi
AltiVec Technology Programming Interface Manual
MOTOROLA
Suggested Reading
This section lists additional reading that provides background for the information in this manual as well as general information about the AltiVec technology and PowerPC architecture.
PowerPC Documentation
The PowerPC documentation is organized in the following types of documents: UserOs manualsNThese books provide details about individual PowerPC implementations and are intended to be used in conjunction with PowerPC Microprocessor Family: The Programming Environments Manual. PowerPC Microprocessor Family: The Programming Environments, Rev. 1 provides information about resources dened by the PowerPC architecture that are common to PowerPC processors. This document describes both the 64- and 32-bit portions of the architecture. MPCFPE/AD (Motorola order #) Implementation Variances Relative to Rev. 1 of The Programming Environments Manual is available via the world-wide web at http://www.mot.com/SPS/PowerPC/. Addenda/errata to userOs manualsNBecause some processors have follow-on parts an addendum is provided that describes the additional features and changes to functionality of the follow-on part. These addenda are intended for use with the corresponding userOs manuals. Hardware specicationsNHardware specications provide specic data regarding bus timing, signal behavior, and AC, DC, and thermal characteristics, as well as other design considerations for each PowerPC implementation. Technical SummariesNEach PowerPC implementation has a technical summary that provides an overview of its features. This document is roughly the equivalent to the overview (Chapter 1) of an implementationOs userOs manual. PowerPC Microprocessor Family: The ProgrammerOs Reference Guide: MPCPRG/D (Motorola order #) is a concise reference that includes the register summary, memory control model, exception vectors, and the PowerPC instruction set. PowerPC Microprocessor Family: The ProgrammerOs Pocket Reference Guide: MPCPRGREF/D (Motorola order #): This foldout card provides an overview of the PowerPC registers, instructions, and exceptions for 32-bit implementations. Application notesNThese short documents contain useful information about specic design issues useful to programmers and engineers working with PowerPC processors (available via the worldwide web at http://www.mot.com/SPS/PowerPC/). Documentation for support chips

MOTOROLA
About This Book
xvii
Additional literature on AltiVec technology and PowerPC implementations is being released as new processors become available. For a current list of AltiVec technology and PowerPC documentation, refer to the website at http://www.mot.com/SPS/PowerPC/.
General Information
The following documentation provides useful information about the PowerPC architecture and computer architecture in general: The following books are available from the Morgan-Kaufmann Publishers, 340 Pine Street, Sixth Floor, San Francisco, CA 94104; Tel. (800) 745-7323 (U.S.A.), (415) 392-2665 (International); internet address: mkp@mkp.com. N The PowerPC Architecture: A Specication for a New Family of RISC Processors, Second Edition, by International Business Machines, Inc. Updates to the architecture specication are accessible via the world-wide web at http://www.austin.ibm.com/tech/ppc-chg.html. N PowerPC Microprocessor Common Hardware Reference Platform: A System Architecture, by Apple Computer, Inc., International Business Machines, Inc., and Motorola, Inc. N Macintosh Technology in the Common Hardware Reference Platform, by Apple Computer, Inc. N Computer Organization and Design, by David A. Patterson and John L. Hennessy. N Computer Architecture: A Quantitative Approach, Second Edition, by John L. Hennessy and David A. Patterson. PowerPC Programming for Intel Programmers, by Kip McClanahan; IDG Books Worldwide, Inc., 919 East Hillsdale Boulevard, Suite 400, Foster City, CA, 94404; Tel. (800) 434-3422 (U.S.A.), (415) 655-3022 (International).
xviii
AltiVec Technology Programming Interface Manual
MOTOROLA
Chapter 1 Overview
10 10
This document denes a programming model for use with the AltiVec instruction set extension to the PowerPC architecture. There are three types of programming interfaces described in this document: A high-level language interface, intended for use within programming languages such as C or C++ An application binary interface (ABI) dening low-level coding conventions An assembly language interface
Although a higher-level application programming interface (API) such as mediaLib is intended for use with AltiVec, such a specication is not addressed by this document. For further details on mediaLib see the AltiVec website at: http://www.mot.com/SPS/PowerPC/AltiVec. An AltiVec-enabled compiler implementing the model described in this document predenes the value __VEC__ as the decimal integer 10205.
1.1 High-Level Language Interface
The high-level language interface for AltiVec is a way for programmer to be able to use the AltiVec technology from programming languages such as C and C++. It describes fundamental data type for the AltiVec programming model. Details of this interface are described in Chapter 2, OHigh-Level Language Interface.O
MOTOROLA
Chapter 1. Overview
1-1
Application Binary Interface (ABI)
1.2 Application Binary Interface (ABI)
The AltiVec Programming Model extends the existing PowerPC ABIs and the extension is independent of the endian mode. The ABI reviews what the data types are and what the register usage conventions are for vector register les. The ABI also discusses how to set up the stack frame. The vector register save and restore functions are included in the ABI section to advocate uniformity among compilers on the method used in saving and restoring vector registers. The Programming Interface Manual provides the valid set of argument types for specic AltiVec operations and predicates as well as the specic AltiVec instruction(s) generated for that set of arguments. The AltiVec operations and predicates are organized alphabetically in Chapter 4, OAltiVec Operations and Predicates.O
1-2
AltiVec Technology Programming Interface Manual
MOTOROLA
Chapter 2 High-Level Language Interface
20 20
The AltiVec high-level language interface: Provides an efcient and expressive mechanism for programmers to access AltiVec functionality from programming languages such as C and C++. Note: Access to AltiVec functionality from Java applications is not currently addressed by this specication, but will likely be addressed through a higher level API such as mediaLib. Denes a minimal set of language extensions that clearly describes the intent of the programmer while minimizing the impact on existing PowerPC compilers and development tools. Denes a minimal set of library extensions needed to support AltiVec functionality.
2.1 Data Types
The AltiVec programming model introduces a set of fundamental data types, as described in Table 2-1.
Table 2-1. AltiVec Data Types
New C/C++ Type vector unsigned char vector signed char vector bool char vector unsigned short 8 unsigned short vector unsigned short int vector signed short 8 signed short vector signed short int vector bool short 8 unsigned short vector bool short int vector unsigned int vector unsigned long* vector unsigned long int* 4 unsigned int 0...232 - 1 0 (F), 65535 (T) -32768...32767 0...65536 Interpretation of Contents 16 unsigned char 16 signed char 16 unsigned char Components Represent Values 0...255 -128...127 0(F), 255 (T)
MOTOROLA
Chapter 2. High-Level Language Interface
2-1
New Keywords
Table 2-1. AltiVec Data Types (Continued)
New C/C++ Type vector signed int vector signed long* vector signed long int* vector bool int vector bool long* vector bool long int* vector float vector pixel 4 float 8 unsigned short IEEE-754 values 1/5/5/5 pixel 4 unsigned int 0 (F), 232 - 1 (T) 4 signed int -231...231-1 Interpretation of Contents Components Represent Values
*The vector types with the long keyword are deprecated and will be eliminated in a future version of this document.
In illustrations where an algorithm could apply to multiple types, vec_data represents any one of these types. Introducing fundamental types permits the compiler to provide stronger type checking and supports overloaded operations on vector types.
2.2 New Keywords
The model introduces new uses for the following ve identiers:
vector __vector pixel __pixel bool
as simple type specier keywords. Among the type speciers used in a declaration, the vector type specier must occur rst. As in C and C++, the remaining type speciers may be freely intermixed in any order, possibly with other declaration speciers. The syntax does not allow the use of a typedef name as a type specier. For example, the following is not allowed:
typedef signed short int16; vector int16 data;
These new uses may conict with their existing use in C and C++. There are two methods that may be used to deal with this conict. An implementation of the AltiVec programming model may choose either method.
2.2.1 The Keyword and Predene Method
In this method, __vector, __pixel, and bool are added as keywords while vector and pixel are predened macros. bool is already a keyword in C++. To allow its use in C as a keyword, it is treated the same as it is in C++. This means that the C language is extended to allow bool alone as a set of type speciers. Typically, this type will map to int. To
2-2
AltiVec Technology Programming Interface Manual
MOTOROLA
Alignment
accommodate a conict with other uses of the identiers vector and pixel, the user can either #undef or use a command line option to remove the predenes.
2.2.2 The Context Sensitive Keyword Method
In this method, __vector and __pixel are added as keywords without regard to context while the new uses of vector, pixel, and bool are keywords only in the context of a type. Since vector must be rst among the type speciers, it can be recognized as a type specier when a type identier is being scanned. The new uses of pixel and bool occur after vector has been recognized. In all other contexts, vector, pixel, and bool are not reserved. This avoids conicts such as class vector, typedef int bool, and allows the use of vector, pixel, and bool as identiers for other uses.
2.3 Alignment
The following paragraphs described AltiVec alignment requirements. When working with vector data, the programmer must be aware of these alignment issues. Because the AltiVec technology does not generate exceptions, the programmer must determine whether and when vector data becomes unaligned.
2.3.1 Alignment of Vector Types
A dened data item of any vector data type in memory is always aligned on a 16-byte boundary. A pointer to any vector data type always points to a 16-byte boundary. The compiler is responsible for aligning vector data types on 16-byte boundaries. Given that vector data is correctly aligned, a program is incorrect if it attempts to dereference a pointer to a vector type if the pointer does not contain a 16-byte aligned address. In the AltiVec architecture, an unaligned load/store does not cause an alignment exception that might lead to (slow) loading of the bytes at the given address. Instead, the low-order bits of the address are quietly ignored.
2.3.2 Alignment of Non-Vector Types
An array of components to be loaded into vector registers need not be aligned, but will have to be accessed with attention to its alignment. Typically, this is accomplished using either the Load Vector for Shift Right, vec_lvsr(), or Load Vector for Shift Left, vec_lvsl(), operation and the Vector Permute, vec_perm(), operation.
2.3.3 Alignment of Aggregates and Unions Containing Vector Types
Aggregates (structures and arrays) and unions containing vector types must be aligned on 16-byte boundaries and their internal organization padded, if necessary, so that each internal vector type is aligned on a 16-byte boundary. This is an extension to all ABIs (AIX, Apple, SVR4, and EABI).
MOTOROLA
Chapter 2. High-Level Language Interface
2-3
Extensions of C/C++ Operators for the New Types
2.4 Extensions of C/C++ Operators for the New Types
Most C/C++ operators do not permit any of their arguments to be one of the new types. Let a and b be vector types and p be a pointer to a vector type. The normal C/C++ operators are extended to include the following operations.
2.4.1 sizeof()
The operations sizeof(a) and sizeof(*p) return 16.
2.4.2 Assignment
If either the left hand side or right hand side of an expression has a vector type, then both sides of the expression must be of the same vector type. Thus, the expression a = b is valid and represents assignment if a and b are of the same vector type (or if neither is a vector type). Otherwise, the expression is invalid and must be signaled as an error by the compiler.
2.4.3 Address Operator
The operation &a is valid if a is a vector type. The result of the operation is a pointer to a.
2.4.4 Pointer Arithmetic
The usual pointer arithmetic can be performed on p. In particular, p+1 is a pointer to the next vector after p.
2.4.5 Pointer Dereferencing
If p is a pointer to a vector type, *p implies either a 128-bit vector load from the address obtained by clearing the low order bits of p, equivalent to the instruction vec_ld(0, p) or a 128-bit vector store to that address equivalent to the instruction vec_st(0, p). If it is desired to mark the data accessed as least-recently-used (LRU), the explicit instruction vec_ldl(0,p) or vec_stl(0, p) must be used. Dereferencing a pointer to a non-vector type produces the standard behavior of either a load or a copy of the corresponding type. Accessing of unaligned memory must be carried out explicitly by a
vec_ld(int, type *) operation, a vec_ldl(int, type *) operation, a vec_st(int, type *) operation or a vec_stl(int, type *) operation.
2-4
AltiVec Technology Programming Interface Manual
MOTOROLA
New Operators
2.4.6 Type Casting
Pointers to old and new types may be cast back and forth to each other. Casting a pointer to a new type represents an unchecked assertion that the address is 16-byte aligned. Some new operators are provided to provide the equivalence of casts and data initialization. Casts from one vector type to another are provided by normal C casts. These should not be needed frequently if the overloaded forms of operators are used. None of the casts performs a conversion; the bit pattern of the result is the same as the bit pattern of the argument that is cast.
(vector (vector (vector (vector (vector (vector (vector (vector (vector (vector (vector signed char) vec_data signed short) vec_data signed int) vec_data unsigned char) vec_data unsigned short) vec_data unsigned int) vec_data bool char) vec_data bool short) vec_data bool int) vec_data float) vec_data pixel) vec_data
Casts between vector types and scalar types are illegal. To copy data between these types, us the vec_lde() or vec_ste() operations. An alternative is to use a union consisting of a vector type and an equivalent array of the scalar type and copy the data using the union.
2.5 New Operators
New operators are introduced to construct vector literals, adjust pointers, and allow full access to the functionality provided by the AltiVec architecture.
2.5.1 Vector Literals
A vector literal is written as a parenthesized vector type followed by a parenthesized set of constant expressions. Vector literals may be used either in initialization statements or as constants in executable statements. Table 2-2 lists the formats and descriptions of the vector literals. For each, the compiler generates code that either computes or loads the values into the register.
MOTOROLA
Chapter 2. High-Level Language Interface
2-5
New Operators
Table 2-2. Vector Literal Format and Description
Notation
(vector unsigned char) (unsigned int)
Represents
A set of 16 unsigned 8-bit quantities which all have the value
specified by the integer. A set of 16 unsigned 8-bit quantities specified by the 16 integers. A set of 16 signed 8-bit quantities that all have the value specified by the integer.
(vector unsigned char) (unsigned int, ..., unsigned int) (vector signed char) (int)
(vector signed char) (int, ..., int) (vector unsigned short) (unsigned int)
A set of 16 signed 8-bit quantities specified by the 16 integers. A set of eight unsigned 16-bit quantities which all have the value
specified by the unsigned integer. A set of eight unsigned 16-bit quantities specified by the eight unsigned integers. A set of eight signed 16-bit quantities which all have the value specified by the integer. A set of eight signed 16-bit quantities specified by the eight integers. A set of four unsigned 32-bit quantities which all have the value specified by the unsigned integer. A set of four unsigned 32-bit quantities specified by the four unsigned integers. A set of four signed 32-bit quantities which all have the value specified by the integer. A set of four signed 32-bit quantities specified by the 4 integers. A set of four floating-point quantities which all have the value specified by the floating-point value. A set of four floating-point quantities which all have the value specified by the four floating-point values.
(vector unsigned short) (unsigned int, ..., unsigned int) (vector signed short) (int)
(vector signed short) (int, ..., int)
(vector unsigned int) (unsigned int)
(vector unsigned int) (unsigned int, ..., unsigned int) (vector signed int) (int)
(vector signed int) (int, ..., int) (vector float) (float)
(vector float) (float, ..., float)
2.5.2 Vector Literals and Casts
The combination of vector casts and vector literals can complicate some parsers. An implementation is not required to support the cast to a vector type of a vector cast or vector literal when the operand of the cast is not a parenthesized expression. For example, the programmer may write the following:
(vector unsigned char)((vector unsigned int)(1, 2, 3, 4)) (vector signed char)((vector unsigned short) variable)
The similar expressions below without the parenthesized expression may not be used in a conforming application
(vector unsigned char)(vector unsigned int)(1, 2, 3, 4) (vector signed char)(vector unsigned short) variable
2-6
AltiVec Technology Programming Interface Manual
MOTOROLA
New Operators
2.5.3 Value for Adjusting Pointers
At compile time, the vec_step(vec_data) produces the integer value representing the amount by which a pointer to a component of an AltiVec data should increment to cause a pointer increment to increment by 16 bytes. For example, a vector unsigned short data type is considered to contain eight unsigned 2-byte values. A pointer to unsigned 2-byte values used to stream through an array of unsigned 2-byte values by a full vector at a time should increment by vec_step(vector unsigned short) = 8. Table 2-3 provides a summary of the values by data type.
Table 2-3. Increment Value for vec_step by Data Type
vec_step Expression
vec_step(vector unsigned char) vec_step(vector signed char) vec_step(vector bool char) vec_step(vector unsigned short) vec_step(vector signed short) vec_step(vector bool short) vec_step(vector unsigned int) vec_step(vector signed int) vec_step(vector bool int) vec_step(vector pixel) vec_step(vector float)
Value 16
8
4
8 4
2.5.4 New Operators Representing AltiVec Operations
New operators are introduced to allow full access to the functionality provided by the AltiVec architecture. The new operators are represented in the programming language by language structures that parse like function calls. The names associated with these operations are all prexed with vec_. The appearance of one of these forms can indicate the following: A generic AltiVec operation, like vec_add() A specic AltiVec operation, like vec_addubm() A predicate computed from a AltiVec operation like vec_all_eq() Loading of a vector of components, as discussed in Section 2.5.1, OVector LiteralsO
Each AltiVec operator takes a list of arguments that represent the input operands. The order of the operands is prescribed in the architecture specication and includes a returned result (possibly void). The programming model restricts the operand types permitted for each AltiVec operation, whether specic or generic. The programmer may override this constraint by explicitly casting arguments to permissible types.
MOTOROLA
Chapter 2. High-Level Language Interface
2-7
Programming Interface
For a specic operation, the operand types determine whether the operation is acceptable within the programming model and the type of the result. For example, vec_vaddubm(vector signed char, vector signed char) is acceptable in the programming model because it represents a reasonable way to do modular addition with signed bytes, while vec_vaddubs(vector signed char, vector signed char) and vec_vaddubh(vector signed char, vector signed char) are not acceptable. If permitted, the former operation would produce a result in which saturation treats the operands as unsigned; the latter operation would produce a result in which adjacent pairs of signed bytes are treated as signed halfwords. For a generic operation, the operand types are used to determine whether the operation is acceptable, to select a particular operation according to the types of the arguments, and to determine the type of the result. For example, vec_add(vector signed char, vector signed char) will map onto vec_vaddubm() and return a result of type vector signed char, while vec_add(vector unsigned short, vector unsigned short) maps onto vec_vadduhm() and return a result of type vector unsigned short. The AltiVec operations that set condition register CR6 (i.e., the compare dot instructions) are treated somewhat differently in the programming model. The programmer can not access specic register names. Instead of directly specifying a compare dot instruction, the programmer makes reference to a predicate that returns an integer value derived from the result of a compare dot instruction. As in C, this value may be used directly as a value (1 is true, 0 is false) or as a condition for branching. It is expected that the compiler will produce the minimum code needed to use the condition. Predicates begin with vec_all_ or vec_any_. Either the true or false state of any bit that can be set by a compare dot instruction has a predicate. For example, vec_all_gt(x,y) tests the true value of bit 24 of the CR after executing some vcmpgt. instruction. To complete the coverage by predicates, additional predicates exercise compare dot instructions with reversed or duplicated arguments. As examples, vec_all_lt(x,y) performs a vcmpgtx.(y,x), and vec_all_nan(x) is mapped onto vcmpeqfp.(x,x). If the programmer wishes to have both the result of the compare dot instruction as returned in the vector register and the value of CR6, the programmer species two operations. The compilerOs job is to determine that these can be merged. The AltiVec operations and predicates are listed in Chapter 4, OAltiVec Operations and PredicatesO.
2.6 Programming Interface
This document does not prohibit or require an implementation to provide any set of include les or #pragma preprocessor commands. If an implementation requires that an include le be used prior to the use of the syntax described in this document, it is suggested that the include le be named . If an implementation supports #pragma preprocessor commands, it is suggested that it provide __ALTIVEC__ as a predened macro with a nonzero value. A suggested preprocessor command set includes the following:
2-8
AltiVec Technology Programming Interface Manual
MOTOROLA
Programming Interface #pragma altivec_codegen on | off
When this pragma is on, the compiler may use AltiVec instructions. When you set this pragma off, the altivec_model pragma is also set to off.
#pragma altivec_model on | off
When this pragma is on, the compiler accepts the syntax specied in this document, and the altivec_codegen pragma is also set to on.
#pragma altivec_vrsave on | off | allon
When this pragma is on, the compiler maintains the VRSAVE register. With allon selected, the compiler changes the VRSAVE register to have all bits set. It is combined with #pragma altivec_vrsave off by having a parent function do the work once of setting the value of the VRSAVE register with #pragma altivec_vrsave allon and the function it calls uses the setting #pragma altivec_vrsave off.
MOTOROLA
Chapter 2. High-Level Language Interface
2-9
Programming Interface
2-10
AltiVec Technology Programming Interface Manual
MOTOROLA
Chapter 3 Application Binary Interface (ABI)
30 30
Note: The ABI extensions described herein for embedded applications are still under review by the PowerPC EABI industry working group, and may be subject to change. Modications, if any, will be highlighted in future revisions of this document. The AltiVec programming model extends the existing PowerPC ABIs. This chapter species extensions to the System V Application Binary Interface PowerPC Processor Supplement (SVR4 ABI), the PowerPC Embedded Application Binary Interface (EABI), Appendix A of The PowerPC Compiler WriterOs Guide (AIX ABI), and the Apple Macintosh ABI. The SVR4 ABI and EABI specications dene both a Big-Endian ABI and a Little-Endian ABI. This extension is independent of the endian mode.
3.1 Data Representation
The vector data types are 16-bytes long and 16-byte aligned. All ABIs are extended similarly. Aggregates (structures and arrays) and unions containing vector types must be aligned on 16-byte boundaries and their internal organization padded, if necessary, so that each internal vector type is aligned on a 16-byte boundary. The Apple ABI and AIX ABI specify a maximum alignment for aggregates and unions of 4-bytes; the EABI species a maximum alignment of 8-bytes. Increasing the alignment to 16-bytes creates the opportunity for padding or holes in the parameter lists involving these aggregates described in Section 3.4.2, OApple Macintosh ABI and AIX ABI Parameter Passing without Varargs.O
3.2 Register Usage Conventions
The register usage conventions for the vector register le are dened as follows:
Table 3-1. AltiVec Registers
Register v0-v1 v2-v13 v14-v19 v20-v31 Intended use General use Parameters, general General General Behavior across call sites Volatile (Caller save) Volatile (Caller save) Volatile (Caller save) Non-volatile (Callee save)
MOTOROLA
Chapter 3. Application Binary Interface (ABI)
3-1
The Stack Frame
Table 3-1. AltiVec Registers
Register VRSAVE Intended use Special, see Section 3.3, "The Stack Frame Behavior across call sites Non-volatile (Callee save)
The VRSAVE special purpose register (SPR256, named vrsave in assembly instructions) is used to inform the operating system which vector registers (VRs) need to be saved and reloaded across context switches. Bit n of this register is set to 1 if vector register vn needs to be saved and restored across a context switch. Otherwise, the operating system may return that register with any value that does not violate security after a context switch. The most signicant bit in the 32-bit word is bit 0. The EABI does not use VRSAVE for any special purpose, but VRSAVE is a non-volatile register.
3.3 The Stack Frame
The stack pointer maintains 16-byte alignment in the SVR4 ABI and the AIX ABI and 8-byte alignment in the EABI and the Apple Macintosh ABI and AIX ABI. It is not necessary to align the stack dynamically in either the SVR4 ABI or the AIX ABI, however, the alignment padding space is specied for both. The additions to the stack frame are the vector register save area, the vrsave word, and the alignment padding space to dynamically align the stack to a quadword boundary. The following additional requirements apply to the stack frame: Before a function changes the value of vrsave, it shall save the value of VRSAVE at the time of entry to the function in the vrsave word. The alignment padding space shall be either 0, 4, 8, or 12 bytes long so that the address of the vector register save area (and subsequent stack locations) are quadword aligned. If the code establishing the stack frame dynamically aligns the stack pointer, it shall update the stack pointer atomically with an stwux instruction. The code may assume the stack pointer on entry is aligned on an 8-byte boundary. Before a function changes the value in any non-volatile vector register, vn, it shall save the value in vn in the word in the vector register save area 16*(32n) bytes before the low-addressed end of the alignment padding space. Local variables of a vector data type which need to be saved to memory will be placed on the stack frame on a 16-byte alignment boundary in the same stack frame region used for local variables of other types.
SP in the gures denotes the stack pointer (general purpose register r1) of the called function after it has executed code establishing its stack frame.
3-2
AltiVec Technology Programming Interface Manual
MOTOROLA
The Stack Frame
3.3.1 SVR4 ABI and EABI Stack Frame
The size of the vector register save area and the presence of the VRSAVE word may vary within a function and are determined by a new registers valid tag. Note: In the SVR4 ABI, the registers valid tag is the most general way to describe a stack frame. It is associated with a frame or frame valid tag. Figure 3-1 shows an SVR4 and EABI stack frame.
High Address SP Back chain Floating-point register save area General register save area CR save word VRSAVE save word Alignment padding Vector register save area Local variable space Parameter list area LR save word Back chain Low Address NEW NEW NEW
Figure 3-1. SVR4 ABI and EABI Stack Frame Table 3-2. Vector Registers Valid Tag Format
Word 1 1 Bits 0-17 18-29 Name RESERVED START_OFFSET 0 The number of words between the BASE of the nearest preceding Frame or Frame Valid tag and the first instruction to which this tag applies. 2 One bit for each non-volatile vector register, bit 0 for v31,..., bit 11 for v20, with a 1 signifying that the register is saved in the vector register save area. 1 if and only if the VRSAVE word is allocated in the register save area. Description
1 2
30-31 0-11
TYPE VECTOR_REGS
2
12
VRSAVE_AREA1
1.If more than one Vector Registers Valid Tag applies to the same Frame or Frame Valid tag, they shall all have the same values for VRSAVE_AREA and VR.
MOTOROLA
Chapter 3. Application Binary Interface (ABI)
3-3
The Stack Frame
Table 3-2. Vector Registers Valid Tag Format
Word 2 2 2 2 Bits 13-17 18-29 30 31 VR1 RANGE VRSAVE_REG SUBTYPE Name Description Size in quadwords of the vector register save area. The number of words between the first and the last instruction to which this tag applies. 1 if and only if VRSAVE is saved in the VRSAVE word. 1
1.If more than one Vector Registers Valid Tag applies to the same Frame or Frame Valid tag, they shall all have the same values for VRSAVE_AREA and VR.
The code example below shows sample prologue and epilogue code with full saves of all the non-volatile oating-point (FPRs), general (GPRs), and VRs for a stack frame of less than 32 Kbytes. The example aligns the stack pointer dynamically, addresses incoming arguments via r30, uses volatile VRs v0v10, maintains VRSAVE, does not alter the nonvolatile elds of the CR and does no dynamic stack allocation. Saving and restoring the VRs and updating vrsave can occur in either order. A function that does not need to address incoming arguments but does align the stack pointer dynamically can recover the address of the original stack pointer with an instruction such as lwz r11,0(sp). The computation of len in the example and whether to use subc or addi to align the stack dynamically is based on the size of the components of the frame. Starting with the components at higher addresses, the value of len is computed by adding the size of the FPR save area, the GPR save area, the CR save word, and the VRSAVE word. The size of the alignment padding space is then computed as the smallest number of bytes needed to make len a multiple of 16. In the example below, the alignment padding space is 4 bytes. Consequently, subc is used to dynamically align the stack by increasing the size of the alignment padding space by either 0 or 8 bytes. Had the alignment padding space been 8 or 12 bytes, addi would be used to align the stack dynamically by decreasing the size of the alignment padding space by either 0 or 8 bytes. Continuing, the value of len is updated by adding the size of the vector register save area, the local variable space, the outgoing parameter list area, and the LR save word. The size of the local variable space is adjusted so that the overall value of len is a multiple of 16. The following is SVR4 ABI and EABI prologue and epilogue sample code.
function: mflr stw ori rlwinm subfic stwux bl addi bl mflr addi r0 r0,4(sp) r11,sp,0 r12,sp,0,28,28 r12,r12,-len sp,sp,r12 _savefpr_14 r11,r11,-144 _savegpr_14_g r31 r30,r11,144 # # # # # # # # # # # # Save return address ... ... in callerOs frame. Save end of fpr save area 0 or 8 based on SP alignment Add in stack length Establish new aligned frame Save floating-point registers Compute end of gpr save area Save gprs and fetch GOT ptr Place GOT ptr in r31 Save CR here if necessary Save pointer to incoming
3-4
AltiVec Technology Programming Interface Manual
MOTOROLA
The Stack Frame # # # # # # # # # # # # # # # # # arguments Save VRSAVE ... ... in callerOs frame. Use v0-v10 and ... v20-v31 (for example) Update VRSAVE Compute end of vr save area Save VRs Body of function Address of vr save area to r0 Restore VRs Fetch prior value of VRSAVE Restore VRSAVE Address of gpr save area to r11 Restore gprs Address of fpr save area to r11 Restore fprs and return
mfspr stw oris ori mtspr addi bl addi bl lwz mtspr addi bl addi bl
r0,vrsave r0,-220(r30) r0,r0,0xff70 r0,r0,0x0fff vrsave,r0 r0,sp,len-224 _savevr20 r0,sp,len-224 _restvr20 r0,-220(r30) vrsave,r0 r11,r30,-144 _restgpr_14 r11,r11,144 _restfpr_14_x
3.3.2 Apple Macintosh ABI and AIX ABI Stack Frame
Figure 3-2 shows how the Apple Macintosh ABI and AIX ABI stack frame is set up.
High Address SP Back chain Floating-point register save area General register save area VRSAVE save word Alignment padding Vector register save area Local variable space Parameter list area Saved TOC Reserved for Binders Reserved for Compilers LR save word CR save word Back chain Low Address NEW NEW NEW
Figure 3-2. Apple Macintosh ABI and AIX ABI Stack Frame
The Apple Macintosh ABI and AIX ABI stack frame allow the use of a 220-byte area at a negative offset from the stack pointer. This area can be used to save non-volatile registers before the stack pointer has been updated. This size of this area is not changed. Depending
MOTOROLA Chapter 3. Application Binary Interface (ABI) 3-5
The Stack Frame
on the number of non-volatile registers saved, it may be necessary to update the stack pointer before saving the VRs. However, it remains unnecessary to update the stack pointer before saving the GPRs or FPRs. The size of the VR save area and the presence of the VRSAVE word are determined by a traceback table entry. The spare3 2-bit eld in the xed portion of the traceback table is changed to the following: This 1-bit eld is set if the procedure saves non-volatile VRs in the vector register save area, saves vrsave in the VRSAVE word, species the number of vector parameters, or uses AltiVec instructions. spare4 Reserved 1-bit eld. When the has_vec_info bit is set, all the following optional elds of the traceback table are present following the position of the alloca_reg eld.
has_vec_info
This 6-bit eld represents the number of non-volatile VRs saved by this procedure. Because the last register saved is always v31, a value of 2 in vr_saved indicates that v30 and v31 are saved. saves_vrsave If this routine saves vrsave, this 1-bit eld is set. If so, the VRSAVE word in the register save area must be used to restore the prior value before returning from this procedure. has_varargs If this function has a variable argument list, this 1-bit eld is set. Otherwise, it is set to 0. vectorparms This 7-bit eld records the number of vector parameters. The eld may be set to a non-zero value for a procedure with vector parameters that does not have a variable argument list. Otherwise, parmsonstk must be set. vec_present This 1-bit eld is set if AltiVec instructions are performed within the procedure. The following code shows sample prologue and epilogue code with full saves of all the nonvolatile oating-point, general, and VRs for a stack frame of less than 32 Kbytes. The code example dynamically aligns the stack pointer, addresses incoming arguments via r31, uses volatile VRs v0v10, maintains VRSAVE, does not alter the non-volatile elds of the CR and does no dynamic stack allocation. Saving and restoring the VRs and updating the vrsave register can occur in either order. A function that does not need to address incoming arguments but does align the stack pointer dynamically can recover the address of the original stack pointer with an instruction such as lwz r11,0(sp).
vr_saved
The computation of len in the example and whether to use subc or addi to align the stack dynamically are based on the size of the components of the frame. Starting with the components at higher addresses, the value of len is computed by adding the size of the oating-point register save area, the general register save area, and the VRSAVE word. The size of the alignment padding space is then computed as the smallest number of bytes
3-6
AltiVec Technology Programming Interface Manual
MOTOROLA
The Stack Frame
needed to make len a multiple of 16. In the example below, the alignment padding space is 0 bytes. Consequently, subc is used to align the stack dynamically by increasing the size of the alignment padding space by either 0 or 8 bytes. Had the alignment padding space been 8 or 12 bytes, addi is used to align the stack dynamically by decreasing the size of the alignment padding space by either 0 or 8 bytes. Continuing, the value of len is updated by adding the size of the vector register save area, the local variable space, the outgoing parameter list area, and 24 for the size of the link area. The size of the local variable space is adjusted so that the overall value of len is a multiple of 16. The following is Apple Macintosh ABI and AIX ABI prologue and epilogue sample code.
function: mflr stw bl stmw ori rlwinm subfic stwux mfspr stw oris ori mtspr addi bl addi bl lwz mtspr ori lmw lwz mtlr b r0 r0,8(sp) _savef14 r13,-220(sp) r31,sp,0 r12,sp,0,28,28 r12,r12,-len sp,sp,r12 r0,vrsave r0,-224(r31) r0,r0,0xff70 r0,r0,0x0fff vrsave,r0 r0,sp,len-224 _savev20 r0,sp,len-224 _restv20 r0,-224(r31) vrsave,r0 sp,r31 r13,-220(sp) r0,8(sp) r0 _restf14 # # # # # # # # # # # # # # # # # # # # # # # # # # # Save return address ... ... in the callerOs frame. Save floating-point registers. Save gprs in gpr save area Save CR here if necessary Save pointer to incoming arguments 0 or 8 based on SP alignment Add in stack length Establish new aligned frame Save VRSAVE ... ... in callerOs frame. Use v0-v10 v20-v31 and ... v20-v31 (for example) Update VRSAVE Compute end of VRSAVE area Save VRs Body of function Address of VRSAVE area to r0 Restore VRs Fetch prior value of VRSAVE Restore Vrsave Restore SP Restore gprs Restore return address ... ... and return from _restf14 Restore fprs and return
3.3.3 Vector Register Saving and Restoring Functions
The vector register saving and restoring functions described in this section are not part of the ABI. They are dened here only to encourage uniformity among compilers in the code used to save and restore VRs. On entry to the functions described in this section, r0 contains the address of the word just beyond the end of the vector register save area, and they leave r0 undisturbed. They modify the value of r12. The following code is an example of saving a vector register.
_savev20: addi stvx _savev21: addi r12,r0,-192 v20,r12,r0 r12,r0,-176 # save v20
MOTOROLA
Chapter 3. Application Binary Interface (ABI)
3-7
The Stack Frame stvx _savev22: addi stvx _savev23: addi stvx _savev24: addi stvx _savev25: addi stvx _savev26: addi stvx _savev27: addi stvx _savev28: addi stvx _savev29: addi stvx _savev30: addi stvx _savev31: addi stvx blr v21,r12,r0 r12,r0,-160 v22,r12,r0 r12,r0,-144 v23,r12,r0 r12,r0,-128 v24,r12,r0 r12,r0,-112 v25,r12,r0 r12,r0,-96 v26,r12,r0 r12,r0,-80 v27,r12,r0 r12,r0,-64 v28,r12,r0 r12,r0,-48 v29,r12,r0 r12,r0,-32 v30,r12,r0 r12,r0,-16 v31,r12,r0 # save v21 # save v22 # save v23 # save v24 # save v25 # save v26 # save v27 # save v28 # save v29 # save v30 # save v31 # return to prologue
The following code shows how to restore a vector register.
_restv20: addi lvx _restv21: addi lvx _restv22: addi lvx _restv23: addi lvx _restv24: addi lvx _restv25: addi lvx _restv26: addi lvx _restv27: addi lvx _restv28: addi lvx _restv29: addi lvx _restv30: addi lvx _restv31: addi lvx blr r12,r0,-192 v20,r12,r0 r12,r0,-176 v21,r12,r0 r12,r0,-160 v22,r12,r0 r12,r0,-144 v23,r12,r0 r12,r0,-128 v24,r12,r0 r12,r0,-112 v25,r12,r0 r12,r0,-96 v26,r12,r0 r12,r0,-80 v27,r12,r0 r12,r0,-64 v28,r12,r0 r12,r0,-48 v29,r12,r0 r12,r0,-32 v30,r12,r0 r12,r0,-16 v31,r12,r0 # restore v20 # restore v21 # restore v22 # restore v23 # restore v24 # restore v25 # restore v26 # restore v27 # restore v28 # restore v29 # restore v30 # restore v31 # return to prologue
3-8
AltiVec Technology Programming Interface Manual
MOTOROLA
Function Calls
3.4 Function Calls
This section applies to all user functions. Note that the intrinsic AltiVec operations are not treated as function calls, so these comments donOt apply to those operations. The rst twelve vector parameters are placed in VRs v2v13. If fewer (or no) vector type arguments are passed, the unneeded registers are not loaded and contain undened values upon entry to the called function. Functions that declare a vector data type as a return value place that return value in v2. Any function that returns a vector type or has a vector parameter requires a prototype. This requirement enables the compiler to avoid shadowing VRs in GPRs.
3.4.1 SVR4 ABI and EABI Parameter Passing and Varargs
The SVR4 ABI algorithm for passing parameters considers the arguments as ordered from left (rst argument) to right, although the order of evaluation of the arguments is unspecied. The vector arguments maintain their ordering. The algorithm is modied to add vr to contain the number of the next available vector register. In the INITIALIZE step, set vr=2. In the SCAN loop, add a case for the next argument VECTOR_ARG as follows: If the next argument is in the variable portion of a parameter list, set vr=14. This leaves the xed portion of a variable argument list in VRs and places the variable portion in memory. If vr>13 (that is, there are no more available VRs), go to OTHER. Otherwise, load the argument value into vector register vr, set vr to vr+1, and go to SCAN.
The OTHER case is modied only to understand that vector arguments have 16-byte size and alignment. Aggregates are passed by reference (i.e., converted to a pointer to the object), so no change is needed to deal with 16-byte aligned aggregates. The va_list type is unchanged, but an additional va_arg_type value of 4 named arg_VECTOR is dened for the __va_arg() interface. Since vector parameters in the variable portion of a parameter list are passed in memory, the __va_arg() routine can access the vector value from the overflow_arg_area value in the va_list type.
3.4.2 Apple Macintosh ABI and AIX ABI Parameter Passing without Varargs
If the function does not take a variable argument list, the non-vector parameters are passed in the same registers and stack locations as they would be if the vector parameters were not present. The only change is that aggregates and unions may be 16-byte aligned instead of 4-byte aligned. This can result in words in the parameter list being skipped for alignment (padding) and left with undened value.
MOTOROLA
Chapter 3. Application Binary Interface (ABI)
3-9
malloc(), vec_malloc(), and new
The rst twelve vector parameters are placed in v2v13. These parameters are not shadowed in GPRs. They are not allocated space in the memory argument list. Any additional vector parameters are passed through memory on the program stack. They appear together, 16-byte aligned, and after any non-vector parameters.
3.4.3 Apple Macintosh ABI and AIX ABI Parameter Passing with Varargs
The va_list type continues to be a pointer to the memory location of the next parameter. If va_arg() accesses a vector type, the va_list value must rst be aligned to a 16-byte boundary. A function that takes a variable argument list has all parameters, including vector parameters, mapped in the argument area as ordered and aligned according to their type. The rst 8 words of the argument area are shadowed in the GPRs only if they correspond to the variable portion of the parameter list. The rst parameter word is named PW0 and is at stack offset 0x24. A vector parameter must be aligned on a 16-byte boundary. This means there are two cases where vector parameters are passed in GPRs. If a vector parameter is passed in PW2:PW5 (stack offset 0x32), its value is placed in GPR5GPR8. If a vector parameter is passed in PW6:PW9 (stack offset 0x48), its value PW6:PW7 is placed in GPR9 and GPR10 and the value PW8:PW9 is placed on the stack. All parameters after the rst 8 words of the argument area that correspond to the variable portion of the parameter list are passed in memory. In the xed portion of the parameter list, vector parameters are placed in v2v13, but are provided a stack location corresponding to their position in the parameter list.
3.5 malloc(), vec_malloc(), and new
In the interest of saving space, malloc(), calloc(), and realloc() are not required to return a 16-byte aligned address. Instead, a new set of memory management functions is introduced that return a 16-byte aligned address. The new functions are named vec_malloc(), vec_calloc(), vec_realloc(), and vec_free(). The two sets of memory management functions may not be interchanged: memory allocated with malloc(), calloc(), or realloc() may only be freed with free() and reallocated with realloc(); memory allocated with vec_alloc(), vec_calloc(), or vec_realloc() may only be freed with vec_free() and reallocated with vec_realloc(). The user must use the appropriate set of functions based on the alignment requirement of the type involved. In the case of the C++ operator new, the implementation of new is required to use the appropriate set of functions based on the alignment requirement of the type.
3-10
AltiVec Technology Programming Interface Manual
MOTOROLA
setjmp() and longjmp()
3.6 setjmp() and longjmp()
The context required to be saved and restored by setjmp(), longjmp(), and related functions now includes the 12 non-volatile VRs and vrsave. The user types sigjmp_buf and jmp_buf are extended by 48 words. An unused word in the existing jmp_buf is used to save VRSAVE.
Table 3-3. ABI Specifications for setjmp() and longjmp()
ABI AIX ABI Apple Macintosh ABI SVR4 ABI and EABI jmp_buf Size 448 448 448 VRSAVE Offset 100 16 248 v20v31 Offset 256 256 256
There are complications in implementing setjmp() and longjmp(): The user types must be enlarged. Existing applications that use these interfaces will have to be recompiled even though they make no use of the AltiVec instruction set. The implementation that saves and restores the VRs can only assume that the v20v31 offset is aligned on a 4-byte boundary. A method where the VRs are saved at the rst aligned location in the jmp_buf was rejected because the user types are only 4-byte aligned and may be copied by value to a location with different alignment. The implementation that saves and restores the VRs and vrsave uses instructions that do not exist on a non-AltiVec enabled PowerPC implementation. The method for testing whether the AltiVec instructions operate is privileged. One solution is to dene an O/S interface that saves and restores the VRs and vrsave if and only if the AltiVec instructions exist and are enabled.
A simple solution to these complications is to dene setjmp(), longjmp() and the user types sigjmp_buf and jmp_buf differently when compiled with an AltiVec-enabled compiler (i.e., when __VEC__ is dened). These bindings result in a larger jmp_buf with 16-byte alignment. The bindings for setjmp() and longjmp() unconditionally save and restore the vector state. Such an implementation does not save and restore the vector state when these interfaces are compiled without an AltiVec-enabled compiler. The application must ensure that these two sets of bindings are not mixed.
3.7 Debugging Information
Extensions to the debugging information format are required to describe vector types and vector register locations. While vector types can be described as xed length arrays of existing C types, the implementation should describe these as new fundamental types. Doing so allows a debugger to provide mechanisms to display vector values, assign vector values, and create vector literals.
MOTOROLA
Chapter 3. Application Binary Interface (ABI)
3-11
printf() and scanf() Control Strings
This section is subject to change. It is intended to describe the extensions to the standard debugging formats: xcoff stabstrings, DWARF version 1.1.0, and DWARF version 2.0.0. Xcoff stabstrings used in the AIX ABI and adopted by the Apple Macintosh ABI support the location of objects in GPRs and FPRs. The stabstring code ORO describes a parameter passed by value in the given GPR; OrO describes a local variable residing in the given GPR. The stabstring code OXO describes a parameter passed by value in the given vector register; OxO describes a local variable residing in the given vector register. DWARF 2.0 debugging DIEs support the location of objects in any machine register. The SVR4 ABI species the DWARF register number mapping. The VRs v0v31 are assigned register numbers 11241155. The VRSAVE SPR is SPR256 and is assigned the register number 356.
3.8 printf() and scanf() Control Strings
The conversion specications in control strings for input functions (fscanf, scanf, sscanf) and output functions (fprintf, printf, sprintf, vfprintf, vprintf, vsprintf) are extended to support vector types.
3.8.1 Output Conversion Specications
The output conversion specications have the following general form:
%[][][][]
where,

::= |

::= | ::= O-O | O+O | O0O | O#O | O O ::= O,O | O;O | O:O | O_O ::= | O*O ::= O.O ::= OllO | OLO | OlO | OhO | ::= OvlO | OvhO | OlvO | OhvO | OvO ::= | | | | ::= OcO ::= OsO | OPO ::= OeO | OEO | OfO | OgO | OGO ::= OdO | OiO | OuO | OoO | OpO | OxO | OXO ::= OnO | O%O
The extensions to the output conversion specication for vector types are shown in bold. The indicates that a single vector value is to be converted. The vector value is displayed in the following general form:
value1 C value2 C ... C valuen
3-12
AltiVec Technology Programming Interface Manual
MOTOROLA
printf() and scanf() Control Strings
where C is a separator character dened by and there are 4, 8, or 16 output values depending on the each formatted according to the , as follows: A of OvlO or OlvO consumes one argument and modies the conversion; it should be of type vector signed int, vector unsigned int, or vector bool int; it is treated as a series of four 4-byte components. A of OvhO or OhvO consumes one argument and modies the conversion; it should be of type vector signed short, vector unsigned short, vector bool short, or vector pixel; it is treated as a series of eight 2-byte components. A of OvO with or consumes one argument; it should be of type vector signed char, vector unsigned char, or vector bool char; it is treated as a series of sixteen 1-byte components. A of OvO with consumes one argument; it should be of type vector float; it is treated as a series of four 4-byte oating-point components. All other combinations of and are undened.

The default value for the separator character is a space unless the OcO conversion is being used. For the OcO conversion the default separator character is null. Only one separator character may be specied in . Examples:
vector signed char s8 = vector signed char(OaO,ObO,O O,OdO,OeO,OfO, OgO,OhO,OiO,OjO,OkO,OlO, Om,O,O,O,OoO,OpO); vector unsigned short u16 = vector unsigned short(1,2,3,4,5,6,7,8); vector signed int s32 = vector signed int(1, 2, 3, 99); vector float f32 = vector float(1.1, 2.2, 3.3, 4.39501); printf(Os8 = %vc\nO, s8); printf(Os8 = %,vc\nO, s8); printf(Ou16 = %vhu\nO, u16); printf(Os32 = %,2lvd\nO, s32); printf(Of32 = %,5.2vf\nO, f32);
This code produces the following output:
s8 s8 u16 s32 f32 = ab defghijklm,op = a,b, ,d,e,f,g,h,i,j,k,l,m,,,o,p =12345678 = 1, 2, 3,99 = 1.10 ,2.20 ,3.30 ,4.40
MOTOROLA
Chapter 3. Application Binary Interface (ABI)
3-13
printf() and scanf() Control Strings
3.8.2 Input Conversion Specications
The input conversion specications have the following general form:
%[][][]
where,

::= ::= ::= ::= ::= ::= ::= ::= ::= ::= ::= O*O | [O*O] | [O*O] O,O | O;O | O:O | O_O OllO | OLO | OlO | OhO | OvlO | OvhO | OlvO | OhvO | OvO | | | | OcO OsO | OPO OeO | OEO | OfO | OgO | OGO OdO | OiO | OuO | OoO | OpO | OxO | OXO OnO | O%O | O[O
The extensions to the input conversion specication for vector types are shown in bold. The indicates that a single vector value is to be scanned and converted. The vector value to be scanned is in the following general form:
value1 C value2 C ... C valuen
where C is a separator sequence dened by (the separator character optionally preceded by whitespace characters) and 4, 8, or 16 values are scanned depending on the each value scanned according to the , as follows: A of OvlO or OlvO consumes one argument and modies the conversion; it should be of type vector signed int * or vector unsigned int * depending on the specication; four values are scanned. A of OvhO or OhvO consumes one argument and modies the conversion; it should be of type vector signed * or vector unsigned short * depending on the specication; 8 values are scanned. A of OvO with or consumes one argument; it should be of type vector signed char * or vector unsigned char * depending on the or specication; 16 values are scanned. A of OvO with consumes one argument; it should be of type vector float *; four oating-point values are scanned. All other combinations of and are undened.

For the OcO conversion the default separator character is null, and the separator sequence does not include whitespace characters preceding the separator character. For other than the
3-14
AltiVec Technology Programming Interface Manual
MOTOROLA
printf() and scanf() Control Strings
OcO conversions, the default separator character is a space, and the separator sequence does include whitespace characters preceding the separator character.
If the input stream reaches end-of-le or there is a conict between the control string and a character read from the input stream, the input functions return EOF and do not assign to their vector argument. When a conict occurs, the character causing the conict remains unread and is processed by the next input operation. Examples:
sscanf(Oab defghijklm,opO, O%vcO, &s8); sscanf(Oa,b, ,d,e,f,g,h,i,j,k,l,m,,,o,pO, O%,vcO, &s8); sscanf(O1 2 3 4 5 6 7 8O, O%vhuO, &u16); sscanf(O1, 2, 3,99O, O%,2lvdO, &s32); sscanf(O1.10 ,2.20 ,3.30 ,4.40O ,O%,5vfO ,&f32);
This is equivalent to:
vector signed char s8 = vector signed char(OaO,ObO,O O,OdO,OeO,OfO, OgO,OhO,OiO,OjO,OkO,OlO, OmO,O,O,OoO,OpO); vector unsigned short u16 = vector unsigned short(1,2,3,4,5,6,7,8); vector signed int s32 = vector signed int(1, 2, 3, 99); vector float f32 = vector float(1.1, 2.2, 3.3, 4.4);
MOTOROLA
Chapter 3. Application Binary Interface (ABI)
3-15
printf() and scanf() Control Strings
3-16
AltiVec Technology Programming Interface Manual
MOTOROLA
Chapter 4 AltiVec Operations and Predicates
40 40
The following three subsections provide some background information that is helpful in understanding the descriptions provided for each operation and predicate. This is followed by a detailed listing of AltiVec operations followed by a separate section describing the AltiVec predicates. The nal subsection contains compiler notes for handling predicates.
4.1 Vector Status and Control Register
The vector status and control register (VSCR) is a special 32-bit vector register shown in Figure 4-1.
Reserved 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NJ 0 14 15 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SAT 30 31
Figure 4-1. Vector Status and Control Register (VSCR)
The VSCR has two dened bits, the AltiVec non-Java mode (NJ) bit (VSCR[15]) and the AltiVec saturation (SAT) bit (VSCR[31]); the remaining bits are reserved. The vec_mfvscr operation moves the VSCR to a vector register. When moved, the 32-bit VSCR is rightjustied in the 128-bit vector register, and the upper 96 bits VRx[095] of the vector register are cleared, so the VSCR in a vector register looks as shown in Figure 4-2.
Reserved 0 0 95 96 0 NJ 110 111 112 0
SAT
126 127
Figure 4-2. VSCR Moved to a Vector Register
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-1
Vector Status and Control Register
VSCR bit settings are shown in Table 4-1.
Table 4-1. VSCR Field Descriptions
Bits
Name --
Description Reserved. Software is permitted to write any value to such a bit. A subsequent reading of the bit returns 0 if the value last written to the bit was 0 and returns an undefined value (0 or 1) otherwise. Non-Java. A mode control bit that determines whether AltiVec floating-point operations will be performed in a Java-IEEE-C9X-compliant mode or a possibly faster non-Java/non-IEEE mode. 0 The Java-IEEE-C9X-compliant mode is selected. Denormalized values are handled as specified by Java, IEEE, and C9X standard. 1 The non-Java/non-IEEE-compliant mode is selected. If an element in a source vector register contains a denormalized value, the value 0 is used instead. If an instruction causes an underflow exception, the corresponding element in the target VR is cleared to 0. In both cases the 0 has the same sign as the denormalized or underflowing value. This mode is described in detail in the AltiVec Programming Environments Manual. Reserved. Software is permitted to write any value to such a bit. A subsequent reading of the bit returns 0 if the value last written to the bit was 0 and returns an undefined value (0 or 1) otherwise. Saturation. A sticky status bit indicating that some field in a saturating instruction saturated since the last time SAT was cleared. In other words, when SAT = 1 it remains set until it is cleared by an explicit instruction. 0 Indicates no saturation occurred, an instruction can explicitly clear this bit. 1 The AltiVec saturate instruction implicitly sets the SAT field when saturation has occurred on the results one of the AltiVec instructions or vector operations having saturate in its name.
0-14
15
NJ
16-30
--
31
SAT
After vec_mfvscr executes, the result in the target vector register is architecturally precise. That is, it reects all updates to the SAT bit that could have been made by vector instructions logically preceding it in the program ow, and further, it does not reect any SAT updates that may be made to it by vector instructions logically following it in the program ow. Reading the VSCR can be much slower than typical AltiVec instructions, and therefore care must be taken in reading it to avoid performance problems. The rst six 16-bit elements of the result are 0. The seventh element of the result contains the high-order 16 bits of the VSCR (including NJ). The eighth element of the result contains the low-order 16 bits of the VSCR (including SAT). The setting of the Non-Java mode (NJ) bit (VSCR[15]) affects some vector oating-point operations. The other special bit (VSCR[31]) is the AltiVec Saturation (SAT) bit that is set when an operation generates a saturated result. Saturation is dened with respect to the type of resulting element The result d of saturating a value x with respect to a type t means:
d = max (minimum(t), min(maximum(t), x)) where minimum(t) is the algebraically smallest value representable by a number of type t and maximum(t) is the algebraically largest value by a number of type t.
For each operation, where applicable, the effects of the NJ bit setting and/or the effects on the SAT bit are described in the operation description.
4-2
AltiVec Technology Programming Interface Manual
MOTOROLA
Byte Ordering
4.2 Byte Ordering
The default mapping for AltiVec ISA is PowerPC big-endian. The endian support of the PowerPC architecture does not address any data element larger than a double word; the basic memory unit for vectors is a quad word. Big-endian byte ordering is shown in Figure 4-3.
Quad Word
High-Order Word 0 Low-Order High-Order Half Word for Half Word for Word 0 Word 0 High-Order Half Word Half Word 0 HighOrder Byte Byte 0
0 8
Word 1
Word 2
Low-Order Word 3
Low-Order Half Word Half Word 1 Half Word 2 Half Word 3 Half Word 4 Half Word 5 Half Word 6 Half Word 7 LowOrder Byte
Byte 1
Byte 2
16
Byte 3
24
Byte 4
32
Byte 5
40
Byte 6
48
Byte 7
56
Byte 8
64
Byte 9
72
Byte 10
80
Byte 11
88
Byte 12
96
Byte 13
104
Byte 14
112
Byte 15
120 127
MSB
(HighOrder)
LSB
(LowOrder)
Figure 4-3. Big-Endian Byte Ordering for a Vector Register
As shown in Figure 4-3, the vector register elements are numbered using big-endian byte ordering. For example, the high-order (or most signicant) byte element is numbered 0 and the low-order (or least signicant) byte element is numbered 15. When dening high-order and low-order for elements in a vector register, be careful not to confuse its meaning based on the bit numbering. For example, in Figure 4-3 the high-order half word for word 0 would be half word 0 (bits 07), and the low-order half word for word 0 would be half word 1 (bits 815).
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-3
Notation and Conventions
4.3 Notation and Conventions
Operation and predicate functionality is described in this section by a semiformal pseudocode language. Table 4-2 lists the pseudocode notation and conventions used throughout the section.
Table 4-2. Notation and Conventions
Notation/Convention Meaning Assignment Add, single-precision floating-point add Subtract, single-precision floating-point subtract Multiply, single-precision floating-point multiply Integer division with non-negative remainder Less than, single-precision floating-point less than Less than or equal, single-precision floating-point less than or equal Greater than, single-precision floating-point greater than Greater than or equal, single-precision floating-point greater than or equal Not equal, floating-point not equal Equal, floating-point equal Positive infinity, negative infinity Concatenation of two bit strings (e.g., 010 || 111 is the same as 010111) AND bit-wise operator OR bit-wise operator Exclusive-OR bit-wise operator NOT logical operator (one's complement) A number expressed in binary format A number expressed in hexadecimal format These symbols represent whole operands in an AltiVec operation or predicate. This is typically a vector, but in some operations it can represent a specific length literal value. These symbols represent the ith component elements of a vector a, b, c, or d, respectively. Absolute value of x Borrow out of the difference of x and y Align x to a y-byte boundary. Carry out of the sum of x and y The smallest single-precision floating-point integer that is greater than or equal to x Do loop. * Do the following starting at x and iterating to y * Indenting shows range. * "To" and/or "by" clauses specify incrementing an iteration variable. * "While" clauses give termination conditions. Indicates the end of a do loop
+, +fp -, -fp *, *fp
/
<, , >fp , fp !=, !=fp =, =fp +, - || & | A 0bnnnn 0xnnnn a,b,c,d
ai,bi,ci,di ABS(x) BorrowOut(x - y) BoundAlign(x,y) CarryOut(x + y) Ceil(x) do i=x to y
end
4-4
AltiVec Technology Programming Interface Manual
MOTOROLA
Notation and Conventions
Table 4-2. Notation and Conventions (Continued)
Notation/Convention Floor(x) FP2 Est(x) FPLog2Est(x) FPRecipEst(x) if...then...else... ISNaN(x) ISNUM(x) MAX(x,y) Meaning The largest single-precision floating-point integer that is less than or equal to x 3-bit-accurate floating-point estimate of 2**x 3-bit-accurate floating-point estimate of log2(x) 12-bit-accurate floating-point estimate of 1/x Conditional execution, indenting shows range, else is optional. Result is 1 if x is a not a number (NaN) and 0 is x is a number Result is 1 if x is a number and 0 is x is not a number (NaN) Returns the larger of x or y. For floating-point values, the following applies: * the maximum of +0.0 and -0.0 is +0.0 * the maximum of any value and a NaN is a QNaN Value at memory location x of size y bytes Returns the smaller of x or y. For floating-point values, the following applies: * the minimum of +0.0 and -0.0 is -0.0 * the minimum of any value and a NaN is a QNaN Remainder of x/y Not a Number, non-numeric Result is -x Result is 1 if x or y is a NaN or if x < y, and 0 otherwise Result is 1 if x or y is a NaN or x y, and 0 oherwise Result is 1 if x or y is a NaN or x > y, and 0 otherwise Result is 1 if x or y is a NaN or x y, and 0 otherwise NaN that propagates through most arithmetic operations without signalling an exception Result is a 12-bit accurate single-precision floating-point estimate of the reciprocal of the square root of x The single-precision floating-point integer that is nearest in value to x (in case of a tie, the even single-precision floating-point value is used). The largest single-precision floating-point integer that is less than or equal to x if x0, or the smallest single-precision floating-point integer that is greater than or equal to x if x<0 IEEE rounding to nearest floating-point number Result of rotating x left by y bits Represents a propagated sign bit in a figure y Saturate(x) means saturate x to the type of y Shift the contents of x right or left y bits, clearing vacated bits (logical shift). This operation is used for shift instructions. Shift the contents of x right y bits, copying the sign bit to the vacated bits (algebraic shift) Sign-extend x on the left with sign bits (that is, with copies of bit 0 of x) to produce y-bit value; represented in figures by a single S Result of converting the signed integer x to a y-bit floating-point value using Round-to-Nearest mode
x
MEM(x,y) MIN(x,y)
mod(x,y) NaN NEG(x) NGE(x,y) NGT(x,y) NLE(x,y) NLT(x,y) QNaN RecipSQRTEst(x) RndToFPINear(x) RndToFPITrunc(x)
RndToFPNearest(x) ROTL(x,y) S Saturate(x) ShiftRight(x,y) ShiftLeft(x,y) ShiftRightA(x,y) SignExtend(x,y) SIToFP(x,y)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-5
Notation and Conventions
Table 4-2. Notation and Conventions (Continued)
Notation/Convention UIToUImod(x,y) Undefined xi x{i} x[y:x]
x0 x1 xy
Meaning Truncate an unsigned integer x to y-bit unsigned integer An undefined value. The value may vary from one implementation to another, and from one execution to another on the same implementation. The ith element of vector x where the size and type of the element are determined by the type of x The ith byte of vector x Bits i through j of vector x, where i can equal j if referring to a single bit A bit string of x zeros A bit string of x ones A bit string of x copies of y, for example, 31 = 111 x raised to the nth power
xn
Precedence rules for pseudocode operators are summarized in Table 4-3.
Table 4-3. Precedence Rules
Operators x{i}, x[y], x[y:z] function evaluation
xy
Associativity Left to right Right to left Right to left Left to right Left to right Left to right Left to right Left to right Left to right None
or replication, xy or exponentiation
unary -, *, *fp, / +, +fp, -, -fp || =, =fp,!=,!=fp, <, , >fp, , fp &, |
A
Operators higher in Table 4-3 are applied before those lower in the table. Operators at the same level in the table associate from left to right, from right to left, or not at all, as shown. For example, OO (unary minus) associates from left to right, so a b c = (a b) c. Parentheses are used to override the evaluation order implied by Table 4-3, or to increase clarity; parenthesized expressions are evaluated before serving as operands.
4-6
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
4.4 Generic and Specic AltiVec Operations
The AltiVec operations are organized alphabetically by generic operation name with a denition of the permitted generic and specic AltiVec operations. The operations are listed in alphabetical order by mnemonic. Figure 4-4 shows the format for each operation description page.
Operation mnemonic Operation name Pseudocode description of operation
vec_cmpge
Vector Compare Greater Than or Equal
vec_cmpge
d = vec_cmpge(a,b)
do i=0 to 3 if ai fp bi then di else di end
32
32
1 0
Text description of operation
Each element of the result is all 1s if the corresponding element of a is greaterthanor equal to the cor responding e lement of b. Otherwise, t returns all i 0s. If VSCR[NJ] = 1, everydenormalize floating po operandelement is t uncated to 0 d int r before thecomparison s made. i The valid argument types and the correspon ding result type for d = vec_cmpge(a,b) are shown in Figure4-31.
Figure showing operation usage and mapping
Element->
0
1
2
3 a b
d
d vector bool int
a vector float
b vector float
maps to vcmpgefp d,a,b
Figure 4-31. Compare Greater-Than- or-E qual of Four Float ing- Point Elements (32-Bit)
4-26
AltiV ec Technology Progr amming Inter face Manual
MOTOROLA
Figure 4-4. Operation Description Format
Where possible, each description is supported by reference gures indicating data modications and including a table that lists: the valid set of argument types for that generic AltiVec operation, the result type for each set of argument types, and the specic AltiVec instruction(s) generated for that set of arguments.
Any operation not explicitly permitted in this section is prohibited.
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-7
Generic and Specific AltiVec Operations
vec_abs
Vector Absolute Value
vec_abs
d = vec_abs(a)
n number of elements do i=0 to n-1 di ABS(ai) end
Each element of the result is the absolute value of the corresponding element of a. The arithmetic is modular for integer types. For vector float argument types, the operation is independent of VSCR[NJ]. Programming note: Unlike other operations, vec_abs maps to multiple instructions. The programmer should consider alternatives. For example, to compute the absolute difference of two vectors a and b, the expression vec_abs(vec_sub(a,b)) expands to four instructions. A simpler method uses the expression vec_sub(vec_max(a,b), vec_min(a,b)) that expands to three instructions. The valid combinations of argument types and the corresponding result types for d = vec_abs(a) are shown in Figure 4-5, Figure 4-6, Figure 4-7, and Figure 4-8. It is necessary to use the generic name since there is no specic operation for vec_abs.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
a ABS ABS ABS ABS ABS ABS ABS ABS ABS ABS ABS ABS ABS ABS ABS ABS d
d
a
maps to vspltisb z,0 vsububm t,z,a vmaxsb d,a,t
vector signed char
vector signed char
Figure 4-5. Absolute Value of Sixteen Integer Elements (8-bit)
4-8
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3
4
5
6
7
a ABS ABS ABS ABS ABS ABS ABS ABS d
d
a
maps to vspltisb z,0 vsubuhm t,z,a vmaxsh d,a,t
vector signed short
vector signed short
Figure 4-6. Absolute Value of Eight Integer Elements (16-bit)
Element(R)
0
1
2
3
a ABS ABS ABS ABS d
d
a
maps to vsplisb z,0 vsubuwm t,z,a vmaxsw d,a,t
vector signed int
vector signed int
Figure 4-7. Absolute Value of Four Integer Elements (32-bit)
Element(R)
0
1
2
3
a ABS ABS ABS ABS d
d
a
maps to vspltisw m,-1 vslw t,m,m vandc d,a,t
vector float
vector float
Figure 4-8. Absolute Value of Four Floating-Point Elements (32-bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-9
Generic and Specific AltiVec Operations
vec_abss
Vector Absolute Value Saturated
vec_abss
d = vec_abss(a)
n number of elements do i=0 to n-1 di Saturate(ABS(ai)) end
Each element of the result is the absolute value of the corresponding element of a. The arithmetic is saturated for integer types. If saturation occurs, VSCR[SAT] is set (see Table 4-1). Programming note: Unlike other operations, vec_abss maps to multiple instructions. The programmer should consider alternatives. For example, to compute the absolute difference of two vectors a and b, the expression vec_abss(vec_subs(a,b)) expands to four instructions. A simpler method uses the expression vec_subs(vec_max(a,b),vec_min(a,b)) that expands to three instructions. The valid combinations of argument types and the corresponding result types for d = vec_abss(a) are shown in Figure 4-9, Figure 4-10, and Figure 4-11. It is necessary to use the generic name since there is no specic operation for vec_abss.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
a ABS ABS ABS ABS ABS ABS ABS ABS ABS ABS ABS ABS ABS ABS ABS ABS d
d
a
maps to vspltisb z,0 vsubsbs t,z,a vmaxsb d,a,t
vector signed char
vector signed char
Figure 4-9. Saturated Absolute Value of Sixteen Integer Elements (8-bit)
4-10
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3
4
5
6
7
a ABS ABS ABS ABS ABS ABS ABS ABS d
d
a
maps to vspltisb z,0 vsubshs t,z,a vmaxsh d,a,t
vector signed short
vector signed short
Figure 4-10. Saturated Absolute Value of Eight Integer Elements (16-bit)
Element(R)
0
1
2
3
a ABS ABS ABS ABS d
d
a
maps to vsplisb z,0 vsubsws t,z,a vmaxsw d,a,t
vector signed int
vector signed int
Figure 4-11. Saturated Absolute Value of Four Integer Elements (32-bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-11
Generic and Specific AltiVec Operations
vec_add
Vector Add
vec_add
d = vec_add(a,b) Integer add:
n number of elements do i=0 to n-1 di ai + bi end
Floating-point add:
do i=0 to 3 di ai +fp bi end
Each element of a is added to the corresponding element of b. Each sum is placed in the corresponding element of d. For vector float argument types, if VSCR[NJ] = 1, every denormalized operand element is truncated to a 0 of the same sign before the operation is carried out, and each denormalized result element is truncated to a 0 of the same sign. The valid combinations of argument types and the corresponding result types for d = vec_add(a,b) are shown in Figure 4-12, Figure 4-13, Figure 4-14, and Figure 4-15.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
a b
+ + + + + + + + + + + + + + + +
d
d
a vector unsigned char
b vector unsigned char vector bool char vector unsigned char
maps to
vector unsigned char
vector unsigned char vector bool char vector signed char
vaddubm d,a,b vector signed char vector bool char vector signed char vector signed char vector signed char vector bool char
Figure 4-12. Add Sixteen Integer Elements (8-bit)
4-12
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3
4
5
6
7
a b
+ + + + + + + +
d
d
a vector unsigned short
b vector unsigned short vector bool short vector unsigned short
maps to
vector unsigned short
vector unsigned short vector bool short vector signed short
vadduhm d,a,b vector signed short vector bool short vector signed short vector signed short vector signed short vector bool short
Figure 4-13. Add Eight Integer Elements (16-bit)
Element(R)
0
1
2
3
a b + + + + d
d
a vector unsigned int
b vector unsigned int vector bool int vector unsigned int
maps to
vector unsigned int
vector unsigned int vector bool int vector signed int
vadduwm d,a,b vector signed int vector bool int vector signed int vector signed int vector signed int vector bool int
Figure 4-14. Add Four Integer Elements (32-bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-13
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3
a b
+ + + +
d
d vector float
a vector float
b vector float
maps to vaddfp d,a,b
Figure 4-15. Add Four Floating-Point Elements (32-bit)
4-14
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_addc
Vector Add Carryout Unsigned Word
vec_addc
d = vec_addc(a,b)
do i=0 to 3 di = CarryOut(ai + bi) end
Each element of a is added to the corresponding element in b. The carry from each sum is zero-extended and placed into the corresponding element of d. CarryOut (a + b) is 1 if there is a carry, and otherwise 0. The valid argument types and the corresponding result type for d = vec_addc(a,b) are shown in Figure 4-16.
Element(R) 0 1 2 3
a b
+ + + +
33-bit per element (temp)
d
d vector unsigned int
a vector unsigned int
b vector unsigned int
maps to vaddcuw d,a,b
Figure 4-16. Carryout of Four Unsigned Integer Adds (32-bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-15
Generic and Specific AltiVec Operations
vec_adds
Vector Add Saturated
vec_adds
d = vec_adds(a,b)
n number of elements do i=0 to n-1 di Saturate(ai + bi) end
Each element of a is added to the corresponding element of b. If saturation occurs, VSCR[SAT] is set (see Table 4-1). The signed-integer result is placed into the corresponding element of d. The valid combinations of argument types and the corresponding result types for d = vec_adds(a,b) are shown in Figure 4-17, Figure 4-18, and Figure 4-19.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15
a b
+ + + + + + + + + + + + + + + +
d
d
a vector unsigned char
b vector unsigned char vector bool char vector unsigned char vector signed char vector bool char vector signed char
maps to
vector unsigned char
vector unsigned char vector bool char vector signed char
vaddubs d,a,b
vector signed char
vector signed char vector bool char
vaddsbs d,a,b
Figure 4-17. Add Saturating Sixteen Integer Elements (8-bit)
4-16
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3
4
5
6
7 a b
+
+
+
+
+
+
+
+
d
d
a vector unsigned short
b vector unsigned short vector bool short vector unsigned short vector signed short vector bool short vector signed short
maps to
vector unsigned short
vector unsigned short vector bool short vector signed short
vadduhs d,a,b
vector signed short
vector signed short vector bool short
vaddshs d,a,b
Figure 4-18. Add Saturating Eight Integer Elements (16-bit)
Element(R) 0 1 3 a b
+ + + +
2
d
d
a vector unsigned int
b vector unsigned int vector bool int vector unsigned int vector signed int vector bool int vector signed int
maps to
vector unsigned int
vector unsigned int vector bool int vector signed int
vadduws d,a,b
vector signed int
vector signed int vector bool int
vaddsws d,a,b
Figure 4-19. Add Saturating Four Integer Elements (32-bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-17
Generic and Specific AltiVec Operations
vec_and
Vector Logical AND
vec_and
d = vec_and(a,b)
da&b
Each bit of the result is the logical AND of the corresponding bits of a and b. The valid combinations of argument types and the corresponding result types for d = vec_and(a,b) are shown in Figure 4-20.
a b
&
d
d vector unsigned char
a vector unsigned char vector unsigned char vector bool char vector signed char
b vector unsigned char vector bool char vector unsigned char vector signed char vector bool char vector signed char vector bool char vector unsigned short vector bool short vector unsigned short vector signed short vector bool short vector signed short vector bool short vector unsigned int vector bool int vector unsigned int vector signed int vector bool int vector signed int vector bool int vector float vector bool int vector float
maps to
vector signed char vector bool char vector unsigned short
vector signed char vector bool char vector bool char vector unsigned short vector unsigned short vector bool short vector signed short
vector signed short vector bool short vector unsigned int
vector signed short vector bool short vector bool short vector unsigned int vector unsigned int vector bool int vector signed int
vand d,a,b
vector signed int vector bool int vector float
vector signed int vector bool int vector bool int vector bool int vector float vector float
Figure 4-20. Logical Bit-Wise AND
4-18
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_andc
Vector Logical AND with Complement
vec_andc
d = vec_andc(a,b)
d a & Ob
Each bit of the result is the logical AND of the corresponding bit of a and the one's complement of the corresponding bit of b. the valid combinations of argument types and the corresponding result types for d = vec_andc(a,b) are shown in Figure 4-21.
b
temp
a
&
d
Figure 4-21. Logical Bit-Wise AND with Complement
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-19
Generic and Specific AltiVec Operations
d vector unsigned char
a vector unsigned char vector unsigned char vector bool char vector signed char
b vector unsigned char vector bool char vector unsigned char vector signed char vector bool char vector signed char vector bool char vector unsigned short vector bool short vector unsigned short vector signed short vector bool short vector signed short vector bool short vector unsigned int vector bool int vector unsigned int vector signed int vector bool int vector signed int vector bool int vector float vector bool int vector float
maps to
vector signed char vector bool char vector unsigned short
vector signed char vector bool char vector bool char vector unsigned short vector unsigned short vector bool short vector signed short
vector signed short vector bool short vector unsigned int
vector signed short vector bool short vector bool short vector unsigned int vector unsigned int vector bool int vector signed int
vandc d,a,b
vector signed int vector bool int vector float
vector signed int vector bool int vector bool int vector bool int vector float vector float
Figure 4-21. Logical Bit-Wise AND with Complement
4-20
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_avg
Vector Average
vec_avg
d = vec_avg(a,b)
n number of elements do i=0 to n-1 di (ai + bi + 1) / 2 end
Each element of the result is a rounded average of the corresponding elements of a and b. Intermediate calculations are not limited by the element size. The value 1 is added to the sum of elements in a and b to ensure the result is rounded up. The valid combinations of argument types and the corresponding result types for d = vec_avg(a,b) are shown in Figure 4-22, Figure 4-23, and Figure 4-24.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15
a b
+ + + + + + + + + + + + + + + + 8 bits Temp 9 bits
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
Temp
d
d vector unsigned char vector signed char
a vector unsigned char vector signed char
b vector unsigned char vector signed char
maps to vavgub d,a,b vavgsb d,a,b
Figure 4-22. Average Sixteen Integer Elements (8-bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-21
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3
4
5
6
7 a b
17 bits
+
+
+
+
+
+
+
+ 16 bits Temp
+1
+1
+1
+1
+1
+1
+1
+1
Temp
d
d vector unsigned short vector signed short
a vector unsigned short vector signed short
b vector unsigned short vector signed short
maps to vavguh d,a,b vavgsh d,a,b
Figure 4-23. Average Eight Integer Elements (16-bit)
Element(R) 0 1 2 3
a b
+ + + + 32 bits Temp 33 bits
+1
+1
+1
+1
Temp
d
d vector unsigned int vector signed int
a vector unsigned int vector signed int
b vector unsigned int vector signed int
maps to vavguw d,a,b vavgsw d,a,b
Figure 4-24. Average Four Integer Elements (32-bit)
4-22
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_ceil
Vector Ceiling
vec_ceil
d = vec_ceil(a)
do i=0 to 3 di Ceil(ai) end
Each single-precision oating-point element in a is rounded to a single-precision oatingpoint integer using the rounding mode Round toward +Innity, and placed into the corresponding word element of d. If an element ai is innite, the corresponding element di equals ai. If an element ai is nite, the corresponding element di is the smallest represented oating-point value ai. For example, if the oating-point element was 123.45, the resulting integer would be 124. If VSCR[NJ] = 1, every denormalized operand element is truncated to 0 before the operation. The valid argument types and the corresponding result type for d = vec_ceil(a,b) are shown in Figure 4-25.
Element(R) 0 1
2
3 a
Ceil
Ceil
Ceil
Ceil
d
d vector float
a vector float
maps to vrfip d,a
Figure 4-25. Round to Plus Infinity of Four Floating-Point Integer Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-23
Generic and Specific AltiVec Operations
vec_cmpb
Vector Compare Bounds Floating-Point
vec_cmpb
d = vec_cmpb(a,b)
do i=0 to 3 di 0 if ai fp bi then else if ai fp -bi then else end
di[0] 0 di[0] 1 di[1] 0 di[1] 1
Each element in a is compared to the corresponding element in b. The 2-bit result indicates whether the element in a is within the bounds specied by the element in b. Bit 0 of each result is 0 if the element in a is less than or equal to the element in b (i.e., in bounds high), and is 1 otherwise (i.e., out of bounds high). Bit 1 of the 2-bit value is 0 if the element in a is greater than or equal to the negative of the element in b (i.e., in bounds low), and is 1 otherwise (i.e., out of bounds low). The 2-bit result is placed into the high-order two bits (bit 0 and 1) of the corresponding element in d (which correspond to bits 01, 3233, 6465, and 9697 of d, respectively) and the remaining bits are cleared. If any singleprecision oating-point word element in b is negative; the corresponding element in a is out of bounds. If an element in a or b element is a NaN, the two high-order bits of the corresponding result are both 1. If VSCR[NJ] = 1, every denormalized operand element is truncated to 0 before the comparison. The valid argument types and the corresponding result type for d = vec_cmpb(a,b) are shown in Figure 4-26.
Element(R) 0 1
2
3 a b
NEG
NEG
NEG
NEG
-b (temp)
d
0
1
32 33
64 65
96 97
d vector signed int
a vector float
b vector float
maps to vcmpbfp d,a,b
Figure 4-26. Compare Bounds of Four Floating-Point Elements (32-Bit)
4-24 AltiVec Technology Programming Interface Manual MOTOROLA
Generic and Specific AltiVec Operations
vec_cmpeq
Vector Compare Equal
vec_cmpeq
d = vec_cmpeq(a,b) Integer compare equal:
n number of elements m number of bits in an element (128/n) do i=0 to n-1 if ai = bi then di m1 else di m0 end
Floating-point compare equal:
do i=0 to 3 if ai =fp bi then di 321 else di 320 end
Each element of the result is all ones if the corresponding element of a is equal to the corresponding element of b. Otherwise, it returns all zeros. For vector float argument types, if VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid combinations of argument types and the corresponding result types for d = vec_cmpeq(a,b) are shown in Figure 4-27, Figure 4-28, Figure 4-29, and Figure 4-30.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15
a b
= = = = = = = = = = = = = = = =
d
d vector bool char
a vector unsigned char vector signed char
b vector unsigned char vector signed char
maps to vcmpequb d,a,b
Figure 4-27. Compare Equal of Sixteen Integer Elements (8-bits)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-25
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3
4
5
6
7 a b
=
=
=
=
=
=
=
=
d
d vector bool short
a vector unsigned short vector signed short
b vector unsigned short vector signed short
maps to vcmpequh d,a,b
Figure 4-28. Compare Equal of Eight Integer Elements (16-Bit)
Element(R)
0
1
2
3 a b
=
=
=
=
d
d vector bool int
a vector unsigned int vector signed int
b vector unsigned int vector signed int
maps to vcmpequw d,a,b
Figure 4-29. Compare Equal of Four Integer Elements (32-Bit)
Element(R)
0
1
2
3 a b
=
=
=
=
d
d vector bool int
a vector float
b vector float
maps to vcmpeqfp d,a,b
Figure 4-30. Compare Equal of Four Floating-Point Elements (32-Bit)
4-26
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_cmpge
Vector Compare Greater Than or Equal
vec_cmpge
d = vec_cmpge(a,b)
do i=0 to 3 if ai fp bi then di 321 else di 320 end
Each element of the result is all ones if the corresponding element of a is greater than or equal to the corresponding element of b. Otherwise, it returns all zeros. If VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison.
The valid argument types and the corresponding result type for d Figure 4-31.
Element(R) 0 1
= vec_cmpge(a,b) are shown in
2
3 a b
d
d vector bool int
a vector float
b vector float
maps to vcmpgefp d,a,b
Figure 4-31. Compare Greater-Than-or-Equal of Four Floating-Point Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-27
Generic and Specific AltiVec Operations
vec_cmpgt
Vector Compare Greater Than
vec_cmpgt
d = vec_cmpgt(a,b) Integer compare greater than:
n number of elements m number of bits in an element (128/n) do i=0 to n-1 if ai > bi then di m1 else di m0 end
Floating-point compare greater than:
do i=0 to 3 if ai >fp bi then di 321 else di 320 end
Each element of the result is all ones if the corresponding element of a is greater than the corresponding element of b. Otherwise, it returns all zeros. For vector float types, if VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid combinations of argument types and the corresponding result types for = vec_cmpgt(a,b) are shown in Figure 4-32, Figure 4-33, Figure 4-34, and Figure 4-35.
d
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15
a b
> > > > > > > > > > > > > > > >
d
d vector bool char
a vector unsigned char vector signed char
b vector unsigned char vector signed char
maps to vcmpgtub d,a,b vcmpgtsb d,a,b
Figure 4-32. Compare Greater-Than of Sixteen Integer Elements (8-bits)
4-28
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3
4
5
6
7 a b
>
>
>
>
>
>
>
>
d
d vector bool short
a vector unsigned short vector signed short
b vector unsigned short vector signed short
maps to vcmpgtuh d,a,b vcmpgtsh d,a,b
Figure 4-33. Compare Greater-Than of Eight Integer Elements (16-Bit)
Element(R)
0
1
2
3 a b
>
>
>
>
d
d vector bool int
a vector unsigned int vector signed int
b vector unsigned int vector signed int
maps to vcmpgtuw d,a,b vcmpgtsw d,a,b
Figure 4-34. Compare Greater-Than of Four Integer Elements (32-Bit)
Element(R)
0
1
2
3 a b
>fp
>fp
>fp
>fp
d
d vector bool int
a vector float
b vector float
maps to vcmpgtfp d,a,b
Figure 4-35. Compare Greater-Than of Four Floating-Point Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-29
Generic and Specific AltiVec Operations
vec_cmple
Vector Compare Less Than or Equal
vec_cmple
d = vec_cmple(a,b)
do i=0 to 3 if ai fp bi then di 321 else di 320 end
Each element of the result is all ones if the corresponding element of a is less than or equal to the corresponding element of b. Otherwise, it returns all zeros. If VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison.
The valid argument types and the corresponding result type for d = vec_cmple(a,b) are shown in Figure 4-36. It is necessary to use the generic name, since the specific operation vec_vcmpgefp does not reverse its operands.
Element(R) 0 1
2
3 a b
d
d vector bool int
a vector float
b vector float
maps to vcmpgefp d,b,a
Figure 4-36. Compare Less-Than-or-Equal of Four Floating-Point Elements (32-Bit)
4-30
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_cmplt
Vector Compare Less Than
vec_cmplt
d = vec_cmplt(a,b) Integer compare less than:
n number of elements m number of bits in an element (128/n) do i=0 to n-1 if ai < bi then di m1 else di m0 end
Floating-point compare less than:
do i=0 to 3 if ai Each element of the result is all ones if the corresponding element of a is less than the corresponding element of b. Otherwise, it returns all zeros. For vector float types, if VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid combinations of argument types and the corresponding result types for = vec_cmplt(a,b) are shown in Figure 4-37, Figure 4-38, Figure 4-39, and Figure 4-40. It is necessary to use the generic name, since the specic operations do not reverse their operands.
d
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15
a b
< < < < < < < < < < < < < < < <
d
d vector bool char
a vector unsigned char vector signed char
b vector unsigned char vector signed char
maps to vcmpgtub d,b,a vcmpgtsb d,b,a
Figure 4-37. Compare Less-Than of Sixteen Integer Elements (8-bits)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-31
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3
4
5
6
7 a b
<
<
<
<
<
<
<
<
d
d vector bool short
a vector unsigned short vector signed short
b vector unsigned short vector signed short
maps to vcmpgtuh d,b,a vcmpgtsh d,b,a
Figure 4-38. Compare Less-Than of Eight Integer Elements (16-Bit)
Element(R)
0
1
2
3 a b
<
<
<
<
d
d vector bool int
a vector unsigned int vector signed int
b vector unsigned int vector signed int
maps to vcmpgtuw d,b,a vcmpgtsw d,b,a
Figure 4-39. Compare Less-Than of Four Integer Elements (32-Bit)
Element(R)
0
1
2
3 a b
d
d vector bool int
a vector float
b vector float
maps to vcmpgtfp d,b,a
Figure 4-40. Compare Less-Than of Four Floating-Point Elements (32-Bit)
4-32
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_ctf
Vector Convert from Fixed-Point Word
vec_ctf
d = vec_ctf(a,b)
do i=0 to 3 di SIToFP(ai) * 2-b end
Each element of the result is the closest oating-point representation of the number obtained by dividing the corresponding element of a by 2 to the power of b. The operation is independent of VSCR[NJ]. The valid argument types and the corresponding result type for d = vec_ctf(a,b) are shown in Figure 4-41.
Element(R) 0 1
2
3 a
SIToFP
SIToFP
SIToFP
SIToFP
* 2-b
* 2-b
* 2-b
* 2-b d
d vector float
a vector unsigned int vector signed int
b 5-bit unsigned literal 5-bit unsigned literal
maps to vcfux d,a,b vcfsx d,a,b
Figure 4-41. Convert Four Integer Elements to Four Floating-Point Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-33
Generic and Specific AltiVec Operations
vec_cts
Vector Convert to Signed Fixed-Point Word Saturated
vec_cts
d = vec_cts(a,b)
do i=0 to 3 di Saturate(ai * 2b) end
Each element of the result is the saturated signed value obtained after truncating the product of the corresponding element of a and 2 to the power of b. If VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the operation. If saturation occurs, VSCR[SAT] is set (see Table 4-1). The valid argument types and the corresponding result type for d = vec_cts(a,b) are shown in Figure 4-42.
Element(R) 0 1
2
3 a
b *2
b *2
b *2
b *2
Saturate
Saturate
Saturate
Saturate
d
d vector signed int
a vector float
b 5-bit unsigned literal
maps to vctsxs d,a,b
Figure 4-42. Convert Four Floating-Point Elements to Four Saturated Signed Integer Elements (32-Bit)
4-34
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_ctu
Vector Convert to Unsigned Fixed-Point Word Saturated
vec_ctu
d = vec_ctu(a,b)
do i=0 to 3 di Saturate (ai * 2b) end
Each element of the result is the saturated unsigned value obtained after truncating the number obtained by multiplying the corresponding element of a by 2 to the power of b. If VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the operation. If saturation occurs, VSCR[SAT] is set (see Table 4-1). The valid argument types and the corresponding result type for d = vec_ctu(a,b) are shown in Figure 4-43.
Element(R) 0 1
2
3 a
b *2
b *2
b *2
b *2
Saturate
Saturate
Saturate
Saturate
d
d vector unsigned int
a vector float
b 5-bit unsigned literal
maps to vctuxs d,a,b
Figure 4-43. Convert Four Floating-Point Elements to Four Saturated Unsigned Integer Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-35
Generic and Specific AltiVec Operations
vec_dss
Vector Data Stream Stop
vec_dss
vec_dss(a)
DataStreamPrefetchControl OstopO || a
Each operation stops cache touches for the data stream associated with tag a. The result is void. The valid argument type for vec_dss(a) is shown in Table 4-4. The result type is
void.
Table 4-4. vec_dssNVector Data Stream Stop Argument Types
a 2-bit unsigned literal maps to dss a
4-36
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_dssall
Vector Stream Stop All
vec_dssall
vec_dssall()
DataStreamPrefetchControl OstopO
The operation stops cache touches for all data streams. All argument and result types for vec_dssall() are void. vec_dssall maps to the dssall instruction.
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-37
Generic and Specific AltiVec Operations
vec_dst
Vector Data Stream Touch
vec_dst
vec_dst(a,b,c)
addr[0:63] a DataStreamPrefetchControl OstartO || c || 0 || b || addr
Each operation initiates cache touches for loads for the data stream associated with tag c at the address a using the data block in b. The result type is void. The a type may also be a pointer to a const-qualied type. Plain char * is excluded in the mapping for a. The b type is encoded for 32-bit as follows: Block size: b[3:7] if b[3:7] != 0; otherwise 32 Block count: b[8:15] if b[8:15] != 0; otherwise 256 Block stride: b[16:31] if b[16:31] != 0; otherwise 32768
/// 0 23 Block Size 78 Block Count 15 16 Block Stride 31
Figure 4-44. Format of b Type (32-bit)
The b type is encoded for 64-bit as follows: Block size: b[35:39] if b[35:39] != 0; otherwise 32 Block count: b[40:47] if b[40:47] != 0; otherwise 256 Block stride: b[48:63] if b[48:63] != 0; otherwise 32768
/// 32 Block Size 34 35 39 40 Block Count 47 48 Block Stride 63
Figure 4-45. Format of b Type (64-bit)
The c type is a 2-bit unsigned literal tag used to identify a specic data stream. Up to four streams can be set up with this mechanism. The valid combinations of argument types for vec_dst(a,b,c) are shown in Table 4-5. The result type is void.
4-38
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
Table 4-5. vec_dstNVector Data Stream Touch Argument Types
a vector unsigned char * vector signed char * vector bool char * vector unsigned short * vector signed short * vector bool short * vector pixel * vector unsigned int * vector signed int * vector bool int * vector float * unsigned char * signed char * unsigned short * short * unsigned int * int * unsigned int * float * b any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type c 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal dst a,b,c maps to
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-39
Generic and Specific AltiVec Operations
vec_dstst
Vector Data Stream Touch for Store
vec_dstst
vec_dstst(a,b,c)
addr[0:63] a DataStreamPrefetchControl OstartO || 0 || static || b || addr
Each operation initiates cache touches for stores for the data stream associated with tag c at the address a using the data block in b. The result type is void. The a type may also be a pointer to a const-qualied type. Plain char * is excluded in the mapping for a. The b type is encoded for 32-bit as follows: Block size: b[3:7] if b[3:7] != 0; otherwise 32 Block count: b[8:15] if b[8:15] != 0; otherwise 256 Block stride: b[16:31] if b[16:31] != 0; otherwise 32768
/// 0 23 Block Size 78 Block Count 15 16 Block Stride 31
Figure 4-46. Format of b Type (32-bit)
The b type is encoded for 64-bit as follows: Block size: b[35:39] if b[35:39] != 0; otherwise 32 Block count: b[40:47] if b[40:47] != 0; otherwise 256 Block stride: b[48:63] if b[48:63] != 0; otherwise 32768
/// 32 Block Size 34 35 39 40 Block Count 47 48 Block Stride 63
Figure 4-47. Format of b Type (64-bit)
The c type is a 2-bit unsigned literal tag used to identify a specic data stream. Up to four streams can be set up with this mechanism. The valid combinations of argument types for vec_dstst(a,b,c) are shown in Table 4-6. The result type is void.
4-40
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
Table 4-6. vec_dststNVector Data Stream for Touch Store Argument Types
a vector unsigned char * vector signed char * vector bool char * vector unsigned short * vector signed short * vector bool short * vector pixel * vector unsigned int * vector signed int * vector bool int * vector float * unsigned char * signed char * unsigned short * short * unsigned int * int * unsigned int * float * b any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type c 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal dstst a,b,c maps to
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-41
Generic and Specific AltiVec Operations
vec_dststt
Vector Data Stream Touch for Store Transient
vec_dststt
vec_dststt(a,b,c)
addr[0:63] a DataStreamPrefetchControl OstartO || 1 || static || b || addr
Each operation initiates cache touches for transient stores for the data stream associated with tag c at the address a using the data block in b. The result type is void. The a type may also be a pointer to a const-qualied type. Plain char * is excluded in the mapping for a. The b type is encoded for 32-bit as follows: Block size: b[3:7] if b[3:7] != 0; otherwise 32 Block count: b[8:15] if b[8:15] != 0; otherwise 256 Block stride: b[16:31] if b[16:31] != 0; otherwise 32768
/// 0 23 Block Size 78 Block Count 15 16 Block Stride 31
Figure 4-48. Format of b Type (32-bit)
The b type is encoded for 64-bit as follows: Block size: b[35:39] if b[35:39] != 0; otherwise 32 Block count: b[40:47] if b[40:47] != 0; otherwise 256 Block stride: b[48:63] if b[48:63] != 0; otherwise 32768
/// 32 Block Size 34 35 39 40 Block Count 47 48 Block Stride 63
Figure 4-49. Format of b Type (64-bit)
The c type is a 2-bit unsigned literal tag used to identify a specic data stream. Up to four streams can be set up with this mechanism. The valid combinations of argument types for vec_dststt(a,b,c) are shown in Table 4-7. The result type is void.
4-42
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
Table 4-7. vec_dststtNVector Data Stream Touch for Store Transient Argument Types
a vector unsigned char * vector signed char * vector bool char * vector unsigned short * vector signed short * vector bool short * vector pixel * vector unsigned int * vector signed int * vector bool int * vector float * unsigned char * signed char * unsigned short * short * unsigned int * int * unsigned int * float * b any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type c 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal dststt a,b,c maps to
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-43
Generic and Specific AltiVec Operations
vec_dstt
Vector Data Stream Touch Transient
vec_dstt
vec_dstt(a,b,c)
addr[0:63] a DataStreamPrefetchControl OstartO || c || 1 || b || addr
Each operation initiates cache touches for transient loads for the data stream associated with tag c at the address a using the data block in b. The result type is void. The a type may also be a pointer to a const-qualied type. Plain char * is excluded in the mapping for a. The b type is encoded for 32-bit as follows: Block size: b[3:7] if b[3:7] != 0; otherwise 32 Block count: b[8:15] if b[8:15] != 0; otherwise 256 Block stride: b[16:31] if b[16:31] != 0; otherwise 32768
/// 0 23 Block Size 78 Block Count 15 16 Block Stride 31
Figure 4-50. Format of b Type (32-bit)
The b type is encoded for 64-bit as follows: Block size: b[35:39] if b[35:39] != 0; otherwise 32 Block count: b[40:47] if b[40:47] != 0; otherwise 256 Block stride: b[48:63] if b[48:63] != 0; otherwise 32768
/// 32 Block Size 34 35 39 40 Block Count 47 48 Block Stride 63
Figure 4-51. Format of b Type (64-bit)
The c type is a 2-bit unsigned literal tag used to identify a specic data stream. Up to four streams can be set up with this mechanism. The valid combinations of argument types for vec_dstt(a,b,c) are shown in Table 4-8. The result type is void.
4-44
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
Table 4-8. vec_dsttNVector Data Stream Touch Transient Argument Types
a vector unsigned char * vector signed char * vector bool char * vector unsigned short * vector signed short * vector bool short * vector pixel * vector unsigned int * vector signed int * vector bool int * vector float * unsigned char * signed char * unsigned short * short * unsigned int * int * unsigned int * float * b any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type c 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal 2-bit unsigned literal dst a,b,c maps to
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-45
Generic and Specific AltiVec Operations
vec_expte
Vector Is 2 Raised to the Exponent Estimate Floating-Point
vec_expte
d = vec_expte(a)
do i=0 to 3 x di FP2 Est(ai) end
Each element of the result is an estimate of 2 raised to the corresponding element of a. If VSCR[NJ] = 1, every denormalized operand element is truncated to a 0 of the same sign before the operation is carried out, and each denormalized result element is truncated to a 0 of the same sign. The valid argument type and corresponding result type for d = vec_expte(a) are shown in Figure 4-52.
Element(R) 0 1
2
3 a
FP2 Est
x
FP2 Est
x
FP2 Est
x
FP2 Est
x
d
d vector float
a vector float
maps to vexptefp d,a
Figure 4-52. 2 Raised to the Exponent Estimate Floating-Point for Four FloatingPoint Elements (32-Bit)
4-46
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_floor
Vector Floor
vec_floor
d = vec_oor(a)
do i=0 to 3 di Floor(ai) end
Each single-precision oating-point word element in a is rounded to a single-precision oating-point integer using the rounding mode Round towards Innity, and placed into the corresponding word element of d. Each element of the result is thus the largest representable oating-point integer not greater than a. For example, if the oating-point element was 123.85, the resulting integer would be 123. If VSCR[NJ] = 1, every denormalized operand element is truncated to 0 before rounding. The valid argument type and corresponding result type for d = vec_floor(a) are shown in Figure 4-53.
Element(R) 0 1 2 3 a Floor Floor Floor Floor d
d vector float
a vector float
maps to vrfim d,a
Figure 4-53. Round to Minus Infinity of Four Floating-Point Integer Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-47
Generic and Specific AltiVec Operations
vec_ld
Vector Load Indexed
vec_ld
d = vec_ld(a,b)
EA BoundAlign(a+b,16) d MEM(EA,16)
Each operation performs a 16-byte load at a 16-byte aligned address. The a is taken to be an integer value, while b is a pointer. BoundAlign(a+b,16) is the largest value less than or equal to a + b that is a multiple of 16. This load is the one that is generated for a loading dereference of a pointer to a vector type. The b type may also be a pointer to a constqualied type. Plain char * is excluded in the mapping for b. The valid combinations of argument types and the corresponding result types for d = vec_ld(a,b) are shown in Table 4-9.
a b
+
BoundAlign(a+b,16)
Effective Address (EA)
d
Load
Memory Interface MEM(EA,16)
Figure 4-54. Vector Load Indexed Operation
4-48
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
Table 4-9. vec_ldNLoad Vector Indexed Argument Types
d vector unsigned char any integral type any integral type vector signed char any integral type vector bool char vector unsigned short any integral type any integral type vector signed short any integral type vector bool short vector pixel any integral type any integral type any integral type vector unsigned int any integral type any integral type any integral type vector signed int any integral type any integral type vector bool int vector float any integral type float * any integral type any integral type short * vector bool short * lvx d,a,b vector pixel * vector unsigned int * unsigned int* unsigned int * vector signed int * int * int * vector bool int * vector float * unsigned short * vector signed short * any integral type any integral type signed char * vector bool char * vector unsigned short * unsigned char * vector signed char * a any integral type b vector unsigned char * maps to
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-49
Generic and Specific AltiVec Operations
vec_lde
Vector Load Element Indexed
vec_lde
d = vec_lde(a,b)
s 16/(number of elements) EA BoundAlign(a+b,s) i mod(EA,16)/s di MEM(EA,s)
Each operation loads a single element into the position in the vector register corresponding to its address, leaving the remaining elements of the register undened. The a is taken to be an integer value, while b is a pointer. BoundAlign(a+b,s) is the largest value less than or equal to a + b that is a multiple of s, where s is 1 for char pointers, 2 for short pointers, and 4 for int or float pointers. The b type may also be a pointer to a const-qualied type. Plain char * is excluded in the mapping for b. The valid combinations of argument types and the corresponding result types for d = vec_lde(a,b) are shown in Table 4-10.
a
+
b
BoundAlign(a+b,s)
Effective Address (EA)
d
Undefined
di
Undefined
Load
Memory Interface MEM(EA,s)
Example shows byte element load
Figure 4-55. Vector Load Element Indexed Operation Table 4-10. vec_lde(a,b)NVector Load Element Indexed Argument Types
d vector unsigned char vector signed char vector unsigned short vector signed short vector unsigned int any integral type vector signed int vector float any integral type any integral type unsigned int * lvewx d,a,b int * float * a any integral type any integral type any integral type any integral type any integral type b unsigned char * lvebx d,a,b signed char * unsigned short * lvehx d,a,b short * unsigned int * Maps to
4-50
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_ldl
Vector Load Indexed LRU
vec_ldl
d = vec_ldl(a,b)
EA BoundAlign(a+b,16) d MEM(EA,16)
Each operation performs a 16-byte load at a 16-byte aligned address. The a is taken to be an integer value, while b is a pointer. BoundAlign(a+b,16) is the largest value less than or equal to a + b that is a multiple of 16. These operations mark the cache line as least-recentlyused. The b type may also be a pointer to a const-qualied type. Plain char * is excluded in the mapping for b. The valid combinations of argument types and the corresponding result types for d = vec_ldl(a,b) are shown in Table 4-11.
a b
+
BoundAlign(a+b,16)
Effective Address (EA)
d
Load
Memory Interface MEM(EA,16)
Figure 4-56. Vector Load Indexed LRU Operation
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-51
Generic and Specific AltiVec Operations
Table 4-11. vec_ldlNVector Load Indexed LRU Argument Types
d vector unsigned char a any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type b vector unsigned char * unsigned char * vector signed char * signed char * vector bool char * vector unsigned short * unsigned short * vector signed short * short * vector bool short * vector pixel * vector unsigned int * unsigned int * vector signed int * int * vector bool int * vector float * float * lvxl d,a,b Maps to
vector signed char vector bool char vector unsigned short
vector signed short vector bool short vector pixel vector unsigned int
vector signed int vector bool int vector float
4-52
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_loge
Vector Log2 Estimate Floating-Point
vec_loge
d = vec_loge(a)
do i=0 to 3 di FPLog2Est(ai) end
Each element of the result is an estimate of the logarithm to base 2 of the corresponding element of a. If VSCR[NJ] = 1, every denormalized operand element is truncated to a 0 of the same sign before the operation is carried out. The valid argument type and corresponding result type for d = vec_loge(a) are shown in Figure 4-57
Element(R) 0 1 2 3 a FPLog2Est FPLog2Est FPLog2Est FPLog2Est d
d vector float
a vector float
maps to vlogefp d,a
Figure 4-57. Log2 Estimate Floating-Point for Four Floating-Point Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-53
Generic and Specific AltiVec Operations
vec_lvsl
Vector Load for Shift Left
vec_lvsl
d = vec_lvsl(a,b)
EA sh if if if if if if if if if if if if if if if if sh sh sh sh sh sh sh sh sh sh sh sh sh sh sh sh a+b EA[28:31] = 0x0 then = 0x1 then = 0x2 then = 0x3 then = 0x4 then = 0x5 then = 0x6 then = 0x7 then = 0x8 then = 0x9 then = 0xA then = 0xB then = 0xC then = 0xD then = 0xE then = 0xF then d d d d d d d d d d d d d d d d 0x000102030405060708090A0B0C0D0E0F 0x0102030405060708090A0B0C0D0E0F10 0x02030405060708090A0B0C0D0E0F1011 0x030405060708090A0B0C0D0E0F101112 0x0405060708090A0B0C0D0E0F10111213 0x05060708090A0B0C0D0E0F1011121314 0x060708090A0B0C0D0E0F101112131415 0x0708090A0B0C0D0E0F10111213141516 0x08090A0B0C0D0E0F1011121314151617 0x090A0B0C0D0E0F101112131415161718 0x0A0B0C0D0E0F10111213141516171819 0x0B0C0D0E0F101112131415161718191A 0x0C0D0E0F101112131415161718191A1B 0x0D0E0F101112131415161718191A1B1C 0x0E0F101112131415161718191A1B1C1D 0x0F101112131415161718191A1B1C1D1E
Each operation generates a permutation useful for aligning data from an unaligned address. The b type may also be a pointer to a const- or volatile-qualied type. Plain char * is excluded in the mapping for b. The valid combination of argument types and the corresponding result type for d = vec_lvsl(a,b) are shown in Table 4-12.
Table 4-12. vec_lvslNLoad Vector for Shift Left Argument Types
d a any integral type any integral type any integral type vector unsigned char any integral type any integral type any integral type any integral type b unsigned char * signed char * unsigned short * short * unsigned int * int * float * lvsl d,a,b maps to
4-54
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_lvsr
Vector Load Shift Right
vec_lvsr
d = vec_lvsr(a,b)
EA sh if if if if if if if if if if if if if if if if a+b EA[28:31] sh=0x0 then d sh=0x1 then d sh=0x2 then d sh=0x3 then d sh=0x4 then d sh=0x5 then d sh=0x6 then d sh=0x7 then d sh=0x8 then d sh=0x9 then d sh=0xA then d sh=0xB then d sh=0xC then d sh=0xD then d sh=0xE then d sh=0xF then d 0x101112131415161718191A1B1C1D1E1F 0x0F101112131415161718191A1B1C1D1E 0x0E0F101112131415161718191A1B1C1D 0x0D0E0F101112131415161718191A1B1C 0x0C0D0E0F101112131415161718191A1B 0x0B0C0D0E0F101112131415161718191A 0x0A0B0C0D0E0F10111213141516171819 0x090A0B0C0D0E0F101112131415161718 0x08090A0B0C0D0E0F1011121314151617 0x0708090A0B0C0D0E0F10111213141516 0x060708090A0B0C0D0E0F101112131415 0x05060708090A0B0C0D0E0F1011121314 0x0405060708090A0B0C0D0E0F10111213 0x030405060708090A0B0C0D0E0F101112 0x02030405060708090A0B0C0D0E0F1011 0x0102030405060708090A0B0C0D0E0F10
Each operation generates a permutation useful for aligning data from an unaligned address. The b type may also be a pointer to a const- or volatile-qualied type. Plain char * is excluded in the mapping for b. The valid combinations of argument types and the corresponding result type for d = vec_lvsr(a,b) are shown in Table 4-13.
Table 4-13. vec_lvsrNVector Load for Shift Right Argument Types
d a any integral type any integral type any integral type vector unsigned char any integral type any integral type any integral type any integral type b unsigned char * signed char * unsigned short * short * unsigned int * int * float * lvsr d,a,b Maps to
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-55
Generic and Specific AltiVec Operations
vec_madd
Vector Multiply Add
vec_madd
d = vec_madd(a,b,c)
do i=0 to 3 di RndToFPNearest(ai * bi + ci) end
Each element of the result is the sum of the corresponding element of c and the product of the corresponding elements of a and b. If VSCR[NJ] = 1, every denormalized operand element is truncated to a 0 of the same sign before the operation is carried out, and each denormalized result element truncates to a 0 of the same sign. The valid argument types and the corresponding result type for d = vec_madd(a,b,c) are shown in Figure 4-58
Element(R) 0 1 2 3 a b * * * * Prod c + RndToFPNearest + RndToFPNearest + RndToFPNearest + RndToFPNearest d
d vector float
a vector float
b vector float
c vector float
maps to vmaddfp d,a,b,c
Figure 4-58. Multiply-Add Four Floating-Point Elements (32-Bit)
4-56
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_madds
Vector Multiply Add Saturated
vec_madds
d = vec_madds(a,b,c)
do i=0 to 7 di Saturate((ai * bi)/215 + ci) end
Each element of the result is the 16-bit saturated sum of the corresponding element of c and the high-order 17 bits of the product of the corresponding elements of a and b. If saturation occurs, VSCR[SAT] is set (see Table 4-1). The valid argument types and the corresponding result type for d = vec_madds(a,b,c) are shown in Figure 4-59.
Element(R) 0 1 2 3 4 5 6 7 a b * * * * * * * * Prod 16 S + S + S + 16 Temp d S + S + 17 S + S + S + c
d vector signed short
a vector signed short
b vector signed short
c vector signed short
maps to vmhaddshs d,a,b,c
Figure 4-59. Multiply-Add Four Floating-Point Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-57
Generic and Specific AltiVec Operations
vec_max
Vector Maximum
vec_max
d = vec_max(a,b)
n number of elements do i=0 to n-1 di MAX(ai,bi) end
Each element of the result is the larger of the corresponding elements of a and b. For vector float argument types, if VSCR[NJ] is set, every denormalized operand element is truncated to a 0 of the same sign before the operation is carried out, and each denormalized result element truncates to a 0 of the same sign. The maximum of +0.0 and 0.0 is +0.0. The maximum of any value and a NaN is a QNaN. The valid combinations of argument types and the corresponding result types for d = vec_max(a,b) are shown in Figure 4-60, Figure 4-61, Figure 4-62, and Figure 4-63.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15
a
b MAX MAX MAX MAX MAX MAX MAX MAX MAX MAX MAX MAX MAX MAX MAX MAX d
d
a vector unsigned char
b vector unsigned char vector bool char vector unsigned char vector signed char vector bool char vector signed char
maps to
vector unsigned char
vector unsigned char vector bool char vector signed char
vmaxub d,a,b
vector signed char
vector signed char vector bool char
vmaxsb d,a,b
Figure 4-60. Maximum of Sixteen Integer Elements (8-Bit)
4-58
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3
4
5
6
7 a
b MAX MAX MAX MAX MAX MAX MAX MAX d
d
a vector unsigned short
b vector unsigned short vector bool short vector unsigned short vector signed short vector bool short vector signed short
maps to
vector unsigned short
vector unsigned short vector bool short vector signed short
vmaxuh d,a,b
vector signed short
vector signed short vector bool short
vmaxsh d,a,b
Figure 4-61. Maximum of Eight Integer Elements (16-bit)
Element(R) 0 1 2 3 a
b MAX MAX MAX MAX d
d
a vector unsigned int
b vector unsigned int vector bool int vector unsigned int vector signed int vector bool int vector signed int
maps to
vector unsigned int
vector unsigned int vector bool int vector signed int
vmaxuw d,a,b
vector signed int
vector signed int vector bool int
vmaxsw d,a,b
Figure 4-62. Maximum of Four Integer Elements (32-bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-59
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3 a
b MAX MAX MAX MAX d
d vector float
a vector float
b vector float
maps to vmaxfp d,a,b
Figure 4-63. Maximum of Four Floating-Point Elements (32-bit)
4-60
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_mergeh
Vector Merge High
vec_mergeh
d = vec_mergeh(a,b)
m (number of elements)/2 do i=0 to m-1 d2i ai d2i+1 bi end
The even elements of the result are obtained left-to-right from the high elements of a. The odd elements of the result are obtained left-to-right from the high elements of b. The valid combinations of argument types and the corresponding result types for d = vec_mergeh(a,b) are shown in Figure 4-64, Figure 4-65, and Figure 4-66.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15
a
b
d
d vector unsigned char vector signed char vector bool char
a vector unsigned char vector signed char vector bool char
b vector unsigned char vector signed char vector bool char
maps to vmrghb d,a,b
Figure 4-64. Merge Eight High-Order Elements (8-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-61
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3
4
5
6
7 a
b
d
d vector unsigned short vector signed short vector bool short vector pixel
a vector unsigned short vector signed short vector bool short vector pixel
b vector unsigned short vector signed short vector bool short vector pixel
maps to
vmrghh d,a,b
Figure 4-65. Merge Four High-Order Elements (16-bit)
Element(R) 0 1 2 3 a
b
d
d vector unsigned int vector signed int vector bool int vector float
a vector unsigned int vector signed int vector bool int vector float
b vector unsigned int vector signed int vector bool int vector float
maps to
vmrghw d,a,b
Figure 4-66. Merge Two High-Order Elements (32-bit)
4-62
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_mergel
Vector Merge Low
vec_mergel
d = vec_mergel(a,b)
m (number of elements)/2 do i=0 to m-1 d2i ai+m d2i+1 bi+m end
The even elements of the result are obtained left-to-right from the low elements of a. The odd elements of the result are obtained left-to-right from the low elements of b. The valid combinations of argument types and the corresponding result types for d = vec_mergel(a,b) are shown in Figure 4-67, Figure 4-68, and Figure 4-69.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15
a
b
d
d vector unsigned char vector signed char vector bool char
a vector unsigned char vector signed char vector bool char
b vector unsigned char vector signed char vector bool char
maps to vmrglb d,a,b
Figure 4-67. Merge Eight Low-Order Elements (8-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-63
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3
4
5
6
7 a
b
d
d vector unsigned short vector signed short vector bool short vector pixel
a vector unsigned short vector signed short vector bool short vector pixel
b vector unsigned short vector signed short vector bool short vector pixel
maps to
vmrglh d,a,b
Figure 4-68. Merge Four Low-Order Elements (16-bit)
Element(R) 0 1 2 3 a
b
d
d vector unsigned int vector signed int vector bool int vector float
a vector unsigned int vector signed int vector bool int vector float
b vector unsigned int vector signed int vector bool int vector float
maps to
vmrglw d,a,b
Figure 4-69. Merge Two Low-Order Elements (32-bit)
4-64
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_mfvscr
Vector Move from Vector Status and Control Register
vec_mfvscr
d = vec_mfvscr
d
960
|| (VSCR)
VCSR
0
0
0
0
0
0
d
Figure 4-70. Vector Move from VSCR Table 4-14. Vector Move from Vector Status and Control Registers Argument Type and Mapping
d vector unsigned short Maps to mfvscr
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-65
Generic and Specific AltiVec Operations
vec_min
Vector Minimum
vec_min
d = vec_min(a,b)
n number of elements do i=0 to n-1 di MIN(ai,bi) end
Each element of the result is the smaller of the corresponding elements of a and b. For vector float argument types, if VSCR[NJ] is set, every denormalized operand element is truncated to a 0 of the same sign before the operation is carried out, and each denormalized result element truncates to a 0 of the same sign. The minimum of +0.0 and 0.0 is 0.0. The minimum of any value and a NaN is a QNaN. The valid combinations of argument types and the corresponding result types for d = vec_min(a,b) are shown in Figure 4-71, Figure 4-72, Figure 4-73, and Figure 4-74.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15
a b
MIN MIN MIN MIN MIN MIN MIN MIN MIN MIN MIN MIN MIN MIN MIN MIN
d
d vector unsigned char
a vector unsigned char vector unsigned char vector bool char vector signed char
b vector unsigned char vector bool char vector unsigned char vector signed char vector bool char vector signed char
maps to vminub d,a,b
vector signed char
vector signed char vector bool char
vminsb d,a,b
Figure 4-71. Minimum of Sixteen Integer Elements (8-Bit)
4-66
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3
4
5
6
7 a b
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
d
d vector unsigned short
a vector unsigned short vector unsigned short vector bool short vector signed short
b vector unsigned short vector bool short vector unsigned short vector signed short vector bool short vector signed short
maps to vminuh d,a,b
vector signed short
vector signed short vector bool short
vminsh d,a,b
Figure 4-72. Minimum of Eight Integer Elements (16-bit)
Element(R) 0 1 2 3
a
b MIN MIN MIN MIN d
d vector unsigned int
a vector unsigned int vector unsigned int vector bool int vector signed int
b vector unsigned int vector bool int vector unsigned int vector signed int vector bool int vector signed int
maps to vminuw d,a,b
vector signed int
vector signed int vector bool int
vminsw d,a,b
Figure 4-73. Minimum of Four Integer Elements (32-bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-67
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3
a
b MINfp MINfp MINfp MINfp d
d vector float
a vector float
b vector float
maps to vminfp d,a,b
Figure 4-74. Minimum of Four Floating-Point Elements (32-bit)
4-68
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_mladd
Vector Multiply Low and Add Unsigned Half Word
vec_mladd
d = vec_mladd(a,b,c)
do i=0 to 7 di (ai * bi) + ci end
Each element of the result is the low-order 16 bits of the sum of the corresponding element of c and the product of the corresponding elements of a and b. The valid combinations of argument types and the corresponding result types for d = vec_mladd(a,b) are shown in Figure 4-75.
Element(R) 0 1 2 3 4 5 6 7 a
b * * * * * * * * Prod c + + + + + + + + Temp
d
d vector unsigned short
a vector unsigned short vector unsigned short
b vector unsigned short vector signed short vector unsigned short vector signed short
c vector unsigned short vector signed short
maps to
vmladduhm d,a,b,c vector unsigned short vector signed short
vector signed short
vector signed short vector signed short
Figure 4-75. Multiply-Add of Eight Integer Elements (16-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-69
Generic and Specific AltiVec Operations
vec_mradds
Vector Multiply Round and Add Saturated
vec_mradds
d = vec_mradds(a,b,c)
do i=0 to 7 di Saturate((ai * bi + 214)/215 + ci) end
Each element of the result is the 16-bit saturated sum of the corresponding element of c and the high-order 17 bits of the rounded product of the corresponding elements of a and b. If saturation occurs, VSCR[SAT] is set (see Table 4-1). The valid argument types and the corresponding result type for d = vec_mradds(a,b,c) are shown in Figure 4-76.
Element(R) 0 1 2 3 4 5 6 7 a b
*
*
*
*
*
*
*
*
Prod
Temp
c
+
+
+
+
+
+
+
+
Saturate
Saturate
Saturate
Saturate
Saturate
Saturate
Saturate
Saturate
d
d vector signed short
a vector signed short
b vector signed short
c vector signed short
maps to vmhraddshs d,a,b,c
Figure 4-76. Multiply-Add of Eight Integer Elements (16-Bit)
4-70
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_msum
Vector Multiply Sum
vec_msum
d = vec_msum(a,b,c) For Multiply Sum of Sixteen 8-bit elements
do i=0 to 3 di (a4i * b4i) + (a4i+1 * b4i+1) + (a4i+2 * b4i+2) + (a4i+3 * b4i+3) +ci end
For Multiply Sum of Eight 16-bit elements
do i=0 to 3 di (a2i * b2i) + (a2i+1 * b2i+1) +ci end
Each element of the result is the sum of the corresponding element of c and the products of the elements of a and b which overlap the positions of that element of c. For vec_msum, the sum is performed with 32-bit modular addition. The valid combinations of argument types and the corresponding result types for d = vec_msum(a,b,c) are shown in Figure 4-77 and Figure 4-78.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15
a b
* * * * * * * * * * * * * * * *
Prod
c
+ + + +
d
d vector unsigned int vector signed int
a vector unsigned char vector signed char
b vector unsigned char vector unsigned char
c vector unsigned int vector signed int
maps to vmsumubm d,a,b,c vmsummbm d,a,b,c
Figure 4-77. Multiply Sum of Sixteen Integer Elements (8-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-71
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3
4
5
6
7 a b
*
*
*
*
*
*
*
*
Prod
c
+ + + +
d
d vector unsigned int vector signed int
a vector unsigned short vector signed short
b vector unsigned short vector signed short
c vector unsigned int vector signed int
maps to vmsumuhm d,a,b,c vmsumshm d,a,b,c
Figure 4-78. Multiply Sum of Eight Integer Elements (16-Bit)
4-72
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_msums
Vector Multiply Sum Saturated
vec_msums
d = vec_msums(a,b,c)
do i=0 to 3 di Saturate((a2i * b2i) + (a2i+1 * b2i+1) + ci) end
Each element of the result is the sum of the corresponding element of c and the products of the elements of a and b which overlap the positions of that element of c. The sum is performed with 32-bit saturating addition. If saturation occurs, VSCR[SAT] is set (see Table 4-1). The valid combinations of argument types and the corresponding result types for d = vec_msums(a,b,c) are shown in Figure 4-79.
Element(R) 0 1 2 3 4 5 6 7 a b
* * * * * * * *
Prod
c
+ + + +
d
d vector unsigned int vector signed int
a vector unsigned short vector signed short
b vector unsigned short vector signed short
c vector unsigned int vector signed int
maps to vmsumuhs d,a,b,c vmsumshs d,a,b,c
Figure 4-79. Multiply-Sum of Integer Elements (16-Bit to 32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-73
Generic and Specific AltiVec Operations
vec_mtvscr
Vector Move to Vector Status and Control Register
vec_mtvscr
vec_mtvscr(a)
VSCR a[96:127]
The VSCR is set by the elements in a which occupy the last 32 bits. The result is void.
a VCSR
Figure 4-80. Vector Move to VSCR
Refer to the description of vec_mfvscr for a detailed description of the VSCR (see Figure 4-1). The valid argument types for vec_mtvscr(a) are shown in Table 4-15. The result type is void.
Table 4-15. vec_mtvscrNVector Move to Vector Status and Control Register Argument Types
a vector unsigned char vector signed char vector bool char vector unsigned short vector signed short vector bool short vector pixel vector unsigned int vector signed int vector bool int mtvscr a Maps to
4-74
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_mule
Vector Multiply Even
vec_mule
d = vec_mule(a,b)
n number of elements in d do i=0 to n-1 di a2i * b2i end
Each element of the result is the product of the corresponding high half-width elements of a and b. The odd elements of a and b are ignored. The valid combinations of argument types and the corresponding result types for d = vec_mule(a,b) are shown in Figure 4-81 and Figure 4-82.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15
a b
* * * * * * * *
d
d vector unsigned short vector signed short
a vector unsigned char vector signed char
b vector unsigned char vector signed char
maps to vmuleub d,a,b vmulesb d,a,b
Figure 4-81. Even Multiply of Eight Integer Elements (8-Bit)
.
Element(R)
0
1
2
3
4
5
6
7 a b
*
*
*
*
d
d vector unsigned int vector signed int
a vector unsigned short vector signed short
b vector unsigned short vector signed short
maps to vmuleuh d,a,b vmulesh d,a,b
Figure 4-82. Even Multiply of Four Integer Elements (16-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-75
Generic and Specific AltiVec Operations
vec_mulo
Vector Multiply Odd
vec_mulo
d = vec_mulo(a,b)
n number of elements in d do i=0 to n-1 di a2i+1 * b2i+1 end
Each element of the result is the product of the corresponding low half-width elements of a and b. The even elements of a and b are ignored. The valid combinations of argument types and the corresponding result types for d = vec_mulo(a,b) are shown in Figure 4-83 and Figure 4-84.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15
a b
* * * * * * * *
d
d vector unsigned short vector signed short
a vector unsigned char vector signed char
b vector unsigned char vector signed char
maps to vmuloub d,a,b vmulosb d,a,b
Figure 4-83. Odd Multiply of Eight Integer Elements (8-Bit)
.
Element(R)
0
1
2
3
4
5
6
7 a b
*
*
*
*
d
d vector unsigned int vector signed int
a vector unsigned short vector signed short
b vector unsigned short vector signed short
maps to vmulouh d,a,b vmulosh d,a,b
Figure 4-84. Odd Multiply of Four Integer Elements (16-Bit)
4-76
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_nmsub
Vector Negative Multiply Subtract
vec_nmsub
d = vec_nmsub(a,b,c)
do i=0 to 3 di -RndToFPNearest(ai * bi - ci) end
Each element of the result is the negative of the difference of the corresponding element of c and the product of the corresponding elements of a and b. For vector float argument types, if VSCR[NJ] is set, every denormalized operand element is truncated to a 0 of the same sign before the operation is carried out, and each denormalized result element truncates to a 0 of the same sign. The valid argument types and the corresponding result type for d = vec_nmsub(a,b,c) are shown in Figure 4-85.
Element(R) 0 1 2 3
a
b
*
*
*
*
Prod
c
-
-
-
-
Temp
-RndToFPNearest
-RndToFPNearest
-RndToFPNearest
-RndToFPNearest
d
d vector float
a vector float
b vector float
c vector float
maps to vnmsubfp d,a,b,c
Figure 4-85. Negative Multiply-Subtract of Four Floating-Point Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-77
Generic and Specific AltiVec Operations
vec_nor
Vector Logical NOR
vec_nor
d = vec_nor(a,b)
d O (a | b)
Each bit of the result is the logical NOR of the corresponding bits of a and b. The valid combinations of argument types and the corresponding result types for d = vec_nor(a,b) are shown in Figure 4-86.
a b
|
Temp
A
d
d vector unsigned char vector signed char vector bool char vector unsigned short vector signed short vector bool short vector unsigned int vector signed int vector bool int vector float
a vector unsigned char vector signed char vector bool char vector unsigned short vector signed short vector bool short vector unsigned int vector signed int vector bool int vector float
b vector unsigned char vector signed char vector bool char vector unsigned short vector signed short vector bool short vector unsigned int vector signed int vector bool int vector float
maps to
vnor d,a,b
Figure 4-86. Logical Bit-Wise NOR
4-78
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_or
Vector Logical OR
vec_or
d = vec_or(a,b)
da|b
Each bit of the result is the logical OR of the corresponding bits of a and b. The valid combinations of argument types and the corresponding result types for d = vec_or(a,b) are shown in Figure 4-87.
a b
|
d
d vector unsigned char
a vector unsigned char vector unsigned char vector bool char vector signed char
b vector unsigned char vector bool char vector unsigned char vector signed char vector bool char vector signed char vector bool char vector unsigned short vector bool short vector unsigned short vector signed short vector bool short vector signed short vector bool short vector unsigned int vector bool int vector unsigned int vector signed int vector bool int vector signed int vector bool int vector float vector bool int vector float
maps to
vector signed char vector bool char vector unsigned short
vector signed char vector bool char vector bool char vector unsigned short vector unsigned short vector bool short vector signed short
vector signed short vector bool short vector unsigned int
vector signed short vector bool short vector bool short vector unsigned int vector unsigned int vector bool int vector signed int
vor d,a,b
vector signed int vector bool int vector float
vector signed int vector bool int vector bool int vector bool int vector float vector float
Figure 4-87. Logical Bit-Wise OR
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-79
Generic and Specific AltiVec Operations
vec_pack
Vector Pack
vec_pack
d = vec_pack(a,b)
n number of elements in a s element size in d (64/n) do i=0 to n-1 di UIToUImod(ai,s) di+n UIToUImod(bi,s) end
Each high element of the result is the truncation of the corresponding wider element of a. Each low element of the result is the truncation of the corresponding wider element of b. The valid combinations of argument types and the corresponding result types for d = vec_pack(a,b) are shown in Figure 4-88 and Figure 4-89.
Element(R) 0 1 2 3 4 5 6 7 Element(R) 0 1 2 3 4 5 6 7
a d
b
d vector unsigned char vector signed char vector bool char
a vector unsigned short vector signed short vector bool short
b vector unsigned short vector signed short vector bool short
maps to vpkuhum d,a,b
Figure 4-88. Pack Sixteen Unsigned Integer Elements (16-Bit) to Sixteen Unsigned Integer Elements (8-Bit)
.
Element(R) 0 1 2 3
Element(R) 0 1 2 3
a d
b
d vector unsigned short vector signed short vector bool short
a vector unsigned int vector signed int vector bool int
b vector unsigned int vector signed int vector bool int
maps to vpkuwum d,a,b
Figure 4-89. Pack Eight Unsigned Integer Elements (32-Bit) to Eight Unsigned Integer Elements (16-Bit)
4-80
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_packpx
Vector Pack Pixel
vec_packpx
d = vec_packpx(a,b)
do i=0 to 3 di ai[7] || ai[8:12] || ai[16:20] || ai[24:28] bi[7] || bi[8:12] || bi[16:20] || bi[24:28] di+4 end
Each high element of the result is the packed pixel from the corresponding wider element of a. Each low element of the result is the packed pixel from the corresponding wider element of b. Programming note: Each source word can be considered to be a 32-bit pixel consisting of four 8-bit channels. Each target half-word can be considered to be a 16-bit pixel consisting of one 1-bit channel and three 5-bit channels. A channel can be used to specify the intensity of a particular color, such as red, green, or blue, or to provide other information needed by the application. The usual transformation from a 32-bit pixel to a 16-bit pixel uses the most signicant bit of the 8-bit intensity channel. This operation uses the least signicant bit. To use the most signicant bit, rst perform the following operation:
(vector unsigned int) vec_rl ((vector unsigned char) a, (vector unsigned char) (1,0,0,0,1,0,0,0, 1,0,0,0,1,0,0,0))
on each input a and b. The valid argument types and the corresponding result type for d = vec_packpx(a,b) are shown in Figure 4-90..
Elements> 0 1 2 3 Elements> 0 1 2 3
a d Elements> 0 1 2 3 4 5 6 7
b
d vector pixel
a vector unsigned int
b vector unsigned int
maps to vpkpx d,a,b
Figure 4-90. Pack Eight Pixel Elements (32-Bit) to Eight Elements (16-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-81
Generic and Specific AltiVec Operations
vec_packs
Vector Pack Saturated
vec_packs
d = vec_packs(a,b)
n number of elements in a do i=0 to n-1 di Saturate(ai) di+n Saturate(bi) end
Each high element of the result is the saturated value of the corresponding wider element of a. Each low element of the result is the saturated value of the corresponding wider element of b. If saturation occurs, VSCR[SAT] is set (see Table 4-1). The valid combinations of argument types and the corresponding result types for d = vec_packs(a,b) are shown in Figure 4-91 and Figure 4-92.
Element(R) 0 1 2 3 4 5 6 7 Element(R) 0 1 2 3 4 5 6 7
a d
b
d vector unsigned char vector signed char
a vector unsigned short vector signed short
b vector unsigned short vector signed short
maps to vpkuhus d,a,b vpkshss d,a,b
Figure 4-91. Pack Sixteen Integer Elements (16-Bit) to Sixteen Integer Elements (8-Bit)
.
Element(R) 0 1 2 3
Element(R) 0 1 2 3
a d
b
d vector unsigned short vector signed short
a vector unsigned int vector signed int
b vector unsigned int vector signed int
maps to vpkuwus d,a,b vpkswss d,a,b
Figure 4-92. Pack Eight Integer Elements (32-Bit) to Eight Integer Elements (16-Bit)
4-82
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_packsu
Vector Pack Saturated Unsigned
vec_packsu
d = vec_packsu(a,b)
n number of elements in a do i=0 to n-1 di Saturate(ai) di+n Saturate(bi) end
Each high element of the result is the saturated value of the corresponding wider element of a. Each low element of the result is the saturated value of the corresponding wider element of b. If saturation occurs, VSCR[SAT] is set (see Table 4-1). The result elements are all unsigned. The valid combinations of argument types and the corresponding result types for d = vec_packsu(a,b) are shown in Figure 4-93 and Figure 4-94.
Element(R) 0 1 2 3 4 5 6 7 Element(R) 0 1 2 3 4 5 6 7
a d
b
d vector unsigned char vector unsigned char
a vector unsigned short vector signed short
b vector unsigned short vector signed short
maps to vpkuhus d,a,b vpkshus d,a,b
Figure 4-93. Pack Sixteen Integer Elements (16-Bit) to Sixteen Unsigned Integer Elements (8-Bit)
.
Element(R) 0 1 2 3
Element(R) 0 1 2 3
a d
b
d vector unsigned short vector unsigned short
a vector unsigned int vector signed int
b vector unsigned int vector signed int
maps to vpkuwus d,a,b vpkswus d,a,b
Figure 4-94. Pack Eight Integer Elements (32-Bit) to Eight Unsigned Integer Elements (16-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-83
Generic and Specific AltiVec Operations
vec_perm
Vector Permute
vec_perm
d = vec_perm(a,b,c)
do i=0 to 15 j c{i}[4:7] if c{i}[3] = 0 then d{i} a{j} else d{i} b{j} end
Each element of the result is selected independently by indexing the byte elements of a and b by the value of the corresponding element of c. For example, 0x1C in c selects byte 12 in b. The value 0x0C selects byte 12 in a. The valid combinations of argument types and the corresponding result types for d = vec_perm(a,b,c) are shown in Figure 4-95.
Element(R) 0 01 00 10 1 14 01 11 2 18 02 12 3 10 03 13 4 16 04 14 5 15 05 15 6 19 06 16 7 1A 07 17 8 1C 08 18 9 1C 09 19 10 1C 0A 1A 11 13 0B 1B 12 08 0C 1C 13 1D 0D 1D 14 1B 0E 1E 15 0E 0F 1F c
a b
d
d vector unsigned char vector signed char vector bool char vector unsigned short vector signed short vector bool short vector pixel vector unsigned int vector signed int vector bool int vector float
a vector unsigned char vector signed char vector bool char vector unsigned short vector signed short vector bool short vector pixel vector unsigned int vector signed int vector bool int vector float
b vector unsigned char vector signed char vector bool char vector unsigned short vector signed short vector bool short vector pixel vector unsigned int vector signed int vector bool int vector float
c vector unsigned char vector unsigned char vector unsigned char vector unsigned char vector unsigned char vector unsigned char vector unsigned char vector unsigned char vector unsigned char vector unsigned char vector unsigned char
maps to
vperm d,a,b,c
Figure 4-95. Permute Sixteen Integer Elements (8-Bit)
4-84
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_re
Vector Reciprocal Estimate
vec_re
d = vec_re(a)
do i=0 to 3 di FPRecipEst(ai) end
Each element of the result d is an estimate of the reciprocal to the corresponding element of a. For results that are not a +0, 0, +, , or QNaN, the estimate has a relative error in precision no greater than one part in 4096, that is:
estimate 1 x ----------------------------------------1x 1 -----------4096
where x is the value of the element in a. Note that the value placed into the element of d may vary between implementations, and between different executions on the same implementation. Operation with various special values of the element in a is summarized below.
Table 4-16. Special Value Results of Reciprocal Estimates
a - -0 +0 + NaN d -0 - + +0 QNaN
If VSCR[NJ] = 1, every denormalized operand element is truncated to a 0 of the same sign before the operation is carried out, and each denormalized result element truncates to a 0 of the same sign. The valid argument type and corresponding result type for d = vec_re(a) are shown in Figure 4-96.
Element(R) 0 1 2 3 a FPRecipEst FPRecipEst FPRecipEst FPRecipEst d
d vector float
a vector float
maps to vrefp d,a
Figure 4-96. Reciprocal Estimate of Four Floating-Point Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-85
Generic and Specific AltiVec Operations
vec_rl
Vector Rotate Left
vec_rl
d = vec_rl(a,b)
n number of elements do i=0 to n-1 di ROTL(ai, bi) end
Each element of the result is the result of rotating left the corresponding element of a by the number of bits indicated by the corresponding element of b. The valid combinations of argument types and the corresponding result types for d = vec_rl(a,b) are shown in Figure 4-97, Figure 4-98, and Figure 4-99.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a b d
d vector unsigned char vector signed char
a vector unsigned char vector signed char
b vector unsigned char vector unsigned char
maps to vrlb d,a,b
Figure 4-97. Left Rotate of Sixteen Integer Elements (8-Bit)
Element(R)
0
1
2
3
4
5
6
7 a b d
d vector unsigned short vector signed short
a vector unsigned short vector signed short
b vector unsigned short vector unsigned short
maps to vrlh d,a,b
Figure 4-98. Left Rotate of Eight Integer Elements (16-bit)
4-86
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3 a b d
d vector unsigned int vector signed int
a vector unsigned int vector signed int
b vector unsigned int vector unsigned int
maps to vrlw d,a,b
Figure 4-99. Left Rotate of Four Integer Elements (32-bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-87
Generic and Specific AltiVec Operations
vec_round
Vector Round
vec_round
d = vec_round(a)
do i=0 to 3 di RndToFPINear(ai) end
Each element of the result is the nearest representable single-precision oating-point integer to the corresponding element of a, using IEEE Round-to-Nearest mode. If the integers are equally near, rounding is to the even integer. The operation is independent of VSCR[NJ]. The valid argument type and corresponding result type for d = vec_round(a) are shown in Figure 4-100.
Element(R) 0 1 2 3 a RndToFPINear RndToFPINear RndToFPINear RndToFPINear d
d vector float
a vector float
maps to vrfin d,a
Figure 4-100. Round to Nearest of Four Floating-Point Integer Elements (32-Bit)
4-88
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_rsqrte
Vector Reciprocal Square Root Estimate
vec_rsqrte
d = vec_rsqrte(a)
do i=0 to 3 di RecipSQRTEst(ai) end
Each element of the result is an estimate of the reciprocal square root of the corresponding element of a. The single-precision estimate of the reciprocal of the square root of each single-precision element in a is placed into the corresponding word element of d. The estimate has a relative error in precision no greater than one part in 4096, that is:
estimate 1 x -----------------------------------------------
1 x
1 -----------4096
where x is the value of the element in a. The value placed into the element of d may vary between implementations and between different executions on the same implementation. If VSCR[NJ] = 1, every denormalized operand element is truncated to a 0 of the same sign before the operation is carried out, and each denormalized result element truncates to a 0 of the same sign. Operation with various special values of the element in a is summarized below.
Table 4-17. Special Value Results of Reciprocal Square Root Estimates
a - less than 0 -0 +0 + NaN d QNaN QNaN - + +0 QNaN
The valid argument type and corresponding result type for d = vec_rsqrte(a) are shown in Figure 4-101.
Element(R) 0 1 2 3 a RecipSQRTEst RecipSQRTEst RecipSQRTEst RecipSQRTEst d
d vector float
a vector float
maps to vrsqrtefp d,a
Figure 4-101. Reciprocal Square Root Estimate of Four Floating-Point Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-89
Generic and Specific AltiVec Operations
vec_sel
Vector Select
vec_sel
d = vec_sel(a,b,c)
do i=0 to 127 if ci=0 then d[i] a[i] else d[i] b[i] end
Each bit of the result is the corresponding bit of a if the corresponding bit of c is 0. Otherwise, it is the corresponding bit of b. The valid combinations of argument types and the corresponding result types for d = vec_sel(a,b,c) are shown in Figure 4-102.
***********
a b c
***********
01001100
***********
************ ***********
d
d vector unsigned char vector signed char vector bool char vector unsigned short vector signed short vector bool short vector unsigned int vector signed int vector bool int vector float
a vector unsigned char vector unsigned char vector signed char vector signed char vector bool char vector bool char vector unsigned short vector unsigned short vector signed short vector signed short vector bool short vector bool short vector unsigned int vector unsigned int vector signed int vector signed int vector bool int vector bool int vector float vector float
b vector unsigned char vector unsigned char vector signed char vector signed char vector bool char vector bool char vector unsigned short vector unsigned short vector signed short vector signed short vector bool short vector bool short vector unsigned int vector unsigned int vector signed int vector signed int vector bool int vector bool int vector float vector float
c vector unsigned char vector bool char vector unsigned char vector bool char vector unsigned char vector bool char vector unsigned short vector bool short vector unsigned short vector bool short vector unsigned short vector bool short vector unsigned int vector bool int vector unsigned int vector bool int vector unsigned int vector bool int vector unsigned int vector bool int
maps to
vsel d,a,b,c
Figure 4-102. Bit-Wise Conditional Select of Vector Contents (128-bit)
4-90
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_sl
Vector Shift Left
vec_sl
d = vec_sl(a,b)
n number of elements s 128/n do i=0 to n-1 di ShiftLeft(ai,mod(bi,s)) end
Each element in d is the result of shifting the corresponding element of a left by the number of bits of the corresponding element of b. The valid combinations of argument types and the corresponding result types for d = vec_sl(a,b) are shown in Figure 4-103, Figure 4-104, and Figure 4-105.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
4
2
7
0
4
2
2
3
6
6
5
6
3
4
4
6b a
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 sh zeros
d
d vector unsigned char vector signed char
a vector unsigned char vector signed char
b vector unsigned char vector unsigned char
maps to vslb d,a,b
Figure 4-103. Shift Bits Left in Sixteen Integer Elements (8-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-91
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3
4
5
6
7
15
6
14
8
10
4
2
12
b a
0
0
0
0
0
0
0
0 sh zeros
d
d vector unsigned short vector signed short
a vector unsigned short vector signed short
b vector unsigned short vector unsigned short
maps to vslh d,a,b
Figure 4-104. Shift Bits Left in Eight Integer Elements (16-bit)
Element(R) 0 1 2 3
16
2
6
24
b
a
0
0
0
0 sh zeros
d
d vector unsigned int vector signed int
a vector unsigned int vector signed int
b vector unsigned int vector unsigned int
maps to vslw d,a,b
Figure 4-105. Shift Bits Left in Four Integer Elements (32-Bit)
4-92
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_sld
Vector Shift Left Double
vec_sld
d = vec_sld(a,b,c)
do i=0 to 15 if (i+c) < 16 then d{i} a{i+c} else d{i} b{i+c-16} end
The result is obtained by selecting the top 16 bytes obtained by shifting left (unsigned) by the value of c bytes a 32-byte quantity formed by catenating a with b. The valid combinations of argument types and the corresponding result types for d = vec_sld(a,b,c) are shown in Figure 4-106.
Byte(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Byte(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
a
||
b
Temp
d
c = 4 in this example
d vector unsigned char vector signed char vector unsigned short vector signed short vector pixel vector unsigned int vector signed int vector float
a vector unsigned char vector signed char vector unsigned short vector signed short vector pixel vector unsigned int vector signed int vector float
b vector unsigned char vector signed char vector unsigned short vector signed short vector pixel vector unsigned int vector signed int vector float
c 4-bit unsigned literal 4-bit unsigned literal 4-bit unsigned literal 4-bit unsigned literal 4-bit unsigned literal 4-bit unsigned literal 4-bit unsigned literal 4-bit unsigned literal
maps to
vsldoi d,a,b,c
Figure 4-106. Bit-Wise Conditional Select of Vector Contents (128-bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-93
Generic and Specific AltiVec Operations
vec_sll
Vector Shift Left Long
vec_sll
d = vec_sll(a,b)
m b[125:127] If each bi[5:7] = m, where i ranges from 0 to 14 then d ShiftLeft(a,m) else d Undefined
The result is obtained by shifting a left by a number of bits specied by the last 3 bits of the last element of b. The valid combinations of argument types and the corresponding result types for d = vec_sll(a,b) are shown in Figure 4-107. Note that the three low-order bits of all byte elements in b must be the same; otherwise the value placed into d is undened.
4-94
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 b[125:127] 6 b For this example, shift=6. a

0 sh zeros
Shift d
d vector unsigned char
a vector unsigned char vector unsigned char vector unsigned char vector signed char
b vector unsigned char vector unsigned short vector unsigned int vector unsigned char vector unsigned short vector unsigned int vector unsigned char vector unsigned short vector unsigned int vector unsigned char vector unsigned short vector unsigned int vector unsigned char vector unsigned short vector unsigned int vector unsigned char vector unsigned short vector unsigned int vector unsigned char vector unsigned short vector unsigned int vector unsigned char vector unsigned short vector unsigned int vector unsigned char vector unsigned short vector unsigned int vector unsigned char vector unsigned short vector unsigned int
maps to
vector signed char
vector signed char vector signed char vector bool char
vector bool char
vector bool char vector bool char vector unsigned short
vector unsigned short
vector unsigned short vector unsigned short vector signed short
vector signed short
vector signed short vector signed short vector bool short
vsl d,a,b
vector bool short
vector bool short vector bool short vector pixel
vector pixel
vector pixel vector pixel vector unsigned int
vector unsigned int
vector unsigned int vector unsigned int vector signed int
vector signed int
vector signed int vector signed int vector bool int
vector bool int
vector bool int vector bool int
Figure 4-107. Shift Bits Left in Vector (128-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-95
Generic and Specific AltiVec Operations
vec_slo
Vector Shift Left by Octet
vec_slo
d = vec_slo(a,b)
m b15[1:4] do i=0 to 15 ji+m if j < 16 then d{i} a{j} else d{i} 0 end
The contents of a are shifted left by the number of bytes specied by bits b15[1:4]; only these 4 bits in b are signicant for the shift value. Bytes shifted out of byte 0 are lost. Zeros are supplied to the vacated bytes on the right. The result is placed into d. The valid combinations of argument types and the corresponding result types for d = vec_slo(a,b) are shown in Figure 4-108.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 b15[1:4] 4 b
a 0 0 0 0 For this example, shift=4. d
d vector unsigned char vector signed char vector unsigned short vector signed short vector pixel vector unsigned int vector signed int vector float
a vector unsigned char vector unsigned char vector signed char vector signed char vector unsigned short vector unsigned short vector signed short vector signed short vector pixel vector pixel vector unsigned int vector unsigned int vector signed int vector signed int vector float vector float
b vector unsigned char vector signed char vector unsigned char vector signed char vector unsigned char vector signed char vector unsigned char vector signed char vector unsigned char vector signed char vector unsigned char vector signed char vector unsigned char vector signed char vector unsigned char vector signed char
maps to
vslo d,a,b
Figure 4-108. Left Byte Shift of Vector (128-Bit)
4-96
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_splat
Vector Splat
vec_splat
d = vec_splat(a,b)
n number of elements do i=0 to n-1 j mod(b,n) di aj end
Each element of the result is component b of a. The valid combinations of argument types and the corresponding result types for d = vec_splat(a,b) are shown in Figure 4-109, Figure 4-110, and Figure 4-111.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a For this example, b=7. d
d vector unsigned char vector signed char vector bool char
a vector unsigned char vector signed char vector bool char
b 5-bit unsigned literal 5-bit unsigned literal 5-bit unsigned literal
maps to vspltb d,a,b
Figure 4-109. Copy Contents to Sixteen Integer Elements (8-Bit)
Element(R)
0
1
2
3
4
5
6
7 a For this example, b=1. d
d vector unsigned short vector signed short vector bool short vector pixel
a vector unsigned short vector signed short vector bool short vector pixel
b 5-bit unsigned literal 5-bit unsigned literal 5-bit unsigned literal 5-bit unsigned literal
maps to
vsplth d,a,b
Figure 4-110. Copy Contents to Eight Elements (16-bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-97
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3 a For this example, b=2. d
d vector unsigned int vector signed int vector bool int vector float
a vector unsigned int vector signed int vector bool int vector float
b 5-bit unsigned literal 5-bit unsigned literal 5-bit unsigned literal 5-bit unsigned literal
maps to
vspltw d,a,b
Figure 4-111. Copy Contents to Four Integer Elements (32-Bit)
4-98
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_splat_s8
Vector Splat Signed Byte
vec_splat_s8
d = vec_splat_s8(a)
do i=0 to 15 di SignExtend(a) end
Each element of the result is the value obtained by sign-extending a. This permits values ranging from -16 to 15 only. The valid argument type and corresponding result type for d = vec_splat_s8(a) are shown in Figure 4-112.
a
d Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
d vector signed char
a 5-bit signed literal
maps to vspltisb d,a
Figure 4-112. Copy Value into Sixteen Signed Integer Elements (8-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-99
Generic and Specific AltiVec Operations
vec_splat_s16
Vector Splat Signed Half-Word
vec_splat_s16
d = vec_splat_s16(a)
do i=0 to 7 di SignExtend(a) end
Each element of the result is the value obtained by sign-extending a. This permits values ranging from -16 to 15 only. The valid argument type and corresponding result type for d = vec_splat_s16(a), tare shown in Figure 4-113.
a
d Element(R) 0 1 2 3 4 5 6 7
d vector signed short
a 5-bit signed literal
maps to vspltish d,a
Figure 4-113. Copy Value into Eight Signed Integer Elements (16-Bit)
4-100
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_splat_s32
Vector Splat Signed Word
vec_splat_s32
d = vec_splat_s32(a)
do i=0 to 3 di SignExtend(a) end
Each element of the result is the value obtained by sign-extending a. This permits values ranging from -16 to 15 only. The valid argument type are corresponding result type for d = vec_splat_s32(a) are shown in Figure 4-114.
a
d Element(R) 0 1 2 3
d vector signed int
a 5-bit signed literal
maps to vspltisw d,a
Figure 4-114. Copy Value into Four Signed Integer Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-101
Generic and Specific AltiVec Operations
vec_splat_u8
Vector Splat Unsigned Byte
vec_splat_u8
d = vec_splat_u8(a)
do i=0 to 15 di SignExtend(a) end
Each element of the result is the value obtained by sign-extending a and casting it to an unsigned char value. Each element of d is set to 256*sign(a) + a, where sign(a) is 0 for nonnegative a and 1 for negative a. The valid argument type and corresponding result type for d = vec_splat_u8(a) are shown in Figure 4-115. It is necessary to use the generic name, since the specic operation vec_vspltisb returns a vector signed char value.
a
d Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
d vector unsigned char
a 5-bit signed literal
maps to vspltisb d,a
Figure 4-115. Copy Value into Sixteen Signed Integer Elements (8-Bit)
4-102
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_splat_u16
Vector Splat Unsigned Half-Word
vec_splat_u16
d = vec_splat_u16(a)
do i=0 to 7 di SignExtend(a) end
Each element of the result is the value obtained by sign-extending a and casting it to an unsigned short value. Each element of d is set to 65536*sign(a) + a, where sign(a) is 0 for non-negative a and 1 for negative a. The valid argument type and corresponding result type for d = vec_splat_u16(a) are shown in Figure 4-116. It is necessary to use the generic name, since the specic operation vec_vspltish returns a vector signed short value.
a
d Element(R) 0 1 2 3 4 5 6 7
d vector unsigned short
a 5-bit signed literal
maps to vspltish d,a
Figure 4-116. Copy Value into Eight Signed Integer Elements (16-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-103
Generic and Specific AltiVec Operations
vec_splat_u32
Vector Splat Unsigned Word
vec_splat_u32
d = vec_splat_u32(a)
do i=0 to 3 di SignExtend(a) end
Each element of the result is the value obtained by sign-extending a. and casting it to an unsigned int value. Each element of d is set to 4294967296*sign(a) + a, where sign(a) is 0 for non-negative a and 1 for negative a. The valid argument type and corresponding result type for d = vec_splat_u32(a) areshown in Figure 4-117. It is necessary to use the generic name, since the specic operation vec_vspltisw returns a vector signed int value.
a
d Element(R) 0 1 2 3
d vector unsigned int
a 5-bit signed literal
maps to vspltisw d,a
Figure 4-117. Copy Value into Four Signed Integer Elements (32-Bit)
4-104
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_sr
Vector Shift Right
vec_sr
d = vec_sr(a,b)
n number of elements s 128/n do i=0 to n-1 di ShiftRight(ai,mod(bi,s)) end
Each element of the result is the result of shifting the corresponding element of a right by the number of bits of the corresponding element of b. Zero bits are shifted in from the left for both signed and unsigned argument types. The valid combinations of argument types and the corresponding result types for d = vec_sr(a,b) are shown in Figure 4-118, Figure 4-119, and Figure 4-120.
Element(R) 0 4 1 2 2 7 3 0 4 4 5 2 6 2 7 3 8 6 9 6 10 5 11 6 12 3 13 4 14 4 15 6b a
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 sh zeros
d
d vector unsigned char vector signed char
a vector unsigned char vector signed char
b vector unsigned char vector unsigned char
maps to vsrb d,a,b
Figure 4-118. Shift Bits Right in Sixteen Integer Elements (8-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-105
Generic and Specific AltiVec Operations
Element(R)
0 15
1 6
2 14
3 8
4 10
5 4
6 2
7 12 b a
0
0
0
0
0
0
0
0 sh zeros
d
d vector unsigned short vector signed short
a vector unsigned short vector signed short
b vector unsigned short vector unsigned short
maps to vsrh d,a,b
Figure 4-119. Shift Bits Right in Eight Integer Elements (16-bit)
Element(R) 0 16 1 2 2 6 3 24 b a
0
0
0
0 sh zeros
d
d vector unsigned int vector signed int
a vector unsigned int vector signed int
b vector unsigned int vector unsigned int
maps to vsrw d,a,b
Figure 4-120. Shift Bits Right in Four Integer Elements (32-Bit)
4-106
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_sra
Vector Shift Right Algebraic
vec_sra
d = vec_sra(a,b)
n number of elements s 128/n do i=0 to n-1 di ShiftRightA(ai,mod(bi,s)) end
Each element of the result is the result of shifting the corresponding element of a right by the number of bits of the corresponding element of b. Copies of the sign bit are shifted in from the left for both signed and unsigned argument types. The valid combinations of argument types and the corresponding result types for d = vec_sra(a,b) are shown in Figure 4-121, Figure 4-122, and Figure 4-123.
Element(R) 0 4 1 2 2 7 3 0 4 4 5 2 6 2 7 3 8 6 9 6 10 5 11 6 12 3 13 4 14 4 15 6b a
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S sh bit x
d
*bit x = bit 0 of each element
d vector unsigned char vector signed char
a vector unsigned char vector signed char
b vector unsigned char vector unsigned char
maps to vsrab d,a,b
Figure 4-121. Shift Bits Right in Sixteen Integer Elements (8-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-107
Generic and Specific AltiVec Operations
Element(R)
0 15
1 6
2 14
3 8
4 10
5 4
6 2
7 12 b
a
S
S
S
S
S
S
S
S sh bit x
d
*x = bit 0 of each element
d vector unsigned short vector signed short
a vector unsigned short vector signed short
b vector unsigned short vector unsigned short
maps to vsrah d,a,b
Figure 4-122. Shift Bits Right in Eight Integer Elements (16-bit)
Element(R) 0 16 1 2 2 6 3 24 b a
S
S
S
S sh bit x *x = bit 0 of each element
d
d vector unsigned int vector signed int
a vector unsigned int vector signed int
b vector unsigned int vector unsigned int
maps to vsraw d,a,b
Figure 4-123. Shift Bits Right in Four Integer Elements (32-Bit)
4-108
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_srl
Vector Shift Right Long
vec_srl
d = vec_srl(a,b)
m b[125:127] if each bi[5:7] = m, where i ranges from 0 to 14 then d ShiftRight(a,m) else d Undefined
The result is obtained by shifting a right by a number of bits specied by the last 3 bits of the last element of b. The valid combinations of argument types and the corresponding result types for d = vec_srl(a,b) are shown in Figure 4-124. Note that the low-order 3 bits of all byte elements in b must be the same; otherwise the value placed into d is undened.
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-109
Generic and Specific AltiVec Operations
b[125:127] 6b a * 0 sh zeros * * * * * * * * * For this example, shift=6. d
d vector unsigned char
a vector unsigned char vector unsigned char vector unsigned char vector signed char
b vector unsigned char vector unsigned short vector unsigned int vector unsigned char vector unsigned short vector unsigned int vector unsigned char vector unsigned short vector unsigned int vector unsigned char vector unsigned short vector unsigned int vector unsigned char vector unsigned short vector unsigned int vector unsigned char vector unsigned short vector unsigned int vector unsigned char vector unsigned short vector unsigned int vector unsigned char vector unsigned short vector unsigned int vector unsigned char vector unsigned short vector unsigned int vector unsigned char vector unsigned short vector unsigned int
maps to
vector signed char
vector signed char vector signed char vector bool char
vector bool char
vector bool char vector bool char vector unsigned short
vector unsigned short
vector unsigned short vector unsigned short vector signed short
vector signed short
vector signed short vector signed short vector bool short
vsr d,a,b
vector bool short
vector bool short vector bool short vector pixel
vector pixel
vector pixel vector pixel vector unsigned int
vector unsigned int
vector unsigned int vector unsigned int vector signed int
vector signed int
vector signed int vector signed int vector bool int
vector bool int
vector bool int vector bool int
Figure 4-124. Shift Bits Right in Vector (128-Bit)
4-110
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_sro
Vector Shift Right by Octet
vec_sro
d = vec_sro(a,b)
m b[121:124] do i=0 to 15 ji-m if j 0 then d{i} a{j} else d{i} 0 end
The result is obtained by shifting (unsigned) a right by a number of bytes specied by the shifting the value of the last element of b by 3 bits. The valid combinations of argument types and the corresponding result types for d = vec_sro(a,b) are shown in Figure 4-125.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 b[121:124] 5 b
a * 0 0 0 0 0 * * * * * * * * * For this example, shift=5. d
d vector unsigned char vector signed char vector unsigned short vector signed short vector pixel vector unsigned int vector signed int vector float
a vector unsigned char vector unsigned char vector signed char vector signed char vector unsigned short vector unsigned short vector signed short vector signed short vector pixel vector pixel vector unsigned int vector unsigned int vector signed int vector signed int vector float vector float
b vector unsigned char vector signed char vector unsigned char vector signed char vector unsigned char vector signed char vector unsigned char vector signed char vector unsigned char vector signed char vector unsigned char vector signed char vector unsigned char vector signed char vector unsigned char vector signed char
maps to
vsro d,a,b
Figure 4-125. Right Byte Shift of Vector (128-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-111
Generic and Specific AltiVec Operations
vec_st
Vector Store Indexed
vec_st
vec_st(a,b,c)
EA BoundAlign((b + c), 16) MEM(EA,16) a
Each operation performs a 16-byte store of the value of a at a 16-byte aligned address. The b is taken to be an integer value, while c is a pointer. BoundAlign(b+c,16) is the largest value less than or equal to a b+c that is a multiple of 16. This is not, by itself, an acceptable way to store aligned data to unaligned addresses. This store is the one that is generated for a storing dereference of a pointer to a vector type. Plain char * is excluded in the mapping for c. The valid combinations of argument types for vec_st(a,b,c) are shown in Table 4-18. The result type is void.
b c
+
BoundAlign(b+c,16)
Effective Address (EA)
a
Store
Memory Interface MEM(EA,16)
Figure 4-126. Vector Store Indexed
4-112
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
Table 4-18. vec_stNVector Store Indexed Argument Types
a vector unsigned char vector unsigned char vector signed char vector signed char vector bool char vector bool char vector bool char vector unsigned short vector unsigned short vector signed short vector signed short vector bool short vector bool short vector bool short vector pixel vector pixel vector pixel vector unsigned int vector unsigned int vector signed int vector signed int vector bool int vector bool int vector bool int vector float vector float b any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type c vector unsigned char * unsigned char * vector signed char * signed char * vector bool char * unsigned char * signed char * vector unsigned short * unsigned short * vector signed short * short * vector bool short * unsigned short * short * vector pixel short * unsigned short * short * vector unsigned int * unsigned int * vector signed int * int * vector bool int * unsigned int * int * vector float * float * stvx a,b,c Maps to
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-113
Generic and Specific AltiVec Operations
vec_ste
Vector Store Element Indexed
vec_ste
vec_ste(a,b,c)
s 16/(number of elements) EA BoundAlign (b + c,s) i mod(EA,16)/s MEM(EA,s) ai
A single element of a is stored at the effective address. BoundAlign(b+c,s) is the largest value less than or equal to b+c that is a multiple of s, where s is 1 for char pointers, 2 for short pointers, and 4 for int or float pointers. The element stored is the one whose position in the register matches the position of the adjusted address relative to 16-byte alignment (A16). If you do not know the alignment of the sum of b and c, you will not know which element is stored. Plain char * is excluded in the mapping for c. The valid combinations of argument types for vec_ste(a,b,c) are shown in Figure 4-127. The result type is void.
4-114
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
b
+
c
BoundAlign(b+c,1)
Effective Address (EA)
a
ai The example shows a byte-sized element.
Store
Memory Interface MEM(EA,s)
a vector unsigned char vector signed char vector bool char vector bool char vector unsigned short vector signed short vector bool short vector bool short vector pixel vector pixel vector unsigned int vector unsigned int vector signed int vector signed int vector bool int vector bool int vector bool int vector bool int vector float
b any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type
c unsigned char * signed char * unsigned char * signed char * unsigned short * short * unsigned short * short * unsigned short * short * unsigned int * unsigned int * int * int * unsigned int * unsigned int * int * int * float *
Maps to
stvebx a,b,c
stvehx a,b,c
stvewx a,b,c
Figure 4-127. Vector Store Element
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-115
Generic and Specific AltiVec Operations
vec_stl
Vector Store Indexed LRU
vec_stl
vec_stl(a,b,c)
EA BoundAlign(b + c, 16) MEM(EA,16) a
Each operation performs a 16-byte store of the value of a at a 16-byte aligned address. The b is taken to be an integer value, while c is a pointer. BoundAlign(b+c,16) is the largest value less than or equal to a b+c that is a multiple of 16. This is not, by itself, an acceptable way to store aligned data to unaligned addresses. The cache line stored into is marked Least Recently Used (LRU). Plain char * is excluded in the mapping for c. The valid combinations of argument types for vec_stl(a,b,c) are shown in Table 4-19. The result type is void.
b c
+
BoundAlign(b+c,16)
Effective Address (EA)
a
Store
Memory Interface MEM(EA,16)
Figure 4-128. Vector Store Indexed LRU
4-116
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
Table 4-19vec_stlNVector Store Index Argument Types
a vector unsigned char vector unsigned char vector signed char vector signed char vector bool char vector bool char vector bool char vector unsigned short vector unsigned short vector signed short vector signed short vector bool short vector bool short vector bool short vector pixel vector pixel vector pixel vector unsigned int vector unsigned int vector signed int vector signed int vector bool int vector bool int vector bool int vector bool int vector float vector float b any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type any integral type c vector unsigned char * unsigned char * vector signed char * signed char * vector bool char * unsigned char * signed char * vector unsigned short * unsigned short * vector signed short * short * vector bool short * unsigned short * short * vector pixel * unsigned short * short * vector unsigned int * unsigned int * vector signed int * int * vector bool int * unsigned int * unsigned int * int * vector float * float * stvxl a,b,c Maps to
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-117
Generic and Specific AltiVec Operations
vec_sub
Vector Subtract
vec_sub
d = vec_sub(a,b) Integer Subtract:
n number of elements do i=0 to n-1 di ai - bi end
Floating-Point Subtract:
do i=0 to 3 di ai -fp bi end
Each element of the result is the difference between the corresponding elements of a and b. The arithmetic is modular for integer types. For vector float argument types, if VSCR[NJ] = 1, every denormalized vector float operand element is truncated to a 0 of the same sign before the operation is carried out, and each denormalized vector float result element truncates to a 0 of the same sign. The valid combinations of argument types and the corresponding result types for d = vec_sub(a,b) are shown in Figure 4-129, Figure 4-130, Figure 4-131, and Figure 4-132.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a
b - - - - - - - - - - - - - - - - d
d vector unsigned char
a vector unsigned char vector unsigned char vector bool char vector signed char
b vector unsigned char vector bool char vector unsigned char vector signed char vector bool char vector signed char
maps to
vsububm d,a,b
vector signed char
vector signed char vector bool char
Figure 4-129. Subtract Sixteen Integer Elements (8-bit)
4-118
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3
4
5
6
7 a b
-
-
-
-
-
-
-
- d
d vector unsigned short
a vector unsigned short vector unsigned short vector bool short vector signed short
b vector unsigned short vector bool short vector unsigned short vector signed short vector bool short vector signed short
maps to
vsubuhm d,a,b
vector signed short
vector signed short vector bool short
Figure 4-130. Subtract Eight Integer Elements (16-bit)
Element(R) 0 1 2 3 a b - - - - d
d vector unsigned int
a vector unsigned int vector unsigned int vector bool int vector signed int
b vector unsigned int vector bool int vector unsigned int vector signed int vector bool int vector signed int
maps to
vsubuwm d,a,b
vector signed int
vector signed int vector bool int
Figure 4-131. Subtract Four Integer Elements (32-bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-119
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3 a b
-fp
-fp
-fp
-fp d
d vector float
a vector float
b vector float
maps to vsubfp d,a,b
Figure 4-132. Subtract Four Floating-Point Elements (32-bit)
4-120
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_subc
Vector Subtract Carryout
vec_subc
d = vec_subc(a,b)
do i=0 to 3 di = BorrowOut(ai - bi) end
Each element of b is subtracted from the corresponding element in a. The borrow from each difference is complemented and zero-extended and placed into the corresponding element of d. BorrowOut (a b) is 0 if a borrow occurred and 1 if no borrow occurred. The valid combination of argument types and the corresponding result type for d = vec_subc(a,b) are shown in Figure 4-133.
Element(R) 0 1 2 3 a b - - - - 33-bit per element (temp)
d
d vector unsigned int
a vector unsigned int
b vector unsigned int
maps to vsubcuw d,a,b
Figure 4-133. Carryout of Four Unsigned Integer Subtracts (32-bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-121
Generic and Specific AltiVec Operations
vec_subs
Vector Subtract Saturated
vec_subs
d = vec_subs(a,b)
n number of elements do i=0 to n-1 di Saturate (ai - bi) end
Each element of the result is the saturated difference between the corresponding elements of a and b. If saturation occurs, VSCR[SAT] is set (see Table 4-1). The valid combinations of argument types and the corresponding result types for d = vec_subs(a,b) are shown in Figure 4-134, Figure 4-135, and Figure 4-136.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15
a b
- - - - - - - - - - - - - - - -
d
d vector unsigned char
a vector unsigned char vector unsigned char vector bool char vector signed char
b vector unsigned char vector bool char vector unsigned char vector signed char vector bool char vector signed char
maps to vsububs d,a,b
vector signed char
vector signed char vector bool char
vsubsbs d,a,b
Figure 4-134. Subtract Saturating Sixteen Integer Elements (8-bit)
4-122
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3
4
5
6
7 a b
-
-
-
-
-
-
-
- d
d vector unsigned short
a vector unsigned short vector unsigned short vector bool short vector signed short
b vector unsigned short vector bool short vector unsigned short vector signed short vector bool short vector signed short
maps to vsubuhs d,a,b
vector signed short
vector signed short vector bool short
vsubshs d,a,b
Figure 4-135. Subtract Saturating Eight Integer Elements (16-bit)
Element(R) 0 1 2 3 a b - - - - d
d vector unsigned int
a vector unsigned int vector unsigned int vector bool int vector signed int
b vector unsigned int vector bool int vector unsigned int vector signed int vector bool int vector signed int
maps to vsubuws d,a,b
vector signed int
vector signed int vector bool int
vsubsws d,a,b
Figure 4-136. Subtract Saturating Four Integer Elements (32-bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-123
Generic and Specific AltiVec Operations
vec_sum4s
Vector Sum Across Partial (1/4) Saturated
vec_sum4s
d = vec_sum4s(a,b) For a with 8-bit elements:
do i=0 to 3 di Saturate (a4i+ a4i+1 + a4i+2 + a4i+3 + bi) end
For a with 16-bit elements:
do i=0 to 3 di Saturate(a2i+ a2i+1 + bi) end
Each element of the result is the 32-bit saturated sum of the corresponding element in b and all elements in a with positions overlapping those of that element. If saturation occurs, VSCR[SAT] is set (see Table 4-1). The valid combinations of argument types and the corresponding result types for d = vec_sum4s(a,b) are shown in Figure 4-137 and Figure 4-138.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a Element(R) 0 1 2 3 b + + + + d
d vector unsigned int vector signed int
a vector unsigned char vector signed char
b vector unsigned int vector signed int
maps to vsum4ubs d,a,b vsum4sbs d,a,b
Figure 4-137. Four Sums in the Integer Elements (32-Bit)
Element(R) 0 1 2 3 4 5 6 7 a Element(R) 0 1 2 3 b + + + + d
d vector signed int
a vector signed short
b vector signed int
maps to vsum4shs d,a,b
Figure 4-138. Four Sums in the Integer Elements (32-Bit)
4-124
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_sum2s
Vector Sum Across Partial (1/2) Saturated
vec_sum2s
d = vec_sum2s(a,b)
do i=0 to 1 d2i 0 d2i+1 Saturate(a2i + a2i+1 + b2i+1) end
The rst and third elements of the result are 0. The second element of the result is the 32-bit saturated sum of the rst two elements of a and the second element of b. The fourth element of the result is the 32-bit saturated sum of the last two elements of a and the fourth element of b. If saturation occurs, VSCR[SAT] is set (see Table 4-1). The valid combination of argument types and the corresponding result type for d = vec_sum2s(a,b) are shown in Figure 4-139.
Element(R) 0 1 2 3 a b + 0 0 + d
d vector signed int
a vector signed int
b vector signed int
maps to vsum2sws d,a,b
Figure 4-139. Two Saturated Sums in the Four Signed Integer Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-125
Generic and Specific AltiVec Operations
vec_sums
Vector Sum Saturated
vec_sums
d = vec_sums(a,b)
do i=0 to 2 di 0 end d3 Saturate(a0 + a1 + a2 + a3 + b3)
The rst three elements of the result are 0. The fourth element of the result is the 32-bit saturated sum of all elements of a and the fourth element of b. If saturation occurs, VSCR[SAT] is set (see Table 4-1). The valid combination of argument types and the corresponding result type for d = vec_sums(a,b) are shown in Figure 4-140.
Element(R) 0 1 2 3 a b + 0 0 0 d
d vector signed int
a vector signed int
b vector signed int
maps to vsumsws d,a,b
Figure 4-140. Saturated Sum of Five Signed Integer Elements (32-Bit)
4-126
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
vec_trunc
Vector Truncate
vec_trunc
d = vec_trunc(a)
do i=0 to 3 di RndToFPITrunc(ai) end
Each single-precision oating-point word element in a is rounded to a single-precision oating-point integer, using the Round-toward-Zero mode, and placed into the corresponding word element of d. Each element of the result is thus the value of the corresponding element of a truncated to an integral value. The operation is independent of VSCR[NJ]. The valid argument type and corresponding result type for d = vec_trunc(a) are shown in Figure 4-141.
Element(R) 0 1 2 3 a RndToFPITrunc RndToFPITrunc RndToFPITrunc RndToFPITrunc d
d vector float
a vector float
maps to vrfiz d,a
Figure 4-141. Round-to-Zero of Four Floating-Point Integer Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-127
Generic and Specific AltiVec Operations
vec_unpackh
Vector Unpack High Element
vec_unpackh
d = vec_unpackh(a) Integer value:
n number of elements in d do i=0 to n-1 di SignExtend(ai) end
Pixel value:
do i=0 to 3 di SignExtend(ai[0]) || 000 || ai[1:5] || 000 || ai[6:10] || 000 || ai[11:15] end
Each element of the result is the result of extending the corresponding half-width high element of a. The valid argument types and corresponding result types for d = vec_unpackh(a) are shown in Figure 4-142, Figure 4-143, and Figure 4-144.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a
S
S
S
S
S
S
S
S
d
d vector signed short vector bool short
a vector signed char vector bool char
maps to vupkhsb d,a
Figure 4-142. Unpack High-Order Elements (8-Bit) to Elements (16-Bit)
4-128
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3
4
5
6
7 a
0
0
0
0
0
0
0
0
0
0
0
0
d
d vector unsigned int
a vector pixel
maps to vupkhpx d,a
Figure 4-143. Unpack High-Order Pixel Elements (16-Bit) to Elements (32-Bit)
Programming note: Notice that the unpacking done by the vector unpack element operations for vector pixel values does not reverse the packing done by the vector pack pixel operation. Specically, if a 16-bit pixel is unpacked to a 32-bit pixel which is then packed to a 16-bit pixel, the resulting 16-bit pixel will not, in general, be equal to the original 16-bit pixel (because, for each channel except the rst, vector unpack element inserts high-order bits while vector pack pixel discards low-order bits.) This was designed to optimize image processing where the unpacked values would be multiplied by small coefcients and accumulated in a digital lter. The usual transformation from the 16-bit pixel to a 32-bit pixel involves multiplication of the RGB channels by 255/31. This can be accomplished by replicating the 3 most signicant bits in the least signicant bits using the operations:
d = vec_unpackh(a); d = (vector unsigned int) vec_or(vec_sl((vector unsigned char)d, (vector unsigned char)(3)), vec_sr((vector unsigned char)d, (vector unsigned char)(2)));
Element(R)
0
1
2
3
4
5
6
7 a
S
S
S
S
d
d vector signed int vector bool int
a vector signed short vector bool short
maps to vupkhsh d,a
Figure 4-144. Unpack High-Order Signed Integer Elements (16-Bit) to Signed Integer Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-129
Generic and Specific AltiVec Operations
vec_unpackl
Vector Unpack Low Element
vec_unpackl
d = vec_unpackl(a) Integer value:
n number of elements in d do i=0 to n-1 di SignExtend(ai+n) end
Pixel value:
do i=0 to 3 di SignExtend(ai+n[0]) || 000 || ai+n[1:5] || 000 || ai+n[6:10] || 000 || ai+n[11:15] end
Each element of the result is the result of extending the corresponding half-width low element of a. The valid argument types and corresponding result types for d = vec_unpackl(a) are shown in Figure 4-145, Figure 4-146, and Figure 4-147.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a
S
S
S
S
S
S
S
S
d
d vector signed short vector bool short
a vector signed char vector bool char
maps to vupklsb d,a
Figure 4-145. Unpack Low-Order Elements (8-Bit) to Elements (16-Bit)
Element(R) 0 1 2 3 4 5 6 7 a
0
0
0
0
0
0
0
0
0
0
0
0
d
d vector unsigned int
a vector pixel
maps to vupklpx d,a
Figure 4-146. Unpack Low-Order Pixel Elements (16-Bit) to Elements (32-Bit)
4-130
AltiVec Technology Programming Interface Manual
MOTOROLA
Generic and Specific AltiVec Operations
Element(R)
0
1
2
3
4
5
6
7 a
S
S
S
S
d
d vector signed int vector bool int
a vector signed short vector bool short
maps to vupklsh d,a
Figure 4-147. Unpack Low-Order Signed Integer Elements (16-Bit) to Signed Integer Elements (32-Bit)
Programming note: Notice that the unpacking done by the vector unpack element operations for vector pixel values does not reverse the packing done by the vector pack pixel operation. Specically, if a 16-bit pixel is unpacked to a 32-bit pixel which is then packed to a 16-bit pixel, the resulting 16-bit pixel will not, in general, be equal to the original 16-bit pixel (because, for each channel except the rst, vector unpack element inserts high-order bits while vector pack pixel discards low-order bits.) This was designed to optimize image processing where the unpacked values would be multiplied by small coefcients and accumulated in a digital lter. The usual transformation from the 16-bit pixel to a 32-bit pixel involves multiplication of the RGB channels by 255/31. This can be accomplished by replicating the 3 most signicant bits in the least signicant bits using the operations:
d = vec_unpackh(a); d = (vector unsigned int) vec_or(vec_sl((vector unsigned char)d, (vector unsigned char)(3)), vec_sr((vector unsigned char)d, (vector unsigned char)(2)));
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-131
Generic and Specific AltiVec Operations
vec_xor
Vector Logical XOR
vec_xor
d = vec_xor(a,b)
daAb
Each bit of the result is the logical XOR of the corresponding bits of a and b. The valid combinations of argument types and the corresponding result types for d = vec_xor(a,b) are shown in Figure 4-148.
a b A d
d vector unsigned char
a vector unsigned char vector unsigned char vector bool char vector signed char
b vector unsigned char vector bool char vector unsigned char vector signed char vector bool char vector signed char vector bool char vector unsigned short vector bool short vector unsigned short vector signed short vector bool short vector signed short vector bool short vector unsigned int vector bool int vector unsigned int vector signed int vector bool int vector signed int vector bool int vector float vector bool int vector float
maps to
vector signed char vector bool char vector unsigned short
vector signed char vector bool char vector bool char vector unsigned short vector unsigned short vector bool short vector signed short
vector signed short vector bool short vector unsigned int
vector signed short vector bool short vector bool short vector unsigned int vector unsigned int vector bool int vector signed int
vxor d,a,b
vector signed int vector bool int vector float
vector signed int vector bool int vector bool int vector bool int vector float vector float
Figure 4-148. Logical Bit-Wise XOR
4-132
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
4.5 AltiVec Predicates
The AltiVec predicates all begin with vec_all_ or vec_any_. The AltiVec predicates are organized alphabetically by predicate name with a denition of the permitted generic AltiVec predicates. The specic operations do not exist for the predicates. Where possible, the description is supported by reference gures indicating data modications and including a table that lists: the valid set of argument types for that predicate, and the specic AltiVec instruction generated for that set of arguments. The AltiVec instruction is in the form v-----. x,a,b, where v-----. represents the instruction and x,a,b represent the operands. The x represents an unused vector result of the vector compare instruction used to implement the predicate. The order of operands listed after the instruction indicate the order in which they are applied for that predicate.
For example,
vec_any_lt(vector unsigned char, vector unsigned char)
maps to the instruction
vcmpgtb. x,b,a
indicating that the operands are applied in reverse order for this predicate.
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-133
AltiVec Predicates
vec_all_eq
All Elements Equal
vec_all_eq
d = vec_all_eq(a,b)
n number of elements if each ai =int bi, where i ranges from 0 to n-1 then d 1 else d 0
The predicate vec_all_eq returns 1 if every element of a is equal to the corresponding element of b. Otherwise, it returns 0. For vector float argument types, if VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid combinations of argument types and the corresponding result type for d = vec_all_eq(a,b) are shown in Figure 4-149, Figure 4-150, Figure 4-151, and Figure 4-152.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a b = = = = = = = = & d = = = = = = = =
d
a vector unsigned char vector unsigned char vector signed char
b vector unsigned char vector bool char vector signed char vector bool char vector unsigned char vector signed char vector bool char
Maps to
int
vector signed char vector bool char vector bool char vector bool char
vcmpequb. x,a,b
Figure 4-149. All Equal of Sixteen Integer Elements (8-bits)
4-134
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
Element(R)
0
1
2
3
4
5
6
7 a b
=
=
=
= &
=
=
=
=
d
d
a vector unsigned short vector unsigned short vector signed short
b vector unsigned short vector bool short vector signed short vector bool short vector unsigned short vector signed short vector bool short vector pixel
Maps to
int
vector signed short vector bool short vector bool short vector bool short vector pixel
vcmpequh. x,a,b
Figure 4-150. All Equal of Eight Integer Elements (16-Bit)
Element(R) 0 1 2 3 a b = = & d = =
d
a vector unsigned int vector unsigned int vector signed int
b vector unsigned int vector bool int vector signed int vector bool int vector unsigned int vector signed int vector bool int
Maps to
int
vector signed int vector bool int vector bool int vector bool int
vcmpequw. x,a,b
Figure 4-151. All Equal of Four Integer Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-135
AltiVec Predicates
Element(R)
0
1
2
3 a b
=fp
=fp &
=fp
=fp
d
d int
a vector float
b vector float
Maps to vcmpeqfp. x,a,b
Figure 4-152. All Equal of Four Floating-Point Elements (32-Bit)
4-136
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
vec_all_ge
All Elements Greater Than or Equal
vec_all_ge
d = vec_all_ge(a,b)
n number of elements if each ai bi, where i ranges from 0 to n-1 then d 1 else d 0
The predicate vec_all_ge returns 1 if every element of a is greater than or equal to the corresponding element of b. Otherwise, it returns 0. For vector float argument types, if VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid combinations of argument types and the corresponding result type for d = vec_all_ge(a,b) are shown in Figure 4-153, Figure 4-154, Figure 4-155, and Figure 4-156.
Element(R) 0
1
2
3
4
5
6
7
8
9
10
11
12
13 14
15 a b

&

d
d
a vector unsigned char vector unsigned char vector bool char vector signed char vector signed char vector bool char
b vector unsigned char vector bool char vector unsigned char vector signed char vector bool char vector signed char
Maps to vcmpgtub. x.b,a
int
vcmpgtsb. x,b,a
Figure 4-153. All Greater Than or Equal of Sixteen Integer Elements (8-bits)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-137
AltiVec Predicates
Element(R)
0
1
2
3
4
5
6
7 a b
&

d
d
a vector unsigned short vector unsigned short vector bool short vector signed short vector signed short vector bool short
b vector unsigned short vector bool short vector unsigned short vector signed short vector bool short vector signed short
Maps to vcmpgtuh. x,b,a
int
vcmpgtsh. x,b,a
Figure 4-154. All Greater Than or Equal of Eight Integer Elements (16-Bit)
Element(R) 0 1 2 3 a b & d
d
a vector unsigned int vector unsigned int vector bool int vector signed int vector signed int vector bool int
b vector unsigned int vector bool int vector unsigned int vector signed int vector bool int vector signed int
Maps to vcmpgtuw. x,b,a
int
vcmpgtsw. x,b,a
Figure 4-155. All Greater Than or Equal of Four Integer Elements (32-Bit)
4-138
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
Element(R)
0
1
2
3 a b
fp
fp &
fp
fp
d
d int
a vector float
b vector float
Maps to vcmpgefp. x,a,b
Figure 4-156. All Greater Than or Equal of Four Floating-Point Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-139
AltiVec Predicates
vec_all_gt
All Elements Greater Than
vec_all_gt
d = vec_all_gt(a,b)
n number of elements if each ai > bi, where i ranges from 0 to n-1 then d 1 else 0
d
The predicate vec_all_gt returns 1 if every element of a is greater than the corresponding element of b. Otherwise, it returns 0. For vector float argument types, if VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid combinations of argument types and the corresponding result type for d = vec_all_gt(a,b) are shown in Figure 4-157, Figure 4-158, Figure 4-159, and Figure 4-160.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a b > > > > > > > > & d > > > > > > > >
d
a vector unsigned char vector unsigned char vector bool char vector signed char vector signed char vector bool char
b vector unsigned char vector bool char vector unsigned char vector signed char vector bool char vector signed char
Maps to vcmpgtub. x,a,b
int
vcmpgtsb. x,a,b
Figure 4-157. All Greater Than of Sixteen Integer Elements (8-bits)
4-140
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
Element(R)
0
1
2
3
4
5
6
7 a b
>
>
>
> &
>
>
>
>
d
d
a vector unsigned short vector unsigned short vector bool short vector signed short vector signed short vector bool short
b vector unsigned short vector bool short vector unsigned short vector signed short vector bool short vector signed short
Maps to vcmpgtuh. x,a,b
int
vcmpgtsh. x,a,b
Figure 4-158. All Greater Than of Eight Integer Elements (16-Bit)
Element(R) 0 1 2 3 a b > > & d > >
d
a vector unsigned int vector unsigned int vector bool int vector signed int vector signed int vector bool int
b vector unsigned int vector bool int vector unsigned int vector signed int vector bool int vector signed int
Maps to vcmpgtuw. x,a,b
int
vcmpgtsw. x,a,b
Figure 4-159. All Greater Than of Four Integer Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-141
AltiVec Predicates
Element(R)
0
1
2
3 a b
>fp
>fp &
>fp
>fp
d
d int
a vector float
b vector float
Maps to vcmpgtfp. x,a,b
Figure 4-160. All Greater Than of Four Floating-Point Elements (32-Bit)
4-142
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
vec_all_in
All Elements in Bounds
vec_all_in
d = vec_all_in(a,b)
if each ai bi and ai -bi, where i ranges from 0 to 3 then d 1 else 0
d
The predicate vec_all_in returns 1 if every element of a is less than or equal to the corresponding element of b (high bound) and greater than or equal to the negative (NEG) of the corresponding element of b (low bound). Otherwise, it returns 0. If VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid argument types and the corresponding result type for d = vec_all_in(a,b) are shown in Figure 4-161.
Element(R) 0 1
2
3 a b
NEG
NEG
NEG
NEG
temp (-b) & d
d int
a vector float
b vector float
Maps to vcmpbfp. x,a,b
Figure 4-161. All in Bounds of Four Floating-Point Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-143
AltiVec Predicates
vec_all_le
All Elements Less Than or Equal
vec_all_le
d = vec_all_le(a,b)
n number of elements if each ai bi, where i ranges from 0 to n-1 then d 0 else d 1
The predicate vec_all_le returns 1 if every element of a is less than or equal to the corresponding element of b. Otherwise, it returns 0. For vector float argument types, if VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid combinations of argument types and the corresponding result type for d = vec_all_le(a,b) are shown in Figure 4-162, Figure 4-163, Figure 4-164, and Figure 4-165.
Element(R) 0
1
2
3
4
5
6
7
8
9
10
11
12
13 14
15 a b

&

d
d
a vector unsigned char vector unsigned char vector bool char vector signed char vector signed char vector bool char
b vector unsigned char vector bool char vector unsigned char vector signed char vector bool char vector signed char
Maps to vcmpgtub. x,a,b
int
vcmpgtsb. x,a,b
Figure 4-162. All Less Than or Equal of Sixteen Integer Elements (8-bits)
4-144
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
Element(R)
0
1
2
3
4
5
6
7 a b
&

d
d
a vector unsigned short vector unsigned short vector bool short vector signed short vector signed short vector bool short
b vector unsigned short vector bool short vector unsigned short vector signed short vector bool short vector signed short
Maps to vcmpgtuh. x,a,b
int
vcmpgtsh. x,b,a
Figure 4-163. All Less Than or Equal of Eight Integer Elements (16-Bit)
Element(R) 0 1 2 3 a b & d
d
a vector unsigned int vector unsigned int vector bool int vector signed int vector signed int vector bool int
b vector unsigned int vector bool int vector unsigned int vector signed int vector bool int vector signed int
Maps to vcmpgtuw. x,a,b
int
vcmpgtsw. x,a,b
Figure 4-164. All Less Than or Equal of Four Integer Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-145
AltiVec Predicates
Element(R)
0
1
2
3 a b
fp
fp &
fp
fp
d
d int
a vector float
b vector float
Maps to vcmpgefp. x,b,a
Figure 4-165. All Less Than or Equal of Four Floating-Point Elements (32-Bit)
4-146
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
vec_all_lt
All Elements Less Than
vec_all_lt
d = vec_all_lt(a,b)
n number of elements if each ai < bi, where i ranges from 0 to n-1 then d 1 else d 0
The predicate vec_all_lt returns 1 if every element of a is less than the corresponding element of b. Otherwise, it returns 0. For vector float argument types, if VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid combinations of argument types and the corresponding result type for d = vec_all_lt(a,b) are shown in Figure 4-166, Figure 4-167, Figure 4-168, and Figure 4-169.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a b < < < < < < < < & d < < < < < < < <
d
a vector unsigned char vector unsigned char vector bool char vector signed char vector signed char vector bool char
b vector unsigned char vector bool char vector unsigned char vector signed char vector bool char vector signed char
Maps to vcmpgtub. x,b,a
int
vcmpgtsb. x,b,a
Figure 4-166. All Less Than of Sixteen Integer Elements (8-bits)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-147
AltiVec Predicates
Element(R)
0
1
2
3
4
5
6
7 a b
<
<
<
< &
<
<
<
<
d
d
a vector unsigned short vector unsigned short vector bool short vector signed short vector signed short vector bool short
b vector unsigned short vector bool short vector unsigned short vector signed short vector bool short vector signed short
Maps to vcmpgtuh. x,b,a
int
vcmpgtsh. x,b,a
Figure 4-167. All Less Than of Eight Integer Elements (16-Bit)
Element(R) 0 1 2 3 a b < < & d < <
d
a vector unsigned int vector unsigned int vector bool int vector signed int vector signed int vector bool int
b vector unsigned int vector bool int vector unsigned int vector signed int vector bool int vector signed int
Maps to vcmpgtuw. x,b,a
int
vcmpgtsw. x,b,a
Figure 4-168. All Less Than of Four Integer Elements (32-Bit)
4-148
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
Element(R)
0
1
2
3 a b
d
d int
a vector float
b vector float
Maps to vcmpgtfp. x,b,a
Figure 4-169. All Less Than of Four Floating-Point Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-149
AltiVec Predicates
vec_all_nan
All Elements Not a Number
vec_all_nan
d = vec_all_nan(a)
if each ISNaN(ai) = 1, where i ranges from 0 to 3 then d 1 else d 0
The predicate vec_all_nan returns 1 if every element of a is Not a Number (NaN). Otherwise, it returns 0. The operation is independent of VSCR[NJ]. The valid argument type and corresponding result type for d = vec_all_nan(a) are shown in Figure 4-170.
Element(R) 0 1 2 3 a
ISNaN
ISNaN
ISNaN
ISNaN
& d
d int
a vector float
Maps to vcmpeqfp. x,a,a
Figure 4-170. All NaN of Four Floating-Point Elements (32-Bit)
4-150
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
vec_all_ne
All Elements Not Equal
vec_all_ne
d = vec_all_ne(a,b)
n number of elements if each ai != bi, where i ranges from 0 to n-1 then d 1 else d 0
The predicate vec_all_ne returns 1 if every element of a is not equal to (!=) the corresponding element of b. Otherwise, it returns 0. For vector float argument types, if VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid combinations of argument types and the corresponding result type for d = vec_all_ne(a,b) are shown in Figure 4-171, Figure 4-172, Figure 4-173, and Figure 4-174.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a b != != != != != != != != & d != != != != != != != !=
d
a vector unsigned char vector unsigned char vector signed char
b vector unsigned char vector bool char vector signed char vector bool char vector unsigned char vector signed char vector bool char
Maps to
int
vector signed char vector bool char vector bool char vector bool char
vcmpequb. x,a,b
Figure 4-171. All Not Equal of Sixteen Integer Elements (8-bits)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-151
AltiVec Predicates
Element(R)
0
1
2
3
4
5
6
7 a b
!=
!=
!=
!= &
!=
!=
!=
!=
d
vector unsigned short vector unsigned short vector signed short int vector signed short vector bool short vector bool short vector bool short vector pixel
vector unsigned short vector bool short vector signed short vector bool short vector unsigned short vector signed short vector bool short vector pixel vcmpequh. x,a,b
Figure 4-172. All Not Equal of Eight Integer Elements (16-Bit)
Element(R) 0 1 2 3 a b != != & d != !=
vector unsigned int vector unsigned int vector signed int int vector signed int vector bool int vector bool int vector bool int
vector unsigned int vector bool int vector signed int vector bool int vector unsigned int vector signed int vector bool int vcmpequw. x,a,b
Figure 4-173. All Not Equal of Four Integer Elements (32-Bit)
4-152
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
Element(R)
0
1
2
3 a b
!=
!= &
!=
!=
d
d int
a vector float
b vector float
Maps to vcmpeqfp. x,a,b
Figure 4-174. All Not Equal of Four Floating-Point Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-153
AltiVec Predicates
vec_all_nge
All Elements Not Greater Than or Equal
vec_all_nge
d = vec_all_nge(a,b)
if each NGE(ai, bi) = 1, where i ranges from 0 to 3 then d 1 else d 0
The predicate vec_all_nge returns 1 if every element of a is not greater than or equal to (NGE) the corresponding element of b. Otherwise, it returns 0. Not greater than or equal can mean either less than or that one of the elements is NaN. If VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid argument types and the corresponding result type for d = vec_all_nge(a,b) are shown in Figure 4-175.
Element(R) 0 1 2 3 a b NGE NGE & d NGE NGE
d int
a vector float
b vector float
Maps to vcmpgefp. x,a,b
Figure 4-175. All Not Greater Than or Equal of Four Floating-Point Elements (32-Bit)
4-154
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
vec_all_ngt
All Elements Not Greater Than
vec_all_ngt
d = vec_all_ngt(a,b)
if each NGT(ai, bi) = 1, where i ranges from 0 to 3 then d 1 else d 0
The predicate vec_all_ngt returns 1 if every element of a is not greater than (NGT) the corresponding element of b. Otherwise, it returns 0. Not greater than can either mean less than or equal to or that one of the elements is NaN. If VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid argument types and the corresponding result type for d = vec_all_ngt(a,b) is shown in Figure 4-176.
Element(R) 0 1 2 3 a b NGT NGT & d NGT NGT
d int
a vector float
b vector float
Maps to vcmpgtfp. x,a,b
Figure 4-176. All Not Greater Than of Four Floating-Point Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-155
AltiVec Predicates
vec_all_nle
All Elements Not Less Than or Equal
vec_all_nle
d = vec_all_nle(a,b)
if each NLE(ai, bi) = 1, where i ranges from 0 to 3 then d 1 else d 0
The predicate vec_all_nle returns 1 if every element of a is not less than or equal to (NLE) the corresponding element of b. Otherwise, it returns 0. Not less than or equal to can either mean greater than or that one of the elements is NaN. If VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid argument types and the corresponding result type for d = vec_all_nle(a,b) are shown in Figure 4-177.
Element(R) 0 1 2 3 a b NLE NLE & d NLE NLE
d int
a vector float
b vector float
Maps to vcmpgefp. x, b, a
Figure 4-177. All Not Less Than or Equal of Four Floating-Point Elements (32-Bit)
4-156
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
vec_all_nlt
All Elements Not Less Than
vec_all_nlt
d = vec_all_nlt(a,b)
if each NLT(ai, bi), where i ranges from 0 to 3 then d 1 else d 0
The predicate vec_all_nlt returns 1 if every element of a is not less than (NLT) the corresponding element of b. Otherwise, it returns 0. Not less than can either mean greater than or equal to or that one of the elements is NaN. If VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid argument types and the corresponding result type for d = vec_all_nlt(a,b) are shown in Figure 4-178.
Element(R) 0 1 2 3 a b NLT NLT & d NLT NLT
d int
a vector float
b vector float
Maps to vcmpgtfp. x,b,a
Figure 4-178. All Not Less Than of Four Floating-Point Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-157
AltiVec Predicates
vec_all_numeric
All Elements Numeric
vec_all_numeric
d = vec_all_numeric(a)
if each ISNUM(ai) = 1, where i ranges from 0 to 3 then d 1 else d 0
The predicate vec_all_numeric returns 1 if every element of a is numeric. Otherwise, it returns 0. The operation is independent of VSCR[NJ]. The valid argument types and the corresponding result type for d = vec_all_numeric(a) are shown in Figure 4-179.
Element(R) 0 1 2 3 a
ISNUM
ISNUM &
ISNUM
ISNUM
d
d int
a vector float
Maps to vcmpeqfp. x,a,a
Figure 4-179. All Numeric of Four Floating-Point Elements (32-Bit)
4-158
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
vec_any_eq
Any Element Equal
vec_any_eq
d = vec_any_eq(a,b)
n number of elements if any ai =int bi, where i ranges from 0 to n-1 then d 1 else d 0
The predicate vec_any_eq returns 1 if any element of a is equal to the corresponding element of b. Otherwise, it returns 0. For vector float argument types, if VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid combinations of argument types and the corresponding result type for d = vec_any_eq(a,b) are shown in Figure 4-180, Figure 4-181, Figure 4-182, and Figure 4-183.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a
b = = = = = = = = | d = = = = = = = =
d
a vector unsigned char vector unsigned char vector signed char
b vector unsigned char vector bool char vector signed char vector bool char vector unsigned char vector signed char vector bool char
Maps to
int
vector signed char vector bool char vector bool char vector bool char
vcmpequb. x,a,b
Figure 4-180. Any Equal of Sixteen Integer Elements (8-bits)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-159
AltiVec Predicates
Element(R)
0
1
2
3
4
5
6
7 a b
=
=
=
= |
=
=
=
=
d
d
a vector unsigned short vector unsigned short vector signed short
b vector unsigned short vector bool short vector signed short vector bool short vector unsigned short vector signed short vector bool short vector pixel
Maps to
int
vector signed short vector bool short vector bool short vector bool short vector pixel
vcmpequh. x,a,b
Figure 4-181. Any Equal of Eight Integer Elements (16-Bit)
Element(R) 0 1 2 3 a b = = | d = =
d
a vector unsigned int vector unsigned int vector signed int
b vector unsigned int vector bool int vector signed int vector bool int vector unsigned int vector signed int vector bool int
Maps to
int
vector signed int vector bool int vector bool int vector bool int
vcmpequw. x,a,b
Figure 4-182. Any Equal of Four Integer Elements (32-Bit)
4-160
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
Element(R)
0
1
2
3 a b
=fp
=fp |
=fp
=fp
d
d int
a vector float
b vector float
Maps to vcmpeqfp. x,a,b
Figure 4-183. Any Equal of Four Floating-Point Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-161
AltiVec Predicates
vec_any_ge
Any Element Greater Than or Equal
vec_any_ge
d = vec_any_ge(a,b)
n number of elements if any ai bi, where i ranges from 0 to n-1 then d 1 else d 0
The predicate vec_any_ge returns 1 if any element of a is greater than or equal to the corresponding element of b. Otherwise, it returns 0. For vector float argument types, if VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid combinations of argument types and the corresponding result type for d = vec_any_ge(a,b) are shown in Figure 4-184, Figure 4-185, Figure 4-186, and Figure 4-187.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a b | d
d
a vector unsigned char vector unsigned char vector bool char vector signed char vector signed char vector bool char
b vector unsigned char vector bool char vector unsigned char vector signed char vector bool char vector signed char
Maps to vcmpgtub. x,b,a
int
vcmpgtsb. x,b,a
Figure 4-184. Any Greater Than or Equal of Sixteen Integer Elements (8-bits)
4-162
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
Element(R)
0
1
2
3
4
5
6
7 a b

|
d
d
a vector unsigned short vector unsigned short vector bool short vector signed short vector signed short vector bool short
b vector unsigned short vector bool short vector unsigned short vector signed short vector bool short vector signed short
Maps to vcmpgtuh. x,b,a
int
vcmpgtsh. x,b,a
Figure 4-185. Any Greater Than or Equal of Eight Integer Elements (16-Bit)
Element(R) 0 1 2 3 a b | d
d
a vector unsigned int vector unsigned int vector bool int vector signed int vector signed int vector bool int
b vector unsigned int vector bool int vector unsigned int vector signed int vector bool int vector signed int
Maps to vcmpgtuw. x,b,a
int
vcmpgtsw. x,b,a
Figure 4-186. Any Greater Than or Equal of Four Integer Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-163
AltiVec Predicates
Element(R)
0
1
2
3 a b
fp
fp |
fp
fp
d
d int
a vector float
b vector float
Maps to vcmpgefp. x,a,b
Figure 4-187. Any Greater Than or Equal of Four Floating-Point Elements (32-Bit)
4-164
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
vec_any_gt
Any Element Greater Than
vec_any_gt
d = vec_any_gt(a,b)
n number of elements if any ai > bi, where i ranges from 0 to n-1 then d 1 else d 0
The predicate vec_any_gt returns 1 if any element of a is greater than the corresponding element of b. Otherwise, it returns 0. For vector float argument types, if VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid combinations of argument types and the corresponding result type for d = vec_any_gt(a,b) are shown in Figure 4-188, Figure 4-189, Figure 4-190, and Figure 4-191.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a b > > > > > > > > | d > > > > > > > >
d
a vector unsigned char vector unsigned char vector bool char vector signed char vector signed char vector bool char
b vector unsigned char vector bool char vector unsigned char vector signed char vector bool char vector signed char
Maps to vcmpgtub. x,a,b
int
vcmpgtsb. x,a,b
Figure 4-188. Any Greater Than of Sixteen Integer Elements (8-bits)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-165
AltiVec Predicates
Element(R)
0
1
2
3
4
5
6
7 a b
>
>
>
> |
>
>
>
>
d
d
a vector unsigned short vector unsigned short vector bool short vector signed short vector signed short vector bool short
b vector unsigned short vector bool short vector unsigned short vector signed short vector bool short vector signed short
Maps to vcmpgtuh. x,a,b
int
vcmpgtsh. x,a,b
Figure 4-189. Any Greater Than of Eight Integer Elements (16-Bit)
Element(R) 0 1 2 3 a b > > | d > >
d
a vector unsigned int vector unsigned int vector bool int vector signed int vector signed int vector bool int
b vector unsigned int vector bool int vector unsigned int vector signed int vector bool int vector signed int
Maps to vcmpgtuw. x,a,b
int
vcmpgtsw. x,a,b
Figure 4-190. Any Greater Than of Four Integer Elements (32-Bit)
4-166
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
Element(R)
0
1
2
3 a b
>fp
>fp |
>fp
>fp
d
d int
a vector float
b vector float
Maps to vcmpgtfp. x,a,b
Figure 4-191. Any Greater Than of Four Floating-Point Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-167
AltiVec Predicates
vec_any_le
Any Element Less Than or Equal
vec_any_le
d = vec_any_le(a,b)
n number of elements if any ai bi, where i ranges from 0 to n-1 then d 1 else d 0
The predicate vec_any_le returns 1 if any element of a is less than or equal to the corresponding element of b. Otherwise, it returns 0. For vector float argument types, if VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid combinations of argument types and the corresponding result type for d = vec_any_le(a,b) are shown in Figure 4-192, Figure 4-193, Figure 4-194, and Figure 4-195.
Element(R) 0
1
2
3
4
5
6
7
8
9
10
11
12
13 14
15 a b

|

d
d
a vector unsigned char vector unsigned char vector bool char vector signed char vector signed char vector bool char
b vector unsigned char vector bool char vector unsigned char vector signed char vector bool char vector signed char
Maps to vcmpgtub. x,a,b
int
vcmpgtsb. x,a,b
Figure 4-192. Any Less Than or Equal of Sixteen Integer Elements (8-bits)
4-168
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
Element(R)
0
1
2
3
4
5
6
7 a b
|

d
d
a vector unsigned short vector unsigned short vector bool short vector signed short vector signed short vector bool short
b vector unsigned short vector bool short vector unsigned short vector signed short vector bool short vector signed short
Maps to vcmpgtuh. x,a,b
int
vcmpgtsh. x,a,b
Figure 4-193. Any Less Than or Equal of Eight Integer Elements (16-Bit)
Element(R) 0 1 2 3 a b | d
d
a vector unsigned int vector unsigned int vector bool int vector signed int vector signed int vector bool int
b vector unsigned int vector bool int vector unsigned int vector signed int vector bool int vector signed int
Maps to vcmpgtuw. x,a,b
int
vcmpgtsw. x,a,b
Figure 4-194. Any Less Than or Equal of Four Integer Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-169
AltiVec Predicates
Element(R)
0
1
2
3 a b
fp
fp |
fp
fp
d
d int
a vector float
b vector float
Maps to vcmpgefp. x,b,a
Figure 4-195. Any Less Than or Equal of Four Floating-Point Elements (32-Bit)
4-170
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
vec_any_lt
Any Element Less Than
vec_any_lt
d = vec_any_lt(a,b)
n number of elements if any ai < bi, where i ranges from 0 to n-1 then d 1 else d 0
The predicate vec_any_lt returns 1 if any element of a is less than the corresponding element of b. Otherwise, it returns 0. For vector float argument types, if VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid combinations of argument types and the corresponding result type for d = vec_any_lt(a,b) are shown in Figure 4-196, Figure 4-197, Figure 4-198, and Figure 4-199.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a b < < < < < < < < | d < < < < < < < <
d
a vector unsigned char vector unsigned char vector bool char vector signed char vector signed char vector bool char
b vector unsigned char vector bool char vector unsigned char vector signed char vector bool char vector signed char
Maps to vcmpgtub. x,b,a
int
vcmpgtsb. x,b,a
Figure 4-196. Any Less Than of Sixteen Integer Elements (8-bits)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-171
AltiVec Predicates
Element(R)
0
1
2
3
4
5
6
7 a b
<
<
<
< |
<
<
<
<
d
d
a vector unsigned short vector unsigned short vector bool short vector signed short vector signed short vector bool short
b vector unsigned short vector bool short vector unsigned short vector signed short vector bool short vector signed short
Maps to vcmpgtuh. x,b,a
int
vcmpgtsh. x,b,a
Figure 4-197. Any Less Than of Eight Integer Elements (16-Bit)
Element(R) 0 1 2 3 a b < < | d < <
d
a vector unsigned int vector unsigned int vector bool int vector signed int vector signed int vector bool int
b vector unsigned int vector bool int vector unsigned int vector signed int vector bool int vector signed int
Maps to vcmpgtuw. x,b,a
int
vcmpgtsw. x,b,a
Figure 4-198. Any Less Than of Four Integer Elements (32-Bit)
4-172
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
Element(R)
0
1
2
3 a b
d
d int
a vector float
b vector float
Maps to vcmpgtfp. x,b,a
Figure 4-199. Any Less Than of Four Floating-Point Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-173
AltiVec Predicates
vec_any_nan
Any Element Not a Number
vec_any_nan
d = vec_any_nan(a)
if any ISNaN(ai) = 1, where i ranges from 0 to 3 then d 1 else d 0
The predicate vec_any_nan returns 1 if any element of a is Not a Number (NaN). Otherwise, it returns 0. The operation is independent of VSCR[NJ]. The valid argument type and corresponding result type for d = vec_any_nan(a) are shown in Figure 4-200.
Element(R) 0 1 2 3 a
ISNaN
ISNaN |
ISNaN
ISNaN
d
d int
a vector float
Maps to vcmpeqfp. x,a,a
Figure 4-200. Any NaN of Four Floating-Point Elements (32-Bit)
4-174
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
vec_any_ne
Any Element Not Equal
vec_any_ne
d = vec_any_ne(a,b)
n number of elements if any ai != bi, where i ranges from 0 to n-1 then d 1 else d 0
The predicate vec_any_ne returns 1 if any element of a is not equal to (!=) the corresponding element of b. Otherwise, it returns 0. For vector float argument types, if VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid combinations of argument types and the corresponding result types for d = vec_any_ne(a,b) are shown in Figure 4-201, Figure 4-202, Figure 4-203, and Figure 4-204.
Element(R) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a b != != != != != != != != | d != != != != != != != !=
d
a vector unsigned char vector unsigned char vector signed char
b vector unsigned char vector bool char vector signed char vector bool char vector unsigned char vector signed char vector bool char
Maps to
int
vector signed char vector bool char vector bool char vector bool char
vcmpequb. x,a,b
Figure 4-201. Any Not Equal of Sixteen Integer Elements (8-bits)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-175
AltiVec Predicates
Element(R)
0
1
2
3
4
5
6
7 a b
!=
!=
!=
!= |
!=
!=
!=
!=
d
d
a vector unsigned short vector unsigned short vector signed short
b vector unsigned short vector bool short vector signed short vector bool short vector unsigned short vector signed short vector bool short vector pixel
Maps to
int
vector signed short vector bool short vector bool short vector bool short vector pixel
vcmpequh. x,a,b
Figure 4-202. Any Not Equal of Eight Integer Elements (16-Bit)
Element(R) 0 1 2 3 a b != != | d != !=
d
a vector unsigned int vector unsigned int vector signed int
b vector unsigned int vector bool int vector signed int vector bool int vector unsigned int vector signed int vector bool int
Maps to
int
vector signed int vector bool int vector bool int vector bool int
vcmpequw. x,a,b
Figure 4-203. Any Not Equal of Four Integer Elements (32-Bit)
4-176
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
Element(R)
0
1
2
3 a b
!=
!= |
!=
!=
d
d int
a vector float
b vector float
Maps to vcmpeqfp. x,a,b
Figure 4-204. Any Not Equal of Four Floating-Point Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-177
AltiVec Predicates
vec_any_nge
Any Element Not Greater Than or Equal
vec_any_nge
d = vec_any_nge(a,b)
if any NGE(ai, bi) = 1, where i ranges from 0 to 3 then d 1 else d 0
The predicate vec_any_nge returns 1 if any element of a is not greater than or equal to (NGE) the corresponding element of b. Otherwise, it returns 0. Not greater than or equal can either mean less than or that one of the elements is NaN. If VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid combination of argument types and the corresponding result type for
d = vec_any_nge(a,b) are shown in Figure 4-205.
Element(R) 0 1 2 3 a b NGE NGE | d NGE NGE
d int
a vector float
b vector float
Maps to vcmpgefp. x,a,b
Figure 4-205. Any Not Greater Than or Equal of Four Floating-Point Elements (32-Bit)
4-178
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
vec_any_ngt
Any Element Not Greater Than
vec_any_ngt
d = vec_any_ngt(a,b)
if any NGT(ai, bi) = 1, where i ranges from 0 to 3 then d 1 else d 0
The predicate vec_any_ngt returns 1 if any element of a is not greater than (NGT) the corresponding element of b. Otherwise, it returns 0. Not greater than can either mean less than or equal to or that one of the elements is NaN. If VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid combination of argument types and the corresponding result type for d = vec_any_ngt(a,b) are shown in Figure 4-206.
Element(R) 0 1 2 3 a b NGT NGT | d NGT NGT
d int
a vector float
b vector float
Maps to vcmpgtfp. x,a,b
Figure 4-206. Any Not Greater Than of Four Floating-Point Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-179
AltiVec Predicates
vec_any_nle
Any Element Not Less Than or Equal
vec_any_nle
d = vec_any_nle(a,b)
if any NLE(ai, bi) = 1, where i ranges from 0 to 3 then d 1 else d 0
The predicate vec_any_nle returns 1 if any element of a is not less than or equal to (NLE) the corresponding element of b. Otherwise, it returns 0. Not less than or equal to can either mean greater than or that one of the elements is NaN. If VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid combination of argument types and the corresponding result type for
d = vec_any_nle(a,b) are shown in Figure 4-207.
Element(R) 0 1
2
3 a b
NLE
NLE |
NLE
NLE
d
d int
a vector float
b vector float
Maps to vcmpgefp. x,b,a
Figure 4-207. Any Not Less Than or Equal of Four Floating-Point Elements (32-Bit)
4-180
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
vec_any_nlt
Any Element Not Less Than
vec_any_nlt
d = vec_any_nlt(a,b)
if any NLT(ai, bi) = 1, where i ranges from 0 to 3 then d 1 else d 0
The predicate vec_any_nlt returns 1 if any element of a is not less than (NLT) the corresponding element of b. Otherwise, it returns 0. Not less than can either mean greater than or equal to or that one of the elements is NaN. If VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid combination of argument types and the corresponding result type for d = vec_any_nlt(a,b) are shown in Figure 4-208.
Element(R) 0 1 2 3 a b NLT NLT | d NLT NLT
d int
a vector float
b vector float
Maps to vcmpgtfp. x,b,a
Figure 4-208. Any Not Less Than of Four Floating-Point Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-181
AltiVec Predicates
vec_any_numeric
Any Element Numeric
vec_any_numeric
d = vec_any_numeric(a)
if any ISNUM(ai) = 1, where i ranges from 0 to 3 then d 1 else d 0
The predicate vec_any_numeric returns 1 if any element of a is numeric. Otherwise, it returns 0. The operation is independent of VSCR[NJ]. The valid argument type and the corresponding result type for d = vec_any_numeric(a) are shown in Figure 4-209.
Element(R) 0 1 2 3 a
ISNUM
ISNUM
ISNUM
ISNUM
| d
d int
a vector float
Maps to vcmpeqfp. x,a,a
Figure 4-209. Any Numeric of Four Floating-Point Elements (32-Bit)
4-182
AltiVec Technology Programming Interface Manual
MOTOROLA
AltiVec Predicates
vec_any_out
Any Element Out of Bounds
vec_any_out
d = vec_any_out(a,b)
if any NLE(ai, bi) = 1 or any NGE(ai, -bi) = 1, where i ranges from 0 to 3 then d 1 else d 0
The predicate vec_any_out returns 1 if any element of a is greater than the corresponding element of b (high bound) or is less than the negative (NEG) of the corresponding element of b (low bound). Otherwise, it returns 0. If VSCR[NJ] = 1, every denormalized oating-point operand element is truncated to 0 before the comparison. The valid combination of argument types and the corresponding result type for d = vec_any_out(a,b) are shown in Figure 4-210.
Element(R) 0 1 2 3 a b
NLE NEG NLE NEG NLE NEG NLE NEG
temp (-b)
NGE NGE NGE NGE
| d
d int
a vector float
b vector float
Maps to vcmpbfp. x,a,b
Figure 4-210. Any Out of Bounds of Four Floating-Point Elements (32-Bit)
MOTOROLA
Chapter 4. AltiVec Operations and Predicates
4-183
AltiVec Predicates
4-184
AltiVec Technology Programming Interface Manual
MOTOROLA
Appendix A AltiVec Instruction Set/Operation/Predicate CrossReference
A0 A0
This appendix cross-references the instruction set for the AltiVec technology, the AltiVec vector operations, and the AltiVec predicates. Table A-1 lists the instructions and the alternate vector operation form cross-referenced to the vector operations and predicates.
Table A-1. Instructions to Operations/Predicates Cross-Reference
AltiVec Instruction dss dssall dst dstst dststt dstt lvebx lvehx lvewx lvsl lvsr lvx lvxl mfvscr mtvscr stvebx stvehx stvewx Specic Operation vec_dss vec_dssall vec_dst vec_dstst vec_dststt vec_dstt vec_lvebx vec_lvehx vec_lvewx vec_lvsl vec_lvsr vec_lvx vec_lvxl vec_mfvscr vec_mtvscr vec_stvebx vec_stvehx vec_stvewx Generic Operation/Predicate vec_dss vec_dssall vec_dst vec_dstst vec_dststt vec_dstt vec_lde vec_lde vec_lde vec_lvsl vec_lvsr vec_ld vec_lvxl vec_mfvscr vec_mtvscr vec_ste vec_ste vec_ste
MOTOROLA
Appendix A. AltiVec Instruction Set/Operation/Predicate Cross-Reference
A-1
Table A-1. Instructions to Operations/Predicates Cross-Reference (Continued)
AltiVec Instruction stvx stvxl vaddcuw vaddfp vaddsbs vaddshs vaddsws vaddubm vaddubs vadduhm vadduhs vadduwm vadduws vand vandc vavgsb vavgsh vavgsw vavgub vavguh vavguw vcfsx vcfux vcmpbfpx vcmpbfp. vcmpeqfx vcmpeqfp. Specic Operation vec_stvx vec_stvxl vec_vaddcuw vec_vaddfp vec_vaddsbs vec_vaddshs vec_vaddsws vec_vaddubm vec_vaddubs vec_vadduhm vec_vadduhs vec_vadduwm vec_vadduws vec_vand vec_vandc vec_vavgsb vec_vavgsh vec_vavgsw vec_vavgub vec_vavguh vec_vavguw vec_vcfsx vec_vcfux vec_vcmpbfpx -- vec_vcmpeqfx -- Generic Operation/Predicate vec_st vec_stl vec_addc vec_add vec_adds vec_adds vec_adds vec_add vec_adds vec_add vec_adds vec_add vec_adds vec_and vec_andc vec_avg vec_avg vec_avg vec_avg vec_avg vec_avg vec_ctf vec_ctf vec_cmpb vec_all_in, vec_any_out vec_cmpeq vec_all_eq, vec_all_nan, vec_all_ne, vec_all_numeric, vec_any_eq, vec_any_nan, vec_any_ne, vec_any_numeric vec_cmpeq vec_all_eq, vec_all_ne, vec_any_eq, vec_any_ne vec_cmpeq
vcmpequbx vcmpequb. vcmpequhx
vec_vcmpequbx -- vec_vcmpequhx
A-2
AltiVec Technology Programming Interface Manual
MOTOROLA
Table A-1. Instructions to Operations/Predicates Cross-Reference (Continued)
AltiVec Instruction vcmpequh. vcmpequwx vcmpequw. vcmpgefpx vcmpgefp. Specic Operation -- vec_vcmpequwx -- vec_vcmpgefpx -- Generic Operation/Predicate vec_all_eq, vec_all_ne, vec_any_eq, vec_any_ne vec_cmpeq vec_all_eq, vec_all_ne, vec_any_eq, vec_any_ne vec_cmpge, vec_cmple vec_all_ge, vec_all_le, vec_all_nge, vec_all_nle, vec_any_ge, vec_any_le, vec_any_nge, vec_any_nle vec_cmpgt, vec_cmplt vec_all_gt, vec_all_lt, vec_all_ngt, vec_all_nlt, vec_any_gt, vec_any_lt, vec_any_ngt, vec_any_nlt vec_cmpgt, vec_cmplt vec_all_ge, vec_all_gt, vec_all_le, vec_all_lt, vec_any_ge, vec_any_gt, vec_any_le, vec_any_lt vec_cmpgt, vec_cmplt vec_all_ge, vec_all_gt, vec_all_le, vec_all_lt, vec_any_ge, vec_any_gt, vec_any_le, vec_any_lt vec_cmpgt, vec_cmplt vec_all_ge, vec_all_gt, vec_all_le, vec_all_lt, vec_any_ge, vec_any_gt, vec_any_le, vec_any_lt vec_cmpgt, vec_cmplt vec_all_ge, vec_all_gt, vec_all_le, vec_all_lt, vec_any_ge, vec_any_gt, vec_any_le, vec_any_lt vec_cmpgt, vec_cmplt vec_all_ge, vec_all_gt, vec_all_le, vec_all_lt, vec_any_ge, vec_any_gt, vec_any_le, vec_any_lt vec_cmpgt, vec_cmplt vec_all_ge, vec_all_gt, vec_all_le, vec_all_lt, vec_any_ge, vec_any_gt, vec_any_le, vec_any_lt vec_cts vec_ctu vec_expte
vcmpgtfpx vcmpgtfp.
vec_vcmpgtfpx --
vcmpgtsbx vcmpgtsb.
vec_vcmpgtsbx --
vcmpgtshx vcmpgtsh.
vec_vcmpgtshx --
vcmpgtswx vcmpgtsw.
vec_vcmpgtswx --
vcmpgtubx vcmpgtub.
vec_vcmpgtubx --
vcmpgtuhx vcmpgtuh.
vec_vcmpgtuhx --
vcmpgtuwx vcmpgtuw.
vec_vcmpgtuwx --
vctsxs vctuxs vexptefp
vec_vctsxs vec_vctuxs vec_vexptefp
MOTOROLA
Appendix A. AltiVec Instruction Set/Operation/Predicate Cross-Reference
A-3
Table A-1. Instructions to Operations/Predicates Cross-Reference (Continued)
AltiVec Instruction vlogefp vmaddfp vmaxfp vmaxsb vmaxsh vmaxsw vmaxub vmaxuh vmaxuw vmhaddshs vmhraddshs vminfp vminsb vminsh vminsw vminub vminuh vminuw vmladduhm vmrghb vmrghh vmrghw vmrglb vmrglh vmrglw vmsummbm vmsumshm vmsumshs vmsumubm vmsumuhm vmsumuhs vmulesb Specic Operation vec_vlogefp vec_vmaddfp vec_vmaxfp vec_vmaxsb vec_vmaxsh vec_vmaxsw vec_vmaxub vec_vmaxuh vec_vmaxuw vec_vmhaddshs vec_vmhraddshs vec_vminfp vec_vminsb vec_vminsh vec_vminsw vec_vminub vec_vminuh vec_vminuw vec_vmladduhm vec_vmrghb vec_vmrghh vec_vmrghw vec_vmrglb vec_vmrglh vec_vmrglw vec_vmsummbm vec_vmsumshm vec_vmsumshs vec_vmsumubm vec_vmsumuhm vec_vmsumuhs vec_vmulesb Generic Operation/Predicate vec_loge vec_madd vec_max vec_max vec_max vec_max vec_max vec_max vec_max vec_madds vec_mradds vec_min vec_min vec_min vec_min vec_min vec_min vec_min vec_mladd vec_mergeh vec_mergeh vec_mergeh vec_mergel vec_mergel vec_mergel vec_msum vec_msum vec_msums vec_msum vec_msum vec_msums vec_mule
A-4
AltiVec Technology Programming Interface Manual
MOTOROLA
Table A-1. Instructions to Operations/Predicates Cross-Reference (Continued)
AltiVec Instruction vmulesh vmuleub vmuleuh vmulosb vmulosh vmuloub vmulouh vnmsubfp vnor vor vperm vpkpx vpkshss vpkshus vpkswss vpkswus vpkuhum vpkuhus vpkuwum vpkuwus vrefp vrfim vrfin vrfip vrfiz vrlb vrlh vrlw vrsqrtefp vsel vsl vslb Specic Operation vec_vmulesh vec_vmuleub vec_vmuleuh vec_vmulosb vec_vmulosh vec_vmuloub vec_vmulouh vec_vnmsubfp vec_vnor vec_vor vec_vperm vec_vpkpx vpkshss vec_vpkshus vec_vpkswss vec_vpkswus vec_vpkuhum vec_vpkuhus vec_vpkuwum vec_vpkuwus vec_vrefp vec_vrfim vec_vrfin vec_vrfip vec_vrfiz vec_vrlb vec_vrlh vec_vrlw vec_vrsqrtefp vec_vsel vec_vsl vec_vslb Generic Operation/Predicate vec_mule vec_mule vec_mule vec_mulo vec_mulo vec_mulo vec_mulo vec_nmsub vec_nor vec_or vec_perm vec_packpx vec_packs vec_packsu vec_packs vec_packsu vec_pack vec_packs, vec_packsu vec_pack vec_packs, vec_packsu vec_re vec_floor vec_round vec_ceil vec_trunc vec_rl vec_rl vec_rl vec_rsqrte vec_sel vec_sll vec_sl
MOTOROLA
Appendix A. AltiVec Instruction Set/Operation/Predicate Cross-Reference
A-5
Table A-1. Instructions to Operations/Predicates Cross-Reference (Continued)
AltiVec Instruction vsldoi vslh vslo vslw vspltb vsplth vspltisb vspltish vspltisw vspltw vsr vsrab vsrah vsraw vsrb vsrh vsro vsrw vsubcuw vsubfp vsubsbs vsubshs vsubsws vsububm vsububs vsubuhm vsubuhs vsubuwm vsubuws vsumsws vsum2sws vsum4sbs Specic Operation vec_vsldoi vec_vslh vec_vslo vec_vslw vec_vspltb vec_vsplth vec_vspltisb vec_vspltish vec_vspltisw vec_vspltw vec_vsr vec_vsrab vec_vsrah vec_vsraw vec_vsrb vec_vsrh vec_vsro vec_vsrw vec_vsubcuw vec_vsubfp vec_vsubsbs vec_vsubshs vec_vsubsws vec_vsububm vec_vsububs vec_vsubuhm vec_vsubuhs vec_vsubuwm vec_vsubuws vec_vsumsws vec_vsum2sws vec_vsum4sbs Generic Operation/Predicate vec_sld vec_sl vec_slo vec_sl vec_splat vec_splat vec_splat_s8, vec_splat_u8 vec_splat_s16, vec_splat_u16 vec_splat_s32, vec_splat_u32 vec_splat vec_srl vec_sra vec_sra vec_sra vec_sr vec_sr vec_sro vec_sr vec_subc vec_sub vec_subs vec_subs vec_subs vec_sub vec_subs vec_sub vec_subs vec_sub vec_subs vec_sums vec_sum2s vec_sum4s
A-6
AltiVec Technology Programming Interface Manual
MOTOROLA
Table A-1. Instructions to Operations/Predicates Cross-Reference (Continued)
AltiVec Instruction vsum4shs vsum4ubs vupkhpx vupkhsb vupkhsh vupklpx vupklsb vupklsh vxor Specic Operation vec_vsum4shs vec_vsum4ubs vec_vupkhpx vec_vupkhsb vec_vupkhsh vec_vupklpx vec_vupklsb vec_vupklsh vec_vxor Generic Operation/Predicate vec_sum4s vec_sum4s vec_unpackh vec_unpackh vec_unpackh vec_unpackl vec_unpackl vec_unpackl vec_xor
Table A-2 lists the vector operations cross-referenced to the AltiVec instructions.
Table A-2. Operations to Instructions Cross-Reference
Specic Operation vec_abs AltiVec Instruction(s) vspltisb, vsububm, vmaxsb vspltisb, vsubuhm, vmaxsh vspltisb, vsubuwm, vmaxsw vspltisw, vslw, vandc vec_abss vspltisb, vsubsbs, vmaxsb vspltisb, vsubshs, vmaxsh vspltisb, vsubsws, vmaxsw vec_add vaddfp vaddubm vadduhm vadduwm vec_addc vec_adds vaddcuw vaddsbs vaddshs vaddsws vaddubs vadduhs vadduws vec_and vand
MOTOROLA
Appendix A. AltiVec Instruction Set/Operation/Predicate Cross-Reference
A-7
Table A-2. Operations to Instructions Cross-Reference (Continued)
Specic Operation vec_andc vec_avg AltiVec Instruction(s) vandc vavgsb vavgsh vavgsw vavgub vavguh vavguw vec_ceil vec_cmpb vec_cmpeq vrfip vcmpbfpx vcmpeqfx vcmpequbx vcmpequhx vcmpequwx vec_cmpge vec_cmpgt vcmpgefpx vcmpgtfpx vcmpgtsbx vcmpgtshx vcmpgtswx vcmpgtubx vcmpgtuhx vcmpgtuwx vec_cmple vec_cmplt vcmpgefpx vcmpgtfpx vcmpgtsbx vcmpgtshx vcmpgtswx vcmpgtubx vcmpgtuhx vcmpgtuwx vec_ctf vcfsx vcfux vec_cts vctsxs
A-8
AltiVec Technology Programming Interface Manual
MOTOROLA
Table A-2. Operations to Instructions Cross-Reference (Continued)
Specic Operation vec_ctu vec_dss vec_dssall vec_dst vec_dstst vec_dststt vec_dstt vec_expte vec_floor vec_ld vec_lde AltiVec Instruction(s) vctuxs dss dssall dst dstst dststt dstt vexptefp vrfim lvx lvebx lvehx lvewx vec_ldl vec_loge vec_lvsl vec_lvsr vec_madd vec_madds vec_max lvxl vlogefp lvsl lvsr vmaddfp vmhaddshs vmaxfp vmaxsb vmaxsh vmaxsw vmaxub vmaxuh vmaxuw vec_mergeh vmrghw vmrghb vmrghh vec_mergel vmrglw vmrglb vmrglh
MOTOROLA
Appendix A. AltiVec Instruction Set/Operation/Predicate Cross-Reference
A-9
Table A-2. Operations to Instructions Cross-Reference (Continued)
Specic Operation vec_mfvscr vec_min AltiVec Instruction(s) mfvscr vminfp vminsb vminsh vminsw vminub vminuh vminuw vec_mladd vec_mradds vec_msum vmladduhm vmhraddshs vmsummbm vmsumshm vmsumubm vmsumuhm vec_msums vec_msums vec_mtvscr vec_mule vmsumshs vmsumuhs mtvscr vmulesb vmulesh vmuleub vmuleuh vec_mulo vmulosb vmulosh vmuloub vmulouh vec_nmsub vec_nor vec_or vec_pack vnmsubfp vnor vor vpkuhum vpkuwum vec_packpx vpkpx
A-10
AltiVec Technology Programming Interface Manual
MOTOROLA
Table A-2. Operations to Instructions Cross-Reference (Continued)
Specic Operation vec_packs AltiVec Instruction(s) vpkshss vpkswss vpkuhus vpkuwus vec_packsu vpkuhus vpkuwus vpkshus vpkswus vec_perm vec_re vec_rl vperm vrefp vrlb vrlh vrlw vec_round vec_rsqrte vec_sel vec_sl vrfin vrsqrtefp vsel vslb vslh vslw vec_sld vec_sll vec_slo vec_splat vsldoi vsl vslo vspltb vsplth vspltw vec_splat_s16 vec_splat_s32 vec_splat_s8 vec_splat_u16 vec_splat_u32 vec_splat_u8 vspltish vspltisw vspltisb vspltish vspltisw vspltisb
MOTOROLA
Appendix A. AltiVec Instruction Set/Operation/Predicate Cross-Reference
A-11
Table A-2. Operations to Instructions Cross-Reference (Continued)
Specic Operation vec_sr AltiVec Instruction(s) vsrb vsrh vsrw vec_sra vsrab vsrah vsraw vec_srl vec_sro vec_st vec_ste vsr vsro stvx stvebx stvehx stvewx vec_stl vec_sub stvxl vsubfp vsububm vsubuhm vsubuwm vec_subc vec_subs vsubcuw vsubsbs vsubshs vsubsws vsububs vsubuhs vsubuws vec_sum2s vec_sum4s vsum2sws vsum4sbs vsum4shs vsum4ubs vec_sums vec_trunc vsumsws vrfiz
A-12
AltiVec Technology Programming Interface Manual
MOTOROLA
Table A-2. Operations to Instructions Cross-Reference (Continued)
Specic Operation vec_unpackh AltiVec Instruction(s) vupkhpx vupkhsb vupkhsh vec_unpackl vupklpx vupklsb vupklsh vec_xor vxor
MOTOROLA
Appendix A. AltiVec Instruction Set/Operation/Predicate Cross-Reference
A-13
Table A-3 lists the predicates cross-referenced to the AltiVec instructions.
Table A-3. Predicate to Instruction Cross-Reference
Predicate vec_all_eq AltiVec Instruction vcmpeqfp. vcmpequb. vcmpequh. vcmpequw. vec_all_ge vcmpgtsb. vcmpgtsh. vcmpgtsw. vcmpgtub. vcmpgtuh. vcmpgtuw. vcmpgefp. vec_all_gt vcmpgtsb. vcmpgtsh. vcmpgtsw. vcmpgtub. vcmpgtuh. vcmpgtuw. vcmpgtfp. vec_all_in vec_all_le vcmpbfp. vcmpgtsb. vcmpgtsh. vcmpgtsw. vcmpgtub. vcmpgtuh. vcmpgtuw. vcmpgefp.
A-14
AltiVec Technology Programming Interface Manual
MOTOROLA
Table A-3. Predicate to Instruction Cross-Reference (Continued)
Predicate vec_all_lt AltiVec Instruction vcmpgtsb. vcmpgtsh. vcmpgtsw. vcmpgtub. vcmpgtuh. vcmpgtuw. vcmpgtfp. vec_all_nan vec_all_ne vcmpeqfp. vcmpeqfp. vcmpequb. vcmpequh. vcmpequw. vec_all_nge vec_all_ngt vec_all_nle vec_all_nlt vec_all_numeric vec_any_eq vcmpgefp. vcmpgtfp. vcmpgefp. vcmpgtfp. vcmpeqfp. vcmpeqfp. vcmpequb. vcmpequh. vcmpequw. vec_any_ge vcmpgtsb. vcmpgtsh. vcmpgtsw. vcmpgtub. vcmpgtuh. vcmpgtuw. vcmpgefp.
MOTOROLA
Appendix A. AltiVec Instruction Set/Operation/Predicate Cross-Reference
A-15
Table A-3. Predicate to Instruction Cross-Reference (Continued)
Predicate vec_any_gt AltiVec Instruction vcmpgtsb. vcmpgtsh. vcmpgtsw. vcmpgtub. vcmpgtuh. vcmpgtuw. vcmpgtfp. vec_any_le vcmpgtsb. vcmpgtsh. vcmpgtsw. vcmpgtub. vcmpgtuh. vcmpgtuw. vcmpgefp. vec_any_lt vcmpgtsb. vcmpgtsh. vcmpgtsw. vcmpgtub. vcmpgtuh. vcmpgtuw. vcmpgtfp. vec_any_nan vec_any_ne vcmpeqfp. vcmpeqfp. vcmpequb. vcmpequh. vcmpequw. vec_any_nge vec_any_ngt vec_any_nle vec_any_nlt vec_any_numeric vec_any_out vcmpgefp. vcmpgtfp. vcmpgefp. vcmpgtfp. vcmpeqfp. vcmpbfp.
A-16
AltiVec Technology Programming Interface Manual
MOTOROLA
Glossary of Terms and Abbreviations
The glossary contains an alphabetical list of terms, phrases, and abbreviations used in this book. Some of the terms and denitions included in the glossary are reprinted from IEEE Std. 754-1985, IEEE Standard for Binary Floating-Point Arithmetic, copyright (c)1985 by the Institute of Electrical and Electronics Engineers, Inc. with the permission of the IEEE. Note that some terms are dened in the context of how they are used in this book.
A
Architecture. A detailed specication of requirements for a processor or computer system. It does not specify details of how the processor or computer system must be implemented; instead it provides a template for a family of compatible implementations. Biased exponent. An exponent whose range of values is shifted by a constant (bias). Typically a bias is provided to allow a range of positive values to express a range that includes both positive and negative values. Big-endian. A byte-ordering method in memory where the address n of a word corresponds to the most-signicant byte. In an addressed memory word, the bytes are ordered (left to right) 0, 1, 2, 3, with 0 being the most-signicant byte. See Little-endian.
B
C
Cache. High-speed memory component containing recently-accessed data and/or instructions (subset of main memory). Cast. A cast expression consists of a left parenthesis, a type name, a right parenthesis, and an operand expression. The cast causes the operand value to be converted to the type name within the parentheses.
D
Denormalized number. A nonzero oating-point number whose exponent has a reserved value, usually the format's minimum, and whose explicit or implicit leading signicand bit is zero.
MOTOROLA
Glossary of Terms and Abbreviations
Glossary-1
E
Effective address (EA). The 32- or 64-bit address specied for a load, store, or an instruction fetch. This address is then submitted to the MMU for translation to either a physical memory address or an I/O address. Exponent. In the binary representation of a oating-point number, the exponent is the component that normally signies the integer power to which the value two is raised in determining the value of the represented number. See also Biased exponent.
F
Floating-point register (FPR). Any of the 32 registers in the oating-point register le. These registers provide the source operands and destination results for oating-point instructions. Load instructions move data from memory to FPRs and store instructions move data from FPRs to memory. The FPRs are 64 bits wide and store oatingpoint vlaues in double-precision format. Fraction. In the binary representation of a oating-point number, the eld of the signicand that lies to the right of its implied binary point.
G
General-purpose register (GPR). Any of the 32 registers in the generalpurpose register le. These registers provide the source operands and destination results for all integer data manipulation instructions. Integer load instructions move data from memory to GPRs and store instructions move data from GPRs to memory. IEEE 754. A standard written by the Institute of Electrical and Electronics Engineers that denes operations and representations of binary oating-point arithmetic. Inexact. Loss of accuracy in an arithmetic operation when the rounded result differs from the innitely precise value with unbounded range.
H I
L
Least-signicant bit (lsb). The bit of least value in an address, register, data element, or instruction encoding. Little-endian. A byte-ordering method in memory where the address n of a word corresponds to the least-signicant byte. In an addressed memory word, the bytes are ordered (left to right) 3, 2, 1, 0, with 3 being the most-signicant byte. See Big-endian.
M
Glossary-2
Mnemonic. The abbreviated name of an instruction used for coding.
AltiVec Technology Programming Interface Manual
MOTOROLA
Modulo. A value v which lies outside the range of numbers representable by an n-bit wide destination type is replaced by the low-order n bits of the twoOs complement representation of v. Most-signicant bit (msb). The highest-order bit in an address, registers, data element, or instruction encoding.
N
NaN. An abbreviation for ONot a NumberO; a symbolic entity encoded in floating-point format. There are two types of NaNsNsignaling NaNs (SNaNs) and quiet NaNs (QNaNs). Normalization. A process by which a oating-point value is manipulated such that it can be represented in the format for the appropriate precision (single- or double-precision). For a oating-point value to be representable in the single- or double-precision format, the leading implied bit must be a 1.
O
Overow. An error condition that occurs during arithmetic operations when the result cannot be stored accurately in the destination register(s). For example, if two 32-bit numbers are multiplied, the result may not be representable in 32 bits. Quad word. A group of 16 contiguous locations starting at an address divisible by 16. Quiet NaN. A type of NaN that can propagate through most arithmetic operations without signaling exceptions. A quiet NaN is used to represent the results of certain invalid operations, such as invalid arithmetic operations on innities or on NaNs, when invalid. See Signaling NaN.
Q
R
Record bit. Bit 31 (or the Rc bit) in the instruction encoding. When it is set, updates the condition register (CR) to reect the result of the operation. Its presence is denoted by a O.O following the mnemonic. Reserved eld. In a register, a reserved eld is one that is not assigned a function. A reserved eld may be a single bit. The handling of reserved bits is implementation-dependent. Software is permitted to write any value to such a bit. A subsequent reading of the bit returns 0 if the value last written to the bit was 0 and returns an undened value (0 or 1) otherwise.
MOTOROLA
Glossary of Terms and Abbreviations
Glossary-3
RISC (reduced instruction set computing). An architecture characterized by xed-length instructions with nonoverlapping functionality and by a separate set of load and store instructions that perform memory accesses.
S
Saturate. A value v which lies outside the range of numbers representable by a destination type is replaced by the representable number closest to v. Signaling NaN. A type of NaN that generates an invalid operation program exception when it is specied as arithmetic operands. See Quiet NaN. Signicand. The component of a binary oating-point number that consists of an explicit or implicit leading bit to the left of its implied binary point and a fraction eld to the right. Splat. A splat instruction will take one element and replicate (splat) that value into a vector register. Sticky bit. A bit that when set must be cleared explicitly. Supervisor mode. The privileged operation state of a processor. In supervisor mode, software, typically the operating system, can access all control registers and can access the supervisor memory space, among other privileged operations.
T
Tiny. A oating-point value that is too small to be represented for a particular precision format, including denormalized numbers; they do not include 0. Underow. An error condition that occurs during arithmetic operations when the result cannot be represented accurately in the destination register. For example, underow can happen if two oating-point fractions are multiplied and the result requires a smaller exponent and/or mantissa than the single-precision format can provide. In other words, the result is too small to be represented accurately. User mode. The unprivileged operating state of a processor used typically by application software. In user mode, software can only access certain control registers and can access only user memory space. No privileged operations can be performed. Also referred to as problem state.
U
Glossary-4
AltiVec Technology Programming Interface Manual
MOTOROLA
V V
Vector Literal. A vector literal is a constant expression with a value that is taken as a vector type. See Section 2.5.1, OVector LiteralsO for details. Vector Register (VR). Any of the 32 registers in the vector register le. Each vector register is 128 bits wide. These registers can provide the source operands and destination results for AltiVec instructions.
V W
Word. A 32-bit data element.
MOTOROLA
Glossary of Terms and Abbreviations
Glossary-5
Glossary-6
AltiVec Technology Programming Interface Manual
MOTOROLA
INDEX
Symbols
#pragma altivec_codegen 2-10 #pragma altivec_model 2-10 #pragma altivec_vrsave 2-10 __pixel 2-2, 2-3 __va_arg 3-9 __vector 2-2, 2-3
D
data stream 4-36, 4-37, 4-38, 4-40, 4-42, 4-44 DataStreamPrefetchControl 4-36, 4-37, 4-38, 4-40, 442, 4-44 debugging information 3-11 DWARF 3-12
E A
ABI 1-1, 1-2, 3-1 ABS 4-4, 4-8, 4-10 AIX ABI 3-1, 3-2, 3-10 stack frame 3-5 aligning data from an unaligned address 4-54, 4-55 alignment aggregates and unions containing vector types 2-3 non-vector types 2-3 vector types 2-3 AltiVec registers 3-1 Apple Macintosh ABI 3-1, 3-2, 3-10 stack frame 3-5 EABI 3-1, 3-2, 3-3, 3-9 Effective Address 4-48, 4-50, 4-51, 4-54, 4-55, 4-112, 4-114, 4-116
F
Floor 4-5, 4-47 FP2xEst 4-5, 4-46 FPLog2Est 4-5, 4-53 FPRecipEst 4-5, 4-85 fprintf 3-12 fscanf 3-12
G B
bool 2-2, 2-3 BorrowOut 4-4, 4-121 BoundAlign 4-4, 4-48, 4-50, 4-51, 4-112, 4-114, 4-116 byte ordering 4-3 generic AltiVec operation 2-8
H
high-level language interface 1-1, 2-1 high-order byte numbering 4-3
C
cache touches all 4-37 loads 4-38 stores 4-40 tag a 4-36 transient loads 4-44 transient stores 4-42 calloc 3-10 CarryOut 4-4, 4-15 casts 2-5 Ceil 4-4, 4-23 condition register CR6 2-9 cross-reference AltiVec Instructions to Operations/Predicates A-1 AltiVec Operations to Instructions A-7 AltiVec Predicates to Instructions A-14
I
ISNaN 4-5, 4-150, 4-174 ISNUM 4-5, 4-158, 4-182
L
longjmp 3-11
M
malloc 3-10 MAX 4-5, 4-58 MEM 4-5, 4-48, 4-50, 4-51, 4-112, 4-114, 4-116 MIN 4-5, 4-66 mod 4-50
N
NaN 4-5, 4-24, 4-58, 4-66, 4-85, 4-150, 4-154, 4-155,
MOTOROLA
Index
Index-1
INDEX
4-156, 4-157, 4-174, 4-178, 4-179, 4-180, 4-181 NEG 4-5, 4-183 NGE 4-5, 4-154, 4-178, 4-183 NGT 4-5, 4-155, 4-179 NJ bit 4-2, 4-8, 4-12, 4-23, 4-24, 4-25, 4-27, 4-28, 430, 4-31, 4-33, 4-34, 4-35, 4-46, 4-47, 4-53, 4-56, 4-58, 4-66, 4-77, 4-85, 4-88, 4-89, 4-118, 4-127, 4-134, 4-137, 4-140, 4-143, 4-144, 4-147, 4-150, 4-151, 4-154, 4-155, 4-156, 4-157, 4-158, 4-159, 4-162, 4-165, 4-168, 4-171, 4-174, 4-175, 4-178, 4-179, 4-180, 4-181, 4-182, 4-183 NLE 4-5, 4-156, 4-180, 4-183 NLT 4-5, 4-157, 4-181 non-Java mode. See NJ bit notation and conventions 4-4 saturation. See SAT bit save and restore functions 3-7 scanf 3-12 setjmp 3-11 ShiftLeft 4-5, 4-91, 4-94 ShiftRight 4-5, 4-105, 4-109 ShiftRightA 4-5, 4-107 SignExtend 4-5, 4-99, 4-100, 4-101, 4-102, 4-103, 4-104, 4-128, 4-130 SIToFP 4-5, 4-33 sizeof 2-4 specific AltiVec operation 2-8 sprintf 3-12 sscanf 3-12 stack frame 1-2, 3-2, 3-5 SVR4 ABI 3-1, 3-2, 3-3, 3-9
O
operation description format 4-7 operator new 3-10
T
type casting 2-5 types 2-5
P
parameter passing 3-9, 3-10 pixel 2-2, 2-3, 4-81, 4-128, 4-130 pointer arithmetic 2-4 pointer dereferencing 2-4 precedence rules 4-6 predicate 2-8, 4-133 printf 3-12 pseudocode 4-4
U
UIToUImod 4-6, 4-80 Undefined 4-6, 4-50, 4-94, 4-109 user-level cache operations vec_dss 4-36 vec_dssall 4-37 vec_dst 4-38 vec_dstst 4-40 vec_dststt 4-42 vec_dstt 4-44
Q
QNaN 4-5, 4-58, 4-66, 4-85
V
va_arg 3-10 Varargs 3-9 vec_abs 4-8 vec_abss 4-10 vec_add 2-8, 2-9, 4-12 vec_addc 4-15 vec_adds 4-16 vec_addubm 2-8 vec_all_eq 2-8, 4-134 vec_all_ge 4-137 vec_all_gt 2-9, 4-140 vec_all_in 4-143 vec_all_le 4-144 vec_all_lt 2-9, 4-147 vec_all_nan 2-9, 4-150 vec_all_ne 4-151 vec_all_nge 4-154 vec_all_ngt 4-155 vec_all_nle 4-156
R
realloc 3-10 RecipSQRTEst 4-5, 4-89 register usage conventions 3-1 RndToFPINear 4-5, 4-88 RndToFPITrunc 4-5, 4-127 RndToFPNearest 4-5, 4-56, 4-77 ROTL 4-5, 4-86 Round to Nearest 4-88 Round toward +Infinity 4-23 Round toward Zero 4-127 Round towards Infinity 4-47
S
SAT bit 4-1, 4-2, 4-10, 4-16, 4-34, 4-35, 4-57, 4-70, 4-73, 4-82, 4-83, 4-122, 4-124, 4-125, 4-126 Saturate 4-5, 4-10, 4-16, 4-34, 4-35, 4-57, 4-70, 4-73, 4-82, 4-83, 4-122, 4-124, 4-125, 4-126
Index-2
AltiVec Technology Programming Interface Manual
MOTOROLA
INDEX
vec_all_nlt 4-157 vec_all_numeric 4-158 vec_alloc 3-10 vec_and 4-18 vec_andc 4-19 vec_any_eq 4-159 vec_any_ge 4-162 vec_any_gt 4-165 vec_any_le 4-168 vec_any_lt 4-171 vec_any_nan 4-174 vec_any_ne 4-175 vec_any_nge 4-178 vec_any_ngt 4-179 vec_any_nle 4-180 vec_any_nlt 4-181 vec_any_numeric 4-182 vec_any_out 4-183 vec_avg 4-21 vec_calloc 3-10 vec_ceil 4-23 vec_cmpb 4-24 vec_cmpeq 4-25 vec_cmpge 4-27 vec_cmpgt 4-28 vec_cmple 4-30 vec_cmplt 4-31 vec_ctf 4-33 vec_cts 4-34 vec_ctu 4-35 vec_data 2-2 vec_dss 4-36 vec_dssall 4-37 vec_dst 4-38 vec_dstst 4-40 vec_dststt 4-42 vec_dstt 4-44 vec_expte 4-46 vec_floor 4-47 vec_free 3-10 vec_ld 2-4, 4-48 vec_lde 4-50 vec_ldl 2-4, 4-51 vec_loge 4-53 vec_lvsl 2-3, 4-54 vec_lvsr 2-3, 4-55 vec_madd 4-56 vec_madds 4-57 vec_malloc 3-10 vec_max 4-8, 4-10, 4-58 vec_mergeh 4-61 vec_mergel 4-63 vec_mfvscr 4-2, 4-65 vec_min 4-8, 4-10, 4-66 vec_mladd 4-69 vec_mradds 4-70 vec_msum 4-71 vec_msums 4-73 vec_mtvscr 4-74 vec_mule 4-75 vec_mulo 4-76 vec_nmsub 4-77 vec_nor 4-78 vec_or 4-79, 4-129, 4-131 vec_pack 4-80 vec_packpx 4-81 vec_packs 4-82 vec_packsu 4-83 vec_perm 2-3, 4-84 vec_re 4-85 vec_realloc 3-10 vec_rl 4-81, 4-86 vec_round 4-88 vec_rsqrte 4-89 vec_sel 4-90 vec_sl 4-91, 4-129, 4-131 vec_sld 4-93 vec_sll 4-94 vec_slo 4-96 vec_splat 4-97 vec_splat_s16 4-100 vec_splat_s32 4-101 vec_splat_s8 4-99 vec_splat_u16 4-103 vec_splat_u32 4-104 vec_splat_u8 4-102 vec_sr 4-105, 4-129, 4-131 vec_sra 4-107 vec_srl 4-109 vec_sro 4-111 vec_st 2-4, 4-112 vec_ste 4-114 vec_step 2-8 vec_stl 2-4, 4-116 vec_sub 4-8, 4-118 vec_subc 4-121 vec_subs 4-10, 4-122 vec_sum2s 4-125 vec_sum4s 4-124 vec_sums 4-126 vec_trunc 4-127 vec_unpackh 4-128, 4-129, 4-131 vec_unpackl 4-130 vec_vaddubh 2-9 vec_vaddubm 2-9 vec_vaddubs 2-9 vec_vadduhm 2-9 vec_xor 4-132 vector 2-2, 2-3 vector bool char 2-1, 2-5
MOTOROLA
Index
Index-3
INDEX
vector bool int 2-2, 2-5 vector bool long 2-2 vector bool long int 2-2 vector bool short 2-1, 2-5 vector bool short int 2-1 vector cast 2-7 vector data types 3-1 vector float 2-2, 2-5, 2-7 vector literal 2-7 vector operations, arithmetic vec_abs 4-8 vec_abss 4-10 vec_add 4-12 vec_addc 4-15 vec_adds 4-16 vec_avg 4-21 vec_max 4-58 vec_min 4-66 vec_mule 4-75 vec_mulo 4-76 vec_sub 4-118 vec_subc 4-121 vec_subs 4-122 vector operations, compare vec_cmpb 4-24 vec_cmpeq 4-25 vec_cmpge 4-27 vec_cmpgt 4-28 vec_cmple 4-30 vec_cmplt 4-31 vector operations, function estimate vec_expte 4-46 vec_loge 4-53 vec_re 4-85 vec_rsqrte 4-89 vector operations, load/store vec_ld 4-48 vec_lde 4-50 vec_ldl 4-51 vec_st 4-112 vec_ste 4-114 vec_stl 4-116 vector operations, logical vec_and 4-18 vec_andc 4-19 vec_nor 4-78 vec_or 4-79 vec_sel 4-90 vec_xor 4-132 vector operations, merge vec_mergeh 4-61 vec_mergel 4-63 vector operations, miscellaneous vec_alloc 3-10 vec_calloc 3-10 vec_free 3-10 vec_malloc 3-10 vec_mfvscr 4-65 vec_mtvscr 4-74 vec_realloc 3-10 vec_step 2-8 vector cast 2-7 vector literals 2-7 vector operations, mixed arithmetic vec_madd 4-56 vec_madds 4-57 vec_mladd 4-69 vec_mradds 4-70 vec_msum 4-71 vec_msums 4-73 vec_nmsub 4-77 vec_sum2s 4-125 vec_sum4s 4-124 vec_sums 4-126 vector operations, pack and unpack vec_pack 4-80 vec_packpx 4-81 vec_packs 4-82 vec_packsu 4-83 vec_unpackh 4-128 vec_unpackl 4-130 vector operations, permute vec_perm 4-84 vector operations, rounding and conversion vec_ceil 4-23 vec_ctf 4-33 vec_cts 4-34 vec_ctu 4-35 vec_floor 4-47 vec_round 4-88 vec_trunc 4-127 vector operations, shift vec_sld 4-93 vec_sll 4-94 vec_slo 4-96 vec_srl 4-109 vec_sro 4-111 vector operations, shift and rotate vec_rl 4-86 vec_sl 4-91 vec_sr 4-105 vec_sra 4-107 vector operations, splat vec_splat 4-97 vec_splat_32 4-101 vec_splat_s16 4-100 vec_splat_s8 4-99 vec_splat_u16 4-103
Index-4
AltiVec Technology Programming Interface Manual
MOTOROLA
INDEX
vec_splat_u32 4-104 vec_splat_u8 4-102 vector operations, supporting alignment vec_lvsl 4-54 vec_lvsr 4-55 vector pixel 2-2, 2-5 vector predicates vec_all_eq 4-134 vec_all_ge 4-137 vec_all_gt 4-140 vec_all_in 4-143 vec_all_le 4-144 vec_all_lt 4-147 vec_all_nan 4-150 vec_all_ne 4-151 vec_all_nge 4-154 vec_all_ngt 4-155 vec_all_nle 4-156 vec_all_nlt 4-157 vec_all_numeric 4-158 vec_any_eq 4-159 vec_any_ge 4-162 vec_any_gt 4-165 vec_any_le 4-168 vec_any_lt 4-171 vec_any_nan 4-174 vec_any_ne 4-175 vec_any_nge 4-178 vec_any_ngt 4-179 vec_any_nle 4-180 vec_any_nlt 4-181 vec_any_numeric 4-182 vec_any_out 4-183 vector register 1-2 vector register saving and restoring functions 3-7 vector signed char 2-1, 2-5, 2-7 vector signed int 2-2, 2-5, 2-7 vector signed long 2-2 vector signed long int 2-2 vector signed short 2-1, 2-5, 2-7 vector signed short int 2-1 vector unsigned char 2-1, 2-5, 2-7 vector unsigned int 2-1, 2-5, 2-7 vector unsigned long 2-1 vector unsigned long int 2-1 vector unsigned short 2-1, 2-5, 2-7 vector unsigned short int 2-1 vfprintf 3-12 vprintf 3-12 VRSAVE 3-2, 3-4, 3-6, 3-11 VSCR 4-1, 4-65, 4-74 vsprintf 3-12
W
website xv, xviii, 1-1
X
xcoff stabstrings 3-12
MOTOROLA
Index
Index-5
INDEX
Index-6
AltiVec Technology Programming Interface Manual
MOTOROLA
Overview
1
High-Level Language Interface
2
Application Binary Interface
3
AltiVec Operations and Predicates
4
AltiVec Instruction Set/Operations/Predicates Cross-Reference
A
Glossary of Terms and Abbreviations GLO
Index IND
1
Overview
2
High-Level Language Interface
3
Application Binary Interface
4
AltiVec Operations and Predicates
A
AltiVec Instruction Set/Operations/Predicates Cross-Reference
GLO Glossary of Terms and Abbreviations
IND
Index
Attention!
This book is a companion to the PowerPC Microprocessor Family: The Programming Environments, referred to as The Programming Environments Manual. Note that the companion Programming Environments Manual exists in two versions. See the Preface for a description of the following two versions: PowerPC Microprocessor Family: The Programming Environments, Rev 1 Order #: MPCFPE/AD PowerPC Microprocessor Family: The Programming Environments for 32-Bit Microprocessors, Rev 1 Order #: MPCFPE32B/AD
Call the Motorola LDC at 1-800-441-2447 or contact your local sales ofce to obtain copies.

▲Up To Search▲

Price & Availability of ALTIVECPIM

	To Download ALTIVECPIM Datasheet File
If you can't view the Datasheet, Please click here to try to view without PDF Reader .