+ All Categories
Home > Documents > Programming of Digital Signal Processors and Data...

Programming of Digital Signal Processors and Data...

Date post: 15-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
94
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií Vysoké učení technické v Brně 1 Programming of Digital Signal Processors and Data Transmission via the PCI Bus (Master Thesis) Martin BARVA August 2002
Transcript
Page 1: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

1

Programming of Digital Signal Processors and Data Transmission via the PCI Bus

(Master Thesis)

Martin BARVA August 2002

Page 2: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

2

CONTENTS

CONTENTS.............................................................................................................................................2

FIGURES .................................................................................................................................................6

PREFACE................................................................................................................................................8

ABOUT CD..............................................................................................................................................9

HARDWARE AND SOFTWARE ................................................................................................................9 DSP Part..........................................................................................................................................9 PCI Part...........................................................................................................................................9

1. INTRODUCTION TO DSP PART............................................................................................10

2. DIGITAL SIGNAL PROCESSING PROCESSORS ..............................................................12

3. TMS320C6000 DSP PLATFORM.............................................................................................14

3.1 TMS320C6000 DSP PROCESSOR ARCHITECTURE ...............................................................14 3.1.1 Key Features of TMS320C62x/TMS320C67x Device ....................................................14 3.1.2 Central Processing Unit Core ........................................................................................15 3.1.3 Memory ...........................................................................................................................18 3.1.4 Peripherals......................................................................................................................19

3.2 TMS320C6701 EVALUATION MODULE...............................................................................20 3.2.1 Key Features of TMS320C6701 Evaluation Module......................................................20 3.2.2 TMS320C6701 Evaluation Module Hardware Functional Overview............................20

3.3 IMPLEMENTATION OF DSP ALGORITHMS .............................................................................22 3.3.1 Low-Level Implementation of DSP Algorithms ..............................................................22 3.3.2 High Level Implementation of DSP Algorithms .............................................................25 3.3.3 Comparison of Low- and High-Level Implementation Approach ..................................29

4. MATHEMATICAL BACKGROUND OF IMPLEMENTED DSP ALGORITHMS..........30

4.1 FINITE IMPULSE RESPONSE (FIR) DIGITAL FILTER ...............................................................30 4.1.1 Properties of FIR filter ...................................................................................................30 4.1.2 Coefficients Calculation by means of Window Method..................................................31

4.2 INFINITE IMPULSE RESPONSE (IIR) DIGITAL FILTER............................................................33 4.2.1 IIR Filter Implementation ...............................................................................................33 4.2.2 Coefficients Calculation using Bilinear Transform Method ..........................................33

Page 3: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

3

4.3 ADAPTIVE FILTERS...............................................................................................................34 4.3.1 Structure of Adaptive Filter ............................................................................................34 4.3.2 Least Mean Square (LMS) Adaptive Filter.....................................................................35

4.4 FAST FOURIER TRANSFORM .................................................................................................36 4.4.1 Calculation Cost of DFT.................................................................................................36 4.4.2 Mathematical Background of FFT - DIT Algorithm ......................................................36 4.4.3 Computational Cost of FFT with Decimation in Time...................................................39

5. IMPLEMENTATION.................................................................................................................41

5.1 CODEC ................................................................................................................................41 5.1.1 Loopback Example..........................................................................................................41 5.1.2 InAndOut example ..........................................................................................................43 5.1.3 Generator Example.........................................................................................................44

5.2 DSP ALGORITHMS ...............................................................................................................45 5.2.1 Examples of Low-Level Implementation of DSP Algorithms .........................................45 5.2.2 Examples of High-Level Implementation of DSP Algorithms ........................................47

6. SUMMARY OF DSP PART.......................................................................................................50

7. INTRODUCTION TO PCI PART ............................................................................................51

8. PERIPHERAL COMPONENT INTERCONNECT (PCI) BUS............................................52

8.1 INTRODUCTION TO COMPUTER BUSES..................................................................................52 8.1.1 Division of Computer Buses ...........................................................................................52 8.1.2 Computer Buses before PCI ...........................................................................................53

8.2 INTRODUCTION TO PCI BUS.................................................................................................53 8.3 KEY FEATURES OF PCI BUS..................................................................................................54 8.4 PCI SIGNALS ........................................................................................................................54

8.4.1 System Signals.................................................................................................................55 8.4.2 Address and Data Signals...............................................................................................55 8.4.3 Interface Control Signals................................................................................................56 8.4.4 Arbitration.......................................................................................................................56 8.4.5 Error Reporting ..............................................................................................................56 8.4.6 Interrupt Signals .............................................................................................................56 8.4.7 64-bit extension ...............................................................................................................56 8.4.8 JTAG Signals ..................................................................................................................56

8.5 ARBITRATION .......................................................................................................................57 8.5.1 BUS Parking ...................................................................................................................57

Page 4: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

4

8.6 BUS PROTOCOL.....................................................................................................................57 8.6.1 PCI Bus Command..........................................................................................................57 8.6.2 Byte Enable .....................................................................................................................59 8.6.3 Basic PCI Transactions ..................................................................................................59 8.6.4 Latency ............................................................................................................................61 8.6.5 Error Detection and Reporting.......................................................................................62 8.6.6 Target-Initiated Termination of Transaction .................................................................62

8.7 ADVANCED FEATURES OF PCI BUS......................................................................................63 8.7.1 Interrupt Handling ..........................................................................................................63 8.7.2 Special Cycle...................................................................................................................63 8.7.3 64-bit extension ...............................................................................................................64

8.8 PLUG AND PLAY CONFIGURATION .......................................................................................64 8.8.1 PCI Configuration Space................................................................................................64 8.8.2 Structure of Configuration Space ...................................................................................65 8.8.3 PCI BIOS ........................................................................................................................68

9. PLX HARDWARE AND SOFTWARE DEVELOPMENT TOOLS ....................................69

9.1 PCI 9050 BUS TARGET INTERFACE CHIP.............................................................................69 9.1.1 PCI 9050 Main Features ................................................................................................70 9.1.2 PCI Bus Interface of PCI 9050 Bus Interface Chip .......................................................70 9.1.3 Local Bus Interface of PCI 9050 Bus Interface Chip.....................................................70 9.1.4 Single Cycle Write and Read ..........................................................................................72 9.1.5 PCI Configuration Registers and Local Configuration Registers .................................72 9.1.6 Serial EEPROM..............................................................................................................73 9.1.7 Local Chip Select ............................................................................................................73

9.2 PCI 9050 REFERENCE DESIGN KIT (RDK) ..........................................................................73 9.2.1 Main features of PCI 9050 Reference Design Kit ..........................................................74 9.2.2 PCI 9050RDK Subsystems..............................................................................................74

9.3 PCI 9050 SOFTWARE DESIGN KIT (SDK) AND PLXMON ...................................................76

10. DESIGNED PCI DEVICE.....................................................................................................77

10.1 APPLICATION OVERVIEW .....................................................................................................77 10.2 HARDWARE PART OF DEVICE...............................................................................................78

10.2.1 Latch Circuitry on PCI 9050RDK .............................................................................78 10.2.2 Timing Diagrams .......................................................................................................79 10.2.3 Parallel Port Configuration.......................................................................................82 10.2.4 Application Registers .................................................................................................82

10.3 SOFTWARE PART OF DEVICE.................................................................................................82

Page 5: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

5

10.3.1 Device Driver .............................................................................................................82 10.3.2 Example Software Application...................................................................................83

11. SUMMARY OF PCI PART ..................................................................................................86

CONCLUSION......................................................................................................................................87

BIBLIOGRAPHY .................................................................................................................................89

APPENDIX ............................................................................................................................................90

A1 EXAMPLE OF EXECUTABLE GENERATION ......................................................................................90 A2 SCHEME OF LATCH CIRCUITRY......................................................................................................93

Page 6: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

6

FIGURES

Figure 1.1: Digital signal processing chain. ................................................................ 12

Figure 1.2: Definition of real-time DSP processing.................................................... 12

Figure 3.1: C62x/C67x block diagram. ....................................................................... 15

Figure 3.2: Central processing unit core. .................................................................... 16

Figure 3.3: Data paths of 'C67x device. ...................................................................... 17

Figure 3.4: Functional diagram of the 'C6701 EVM. ................................................. 21

Figure 3.5: Software development flow. ..................................................................... 23

Figure 3.6: Graphical interface of the Code Composer Studio................................... 24

Figure 3.7: C6701 EVM simulink library blocks........................................................ 26

Figure 3.8: Example of simulink model designed for executable generation. ........... 27

Figure 3.9: Build process of the executable. ............................................................... 28

Figure 3.10: High-level object oriented view of the executable................................. 29

Figure 4.1: Periodical transfer function of FIR filter. ................................................. 31

Figure 4.2: Coefficients of ideal FIR filter.................................................................. 32

Figure 4.3: Block diagram of adaptive filter. .............................................................. 35

Figure 4.4: Two basic DSP operation.......................................................................... 37

Figure 4.5: DSP flowchart display of equation 4.22................................................... 38

Figure 4.6: 8-point DFT expressed with four 2-point DFT. ....................................... 38

Figure 4.7: FFT butterfly topology.............................................................................. 39

Figure 4.8: Complete 8-point FFT............................................................................... 39

Figure 5.1: Loopback example. ................................................................................... 42

Figure 5.2: InAndOut example. ................................................................................... 43

Figure 5.3: Generator example. ................................................................................... 44

Figure 5.4: FFT example. ............................................................................................ 46

Figure 5.5: FIR graphical user interface...................................................................... 47

Figure 5.6: IIR graphical user interface....................................................................... 49

Figure 8.1: Computer bus. ........................................................................................... 52

Figure 8.2: PCI bus diagram........................................................................................ 55

Figure 8.3: PCI read transaction. ................................................................................. 59

Page 7: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

7

Figure 8.4: PCI write transaction................................................................................. 60

Figure 8.5: Bus latency. ............................................................................................... 61

Figure 8.6: Target disconnect. ..................................................................................... 62

Figure 8.7: Target abort. .............................................................................................. 63

Figure 8.8: Type 0 configuration header. .................................................................... 65

Figure 8.9: Structure of capabilities list. ..................................................................... 67

Figure 9.1: PCI 9050 bus interface chip...................................................................... 69

Figure 9.2: Single local bus write. ............................................................................... 72

Figure 9.3: Single local bus read. ................................................................................ 72

Figure 9.4: PLX PCI 9050RDK block diagram. ......................................................... 74

Figure 10.1: Block diagram of the application............................................................ 77

Figure 10.2: Scheme of the latch circuitry. ................................................................. 78

Figure 10.3: Data transfer between PCI bus and PCI9050 write FIFO. ..................... 80

Figure 10.4: Data transfer between PCI 9050 FIFO and latch circuitry..................... 81

Figure 10.5: Data transfer between latch circuitry and parallel port. ......................... 81

Figure 10.6: WritePCI program writes data into the PCITOLPT device. .................. 84

Figure 10.7: ReadLPT program reads data stored in the PCITOLPT device............. 84

Figure A1.1: Simulink model to be converted into executable. ................................. 90

Figure A1.2: Setting of the solver. .............................................................................. 90

Figure A1.3: Setting the Real-Time Workshop parameters........................................ 91

Figure A1.4: Press the Build & Run button to execute the build process. ................. 91

Figure A2.1: Octal latch circuitry. ............................................................................... 94

Page 8: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

8

PREFACE

The aim of this project, which consists of two parts, was to develop

applications that could be used by students to explore the domain of the digital signal

processing and the peripheral component interconnect bus.

The digital signal processing part explains through chapters 1 - 6 different

approaches that can be taken in order to implement DSP algorithms into signal

processors.

The second part, that forms chapters 7 - 11 is focused on the peripheral

component interconnect bus and its possible use for data transfer between two

computer systems.

The project was carried out in a laboratory of Institut National des Sciences

Appliquées de Lyon, France. I would like to especially thank Dr. Philippe

Delachartre for his valuable advice and technical support.

M. Barva

Page 9: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

9

ABOUT CD

Included CD contains files concerning the DSP and PCI part of the project.

The files are located in following directory structure:

• \DOC: Electronic version of the final report in the Microsoft Word 97 format.

• \DSP: Its subdirectories contain DSP applications that are described in chapter 5.

• \PCI\BIN: Two Win9x programs, WritePCI and ReadLPT described in section

10.3.2.

• \PCI\DRIVER: Win9x device drivers for the PCITOLPT device and parallel port.

• \PCI\SRC: Source files of the WritePCI and ReadLPT application for Microsoft

Visual C++ ver.6.

HARDWARE AND SOFTWARE

This section contains description of the hardware and software configuration

used to develop and test functionality of the designed applications.

DSP Part

• TMS320C6701 EVM: Texas Instruments evaluation module with a

TMS320C6701 signal processor.

• Windows 98: Operating system.

• Code Composer Studio ver.1.0: Integrated development environment for Texas

Instruments DSP processors.

• Matlab ver.6.1, Real Time Workshop, Developer's Kit for TI DSP and Simulink:

Used for high-level implementation of DSP algorithms.

PCI Part

• PLX PCI 9050RDK with latch circuitry: This development kit was used to built

PCI compliant circuitry.

• Windows 98: Operating system.

• Microsoft Visual C++ ver.6: 32-bit programming environment.

• WinDriver: Tool for device driver development.

Page 10: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

10

1. INTRODUCTION TO DSP PART

Digital Signal Processing (DSP) differentiates from other areas of computer

science by the type of data it uses: signals. The signals can, for example, originate

from sensors providing information about the real world such as seismic vibrations,

sound waves, images, etc.

The notion "digital signal processing" comprises the mathematical

background, the algorithms, and the techniques that are used to manipulate and

process input data.

Signals are processed in order to achieve a wide variety of goals, e.g.

compression of data for storage and transmission, recognition and generation of

speech, image enhancement, extracting information encoded in the signal, etc.

Traditional signal processing was achieved by using analogue components

such as resistors, capacitors and inductors. However, the tolerances associated with

these components, or temperature can affect the effectiveness of analogue circuitry.

The main objective of the DSP part of this project is to allow the reader to

familiarize himself / herself with the fundamentals concerning implementation of

DSP algorithms into DSP processors. As the developed examples were implemented

into the TMS320C6701 Evaluation Module (EVM), this report likewise contains

more detailed description of the evaluation module and the ‘C67x digital signal

processor.

The DSP part of this report consists of five chapters:

Chapter 2 introduces the notion “digital signal processor. It describes

differences in architecture from other general-purpose processors and gives typical

examples of applications, where DSP processors are used.

Chapter 3 is focused on TMS320C62x/67x DSP processors by Texas

Instruments. Further, it contains description of the TMS320C6701 EVM. Finally,

different approaches of DSP algorithm implementations are presented.

Chapter 4 provides mathematical background of the DSP algorithms that

were implemented within the framework of this project.

Page 11: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

11

Chapter 5 describes created applications.

Chapter 6 contains summary of the DSP part.

Page 12: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

12

2. DIGITAL SIGNAL PROCESSING

PROCESSORS

In most cases, digital signal processing applications are implemented as

algorithms that run on a special processors called the digital signal processing (DSP)

processors.

The block diagram in figure 1.1 shows a typical digital signal processing

chain, where both, the input and output signals are analog.

Figure 1.1: Digital signal processing chain.

The input signal passes through a low-pass filter before it enters an analog-to-

digital converter, where it is sampled with a constant sampling frequency. Every

sample is then processed with the DSP algorithm that is implemented in the DSP

processor. The result of the operation then passes through a digital-to-analog

converter and a low-pass filter.

From the DSP processor is often required a real-time performance, that is to

say that the processor must be able to process a sample, before the next one comes.

An example of a real-time data processing is shown in figure 1.2.

Figure 1.2: Definition of real-time DSP processing.

A signal is sampled with sampling frequency of 40 kHz (time between two

samples is 1/40000 = 25 µs ). Upon the signal is applied an algorithm that needs 100

instructions to process one sample. If a DSP processor with 30 ns cycle time is

Page 13: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

13

considered, then the waiting time can be determined by deduction of the process time

(30 ns x 100) from the time between samples (25 µs). If the waiting time is greater or

equal to zero, then the application meets the real-time demands.

Nowadays, a wide variety of digital signal processing algorithms are

implemented. Yet, among the most common DSP techniques belong the Finite

Impulse Response (FIR) filter, Infinite Impulse Response (IIR) filter, convolution

and Fast Fourier Transform. From the mathematical theory concerning these DSP

algorithms implies that they require two basic operations in form of the sum of

products (S = Σaibi). Due to this fact, DSP processors, compared to general-purpose

processors, usually have many specialized arithmetic units that can operate

simultaneously. The key features of DSP processors are:

• Arithmetic unit: To calculate the sum of products, all DSP processors have

hardware multiplier and accumulator so two operations, multiplication and

addition can be completed during one cycle. Some DSP processors can fulfil

simultaneously even one DFFT butterfly.

• Bus architecture: DSP processors have Harvard architecture with the two separate

buses, one for program and the other for data. This bus architecture enables the

DSP processor to read an instruction and data from memory simultaneously.

• Addressing: Hardware supported address generation speeds up the calculation of

address and thus reducing the computational time. Some DSP processors support

binary inverse addressing, which is convenient for the DFFT algorithm.

Whereas the first 16-bit DSP processors operated with the speed of 5 MIPS

(Million Instructions Per Second), at present the combination of high speed and

multiple units increased the performance up to 2400 MIPS.

The most important producers of DSP processors are Texas Instruments,

Motorola, Analog Devices, AT&T and NEC.

DSP processors are used in a wide variety of domains such as telecommunication,

control engineering, space, medicine, etc.

Page 14: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

14

3. TMS320C6000 DSP PLATFORM

This chapter is divided into several sections that gradually describe the

TMS320C62x/TSM320C67x processor, TMS320C6701 EVM board and finally

software development.

3.1 TMS320C6000 DSP PROCESSOR ARCHITECTURE

TMS320C6000 devices are first DSP processors that use an enhancement of

the Very Long Instruction Word (VLIW) architecture, which allows achieve high

performance through instruction level parallelism being the key feature for

increasing the performance.

TMS3206000 devices can be separated in two main categories: fixed point

DSP processors TMS320C62x ('C62x) and floating point DSP processors

TMS320C67x ('C67x). Not only have these two types very similar architecture, but

they are also pin compatible, which means that hardware developers do not have to

make two different board designs to support both, the 'C62x and 'C67x processor.

3.1.1 Key Features of TMS320C62x/TMS320C67x Device

The most important features of 'C62x/C67x DSP processors can be

summarized as follows:

• TMS320C62x/TMS320C67x devices operate at 150, 167, 200 and 250 MHz (6,67

ns, 6 ns, 5 ns, and 4 ns cycle time).

• Peak 1336 MIPS (Million Instructions Per Second) at 167 MHz. 'C67x has peak

performance of 688 MFLOPS (Million Floating Point Operations Per Second) at

167 MHz.

• Advanced VLIW CPU architecture with eight functional units, including two

multipliers and six arithmetic units. Up to eight 32-bit instructions can be

executed every cycle.

• 8/16/32-bit data support.

• Large on-chip RAM of 2x64 kB for program and data.

• 32-bit external memory interface supports external memories.

• Host port access to 'C62x/C67x memory and peripherals.

Page 15: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

15

• Direct memory access.

• Multichannel buffered serial port.

• 32-bit timers.

As can be seen in figure 3.1, the 'C62x/C67x DSP processor consists of three

main parts:

• CPU core: Executes the instructions.

• Memory: 2x64 kB RAM for program and data.

• Peripherals: External memory and host port interface, direct memory access, serial

ports and timers.

Figure 3.1: C62x/C67x block diagram.

3.1.2 Central Processing Unit Core

From the block diagram of the Central Processing Unit (CPU) core shown in

figure 3.2 is obvious that the CPU core is composed of several components, whose

functionality is further described.

Page 16: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

16

Figure 3.2: Central processing unit core.

3.1.2.1Program Control Unit

The task of the program control unit is to retrieve a fetch packet of eight

instructions, dispatch them to appropriate units and finally decode these instructions.

One operation cycle of the program control unit can be described as follows.

• PG phase: CPU generates the address of first instruction in the first fetch packet.

• PS phase: Generated address is sent to the program memory, which can be either

external or internal.

• PW phase: CPU retrieves the fetch packet.

• DP phase: Dispatch unit sends each instruction to its unit. Functional units are

designed only for certain instructions.

• DC phase: Instructions are decoded and executed in functional units.

3.1.2.2Data Paths

Figure 3.3 shows a detail of data paths of the 'C62x DSP processor. As can be

observed two data paths, denoted A and B are presented. Each data path contains a

register file of sixteen 32-bit registers and four functional units. Moreover, data paths

include one control register file, which can be accessed only from functional

unit .S2.

3.1.2.2.1General Purpose Register File

Page 17: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

17

There are two groups of general-purpose register files in the 'C62x/C67x

device, each containing sixteen 32-bit registers, which are read from, or written to by

functional units. 32-bit and 40-bit

fixed-point data are supported by the

registers. In case of the 'C67x, they

can likewise store 64-bit double

precision floating-point value.

The main function of the two

register files is to store operands for

functional units. As can be seen from

figure 3.3, register file A can be read

from, or written to by functional units

.L1, .S1, .M1 and .D1. Similarly,

register file B can be accessed by

functional units .L2, .S2, .M2 and .D2.

Data cross paths denoted in figure 3.3

as 1x and 2x enables functional units

to access an operand from the other

side of the CPU.

Paths .ST1, .ST2, .LD1 and

.LD2 serve for data transfer between

register files and memory.

Figure 3.3: Data paths of 'C67x device.

3.1.2.2.2Functional Units

In functional units instructions are finally executed and the results are written

to register files, from where they can be moved into memory. There are in total eight

functional units in both data paths (four units for each data path), each of them

having its own port for read and write, which gives the CPU core the ability of

executing up to eight instructions in one cycle.

Page 18: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

18

As was already mentioned, each functional unit executes a specific set of

operations, so for example, only two multiplication are possible per cycle. Functional

units and their supported operations are summarized in table 3.1.

FUNCTIONAL UNIT FIXED-POINT OPERATIONS FLOATING-POINT

OPERATIONS

.L unit (.L1, .L2) 32/40-bit arithmetic and compare

operations

arithmetic operations

.S unit (.S1, .S2) 32/40 shifts and 32-bit-field

operations

absolute value operations

.M unit (.M1, .M2) 16x16 multiply operations 32x32 bit multiply oper.

.D unit (.D1, .D2) linear and circular address

calculation

load double word with a 5-bit

offset

Table 3.1: Functional units and supported operations.

3.1.2.2.3Control Register File

The 'C62x devices have ten registers for control purposes, while the 'C67x

have thirteen registers. The three extra registers in 'C67x DSP processor are there to

support floating-point operations. Control registers can be accessed only by the

functional unit .S2.

3.1.2.3Interrupts

'C62x/C67x devices allow normal program flow to be interrupted by an event

that comes either from an external peripheral, internal peripheral, or special

instruction in the program.

There are two types of interrupts: non-maskable interrupts (reset and NMI)

and maskable interrupts. Interrupt mechanism is controlled by registers in the control

register file.

3.1.3 Memory

Due to 32-bit wide address bus, the 'C62x/C67x DSP processors have 4 GB

addressable memory space, which is divided into four regions: internal program

memory, internal data memory, internal peripheral and external memory space. Exact

location of each region depends on the memory map used.

Page 19: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

19

The internal 64 kB program memory can be used either to store program or

serve as a cache if the program is in an external memory.

The internal data memory has as well capacity of 64 kB and is used to store

data during program execution.

The external memory connected to the CPU through the External Memory

Interface (EMIF) can be both, synchronous and asynchronous. External memory

extends available storage capacity for program and data.

3.1.4 Peripherals

Peripherals located on the 'C62x/C67x devices include DMA controllers,

multichannel buffered serial ports, timers and interfaces that enable to connect

external memory and external devices such as microprocessors or PCI bridge chips.

3.1.4.1DMA Controller

DMA controller controls data flow between the internal memory and external

memory, host port interface or external peripheral. As the DMA controller performs

data transfer with zero overhead, it can operate together with the CPU independently.

3.1.4.2Multichannel Buffered Serial Ports

Two multichannel buffered serial ports support full-duplex communication at

the maximum speed of 40 Mb/s per channel. This feature allows easily connect

external peripherals such as codec for real-time analog-to-digital and digital-to-

analog conversion.

3.1.4.3Timers

Two 32-bit programmable internal timers are available, each of them being

able to trigger an interrupt. The countdown registers can be clocked internally or

externally.

3.1.4.4Host port interface

The Host Port Interface (HPI) is a parallel interface through which a host

processor can directly access the CPU's entire memory space (internal, external and

Page 20: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

20

memory mapped peripherals), thus allowing data exchange between the host and

DSP processor.

3.2 TMS320C6701 EVALUATION MODULE

Within the framework of this project, the DSP applications that are described

in chapter 5 were implemented and tested on the TMS320C6701 Evaluation Module

(C6701 EVM). The 'C6701 EVM is a demonstration board with a TMS32067x DSP

processor, that is designed for development and real-time testing of digital signal

processing algorithms. External peripherals such as external memory, codec, PCI

controller are as well located on the 'C6701 EVM to allow easy testing of DSP

algorithms in real-time conditions.

3.2.1 Key Features of TMS320C6701 Evaluation Module

The C6701 EVM has the following features:

• TMS320C67x floating-point digital signal processor.

• Quad clock support of 25 MHz, 33,25 MHz, 100 MHz and 133 MHz.

• Peripheral Component Interconnect (PCI) interface with master/slave support.

• 256 kB of 133 MHz synchronous burst static random-access (SBRAM) memory.

• 8 MB of 100 MHz synchronous dynamic random-access (SDRAM) memory.

• Access to all DSP memory from the PCI bus via the host port interface.

• 16-bit stereo codec.

• Three light emitting diode (LED) indicators.

• Plug and play PCI device.

3.2.2 TMS320C6701 Evaluation Module Hardware Functional Overview

Page 21: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

21

Figure 3.4 shows a basic functional diagram of the 'C6701 EVM.

Figure 3.4: Functional diagram of the 'C6701 EVM.

From figure 3.4 is evident that the 'C6701 EVM can be divided into following

functional blocks:

• DSP: The TMS320C6701 evaluation module is built around the 'C67x floating-

point digital signal processor. Refer to the section 3.1 for more information about

the 'C67x DSP device.

• DSP clock: The C6701 EVM supports operation with two different on-board

clock sources and two different clock modes (multiply-by-1 and multiply-by-4).

As a result, the DSP can operate at four different clock rates: 25 MHz, 33,25

MHz, 100 MHz and 133 MHz.

• External memory: The C6701 EVM provides one bank of 256 kB of 133 MHz

SBSRAM and 8 MB of 100 MHz SDRAM memory. Additional memory can be

added using the expansion memory interface.

• Audio interface: The C6701 EVM includes a 16-bit stereo codec that supports

sample rates from 5,5 kHz - 48 kHz. The audio codec has two stereo inputs,

microphone and line-level and a stereo line-level output, which are located on the

board's mounting bracket.

Page 22: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

22

• PCI Interface: PCI local bus revision 2.1 compliant interface allows the host

processor to access whole DSP memory and control/status registers on the board.

• JTAG emulation: Allows source debugging from the host processor via the PCI

bus.

• Programmable logic: The C6701 EVM uses a programmable logic to control the

board system such as reset control, dual CPU clock oscillator, PCI controller, DSP

interface control, etc.

• User options: With twelve DIP switches, the user can choose the boot mode, clock

frequency, JTAG mode and memory map.

• LED indicators: The C6701 EVM provides three LED indicators. One LED is

illuminated whenever 5 V is applied to the board. The other two LEDs are user

defined.

More details about the TMS320C6x device and TMS320C6701 Evaluation Module

can be found in bibliography reference [1].

3.3 IMPLEMENTATION OF DSP ALGORITHMS

Basically, two ways of implementing DSP algorithms into the TMS320C6701

EVM exist. First approach, described in section 3.3.1 is to directly write a source

code of the DSP algorithm in a programming language such as assembler or C/C++

and then from the source code make an executable for the 'C67x DSP processor. This

approach may be denoted as a low-level implementation of DSP algorithms.

Digital signal processing algorithms can be likewise implemented using the

Matlab v.6, which together with simulink and real-time workshop supports

executable generation for the TMS320C6701 EVM. This way can be called high-

level implementation of DSP algorithms.

3.3.1 Low-Level Implementation of DSP Algorithms

This section describes the process of implementing DSP algorithms into the

'C67x device using the Code Composer Studio (CCS) environment.

3.3.1.1Software Development Flow

Page 23: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

23

Typical software development flow consists of steps that can be describe as

follows, see figure 3.5.

Figure 3.5: Software development flow.

1. The C compiler accepts C source code of the DSP algorithm and produces

assembly language source code.

2. The assembly optimizer allows write linear assembly code (assembly code

without register assignment) without being concerned with the registers. The

assembly optimizer assigns registers and turns the linear assembly into highly

parallel assembly code.

3. The assembler translates assembly language source files into machine language

object files based on common object file format (COFF).

4. The linker accepts COFF object files and object libraries as input to create the

executable module that can be run on 'C67x DSP processor.

3.3.1.2Code Composer Studio (CCS)

The CCS environment supports the whole software development flow shown

in figure 3.5 and furthermore introduces optional features such as debugging,

Page 24: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

24

DSP/BIOS, JTAG interface or real-time data exchange between host processor and

the 'C67x DSP device.

The graphical environment of the CCS is shown in figure 3.6.

Figure 3.6: Graphical interface of the Code Composer Studio.

3.3.1.2.1Application Debugging Features

The code composer studio provides support for following debugging

activities:

• Setting breakpoints

• Graphical display of variables in the DSP processor

• Watching variables

• Viewing and editing memory and control registers

• Using probe point tools to stream data to and from the DSP

• Profiling execution statistics

3.3.1.2.2DSP/BIOS

During an analysis phase of the software development cycle, traditional

debugging features are ineffective for problems that arise during real-time execution.

Page 25: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

25

The CCS DSP/BIOS plug-ins provides means for real-time analysis with minimal

influence on the performance.

Unlike traditional debugging, which is external to the executing program, the

DSP/BIOS features require the program to be linked with certain DSP/BIOS API

modules, whose functions are declared as external and are called from source

program. Since the functions are performed by the host, they have minimal impact on

the real-time performance of the DSP application.

3.3.1.2.3JTAG Emulation

The on-chip emulation enables the CCS to control the program execution and

monitor real-time activity. The communication with this on-chip emulation occurs

via the JTAG link. The chip emulation takes care of the communication between the

host target concerning:

• Starting, stopping and, and resetting the DSP processor

• Examining the registers and memory of the DSP

• Performance profiling

• Real-time data exchange between the host and the DSP device

3.3.1.2.4Real-Time Data Exchange (RTDX)

The real-time data exchange feature allows transfer data between the host and

DSP processor without stopping the target application. Acquired data can be

analyzed and visualized on the host using any Objet Link Embedding (OLE) client

such as the Matlab or the Microsoft Excel.

3.3.2 High Level Implementation of DSP Algorithms

The Matlab ver.6.1 together with Simulink and, Real-Time Workshop and the

Developer's Kit for TI DSP toolbox enables to create an executable from a simulink

model, which can be run on the C6701 EVM. Furthermore, the Matlab provides

means for RTDX, direct building and loading executables into 'C67x DSP

processors.

With this set of tools one can develop and test very complex DSP in real-time

conditions without having to be well acquainted with the architecture of the 'C6701

Page 26: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

26

EVM. As the executable is based on a simulink model there's no need for writing

source code of the DSP algorithm.

3.3.2.1The C6701 EVM Library Blocks

The Developer's Kit for TI DSP toolbox provides a simulink library of four

blocks that can be used together with standard blocks in a simulink model, see figure

3.7.

Figure 3.7: C6701 EVM simulink library blocks.

• C6701 ADC Block: Adding this block to a simulink model enables the DSP

application to access the input signals form an external sources. This real signals

can be used to drive and test the DSP algorithms implemented in the simulink

model.

• C6701 DAC Block: This block sends digital data from a simulink model to the

D/A converter and then to the external output connectors of the 'C6701 EVM.

• C6701 LED Block: There are two LEDs on the EVM, one internal (placed

directly on the board) and the other external (on the back of the board). This

functionality can be used to indicate that the algorithm has completed a specific

calculation or reached a certain point in the processing.

• C6701 Reset Block: This is used to reset the 'C6701 and reload the DSP processor

with executable directly from the simulink model window.

3.3.2.2Generation of Executable File

The Real-Time Workshop (RTW), which takes care of the build process

accepts a simulink model as the input and converts it into an executable file that can

be run on the 'C6701 EVM.

Page 27: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

27

The simulink model is required to contain at least two library blocks

representing the input and the output of the C6701 EVM. This blocks are part of the

Developer's Kit for TI DSP toolbox together with other two simulink blocks, see

section 3.3.2.1.

Figure 3.8: Example of simulink model designed for executable generation.

In figure 3.8 is shown an example of simulink model that puts signal from

LineIn connector to the Out connector of the 'C6701 EVM.

The build process of executable is controlled by three files (system target file,

template makefile and make command), which are likewise included in the

Developer's Kit for TI DSP toolbox. As shown in figure 3.9, the build process

consists of several steps.

1. Analysis of the model: The build process begins with this step. During this phase,

the Real-Time Workshop reads the simulink model file ( model.mdl ), that has

been created in the Simulink and creates an intermediate representation of the

model. This description is then stored in a target independent format. The output

of this step is a file called model.rtw.

2. Generation of code by the Target Language Compiler - The Target Language

Compiler converts the model.rtw file into a target specific code. The output of the

Target Language Compiler is a target specific source code version of the simulink

model.

Page 28: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

28

3. Creation of the executable - in this final step, the build process invokes the make

command which in turn runs the compiler to compile the source files. After

successful compilation, the compiled files, libraries and real-time interface are

linked into one executable file.

Figure 3.9: Build process of the executable.

Appendix A1 contains an example that shows step by step the process of

executable generation from a simulink model.

3.3.2.3Execution of Executable Generated by Real-Time

Workshop

The Real-Time Workshop generates a model code based on corresponding

simulink model. It also generates a run-time interface that executes the model code.

The run-time interface and the model code are compiled and linked to create an

executable. Figure 3.10 shows a high level object oriented view of the executable.

Page 29: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

29

Figure 3.10: High-level object oriented view of the executable.

3.3.3 Comparison of Low- and High-Level Implementation Approach

With both approaches, low-level and high-level, DSP algorithms can be

implemented into a DSP processor. Yet, before choosing one of the mentioned

programming technique, the complexity and speed requirements of the DSP

algorithms should be considered.

If the speed is the main goal, then the low-level way of implementing should

be used, since it allows the programmer to highly optimize the code, or even to write

critical parts of the code in assembly to further increase the speed.

Although in the Matlab v.6 the optimization process can not be so advanced,

high-level implementation approach allows to implement very complex DSP

algorithms without the need of profound knowledge of the architecture of DSP

processor. On the other hand the present the Matlab v.6 supports executable

generation only for the TMS320C6701 EVM, which means that high-level

implementation of DSP algorithms can not be used for other DSP modules.

Page 30: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

30

4. MATHEMATICAL BACKGROUND OF

IMPLEMENTED DSP ALGORITHMS

This chapter contains mathematical theory concerning four DSP algorithms

that were implemented within the framework of this project.

4.1 FINITE IMPULSE RESPONSE (FIR) DIGITAL FILTER

Although the FIR filter requires higher order to achieve the same performance

as infinite impulse response filter, it is widely used due to its ability of providing

linear phase characteristic, that neither the analog nor the infinite impulse response

filter can achieve.

4.1.1 Properties of FIR filter

An FIR filter of order N can be defined by equation 4.1.

where x(n) - input

bk - coefficients of the filter

y(n) - output

In the equation 4.2 the input signal is replaced by the Dirac impulse δ(n), that

is defined as:

As b0 = h(0), b1 = h(1), ..., bN-1 = n(N-1), it is obvious that the coefficients

equal to the impulse response of FIR filter.

The frequency response of FIR filter can be determined by taking the z-

transform of h(n):

( ) ( )

)1.4(

1

0∑

=

−=N

kk knxbny

( ) ( ) ( )

)2.4(

1

0∑

=

=−=N

kk nhknbny δ

( )( )

)3.4(;0;1

knknknkn

≠=−==−

δδ

Page 31: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

31

To find the frequency response parameter z is in equation 4.4 replaced by

expression ejωT, where T is sampling frequency. Therefore for T = 1s we can obtain:

Since e-j2πn=1, then:

Equation 4.6 proves that the frequency response of an FIR filter is periodical

with period 2π (T = 1 s), see figure 4.1.

Figure 4.1: Periodical transfer function of FIR filter.

4.1.2 Coefficients Calculation by means of Window Method

The ability of FIR filter to achieve a frequency response "identical" to the

specified one depends mainly on the method that was used to calculate its

coefficients. Among the most common method belong window, frequency sampling,

and optimal equiripple method. In this project, the FIR filter was designed with the

windows method.

As was proved, the frequency response of FIR filter is a periodical function.

Hence, we can apply the Fourier series to obtain the coefficients of FIR filter, see

equation 4.7.

( ) ( )

)4.4(0∑

=

−=n

nznhzH

( )

)5.4(

)(0∑

=

−=n

njj enheH ωω

( )( ) ( ) ( ) ( )( )( ) ( )

)6.4(

200

22

ωπω

ωπωπω

jjn

nj

n

jnj

eHeH

enhenheH

=

==

+

=

−∞

=

+−+ ∑∑

Page 32: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

32

where ωs is the sampling frequency. The frequency response of ideal low-

pass filter is:

where ωc is the cut-off frequency.

From equation 4.7 and 4.8 implies that the coefficients of ideal low-pass FIR

filter can be calculated as follows:

The coefficients obtained from equation 4.9 are shown in figure 4.2.

Figure 4.2: Coefficients of ideal FIR filter.

From equation 4.9 and figure 4.2 implies that the length of ideal FIR filter is

infinite and non-causal. To avoid this problem the impulse response must be shifted

and truncated with a window function, which consequently introduces overshoots

and ripples. This is known as the Gibbs phenomenon. In order to reduce overshoots

( ) ( )

( ) ( )

)7.4(

1;1

1;21

2

2

≠=

==

TdeeHnTh

TdeeHnh

s

s

nTjTj

s

njj

ω

ω

ωω

π

π

ωω

ωω

ωπ

( )( )

)8.4(2

;0

;1

sc

Tj

cTj

eH

eHωωω

ωω

ω

ω

≤<=

≤=

( ) ( )

)9.4(

,...2,1,0,sin211∫

±±==⋅=c

c

nnTcffdenTh c

s

cnTj

s

ω

ω

ω ωωω

Page 33: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

33

and ripples many window have been investigated. Among the most common

windows belong the Rectangular, Hanning, Hammning and Kaiser window.

If the window's sequence is denoted as w(nT), then the final form for

coefficients calculation is:

4.2 INFINITE IMPULSE RESPONSE (IIR) DIGITAL FILTER

Infinite impulse response filters are computationally more efficient than FIR

filters, since they require fewer coefficients due to the fact that they use feedback or

poles. However, this feedback can result in the filter being unstable if the coefficients

deviate from their values. Furthermore, the phase characteristic of IIR filter is not

linear.

4.2.1 IIR Filter Implementation

The general form of the IIR filter can be expressed as follows:

where ak and bk are the coefficients of IIR filter that fully describe its

properties.

4.2.2 Coefficients Calculation using Bilinear Transform Method

The bilinear transform method, which was used in this project to design a

low-pass IIR filter is based on analog filter design. Other known methods are the

pole-zero placement approach and impulse invariant method.

From the known equation z = ejωT = epT, the following relationship between

the s and z transform can be established:

( ) ( )

( ))10.4(

1,0;0

1,0;2

1

'

'

−∉=

−∈⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠⎞

⎜⎝⎛ −−=

NnnTh

NnnTwTNnhnTh

( )

)11.4(

1...1...

0

02

21

1

22

110

=

=

−−−

−−−

+=

++++++++= M

k

kk

N

k

kk

MM

NN

za

zb

zazazazbzbzbbzH

Page 34: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

34

The mapping from s-plane to the z-plane introduces non-linearity between the

analogue and digital frequencies. Therefore, it is necessary to adjust the cut-off

frequency ωc of analog filter prototype according to the desired cut-off frequency ωp

using the equation 4.13.

where T is the sampling period.

When deriving a digital filter with the bilinear transform the following

procedure can be used.

1. Specify the normalized analog filter.

2. Determine the cut-off frequency ωp of the digital filter and using equation 4.13

find its equivalent analog cut-off frequency ωc. This step is known as pre-

warping.

3. De-normalize the analogue filter by ωc. This can be done by replacing s by s/ωc.

4. Finally, using equation 4.12 apply the bilinear transform to the filter obtained in

step 3.

Theory and design procedure concerning FIR and IIR digital filters are described in

details in bibliography reference [2].

4.3 ADAPTIVE FILTERS

Adaptive filters differ from other filters such as the FIR or IIR filter in the

sense that the filter coefficients are not fixed but they are calculated real-time by an

adaptive algorithm.

4.3.1 Structure of Adaptive Filter

Figure 4.3 shows a basic block diagram of adaptive filter.

)12.4(

112ln1

+−≅=

zz

Tz

Ts

s

)13.4(2

tan2⎟⎟⎠

⎞⎜⎜⎝

⎛=

TT

p

sc

ωω

Page 35: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

35

Figure 4.3: Block diagram of adaptive filter.

From figure 4.3 is obvious that the adaptive filter consists of a FIR or IIR

filter whose coefficients are calculated real-time by adaptive algorithm to provide the

desired performance.

4.3.2 Least Mean Square (LMS) Adaptive Filter

From figure 4.3, following equations can be written.

The basic premise of the LMS algorithm is the use of the steepest descent

algorithm. The coefficients of the FIR filter can be determined as follows.

where β is a positive value known as the step size parameter and ∆n,k is a

gradient vector that makes FIR filter coefficients approach their optimal values. It

can been proved that:

Finally,

( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( )

)14.4(

1

0

1

0

knxkhndnyndne

knxkhny

N

k

N

k

−−=−=

−=

∑−

=

=

( ) ( ))15.4(

,1 knnn khkh ∆+= − β

( ) ( ))16.4(

, knxnekn −=∆

( ) ( ) ( ) ( ))17.4(

1 knxnekhkh nn −+= − β

Page 36: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

36

4.4 FAST FOURIER TRANSFORM

The Discrete Fourier Transform (DFT) is used to produce frequency analysis

of discrete non-periodic signals. The Fast Fourier Transform (FFT) is another way to

achieve the same result, but with less overhead involved in the calculations.

4.4.1 Calculation Cost of DFT

From equation 4.18 of the DFT, where Wn is called the twiddle factor

implies that the computational cost of N-point DFT requires N2 complex

multiplication and N(N-1) complex additions. In this case a simple eight-sample-

signal would require 64 complex multiplication and 56 complex additions. But a

signal of 1024 samples would require much more computational cost, concretely

20,000,000 complex operations. The FFT is therefore used to decrease the

computational cost. There are several algorithms, but the best know are the two

Radix 2 methods:

• Decimation In Time (DIT)

• Decimation in Frequency (DIF)

4.4.2 Mathematical Background of FFT - DIT Algorithm

If we assume to have a signal whose number of samples is an integer power

of 2 (N = 2v), then we can separate the original sum (equation 4.18) into two sums.

One sum for even samples and the other for odd samples, see equation 4.19.

Equation 4.20 can be obtained by denoting n = 2r for n even and n = 2r+1 for

n odd.

( ) ( )

)18.4(

; /21

0

NjN

N

n

nkN eWWnxkX π−

=

==∑

( ) ( ) ( )

)19.4(

1)2/(1)2/(

∑∑−

=

=

+=N

oddn

nkN

N

evenn

nkN WnxWnxkX

( ) ( ) ( )

)20.4(

1221)2/(

0

)12(1)2/(

0

2 ∑∑−

=

+−

=

++=N

r

krN

N

r

rkN WrxWrxkX

Page 37: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

37

As for the twiddle factor applies that

we can rewrite the equation 4.20 into the following equation 4.22.

It's obvious that the original equation for the DFT has been split into two

halves, where the first sum represent a N/2-point DFT of even samples and the

second sum is a N/2 - point DFT of odd samples.

Now, let's calculate the computational cost of each form for 8 samples (N =

8): the original form produces N2 multiplication -> 82 = 64 multiplication, however,

equation 4.22 requires only 2(N/2)2 + N multiplication to calculate the same result ->

2(8/2)2+8=40 multiplication.

In order to further develop this concept it's convenient to adopt a graphic

approach based on the signal flow chart. Two basics DSP operations are addition and

multiplication, see figure 4.4.

Figure 4.4: Two basic DSP operation.

Using the signal flow chart, equation 4.22 can be display as shown in figure

4.5.

)21.4(; 2/

22/

)2//(2/222 rkN

kN

kN

rkNN

NjNjN WWWWWeeW ==== −− ππ

( )

)22.4(

)12()2(1)2/(

02/

1)2/(

02/ ∑∑

=

=

++=N

r

rkN

kN

N

r

rkN WrxWWrxkX

Page 38: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

38

Figure 4.5: DSP flowchart display of equation 4.22.

In equation 4.23

G(k) is a N/2-point DFT for even samples and H(k) is a N/2-point DFT for odd

samples. As both, the G(k) and H(k) can be further break up into additional N/4

DFTs, the original 8-point DFT can be viewed as a combination of results of four 2-

point DFTs, see figure 4.6.

Figure 4.6: 8-point DFT expressed with four 2-point DFT.

The expression for the 2-point DFT is

( )

)23.4(

)()( ∑+=k

N

kHWkGkX

)24.4(

)()()(1

0

2/21

02 ∑∑

=

=

==n

nkj

n

nk enxWnxkX π

Page 39: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

39

For k = 0, 1 we obtain

Figure 4.7 shows the 2-point DFT in the signal flow chart.

Figure 4.7: FFT butterfly topology.

The topology in figure 4.7 is referred as the FFT butterfly. If the 2-point

DFTs in figure 4.6 are replaced with FFT butterflies, we obtain a complete 8-point

FFT with decimation in time, see figure 4.8.

Figure 4.8: Complete 8-point FFT.

4.4.3 Computational Cost of FFT with Decimation in Time

If N denotes the number of samples to process, then A = log2(N) is the

number of columns in the signal flow chart. For example, there are A = log2(8) = 3

columns for 8-point DFT, see figure 4.8. In each column there are B = (N/2)

)25.4()1()0()1()0()1(

)1()0()0(2/12 xxexxX

xxXj −=+=

+=− π

Page 40: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

40

butterflies. As one butterfly requires C = 2 multiplication, it's clear that the total

number of multiplication for N-point FFT is:

A⋅B⋅C = log2(N)⋅(N/2)⋅2 = N⋅log2(N)

Page 41: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

41

5. IMPLEMENTATION

The main objective of the DSP part was to create several examples that would

allow students to become familiar with the 'C6701 EVM. The examples should as

well outline different ways of DSP algorithms implementation. For this reason the

examples are grouped into two categories.

First part describes the process of analog-to-digital and digital-to-analog

conversion specific to the 'C6701 EVM, whereas the second part shows

implementation of four most common DSP algorithms: FIR filter, IIR filter, LMS

adaptive filter and Fast Fourier Transform.

5.1 CODEC

This part, which is more focused on practical aspects of software

development for the 'C6701 EVM contains three examples of controlling and setting

properties of the 16-bit stereo codec.

These examples show the process of converting input analog signal into

digital samples that are processed by the 'C67x DSP and further demonstrate how the

digital samples from the DSP processor are transformed into output analog signal.

Since most of DSP algorithms operate on samples acquired from analog signal it is

important to well understand the basics concerning analog-to-digital and digital-to-

analog conversion.

5.1.1 Loopback Example

Within the framework of this example, it is described the way in which a

signal is modified by changing codec's parameters (gain, attenuation, sample

frequency). See figure 5.1.

Page 42: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

42

Figure 5.1: Loopback example.

Two stereo signals from the Mic and LineIn connectors serve as input signals.

Before entering the ADCs, the Mic signal goes through a gain controlled by the

variable MicGain. After analog-to-digital conversion, the samples go through the

Loopback block into the DACs, where analog signal is reconstructed.

The other signal from the LineIn connector goes through a mixer. The mixer's

output is then summed with the output of the DACs and the result signal is lead into

the Out connector of the 'C6701 EVM board. The sampling frequencies of the ADCs

and DACs are the same.

• Mic slider (variable MicGain): Controls the gain before ADCs.

• Sample slider (variable SelSmpFreq): Changes the sampling frequency of ADCs

and DACs.

• Loop slider (variable LoopBackAtten): Attenuation of the loopback block.

• LineIn slider (variable LineInGain): Gain of the analog mixer.

Page 43: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

43

5.1.2 InAndOut example

Figure 5.2: InAndOut example.

Unlike the Loopback example, where sampled signal does not go into the

DSP, the InAndOut example demonstrates data transfer between the DSP and an

external source or sink such as a signal generator or oscilloscope). An input signal is

taken from the LineIn connector and is then lead through the ADCs and the codec's

audio data serial port into the DSP's Multichannel Buffered Serial Port (McBSP).

Once a complete sample has been received, the McBSP triggers an interrupt, which

tells the DSP that data are ready to be read from the MCBSP0_DRR register. In

response to the interrupt, the service routine called MyISR() reads the sample and

writes the same value into the McBSP's transmit register MCBSP0_DXR. As soon as

the value has been written , the McBSP starts transmitting this sample to the codec's

audio data serial port, which passes received bits to the DACs, that change them into

analog signal.

Page 44: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

44

A functionality of this example can be tested in two ways:

1. With a breakpoint in the MyISR() function, you can view the input signal, that is

received by the DSP. In this case the application is not real-time since the

algorithm is interrupted each time the breakpoint is reached.

2. Without the breakpoint, the output signal should correspond to the input signal,

provided the sampling theorem is fulfilled (the sampling frequency is set to

5510Hz).

5.1.3 Generator Example

Figure 5.3: Generator example.

The EVM board can also be programmed to generate basic signals, e.g.

sinusoidal, triangular or square signal. This example shows a practical

implementation of generating such signals and at the same time it allows to change

the amplitude and frequency of the signal. Whereas the algorithms for generation of

triangular and square signal were derived from their geometry, for sinusoidal signal

generation a filtering approach has been used, that is based on following transfer

function:

Page 45: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

45

where R = 1 for a pure sinusoidal signal

fo - desired frequency

fs - sampling frequency

A property of this transfer function is a sinusoidal impulse response. In other

words, the impulse response of such digital filter is a sinusoidal signal of a frequency

fo and the amplitude equal to 1. This can be verified, e.g. in the Matlab.

• Amplitude slider (variable A): Amplitude of the signal, in the units of a 16-bit

signed integer (max. 32767).

• Frequency slider (variable fo): Frequency fo of the signal.

• Signal slider: Sinusoidal, triangular or square signal.

5.2 DSP ALGORITHMS

Four DSP algorithms have been used to introduce different approaches of

implementing DSP algorithms into signal processors. Refer to chapter 4 for

mathematical theory concerning these algorithms and section 3.3 for description of

low-level and high-level implementation of DSP algorithms.

5.2.1 Examples of Low-Level Implementation of DSP Algorithms

Two DSP algorithms, LMS adaptive filter and Fast Fourier Transform were

used to practically show the concept of low-level implementation.

5.2.1.1Least Mean Square Adaptive Filter Example

This application implements the least mean square adaptive filter algorithm,

adjusting coefficients of a finite impulse response filter in such way that the output of

the algorithm traces the desired signal. Mathematical theory regarding LMS adaptive

filter can be found in section 4.3.2.

)1.5(

2;cos21

sin)( 221

1

s

oo

o

o

ff

zRzRzRzH πω

ωω =

+−= −−

Page 46: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

46

Sampling frequency is set to 16 kHz and the order of the FIR filter is 8. The

desired signal is identical with the input signal that is taken from LineIn connector of

the 'C6701 EVM. Out connector provides the output signal.

After successful start of the application, we can see on oscilloscope how the

output signal tries to trace the input signal, which is the result of LMS algorithm

adjusting the taps of the FIR filter.

5.2.1.2Fast Fourier Transform Example

Figure 5.4 shows the result of the Fast Fourier Transform algorithm, whose

mathematical background is described in section 4.4.

Figure 5.4: FFT example.

Sampling frequency is initially set to 44,1 kHz, but can be adjusted with the

Sample slider to a value between 5,5 kHz and 48 kHz. The algorithm executes 256-

point FFT algorithm with decimation in time upon a signal acquired from LineIn

connector of the TMS320C6701 EVM. The magnitude of frequency spectrum of the

input signal is displayed in a graph.

Page 47: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

47

To further increase the speed of the algorithm, the twiddle factor is computed

off-line and during algorithm execution the values of twiddle factor are read from a

look-up table.

• Sample slider (variable ActSmpFreq): Changes the sampling frequency of the

ADCs.

5.2.2 Examples of High-Level Implementation of DSP Algorithms

As mentioned earlier, the Matlab v.6 enables to create an executable for the

'C6701 EVM from a simulink model. This feature of the Matlab provides a

completely new approach of implementing and real-time testing of DSP algorithms.

In this way two DSP filters, FIR and IIR were implemented into the 'C6701 EVM.

For more details concerning the problematic of high-level implementation, refer to

section 3.3.2.

5.2.2.1Finite Impulse Response Digital Filter Example

In order to simplify the procedure of executable generation from a simulink

model, a Matlab Graphical User Interface (GUI) application was created that allows

a design of FIR filter and its implementation into the TMS320C6701 EVM.

Figure 5.5: FIR graphical user interface.

Page 48: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

48

As can be seen from figure 5.5, the order of the filter is set to 17 and the cut-

off frequency can vary between 500 and 3500 Hz. To design FIR filter with window

method three windows are available: the Rectangular, Hamming and Kaiser (β = 14)

window. Further options are as follows:

• Transfer functions for defined window: This option draws magnitude of transfer

function for all defined windows.

• Transfer function for selected window: This option draws magnitude of transfer

function for chosen window.

• Simulink model of FIR filter for select. window: This option creates a simulink

model of FIR filter from given parameters. At this point, the process of executable

generation can be started.

• Info: Provides a short description of the GUI application.

• Exit: Exits the application.

Ones the generated executable has been successfully loaded into the DSP

processor, the FIR filter can be tested by connecting input signal to LineIn connector.

Out connector provides the output signal.

5.2.2.2Infinite Impulse Response Digital Filter Example

Similarly to the FIR filter example, this application is likewise controlled by

means of a GUI application. Figure 5.6 shows the GUI application through which an

IIR filter can be implemented into the TMS320C6701 EVM.

Filter's order is set to 7 and the sampling frequency is 8 kHz. The cut-off

frequency can be set to a value between 500 Hz and 3500 Hz. As the bilinear

transform method is used by the application to design the filter, user can choose

between the Butterworth and Elliptic analog filter prototype. Another options of the

applications are:

• Transfer functions for def. filter prototypes: Displays transfer functions of IIR

filters that are based on the Butterworth and Elliptic analog filter prototype.

• Transfer function for sel. filter prototype: Displays the transfer function of IIR

filter based on the chosen analog prototype.

Page 49: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

49

• Simulink model of IIR filter for sel. prototype: Creates a simulink model

according to the defined IIR filter, which will be further used for executable

generation.

• Info: Shows a short description of the GUI application.

• Exit: Exits the application.

Figure 5.6: IIR graphical user interface.

Once the generated executable has been successfully loaded into the DSP

processor, the IIR filter can be tested by applying a signal to LineIn connector. Out

connector provides the output signal.

Page 50: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

50

6. SUMMARY OF DSP PART

The DSP part of the project describes different approaches to DSP algorithms

implementation into signal processors TMS320C6x whose description is likewise

included in this project.

Introduction to the digital signal processing domain is followed by a chapter

where the TMS320C6x DSP processor and the TMS320C6701 EVM are described.

The TMS3320C6701 EVM was used to practically introduce the concept of low- and

high-level DSP algorithms implementation. Within the framework of the DSP part,

four common DSP algorithms were implemented to show the difference between

classical programming approach and Matlab supported high-level design of DSP

algorithms.

Page 51: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

51

7. INTRODUCTION TO PCI PART

As the computer systems demand high bandwidth architecture, that allows to

implement modern devices such as high resolution graphics, network controller, etc.,

there's growing need for effective interconnects, that also enable devices to be

changed or upgraded with a minimum of effort.

The Peripheral Component Interconnect (PCI) bus, that is the subject of this

report, meets most of the requirements that are imposed by high performance

computer systems. Due to advanced features mentioned in following sections, the

PCI bus is nowadays the most frequently used bus in computer systems.

The aim of the PCI part is to design a PCI based device that would introduce

elementary features of the PCI specification.

The PCI part is divided into four chapters:

Chapter 8: Explores the PCI specification and its main characteristics.

Chapter 9: Introduces the PCI 9050 bus interface chip, which was used in

conjunction with the PLX PCI 9050 Reference Deign Kit to design a PCI device.

Chapter 10: Describes the designed PCI device. Timing diagrams of the

implemented communication protocol are included together with the description of

the software application.

Chapter 11: Contains a summary of the PCI part.

Page 52: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

52

8. PERIPHERAL COMPONENT

INTERCONNECT (PCI) BUS

This chapter describes basics concerning the PCI bus and summarizes its

main features. Furthermore, examples of PCI read and write are included to

practically show the bus protocol. Finally, there's an introduction into the

problematic of the plug and play configuration mechanism.

8.1 INTRODUCTION TO COMPUTER BUSES

As shown in figure 8.1, a computer bus represents a set of parallel lanes to

which several peripherals boards can be attached with the processor at one end.

Figure 8.1: Computer bus.

According to their purpose, buses can be divided into three main categories:

• Address bus: Values on address bus specifies which PCI bus segment, peripheral

and register is being accessed.

• Data bus: The information that is being conveyed.

• Control bus: Controls data transfer operation with a set of rules that is called bus

protocol.

Together with these signals, others can be presented in order to implement

advanced features such as interrupts, DMA or power distribution.

8.1.1 Division of Computer Buses

Computer buses can be divided into following categories.

Page 53: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

53

8.1.1.1Timing

• Synchronous buses: All operations occur on a specified edge of the master clock

signal.

• Asynchronous buses: Operations are driven by signals on control buses without

regard to the master clock signal.

8.1.1.2Architecture

• Non-multiplexed buses: Have separate lanes for address and data.

• Multiplexed buses: As address and data share the same lanes, an address phase is

followed by one or more data phases, which are mutually identified with control

signals.

Further, computer buses can be classified according the number of address

lanes (8, 16, 32, 64), data lanes (1,8,16,32,64), transfer rate, maximum length, or the

number of devices that can be connected to the bus at the same time.

8.1.2 Computer Buses before PCI

The first widely used computer bus was the Industry Standard Architecture

(ISA) bus, which is with its maximum transfer rate of 8 MB/s and 16 MB address

space inadequate for today's computer systems.

Following computer bus denoted as the VESA Local bus increased the data

transfer up to 132 MB/s due to 32-bit wide bus operating at 33 MHz. As the bus was

attached to the processor's local bus directly or though a bus buffer, it was a

processor specific (486 CPU) and with the arrival of the Pentium it was no more

relevant.

8.2 INTRODUCTION TO PCI BUS

The Peripheral Component Interconnect bus is described by a set of

specifications that are maintained by the PCI Special Group Interest (PCI SIG,

www.pcisig.com). This organization provides all necessary information regarding

PCI bus and its implementation.

Page 54: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

54

There are three revisions of the PCI specification available from PCI SIG:

revision 1, revision 2.0, revision 2.1 and revision 2.2 released in 1999.

8.3 KEY FEATURES OF PCI BUS

The main features of PCI bus can be summarized as follows.

• Multiplexed, synchronous 32-bit (64-bit) wide bus operating on 33 MHz (PCI

revision 2.2).

• The maximum theoretical transfer rate is 132 MB/s. Currently defined revision

2.2 can move data with speed of 528 MB/s and the most resent PCI-X with the

speed up to 1 GB/s.

• Any device on the PCI bus with master capabilities can initiate data transfer with

other devices.

• Blocks of data can be moved.

• PCI implements the plug and play configuration. Every device in a system is

automatically configured each time the system is turned on.

• PCI is a 'green architecture' supporting both 3,3 and 5 V signaling environment.

8.4 PCI SIGNALS

Concerning computer buses a number of frequently used terms exist, from

which the most important are:

• Agent: A device that operates on a computer bus.

• Master: An agent that is capable of initiating a data transfer.

• Transaction: A data transfer consisting of one address phase followed by one or

more data phases, known as the burst transfer.

• Initiator: A master that wanted to access the bus and was granted by the central

arbiter to do so.

• Target: An agent that recognized its address during the address phase. The target

responds to the transaction initiated by the initiator.

PCI bus has in total 98 signals, see figure 8.2.

Page 55: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

55

Figure 8.2: PCI bus diagram.

47 signals, resp. 49 signals is the minimum number of signals that is required

by PCI specification to successfully implement a target, resp. master device. Rest of

the signals is not required and serves for optional features such as 64-bit transfer,

JTAG interface, etc.

PCI signals in figure 8.2 can be divided according to their purpose into

several categories: address and data, control, error reporting, arbitration, system, 64-

bit extension, interrupt and JTAG interface signals.

Note: In following sections, a # sign at the end of a signal name means that

the signal is asserted or active in low-level voltage state.

8.4.1 System Signals

• CLK: Provides timing for all PCI transactions.

• RST#: Resets the device by setting its registers to initial states.

8.4.2 Address and Data Signals

• AD[31::0]: Multiplexed address and data. During address phase, they convey

address, whereas during data phases they convey data.

• C/BE#[3::0]: Multiplexed bus command and byte enables. During address phase

they convey the bus command, whereas during data phases they convey byte

enable information.

• PAR: Even parity across AD[31::0] and C/BE#[3::0] signals.

Page 56: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

56

8.4.3 Interface Control Signals

• FRAME#: Master indicates the beginning and duration of a PCI transaction.

• IRDY#: Initiator Ready indicates that the initiator is ready to read or write data.

• TRDY#: Target Ready indicates that the target is ready to write or read data.

• STOP#: Selected target requests the master to terminate the current transaction.

• LOCK#: Multiple transactions are required to complete the transfer operation.

• IDSEL: Initialization Device Select is a chip select used during configuration

transaction.

• DEVSEL#: Device Select indicates that the target recognizes itself as the target

of the current transaction.

8.4.4 Arbitration

• REQ#: Master indicates with this signal to the central arbiter that it desires to use

the bus.

• GNT#: Central arbiter grants the bus to the master.

8.4.5 Error Reporting

• PERR#: Data parity error during all PCI transactions.

• SERR#: Address parity error, or any other serious system error.

8.4.6 Interrupt Signals

INTA# through INTD# are used by the device to indicate to its driver an

event.

8.4.7 64-bit extension

Device uses this group of signals for 64-bit transactions, which can be

executed only if both, the initiator and target support 64-bit transactions.

8.4.8 JTAG Signals

These signals provide means for testing PCI devices.

Other signals not mentioned through sections 8.4.1 – 8.4.8 can be localized

on the PCI interface in order to support advanced features such as power

management, or 3,3 V signaling environment.

Page 57: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

57

8.5 ARBITRATION

PCI is a multi-master bus, which means that any agent that has master

capabilities can act as a bus master and therefore execute data transfer across the PCI

bus.

To become the bus muster, an agent must be before granted by the central

arbiter. For this purpose, two signals REQ# and GNT# are presented. With REQ#

signal an agent indicates to the central arbiter that it desires to use the bus. The

central arbiter then gives the permission to the master to use the bus by asserting its

GNT# signal. Provided the bus is idle - both signals FRAME# and IRDY# are de-

asserted - the master can start a PCI transaction.

8.5.1 BUS Parking

The PCI specification introduces the notion of "bus parking". This option

allows one master to start a transaction without first asking for the bus access with

the REQ# signal, because idle PCI bus has been before "parked" on this agent.

Although any master can become the default master, it is recommended that the last

master that acquired the bus has the GNT# asserted.

8.6 BUS PROTOCOL

A bus protocol is a set of rules that define how data are moved between the

initiator and target by specifying timing for address, data and control signals. This

chapter explains the bus protocol of PCI transactions.

8.6.1 PCI Bus Command

PCI is a multiplexed bus. Two different phases therefore exist within one

transaction: one address phase followed by one or more data phases. During the

address phase, C/BE# lanes convey information about the type of current transaction,

called the bus command. All PCI bus commands are listed in table 8.1.

Although, a PCI device is not obliged to support all types of PCI transaction,

it is required to respond to configuration read and configuration write, so the PCI

BIOS configuration software can access it during boot-up of the system.

Page 58: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

58

C/BE#3 C/BE#2 C/BE#1 C/BE#0 COMMAND TYPE

0 0 0 0 Interrupt Acknowledge

0 0 0 1 Special Cycle

0 0 1 0 I/O Read

0 0 1 1 I/O Write

0 1 0 0 Reserved

0 1 0 1 Reserved

0 1 1 0 Memory Read

0 1 1 1 Memory Write

1 0 0 0 Reserved

1 0 0 1 Reserved

1 0 1 0 Configuration Read

1 0 1 1 Configuration Write

1 1 0 0 Memory Read Multiple

1 1 0 1 Dual-Address Cycle

1 1 1 0 Memory Read Line

1 1 1 1 Memory Write and Invalidate

Table 8.1: PCI Bus commands.

Read and write operations can be executed upon three address spaces:

memory, I/O and configuration space. Configuration space is used only at boot-up

time to configure all PCI devices. Memory differs from I/O space by being pre-

fetchable, which means that multiple reads from memory give the same results.

Further, there are additional bus commands.

• Memory read line, memory read multiple, memory write and invalidate: PCI

transactions that are optimized for cache reads and writes.

• Interrupt acknowledge: System interrupt controller reads corresponding vector

from the target.

• Special cycle: Message broadcast. All the PCI devices that allow special cycle

transaction receive this message.

• Dual address cycle: With transaction, 32-bit agents can access 64-bit address

space.

Page 59: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

59

8.6.2 Byte Enable

As 8-, 16- and 32-bit data can be moved between the initiator and target, the

initiator has to indicate which byte lanes convey data. For this purpose serve C/BE#

signals, that specify during each data phase which bytes are valid.

Though only some or none byte may be enabled, the agent driving the AD

bus is required to drive all 32-bit AD bus to their stable values.

8.6.3 Basic PCI Transactions

This section explains basic read/write data transfer between the initiator and

target.

8.6.3.1Read Transaction

Figure 8.3 shows data transfer from the target towards the initiator.

Figure 8.3: PCI read transaction.

This transaction consists of following steps:

1. The bus is idle and most signals are tri-stated. The master for following

transaction has received its GNT# and detected that the bus is idle.

2. Address phase: The master drives the FRAME# low and places the address of the

target on the AD bus and the bus command.

3. The master asserts appropriate lanes of C/BE# signals and also asserts IRDY# to

indicate that it is ready to accept data from the target. The target that recognizes

its address on the AD bus asserts DEVSEL#. This is also a turnaround cycle (one

wait state between the address phase and first data phase), because in read

Page 60: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

60

transaction the master drives the AD lines during the address phase and the target

drives it during the data phase.

4. The target places data on the AD bus and asserts TRDY#. The master latches the

data on the rising edge of clock 4. Data transfer takes place on any clock cycle

during which both IRDY# and TRDY# are asserted.

5. The target de-asserts TRDY# indicating that the next data element is not ready to

transfer. Nevertheless, the target is requires to continue driving the AD bus. This

is a wait cycle.

6. The target has placed the next data item on the AD bus and asserted TRDY#. Both

IRDY# and TRDY# are asserted so the master latches the data bus.

7. The master has de-asserted IRDY# indicating that is not ready for the next data

element. This is another wait cycle.

8. The master has re-asserted IRDY# and de-asserted FRAME# to indicate that this

is the last data transfer. In response the target de-asserts AD, TRDY# and

DEVSEL#. The master de-asserts C/BE# and IRDY#. This is master-initiated

termination.

8.6.3.2Write Transaction

Figure 8.4 shows details of a typical write transaction, where data move from

the initiator to the target.

Figure 8.4: PCI write transaction.

Page 61: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

61

As can be seen, the main difference between the read and write transaction is

the absence of turnaround cycle between the address phase and first data phase, since

in this case, it is the initiator who is driving the AD bus lanes during both phases.

8.6.4 Latency

PCI specification defines several types of latency: arbitration, acquisition and

initial target latency, see figure 8.5.

Figure 8.5: Bus latency.

The length of latency is influenced by the parameters of PCI device that are

described in following sections.

8.6.4.1Latency timer

PCI devices have an internal countdown latency timer. The timer is loaded

with a defined value each time the masters asserts FRAME# signal. This value is

decremented with following clocks and once the counter reaches zero, the master is

obliged to terminate its transaction.

8.6.4.2DEVSEL# Latency

The selected target is required to assert its DEVSEL# signal within three

cycles from assertion of FRAME# signal, otherwise the initiator terminates after

fourth clock from the beginning of the transaction.

8.6.4.3IRDY# / TRDY# Latency

With IRDY# and TRDY# signals, the initiator and the target indicate its ready

condition for data read/write. Yet, there are some restrictions. The initiator must

assert its IRDY# signal within 8 clocks from assertion of FRAME# signal and 8

clocks between following data phases. Similarly, the target is required to assert its

Page 62: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

62

TRDY# signal within 16 clocks from FRAME# signal assertion and within 8 clocks

between two data phases.

8.6.5 Error Detection and Reporting

All bus agents are required to generate even parity over AD and C/BE# lanes.

The result of even parity is placed on PAR lane, so receiving agent can detect it.

Parity error can be detected during address or data phase.

In case the parity error was detected during data phase, respective receiving

agent may asserts its PERR# signal. If even parity was detected during address

phase, any receiving agent can asserts SERR# signal. The assertion of SERR# signal

should be considered as a fatal condition and handled appropriately with non-

maskable interrupt.

8.6.6 Target-Initiated Termination of Transaction

A transaction can be terminated either by the master or target. In first case,

shown in the previous paragraph, the master uses the signals FRAME# and IRDY#

to terminate the transaction. If it is the target who wants to terminate, it asserts

STOP# signal. There are two types of target-initiated-disconnect:

• Target Disconnect (figure 8.6): DEVSEL# and STOP# are asserted at the same

time. The target is not ready to execute another data phase.

Figure 8.6: Target disconnect.

Page 63: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

63

• Target Abort (figure 8.7): DEVSEL# is de-asserted, STOP# is asserted. The

target has experienced some fatal error condition.

Figure 8.7: Target abort.

8.7 ADVANCED FEATURES OF PCI BUS

PCI specifications define together with the basic bus protocol additional

features, which extend its capabilities. Yet, these optional features are not required to

successfully implement an elementary PCI master/target device.

8.7.1 Interrupt Handling

The PCI bus provides four interrupt signals for each device.

Interrupts are defined as assertion low, level sensitive and asynchronous to

the PCI bus master clock. With an interrupt, PCI device requests attention from its

device driver and stays set until the device clears the condition that caused the

interrupt.

Interrupt acknowledgement bus command is used to read the interrupt vector

(8-bit for x86 processors) from the target.

8.7.2 Special Cycle

The special cycle provides a mechanism to send information to multiple

targets that are enabled to respond to special cycle bus command. An example of

broadcast information may be the processor status such as halt or shutdown.

Page 64: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

64

8.7.3 64-bit extension

64-bit data transfers are executed if both, the target and the master implement

64-bit extension signals. In this case, the maximum data transfer rate can be

increased up to 264 MB/s (33 MHz bus clock).

Another optional mechanism with 64-bit characteristic allows 32-bit agents to

access 64-bit (4 GB) address space. This is accomplished with two address phases

and dual address bus command.

8.8 PLUG AND PLAY CONFIGURATION

Before the PCI bus, devices had to be set manually to determine their

parameters regarding resource requirements such as memory space, I/O space,

interrupts, etc. Incorrect device configuration often led to hardware conflicts, which

were difficult to detect.

In order to simplify system modification, PCI supports the plug and play

feature, allowing a system to be automatically configured at boot time. Each PCI

device provides information about its resource requirements that are used by the PCI

BIOS configuration software to determine system topology. Once the configuration

software has enough information about the system, it assigns non-conflicting

resources to each PCI card.

8.8.1 PCI Configuration Space

PCI specification defines third addressable space called the configuration

space, in which every PCI device gets 256 Bytes. Based on information read

from/written into the device, we can determine its actual status and change

operational mode.

Reads and writes into the configuration space are executed via two registers

CONFIG_ADDRESS and CONFIG_DATA with the configuration read and

configuration write bus command, see section 8.6.1.

CONFIG_ADDRESS identifies bus segment, device, logical function and

configuration register. Configuration data are located in the CONFIG_DATA

register.

Page 65: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

65

8.8.2 Structure of Configuration Space

First 64 Bytes of the configuration space are reserved for a configuration

header that contains identifications and information effecting operational mode of the

device. Remaining 192 Bytes are available for device specific configuration

functions.

8.8.2.1Configuration Header

Although three different types of configuration headers exist, type 0

configuration header, described in this section is used by most of PCI devices.

Structure of type 0 configuration header is shown in figure 8.8.

Figure 8.8: Type 0 configuration header.

8.8.2.1.1Identification Registers

• Vendor ID: This value, assigned by PCI SIG organization identifies the vendor of

the device.

• Device ID: This value assigned by the vendor identifies the device.

• Revision ID: Version of the device.

• Subsystem Vendor ID, Subsystem Device ID: Used in case of multifunctional

device.

Page 66: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

66

• Class code: Defines the basic functional category (storage controller, network

controller, sound card) and specifies its implementation.

8.8.2.1.2Command Registers

Enables to control device behavior. Command register determines to which

PCI cycles the PCI device will respond, or which PCI cycles will be able to generate.

• Respond to PCI memory and or memory access.

• Respond to PCI memory space access.

• Be able to act as a bus master.

8.8.2.1.3Status Register

Status register defines the actual status of the device concerning events such

as target abort, system or parity error. Further, it provides additional information

about device's capabilities, e.g. 66-bit operation support, DEVSEL# timing, etc.

8.8.2.1.4Built-in Self Test Register

This register provides a mechanism for self-testing the device. It enables to

determine self-test support of the device and invoke built-in test and check the result.

8.8.2.1.5Optimization Registers

Latency timer, Cash line size, Max_Lat and Min_GNT belong to a group of

optimizing registers allowing designers to optimally set system performance by

modifying values that effect timing diagrams of PCI transactions.

8.8.2.1.6Base Address Registers (BAS)

The base address registers provide the mechanism, which allows PCI BIOS

configuration software to determine requirements on memory or/and I/O address

space. Once the system topology is determined, the PCI BIOS configuration software

writes the corresponding non-conflicting address ranges into the base address

register. Type 0 configuration header supports up to six base address registers, each

containing the start address of independent address space.

Address space can represent memory space or I/O space. In case of memory

space, it can be placed anywhere in 32-bit or 64-bit space. On top of that, memory

Page 67: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

67

space can be denoted as prefetchable or non-prefetchable, which means that multiple

reads of the same memory location provides the same result.

If the address space represents I/O space, then it can be located only in 32-bit

space and it can not be denoted as prefetchable.

8.8.2.1.7Expansion Bus ROM Base Register

Similarly to the base address register, the expansion bus ROM base register

contains a value representing the base address of ROM memory.

8.8.2.2Capabilities List

PCI specification revision 2.2 specifies a new mechanism of providing

additional information about a PCI device. The capabilities list resides in the device-

specific portion of function's space, that is in 192 Bytes after 64 Bytes of the

configuration header.

The presence of capabilities list can be determined by respective bit in the

status register. The CapPntr field in the configuration header (see figure 8.8) contains

the offset to the first element of the list.

The capabilities list is implemented as an open-linked list, where each item

consists of 8-bit ID, an 8-bit offset to the next element in the list followed by

additional bytes, see figure 8.9.

Figure 8.9: Structure of capabilities list.

The capabilities list is used to identify new and optional PCI features.

Page 68: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

68

8.8.2.3Vital Product Data

Vital product data is additional information such as part number, serial

number. It can be also used to store data about performance and failure.

Vital product resides in a storage device such as EEPROM on a PCI device.

8.8.3 PCI BIOS

The PCI BIOS provides a system independent means for access into

configuration space of a PCI device. The BIOS is accessible from all operating

modes of the x86 processors.

The PCI BIOS functions enable to identify PCI resources (find PCI device,

find PCI class code), access PCI configuration registers and use PCI functions (e.g.,

generate special cycles).

Bibliography reference [4], [5] and [6] provide information and detailed

description of PCI bus and its use for data transmission.

Page 69: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

69

9. PLX HARDWARE AND SOFTWARE

DEVELOPMENT TOOLS

This chapter describes hardware and software tools that were used to develop

an application that is capable of data transfer across the PCI bus. For this purpose,

the PCI 9050 Reference Design Kit (RDK) and PCI 9050 Software Design Kit

(SDK) were chosen. Both, the PCI 9050RDK and PCI 9050SDK are products of

PLX Technology (www.plxtech.com). The PCI 9050RDK and PCI 9050SDK

provide complex means for development of PCI based applications.

9.1 PCI 9050 BUS TARGET INTERFACE CHIP

The PLX PCI 9050RDK is a complete hardware development tool, which is

suitable for development of PCI based applications. The core of the PCI 9050RDK is

a PCI 9050 bus interface chip, that together with I/O daughter card connector, test

headers and a breadboard area allow the user easy implementation and testing of a

new circuitry.

Figure 9.1: PCI 9050 bus interface chip.

The PCI 9050 bus interface chip provides PCI bus slave interface for adapter

boards. It is designed to connect a wide variety of local bus designs to the PCI bus

and allows local bus circuitry to achieve up to 132 MB/s burst transfers.

Page 70: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

70

The PCI 9050 chip contains read and FIFO between the PCI and a local bus.

The PCI9050 also provides five local address spaces and four chip select signals.

9.1.1 PCI 9050 Main Features

The main features if the PCI 9050 chip, shown in figure 9.1 can be

summarized as follows:

• Compliant with PCI specification revision 2.1, supporting low-cost slave

adapters.

• Support of burst transfers to memory space.

• Interrupt generation.

• Internal local bus clock can run independently of the PCI bus master clock.

• Programmable local bus configuration supports 8-, 16- and 32-bit local bus in

multiplexed or non-multiplexed mode.

• Serial EEPROM interface for a memory that can be used for loading

configuration information.

• Five local independent address spaces.

• Four local chip select signals.

• Possibility of modifying timing diagrams of local bus data transfers.

9.1.2 PCI Bus Interface of PCI 9050 Bus Interface Chip

The PCI 9050 is compliant with PCI specification revision 2.1 and supports

all PCI bus functions as a direct slave interface chip.

As a target, the PCI 9050 chip allows access to its internal registers and local

address spaces. Data transfer can be either 8-, 16- or 32-bit and all bus commands

listed in table 8.1 are supported.

9.1.3 Local Bus Interface of PCI 9050 Bus Interface Chip

The local bus provides a data path between the PCI bus and a non-PCI device.

The PCI 9050 chip, that is the local bus master is responsible for data transfer

between the PCI and local bus.

The local bus can be viewed as a set of lanes with address, data and control

signals to which a user specific circuitry can be attached. The PCI 9050 as the local

bus master mediates communication between the PCI and local bus.

Page 71: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

71

8-, 16- and 32-bit wide data transfers are supported by the local bus and

depending on the settings, it can operate in multiplexed or non-multiplexed mode.

There are four independent address spaces, each one containing a set of

configurations registers that determine local bus characteristics when particular

address space is accessed.

9.1.3.1Local Bus Signals

Similarly to the PCI signals described in section 8.4, local bus signals can be

divide into several groups depending on their purpose: address/data, control/status

and arbitration signals.

9.1.3.1.1Address and Data Signals

• LA[27::2]: Convey address.

• LAD[31::0]: During data phases, they contain data.

9.1.3.1.2Control and Status Signals

• ADS#, ALE#: Local bus access starts when ADS# and ALE are asserted,

indicating valid address on address lanes.

• LBE#[3::0]: Indicate which byte lanes convey valid data.

• LRDY#: If this bus signal is enabled it indicates that the device is ready to be

read from, or written to.

• LW/R#: Indicates data transfer direction.

• WAITO#: Provides status of the internal wait state generator, which can be used

to modify timing diagrams of local bus transactions.

• RD#, WD#: General purpose signals used to indicate to the local device

read/write operation.

9.1.3.1.3Local Bus Arbitration

• LHOLD: Asserted by a device to indicate that it desires to access the local bus.

Page 72: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

72

• LHOLDA: PCI 9050 grants with this signal local bus control to a device, that

before asked for the access with LHOLD signal.

Above listed signals belong to the main ones presented on the local bus. For

more detailed description.

9.1.3.2Modification of Local Bus Timing Diagrams

Write/Read cycle time can extended with internally generated wait states,

and/or with delaying LRDY# signal.

9.1.4 Single Cycle Write and Read

Figures 9.2 and 9.3 shows details of a single write and read on the local bus.

Figure 9.2: Single local bus write.

Figure 9.3: Single local bus read.

9.1.5 PCI Configuration Registers and Local Configuration Registers

PCI configuration registers are grouped in a structure called the configuration

header as described in section 8.8.2.1.

Page 73: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

73

Local configuration registers is a set of twenty one 32-bit registers whose

values determine local bus characteristics of five local address spaces such as local

base address, address range, timing characteristics, chip select, etc.

9.1.6 Serial EEPROM

During POWER-ON, the PCI 9050 RST# signal resets the default values of

the internal registers of PCI 9050 chip. In response to RST# signal, the PCI 9050

outputs the local LRESET signal and checks for a serial EEPROM. If the serial

EEPROM exists, internal registers are set according to the values stored in the

EEPROM. Otherwise, default values are used.

9.1.7 Local Chip Select

The PCI 9050 provides four chip select signals to selectively enable devices

that are attached to the local bus. Each signal is programmable via four chip select

base address registers. Without this feature, external address decoding logic would be

required to implement chip select signals.

9.2 PCI 9050 REFERENCE DESIGN KIT (RDK)

The PCI 9050RDK was used in this project to investigate the possibilities of

PCI bus and its use for PCI based data transfer.

The PLX Technology, the manufacturer of the PCI 9050RDK provides as

well software support, which together with the PCI 9050RDK allows complete

development environment for PCI oriented applications.

The core of the PCI 9050RDK is composed of the PCI 9050 bus interface

chip, that is described in section 9.1. The main purpose of this development board is

to enable developers fast and easy conversion of existing ISA cards (sound cards,

network cards, etc.) into PCI compliant boards, which would have all advantages

resulting from PCI specifications such as high transfer rate, plug and play, etc. For

this reason, one piggyback ISA slot is located on the PCI 9050RDK to which a

functional 8- or 16-bit ISA card can be connected.

Furthermore, on the PCI 9050RDK, there are test headers, I/O daughter card

containing most of the local bus signals from the PCI 9050 chip. These connectors in

Page 74: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

74

conjunction with a breadboard area can be put to use for PCI implementation of user

built circuitry.

9.2.1 Main features of PCI 9050 Reference Design Kit

The main features of the PCI 9050RDK board can be summarized as follows:

• PCI 9050 bus interface chip: PCI specification revision 2.1 compliant board

based on PLX PCI 9050 bus interface chip.

• Generic and ISA bus: PCI 9050RDK with on-board PCI-to-ISA conversion logic

and a piggyback ISA slot support ISA bus adapters.

• User circuitry support: Large prototype area and test header with a daughter card

connector simplify circuitry development.

• PLXMon: PLXMon provides a comprehensive tool for PCI bus monitoring and

debugging.

Figure 9.4: PLX PCI 9050RDK block diagram.

9.2.2 PCI 9050RDK Subsystems

This section describes subsystems located on the RDK. As can be seen in

figure 9.4, the hardware of the PCI 9050RDK consists of the following subsystems:

• PCI slot interface

• PCI 9050 bus interface chip

• SRAM memory subsystem

Page 75: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

75

• ISA slot and PCI-to-ISA conversion logic to facilitate the conversion of existing

ISA boards into a PCI platform.

• Daughter card connector to which a new circuitry can be attached.

• Prototyping area and headers for testing and experimenting

9.2.2.1PCI Interface

The PCI 9050RDK, that is fully compliant with PCI specification revision 2.1

can be directly plugged into a PCI slot. As it is target only, the PCI mastering signals

are not used.

9.2.2.2PCI 9050 Bus Interface Chip

The PCI 9050 bus interface chip that represents the RDK core is responsible

for appropriate interface between the PCI bus and the RDK subsystems that are

connected to the local bus. See section 9.1 for more details concerning the PCI 9050

bus interface chip.

With jumpers we can set operational characteristics of the local bus such as:

• Multiplexed or non-multiplexed local bus.

• Local bus clock frequency: Local bus clock that runs asynchronously with

respect to the PCI bus master clock can be set to any frequency up to 40 MHz.

With jumpers the supported frequencies are 8 MHz, 16 MHz and 33 MHz.

• Local bus interrupts: Two local interrupts can be either user defined or routed to

ISA IRQ and ISA bus error signal.

• Local bus chip selects: in total four chip selects can be user mapped or can be

connected to the ISA or SRAM subsystems.

9.2.2.3SRAM Subsystem

There is a 32-bit wide static random access memory (SRAM) supplied on the

RDK to demonstrate memory accesses from the PCI bus. The 32 k DWORD deep

memory operates at 33 MHz and is capable of zero wait state read/write.

Page 76: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

76

9.2.2.4ISA Subsystem

The main purpose of the PCI 9050RDK is to upgrade an existing ISA bus

designs into their PCI counterparts. For this reason, on the PCI 9050RDK, one

piggyback ISA slot is supplied to which an ISA card can be plugged. One MACH210

Programmable Logic Device (PLD) takes care of signal conversion between the PCI

and ISA bus.

9.2.2.5Daughter Card Connector and Prototyping Area

Along with the daughter card area, test headers provide most of the local bus

signals allow easy signal monitoring. In conjunction with a prototyping area, this

subsystem is ideal for simple device development and testing.

9.3 PCI 9050 SOFTWARE DESIGN KIT (SDK) AND PLXMON

PLXMon by PLX Technology is a user interactive program that is designed

not only for working with PCI cards belonging to the PLX family, but also it can be

used for low-level control of non-PLX compliant PCI cards.

Both versions, DOS and Windows 95 based allow for generic PCI cards to:

• Select a PCI device.

• Examine and modify device via its configuration registers.

• Examine and modify memory on the device (32-bit addressing).

• Examine and modify I/O space of the device (16-bit addressing).

With PLX device family, PLXMon provides means for examination and

modification of:

• Local configuration registers

• Serial EEPROM

Page 77: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

77

10. DESIGNED PCI DEVICE

In order to show a practical use of the possibilities of PCI bus, a simple

device was developed that is capable of data transfer via the PCI bus. The device,

which is described in following section allows data exchange between two

computers. Transfer data are written via PCI bus into the device from where they are

read by another computer via its parallel port (LPT). Two programs show the

functionality of the device.

As the purpose of this elementary example is to practically introduce basic

features of PCI bus, more advanced techniques are not employed.

10.1 APPLICATION OVERVIEW

The block diagram of the application is shown in figure 10.1.

Figure 10.1: Block diagram of the application.

As was mentioned, the example can be used for data transfer between two

computers. To outline the function of the device, data transfer can be divided into

several steps:

• In the software layer of PC1, the program sends data on the PCI bus.

• In the hardware layer of PC1, data are read from PCI bus and stored by the

PCITOLPT device, which is the PCI 9050RDK with added circuitry to latch data

from PCI bus.

Page 78: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

78

• Parallel port of PC2 acquires data from the PCITOLPT device.

• Program in the software layer of PC2 displays and/or processes data gained from

PC1.

The communication is one way, from PC1 (PCI) towards PC2 (LPT) and

transfer data are 8-bit wide.

10.2 HARDWARE PART OF DEVICE

As data written to PCI bus are valid for a certain period time determined by

PCI control signals, the PCI device must be able to recognize when data are valid

and then store them, so they are ready for further processing. The PCI 9050RDK

together with a simple circuitry was used to accomplish this task.

10.2.1 Latch Circuitry on PCI 9050RDK

The designed circuitry described in this section is used to store 8-bit data

acquired from PCI bus.

The simplified scheme of the circuitry that is located between the local bus of

the PCI 9050 chip and parallel port is shown in figure 10.2. See appendix A2 for

complete scheme of the latch circuitry.

Figure 10.2: Scheme of the latch circuitry.

Page 79: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

79

The PCITOLPT device is required to meet following demands:

• When its address appears on the PCI bus, it must recognize it and latch 8-bit data

during subsequent data phase.

• Set a busy flag indicating that data are ready and notify parallel port.

• It must allow parallel port to de-assert the busy flag once data has been read from

the PCITOLPT device.

• Further the device must be able to set its data output pins to high impedance

state.

10.2.2 Timing Diagrams

The communication protocol of data transfer between PCI bus and parallel

port can be divide into several phases, during which data are read by the PCI 9050

chip into its write FIFO, from where they are moved to the local bus and captured by

the octal latch. Latched data are then read by the parallel port of another computer.

10.2.2.1PCI Bus to PCI 9050 Write FIFO Phase

During this phase, data are moved from PCI bus into write FIFO of the PCI

9050 chip, see figure 10.3.

1. PCI bus is idle. Bus master can start the transaction.

2. Bus master puts valid address of the PCI9050 device, bus command and asserts

the FRAME# signal.

3. PCI 9050 chip recognizes its address and asserts its DEVSEL# signal.

4. When the PCI 9050 chip is ready, it asserts TRDY# signal to indicate that is ready

to write data into its internal FIFO. With this clock, the PCI bus to PCI 9050 FIFO

transaction is complete and data are now in the FIFO.

Page 80: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

80

Figure 10.3: Data transfer between PCI bus and PCI9050 write FIFO.

10.2.2.2PCI 9050 Write FIFO to Octal Latch Phase

During this phase, data are moved from write FIFO to local bus and latched in

the latch circuitry, see figure 10.4. This phase is synchronous with the local bus clock

(8 MHz), which runs asynchronously with respect to the PCI bus master clock.

1. Valid address is put on the local address bus.

2. Signal ADS# indicates valid address on the local bus. At the same time PCI 9050

chip uses its internal address decoder to decode the address. As a result it asserts

the chip select signal (CS#) signal.

3. Valid data are presented and WR# signal is asserted. At rising edge of the local

bus clock, data are stored in the octal latch. IRQ signal is asserted to indicate to

the parallel port that data are ready to be read from the latch.

4. Data from write FIFO are stored in the latch circuitry and an interrupt from

parallel port is set.

Page 81: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

81

Figure 10.4: Data transfer between PCI 9050 FIFO and latch circuitry.

10.2.2.3Octal Latch to Parallel Port Phase

During this phase that is asynchronous, data are moved between the octal

latch and parallel port, see figure 10.5.

Figure 10.5: Data transfer between latch circuitry and parallel port.

1. Once IRQ signal is asserted, an interrupt from parallel port indicates to its driver

that data are ready to be read from parallel port. The interrupt service routine

belonging to the interrupt reads data from parallel port.

Page 82: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

82

2. The interrupt service routine indicates with AckIrq signal that data has been

successfully read form PCITOLPT device. AckIrq signal de-asserts IRQ signal.

10.2.3 Parallel Port Configuration

In present configuration, the parallel port works in bi-directional mode and

two pins are used for data flow control. Input signal IRQ is used to trigger interrupt,

whereas output signal AckIrq clears the interrupt after data have been read.

10.2.4 Application Registers

The application uses a set of registers of the PCI9050 and parallel port to

read/write data and to get status information.

10.2.4.1PCITOLPT Registers

• Base Address Register2: This register contains the starting address of the first

local address space of the PCI 9050 chip. Write into this address space asserts the

chip select CS# signal, that controls the octal latch.

• Base Address Register0 + offset (0x50): Local control register, that controls the

pins User 0 and User 1. With this register, the application is able to find out the

status of the input pin User 0 and control output pin User 1.

10.2.4.2LPT Registers

• BASE(0x378): Base address of the parallel port. In the bi-directional mode, it

contains the values on data pins 0 - 7.

• BASE + offset (0x2): Control register of the parallel port. With this register, it is

possible to enable interrupt and set the parallel port into bi-directional mode.

10.3 SOFTWARE PART OF DEVICE

This section describes the software application that was used to demonstrate

communication between two computers.

10.3.1 Device Driver

As the software application is required to run on the Windows operating

system, which does not allow direct access to hardware registers, device drivers for

Page 83: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

83

the PCITOLPT device and parallel port had to be developed. The WinDriver

program by KRFTech (www.krftech.com) was chosen to accomplish the task.

The process of device driver creation using the WinDriver tool consists of

several steps:

1. After star of the WinDriver, a device can be chosen from a list. This phase also

allows to create an *.ini file, that is used by the Windows to register the device.

2. WinDriver detects all accessible registers of the device.

3. User then chooses which registers, he/she wants to read/write from a Windows

application. Further, it is possible to assign a name to them, which will be used in

the function calls to the device driver. Interrupts are also supported by WinDriver.

4. WinDriver then creates two files *_lib.h and *_lib.c that contains functions for

accessing registers defined in step 3. After including these files into a software

application project, it is possible to control these registers from a Windows

application.

With the WinDriver following functions were created for the PCI2LPT

device (PCI2LPT_lib.*):

• PCI2LPT_WritePciByteLatch: Writes data at the address stored in the Base

Address Register2.

• PCI2LPT_WriteControl, PCI2LPT_ReadControl: Writes/reads data to/from the

local control register. These functions controls both pins, User 0 and User 1.

With the WinDriver following functions were created for the parallel port

(LPT_lib.*):

• LPT_ReadData: Reads data from the Base register of parallel port.

• LPT_WriteControl: Writes into the control register and in this way it can set

operating mode of the parallel mode (interrupt, bi-directional mode).

10.3.2 Example Software Application

Under Windows environment using the Microsoft Visual C++ ver.6, a

software application was developed that demonstrates data transfer between two

computers. To enable others to go through the source code, the software application

was compiled as a Win 32 Console Program.

Page 84: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

84

The software application consists of two stand-alone programs, WritePCI and

ReadLPT. WritePCI program writes byte data into the PCITOLPT device. ReadLPT

program reads data from the PCITOLPT device via parallel port. The programs

communicate through the PCITOLPT device and parallel port in such a way that the

text written in WritePCI appears in the window of the ReadLPT program, see figure

10.6 and 10.7.

Figure 10.6: WritePCI program writes data into the PCITOLPT device.

Figure 10.7: ReadLPT program reads data stored in the PCITOLPT device.

Page 85: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

85

WritePCI program runs on the computer equipped with the PCITOLPT

device, whereas ReadLPT program runs on the other computer with a bi-directional

parallel port. The two computers are connected as shown in figure 10.1.

Page 86: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

86

11. SUMMARY OF PCI PART

The PCI part of this project introduces the Peripheral Component

Interconnect bus and explores its possibilities for data transfer.

To practically show the use of PCI bus, a circuitry was built into the PLX PCI

9050RDK board that together with parallel port allows simplex data transfer between

two computer systems. The functionality of the configuration was successfully

verified with a Windows based application, which enables to send written text from

one computer system to another.

Page 87: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

87

CONCLUSION

The purpose of the DSP part of the project was to develop several examples

that would allow students to familiarize themselves with the domain of digital signal

processing.

One group of examples is more focused on practical aspects of the

TMS320C6701 Evaluation Module, that was used in this project to design and test

DSP algorithms. Applications in this group make possible to change operational

characteristics of 16-bit codec and thus thoroughly investigate the process of analog-

to-digital and digital-to-analog conversion, which is very important, since in many

cases both, the input and output signals are analog and hence we have to be able to

convert them from analog to digital form and vice versa.

The examples in second group introduce different approaches of

implementing DSP algorithms into digital signal processors. There are basically two

ways: low- and high-level implementation. Within the framework of introducing the

low-level approach, the Least Mean Square (LMS) adaptive filter and Fast Fourier

Transform (FFT) algorithms were implemented in C language. The Matlab and its

ability to generate an executable for the TMS320C6701 EVM from a simulink model

was employed with high-level implementation of the finite and infinite response

digital filters. A simple graphical user interface was designed to simplify the task of

converting a simulink model into the executable that can be run on the

TMS320C6701 EVM.

Mathematical theory concerning implemented DSP algorithms together with

a description of the TMS320C6701 EVM and TMS320C6701 DSP processor are as

well part of the project.

In future, more complex DSP algorithms could be tested with the Matlab in

order to fully test the possibilities of high-level implementation. Furthermore,

optional features of the 'C6701 digital signal processor could be exploited. For

instance, the DSP/BIOS and its real-time data exchange might be used for the

purpose of data transfer between the DSP core and an OLE client such as the Matlab,

or Microsoft Excel.

Page 88: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

88

The main goal of the PCI part of the project was to design an PCI based

device that would allow to investigate the possibilities of data transfer via the PCI

bus. Further, it was desired that the application handling the PCI device runs on

Windows environment. For this reason, the report likewise explains the process of

device driver development using the WinDriver tool.

The PCI part starts with a short introduction into the PCI bus, which presents

PCI bus features such as bus arbitration, bus protocol and the plug and play

mechanism. The chapter is followed by description of the PCI 9050 bus interface

chip, which was together with the PLX PCI 9050RDK development board employed

in this project to develop a PCI compliant device.

The designed PCI device exploits the PCI bus for simplex byte data transfer

between two computers. Data are latched through the PCI 9050 chip in the built

circuitry, from where they can be read via the parallel port of another computer. An

implemented communication protocol between the PCI device and parallel port

assures proper data flow between the two computer systems.

With the view of demonstrating the possibilities of the designed PCI device,

two Windows based programs were created. First program transfers typed characters

into the PCI device, whereas the other program reads them out via parallel port.

Developed PCI device is rather elementary example of PCI use that exploits

merely basic functionality of the PCI bus. It is therefore well possible to considerable

enhance this application, or even start a new project that would practically put in use

advanced features of PCI bus, e.g. burst data transfer, 64-bit extension, special cycle,

bus mastering, etc. Yet, these features would require different development kit, since

the PLX PCI 9050 Reference Design Kit does not support all the PCI bus advanced

techniques. On the other hand, the PLX PCI 9050RDK is equipped with ISA

subsystem, that could be used without additional circuitry for ISA card conversion

into its PCI counterpart.

Page 89: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

89

BIBLIOGRAPHY

[1] Dahnoum Naim: “Digital Signal Processing Implementation Using the

TMS3206000 DSP Platform”, Prentice Hall, 2000, ISBN 0-201-61916-4.

[2] Vich Robert, Smekal Zdenek: “Cislicove filtry”, Academia, 2000, ISBN

80-200-0761-X.

[3] Stephen J. Chapman: "Matlab Programming for Engineers", Brooks/Cole,

2002, ISBN 0-534-95151-1.

[4] Solari E., Willse G.: " PCI Hardware and Software", Annabooks, 1996.

[5] Shanley T., Anderson D.: " PCI System Architecture", Addison-Wesley, 1998.

[6] Abbott Doug: “PCI Bus Demystified”, LLH Technology, 2000, ISBN

1-878707-60-4.

Page 90: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

90

APPENDIX

A1 EXAMPLE OF EXECUTABLE GENERATION

This section shows step by step the process of converting a simple simulink

model into an executable file, that can be loaded and run on the 'C6701 EVM.

1. Create a simulink model: Figure A1.1 shows a simulink model that can be

converted into a DSP executable by the Real-Time Workshop.

Figure A1.1: Simulink model to be converted into executable.

2. In the Solver tab of the Simulation Parameters of the Simulation menu, set the

Stop time to inf and the Type to Fixed-point as shown in figure A1.2.

Figure A1.2: Setting of the solver.

3. Go to the Real-Time Workshop tab of the Simulation Parameters of the

Simulation menu.

Page 91: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

91

4. In the Category Target Configuration push the Browse button and choose the

system file called ti_c6701evm.tlc. This step sets the proper compile and build

parameters for the 'C6701EVM target, see figure A1.3.

Figure A1.3: Setting the Real-Time Workshop parameters.

5. In the Category choose TI C6701EVM runtime and in the menu Build action set

Build_and_Execute. With this option the Real-Time Workshop compiles the

simulink model, creates the source files, invokes the Code Composer Studio,

where the source files are compiled and linked into one executable file. After

this, the Real-Time Workshop loads and runs the executable on the 'C6701 EVM.

Figure A1.4: Press the Build & Run button to execute the build process.

Page 92: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

92

6. The build process is started by clicking on the Build & Run button as shown in

figure A1.4.

The correct functionality of this simple application can be verified by putting

a signal to the LineIn input of the EVM board. The same signal should appear on the

Out connector provided the sampling theorem is fulfilled.

Page 93: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

93

A2 SCHEME OF LATCH CIRCUITRY

Figure A2.1 shows a complete scheme of the octal latch circuitry that is

connected to the local bus of the PCI 9050 bus interface chip. See section 10.2 for

more details about the purpose of this circuitry.

The latch circuitry is controlled by the local bus signals of PCI 9050 chip and

one signal from parallel port. In the following list, in/out denotes an input/output

signal with respect to the circuitry.

• D[7::0]: Connected to the LAD[7::0] data lanes of the PCI9050 local bus. (in)

• Q[7::0]: Latched data connected to the data pins of parallel port. (out/tristate)

• LCLK: Local bus clock of the PCI9050 chip. (in)

• CS#: Local bus chip select 0. Asserted when local address space 0 of the

PCI9050 chip is accessed. (in)

• WR#: Indicates that write operation is in progress. (in)

• IRQ: When data are latched, signal IRQ triggers an interrupt from parallel port.

(out)

• AckIrq: Signal from parallel port that clears the interrupt from parallel port. (in)

• User 0: Traces signal IRQ. It can be used to determine, if data have been already

read from the PCITOLPT device. (out)

• User 1: Sets the outputs of the octal latch to high impedance state. (in)

Table A2.1 contains a list of the components that were used to build the latch circuit.

QUANTITY NAME TYPE VALUE DESCRIPTION

8 R1 - R8 R-EU 0204/5 39 kΩ Resistor.

8 R9 - R16 R-EU 0204/5 100 Ω Resistor.

8 D1 - D8 LED 5 mm 1,5 V / 20mA Light Emitting Diode.

8 T1 - T8 BC237 Bipolar NPN transistor.

1 IC1 M74LS573B Three state octal transparent latch.

1 IC2 MC14043B Three state quadruple R/S latch.

1 V1 SN74LS02 Quad 2-input NOR gate.

1 V2 M74LS08 Quad 2-input AND gate.

Table A2.1: List of components for the latch circuitry.

Page 94: Programming of Digital Signal Processors and Data ...cmp.felk.cvut.cz/ftp/articles/barva/Barva-MastersThesis2002.pdf · ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky

ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií

Vysoké učení technické v Brně

94

Figure A2.1: Octal latch circuitry.


Recommended