Novell AppNote-"Computer Telephone Integration: Call Control vs. Voice Processing"© Feb 95

FEBRUARY 1995

Computer Telephone Integration: Call Control vs. Voice Processing

RICH LEE
Senior Research Consultant
Systems Research Department

As technology drives the integration of computers and telephones, there are two main components in the development of this technology which need to be examined and understood. First is call control - the process of setting up and breaking down calls. The second is voice processing - the function of using voice technology in a messaging environment, interactive voice response, converting text to speech, and using speech recognition algorithms to control and access information. This Application Note introduces telephony nomenclatureand depicts the differences between call control and voice processing.

Contents

Copyright © 1995 by Novell, Inc. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, for any purpose without the express written permission of Novell.

All product names mentioned are trademarks of their respective companies or distributors.

Introduction

There are two main components of the technology for integrating computers and telephones. First there is call control which is the process of setting up and breaking down calls. Examples of call control functions range from tasks as simple as making a call, to more complex functions such as placing multi-party conference calls, transferring calls, and performing call merge functions. The second component is voice processing; the function of using voice technology in a messaging environment. Voice processing makes use of interactive voice response, converting text to speech, and using speech recognition algorithms to control and access information.

This article examines the differences and similarities between call control and voice processing. It looks at the roles of these segments in a computer telephone integration model, and the technological advantages derived by combining the two technologies in a client/server environment.

Call Control: A Technology Overview

From a definition standpoint, call control is the process of setting up and breaking down telephone calls. While calls can be as simple as an out-dial or make-call, the term also covers more complex tasks such as setting up conference calls between multiple parties, transferring calls, or controlling the receipt of incoming calls.

Historically, call control comes from the telecommunications industry, and until very recently call control functions were performed by telecommunication system hardware such as a Private Branch Exchange (PBX), a central office switch, or a key system. along with the actual telephone device sitting on the user's desk.

With the introduction of computer telephone integration (CTI), computers--especially micro computers and servers--are now poised to play a pivotal role in performing many of the functions of call control. While the most of today's telecommunications infrastructures are distinct equipment groups, CTI creates a link at the desktop between the user's computer and telephone as shown in Figure 1.

Figure 1: This shows the CTI link between computer and telephone.

From a practicality standpoint, this means the function of setting up and breaking down a call is moved from the telephone (a non-intuitive device limited by twelve buttons, and requiring significant knowledge to operate) to the desktop computer (an intuitive device with immense possibilities for specific telephony applications).

Once the link between the two infrastructures is established, software applications are developed to run either on the server or the workstation providing the actual user interface for performing call control functions. This is the epitome of Client-Server implementation, concentrating the more expensive components of the hardware base at the server for use by many instead of at the client with expense to many.

Call control is achieved in a CTI environment when the user clicks an icon in his windows application (such as a transfer icon), and the computer passes a command to the Telephony Server. Using the Telephony Services library (TSLIB), an asynchronous communications stream (ACS) stream is opened between the client workstation and the telephony server (see Figure 2). The ACS stream is used for all client-server communications between a telephony-enabled application and the telephony server.

Figure 2: This shows the Telephone Services library (TSLIB) initiating an asynchronous communications stream.

Once the stream is opened, information such as destination extension, source extension, etc. are passed to the telephony server using standard Computer Supported Telecommunications Application (CSTA) commands (in this example the API call is cstaTransferCall). The TSAPI.NLM in NetWare is based on the Computer Supported Telecommunications Application "CSTA" industry standard. After initialization, the telephony server takes the CSTA command and through the PBX driver interface, translating the command into the native PBX protocol (such as AT&T's ASAI). The PBX then performs the specified function requested, while the telephony server constantly acts as a multiplexer, routing and controlling the flow of CSTA client or server commands to the PBX.

Call Control: An Architecture Overview

There are three architectures that can be used to achieve call control capability in a CTI environment. Figure 3 shows a representation of each of these architectures from a high level view. These implementation architectures range from simple to complex with increasing capabilities corresponding to the complexity of the architecture.

Figure 3: These are the three call control architectures.

Examining the NetWare Telephony Services model, show a new degree of integration between the computer and the telephone, and reveals three main components to the architecture of a call control system:

The physical link between the computer and the telecommunications system.
The software used to control the basic communications between the computer and the telecommunications system.
The application running at the user's computer workstation used to perform the call control functions.

The link between the PBX and the computing device can be any number of types of physical connections.

The most widely used link types include:

A Basic Rate Interface (BRI) link between the PBX and the server or host computer.
An X.25 link between the PBX and the server or host computer.
A serial link between the PBX and the server or host computer.
A standard Ethernet link between the PBX and the server or host computer.

The link type used is dependent on the implementation developed by the PBX manufacturer. The server or host computer will typically support any type of physical link from the PBX. The actual hardware is either standard, off-the-shelf hardware (as in the case of an Ethernet link), or is provided by the PBX manufacturer (as in the case of a BRI link).

Call Control: TSAPI/NetWare Telephony Services

Telephony Services Application Programming Interface (TSAPI) is a programming interface developed by Novell and AT&T. TSAPI is based on the Computer Supported Telecommunications Application (CSTA) industry standard. TSAPI plays a significant role in the deployment of call control applications for two important reasons:

TSAPI defines the programming interface for application developers and PBX manufacturers. This means that applications and PBX drivers that comply with the TSAPI interface can interoperate with each other without knowledge of the specifics of each other.Thus, an application can be written a single time and can operate with any TSAPI-compliant PBX. Prior to the creation of this standard, applications had to be modified (and often rewritten) to run on different PBX types. This allows implementers a much broader range of integration possibilities when running NetWare TSAPI to existing PBX services.
The creation of NetWare Telephony Services and TSAPI allows developers to take advantage of the control links exposed on a PBX. This provides enormous functionality for the application developer that could not be gained without a link to the control port. For example, utilizing the control link provides third party call control access. This means that an application can be written that allows one user to control a call, manage a call, or attach data information associated with a call, without being a party to the call. So, when an application makes a call for the user, a command is passed to the server, communicating the call request to the PBX which, in turn, dials the specified phone.

Call Control: Application Examples

Below are two examples of call control applications:

The first example is to provide basic call control to the desktop computer. These functions include making an outgoing call, transferring a call, putting a call on hold, and setting up a conferences call. In our example the user would select the name of the person to be called out of a database listing. Using the mouse, the user would then click a dial icon. The user's speakerphone would open and the caller would be connected. The functions of then putting the caller on hold or transferring the caller to another extension are also achieved with point and click commands. Once the initial stream has been opened (a function which is performed by the telephony server upon execution of the first command) those point and click commands are merely making TSAPI function calls to the Telephony Server (i.e. cstaMakeCall).
In a second application example, we explain how call control is achieved on incoming calls. As caller ID information is passed to the receiving PBX, then information is delivered to the Telephony Server. A database is queried and at the same time the call is passed from the PBX to the user's telephone, a data record of the caller is passed to the client workstation. This type of application increases user efficiency and can reduce incoming telephone costs. Most applications that provide this type of "screen pop" functionality then allow the user to control the call with additional point and click commands.

Voice Processing: Technology Introduction

Voice processing is a broad term that encompasses several similar technologies. Included under the heading of voice processing are:

Voice messaging: The function of creating electronic mailboxes that allow users to receive, send, and act on voice messages
Automated attendant: A voice system that receives and directs calls automatically, without human intervention based on the selection made by the caller using a telephone keypad
Audiotext: The delivery of prerecorded messages to callers who make requests using a telephone keypad
IVR: The function of interacting with a caller by providing specific information (usually extracted from a computer database) to a caller's request
Text to speech: The function of converting text data to voice and playing the voice for the caller
Speech recognition: Utilizing voice commands to perform functions (ordinarily done using a telephone keypad).
Speech to text: The function of converting (perhaps in real-time) the callers voice from analog speech to ASCII text.

Voice processing provides many user benefits, including:

Access to messages via telephone keypad from almost anywhere.
The ability to communicate between parties without time or location restrictions.
The ability to play prerecorded messages for callers.
The ability to have the voice processing system forward calls based on the caller's selection.
The ability to send the same message to multiple parties at the same time.

Figure 4: How voice messaging works in a NetWare environment.

There are also more advanced features of sophisticated voice processing systems. It is important to understand that while many of these features have been available for years, access to them was very limited until PC graphical applications began to provide an interface to manage the resources and allow users to take advantage of the functions.

These advanced features include the following:

Message Notification (the user is notified that a new message has arrived by pager, by a call from the voice mail system to another number, or with a screen-pop to the user workstation).
Message Escalation (if a message has not been retrieved from the user's mailbox after a programmed period of time, a copy of the message is sent to an escalation or supervisor's voice mailbox. Amount of elapsed time, which mail box to escalate to, etc. is all set by the voice mail system administrator).
Multiple languages (a user can hear prompts in a specified language. This is often settable for each individual mailbox and requires that the prompts have been recorded in multiple languages).
< P>Future Delivery (the user can enter a specified date and time that the message is to be delivered).
Return Receipt (the sender of the message is notified that the message has been played by the recipient).
Distribution Lists (the ability to send the same message to multiple parties).

Voice Processing: Architecture

As an example, we have selected the CallWare voice processing system developed by International Voice Exchange. CallWare is written as a NetWare Loadable Module (NLM) and runs natively on NetWare.

A voice processing system consists of four main components:

Processing Unit. While some voice processing systems still only run on proprietary hardware, CallWare, likemany other newer systems, run on industry standard 386/486/Pentium hardware and uses industry standard PC-based voice boards. Voice files are stored on standard hard drives.
Server software. Each voice processing system contains server software that is the engine or intelligence to the entire system. This server software contains all information about a given mailbox, defines the communication to the PBX and controls the access to the voice files. In the case of CallWare this server software is written as a NetWare Loadable Module (NLM) and hence runs natively on NetWare.
Client software. Some voice processing systems (still a minority) offer software that resides on a client desktop PC. In most cases this is MS Windows-based software. This client software serves as an alternate interface for the user to manage his or her voice messages using simple point and click icons for play, rewind, reply, forward, etc.
Voice boards. Each voice processing system must include a minimum of one voice processing board. The board resides in the server and is connected to the PBX or key system. It is the voice board that performs the task of converting the spoken word into a digital format for storage and then back to an analog voice to be played to a caller. These boards are produced mainly in four port increments. Main components include a Digital Signal Processor (DSP), firmware, voice ports, and signal relays.

Figure 5: This provides details of the board components.

Some of the key functions of the voice processing board are:

Digit Processing - The interpretation of Dual Tone Multi Frequency (DTMF) signals
Call Progress Control - The determination of the state of the phone device (busy, no answer,and so forth)
Speech Compression and Encoding - The analog signal is encoded as digital data using different encoding methods and compression rates

Figure 6: How the voice messaging system handles an incoming call sent to a busy extension.

Important to the functionality that can be achieved from a voice processing system is the integration between the system and the PBX. This integration, or physical connection, will serve as the pipe through which the signaling between the two systems will travel. In most scenarios, the physical line between the telephone switch and the voice processing server that is used for the voice signal will be a typical analog phone line. But while the analog voice signal is transmitted through the analog line, the data packet signals could or could not be transmitted down the same line.

There are two primary methods of addressing this portion of the integration:

In-band signaling - When the data packets are sent right along with the voice signals on the same line (see Figure 7).

Figure 7: This is an example of in-band signaling.

Out-of-band signaling - When the control signals are not sent right along with the voice signals on the same line, but rather are sent along another physical line. The two signals, the data packets and the voice signal, are still logically related to one another, but they do not travel down the same line (see Figure 8).

Figure 8: This is an example of out-of-band signaling.

Integrating with NetWare Telephony Service/TSAPI

As a programming interface, TSAPI defines call control functions (making calls, transferring calls, placing calls on hold, and so on). It does not deliver a standard programming interface for voice processing. However, the NetWare Telephony Services NLM does provide several valuable things for voice processing systems:

Provides a control link between the NetWare serverand the PBX. This control link can be used as a primary integration method between the voice processing system and the PBX.
Allows a voice processing application to monitor devices (such as a telephone) connected to the PBX. This is important because it improves the intelligence by which the voice processing system manages the calls. For example, if a TSAPI connection is present, the voice mail system can use the link to monitor the state of a telephone. If the phone is busy, the voice mail system can play a prompt and record the caller's message without having to expend the time to actually deliver the call to the busy extension and have the call returned to the voice processing system for action.
Allows the voice processing system to more easily perform the functions it needs for call control functions, such as transferring calls from mailbox to mailbox.

Voice Processing Application Example

Interactive voice response (IVR) is a common application for allowing users to gain access to common information via a telephone. The best application example for IVR is the banking industry. A user calls his "automated banking system", is greeted with a pre-recorded prompt "Thank you for calling Sierra West Bank". The user is then prompted to enter an account code and desired transactions using the key pad on the telephone. The IVR system then processes the request and performs database lookups to procure the requested information. The information is then played over the telephone to the user using pre-recorded digitized voice strings. Example: "Thank you Susan Kingston. Your checking balance is $1,345.78." Both the account owner's name and the bank balance were obtained from the IVR system database. The three big advantages to an IVR application are:

The system is interactive. It provides the information the user requests, at the time the user requests it.
The system works in a real time, or near real time mode. This is to say that once a database record (such as the checking account balance) is updated, that information is available to the caller immediately.
The information is available almost universally from any telephone that can transmit DTMF.

A second example of computer telephone integration in the voice processing arena can be seen in voice messaging. Voice messaging, or voice mail as it is commonly called, is the function of sending, receiving, and processing of stored voice messages. The interface to any voice messaging system has traditionally been limited to the telephone. This has presented many user interface problems. Many of these problems are solved by integrating the computer system and telecommunications system. Traditional problems included the following:

Messages are listened to serially. That is to say if a user has 22 new voice mail messages, they must be listened to in order (either first in first out, or last in first out).
The ability to manage the message (rewind, fast-forward, reply, forward, etc.) is difficult. These commands are usually controlled by non-intuitive methods using the telephone keypad. Users typically only memorize the most basic functions.
Although advanced functionality is present with most systems, that functionality is accessible only through complex, telephone keypad commands. Because of this, most of these functions go unused.

With a CTI-based voice messaging application, a user can view the header information of all messages using a PC-based graphical interface. The header information typically shows who the message is from, the date and time the message was left and the length of the message. The user can then select which message to listen to first, second, and so forth. Messages can be sorted simply by double-clicking on the column heading. Message management is simplified by providing a graphical point-and-click interface for play, stop, pause, rewind, fast-forward, reply, forward, record, delete, etc.

Even advanced features are greatly simplified. For example, let's say we want to send a message with a specified future delivery date and time (a very useful function so difficult to use in a stand-alone system, that it is most often ignored). In a CTI-based voice messaging system, the user simply clicks the "future delivery" icon, enters in the date and time of delivery, and clicks the send button.

With this example, it makes clear the ease-of-use enhancements that are enabled in a CTI environment.

Summary

As detailed in this article, there are strong advantages to the integration of computers and telephones. Within the realm of CTI there are two fairly distinct technologies ( call control and voice processing. Each of these two segments serve a purpose in the overall scheme of CTI. It seems evident that the real advantages, which for the most part are just now beginning to be a reality, are realized when the full power of call control is combined with the sophisticated messaging and interactive capabilities of a voice processing system.

Additionally, we immediately see the advantages of implementing Voice Processing with call control. Speech to Text abilities from the telephony server would provide immediate integration with existing email systems and the prioritization of incoming calls, sender verification (who really called), not to mention the ability to voice authenticate. While most users may not need these advanced abilities, they will certainly find their way into corporate culture.