今天技术的总结

Tcrazyalways

2219人浏览 · 2005-01-13 01:32:00

Tcrazyalways · 2005-01-13 01:32:00 发布

可怜呀，不过也是幸运，跟的导师叫我接手，与研究生胡永宾（人很好）共同开发国家级：多媒体聋生辅助教学研究课题。很有挑战性。主要是研究后台软件与前台Flash2004结合，后台技术主要是ASR(Automatic Speech Recognition自动语音识别)，现在亟待解决的问题：

      一、开发平台还没有确定
         现在有几个ASR
         1.Java Speech APIs(Sun公司现在不提供对应的API)
           如今有多家企业根据JSPAI开发相应的API，摘录如下:

FreeTTS on

Description: Open source speech synthesizer written entirely in the Java programming language.
Requirements: JDK 1.4. Read about more requirements on the FreeTTS web site.

IBM's "Speech for Java"

Description: Implementation based on IBM's ViaVoice product, which supports continuous dictation, command and control and speech synthesis. It supports all the European language versions of ViaVoice -- US & UK English, French, German, Italian and Spanish -- plus Japanese.
Requirements: JDK 1.1.7 or later or JDK 1.2 on Windows 95 with 32MB, or Windows NT with 48MB. Both platforms also require an installation ViaVoice 98.

IBM's "Speech for Java" on Linux

Description: Beta version of "Speech for Java" on Linux. Currently only supports speech recognition.
Requirements: RedHat Linux 6.0 with 32MB, and Blackdown JDK 1.1.7 with native thread support.

The Cloud Garden

Description: Implementation for use with any recognition/TTS speech engine compliant with Microsoft's SAPI5 (with SAPI4 support for TTS engines only). An additional package allows redirection of audio data to/from Files, Lines and remote clients (using the javax.sound.sampled package). Some examples demonstrate its use in applets in Netscape and IE browsers.
Requirements: JDK 1.1 or better, Windows 98, Me, 2000 or NT, and any SAPI 5.1, 5.0 or 4.0 compliant speech engine (some of which can be downloaded from Microsoft's web site).

Lernout & Hauspie's TTS for Java Speech API

Description: Implementations based upon ASR1600 and TTS3000 engines, which support command and control and speech synthesis. Supports 10 different voices and associated whispering voices for the English language. Provides control for pitch, pitch range, speaking rate, and volume.
Requirements: Sun Solaris OS version 2.4 or later, JDK 1.1.5. Sun Swing package (free download) for graphical Type-n-Talk demo.
More information: Contact Edmund Kwan, Director of Sales, Western Region Speech and Language Technologies and Solutions (ekwan@lhs.com)

Conversa Web 3.0

Description: Conversa Web is a voice-enabled Web browser that provides a range of facilities for voice-navigation of the web by speech recognition and text-to-speech. The developers of Conversa Web chose to write a JSAPI implementation for the speech support.
Requirements: Windows 95/98 or NT 4.0 running on Intel Pentium 166 MHz processor or faster (or equivalent). Minimum of 32 MB RAM (64 MB recommended). Multimedia system: sound card and speakers. Microsoft Internet Explorer 4.0 or higher.

Festival

Description: Festival is a general multi-lingual speech synthesis system developed by the Centre for Speech Technology Research at the University of Edinburgh. It offers a full text to speech system with various APIs, as well an environment for development and research of speech synthesis techniques. It is written in C++ with a Scheme-based command interpreter for general control and provides a binding to the Java Speech API. Supports the English (British and American), Spanish and Welsh languages.
Requirements: Festival runs on Suns (SunOS and Solaris), FreeBSD, Linux, SGIs, HPs and DEC Alphas and is portable to other Unix machines. Preliminary support is available for Windows 95 and NT. For details and requirements see the Festival download page.

Elan Speech Cube

Description: Elan Speech Cube is a Multilingual, multichannel, cross-operating system text-to-speech software component for client-server architecture. Speech Cube is available with 2 TTS technologies (Elan Tempo : diphone concatenation and Elan Sayso : unit selection), covering 11 languages. Speech Cube native Java client supports JSAPI/JSML.
Requirements: JDK 1.3 or later on Windows NT/2000/XP, Linux or Solaris 2.7/2.8, Speech Cube V4.2 and higher.
About Elan Speech: Elan Speech is an established worldwide provider of text-to-speech technology (TTS). Elan TTS transforms any IT generated text into speech and reads it out loud.

但以上很多都是需要付费，IBM的可以免费使用开发。当然自己也可以开发，但相应的很困难，需要了解相应的语音识别原理等，如果学校需要自己的知识产权，故需要向 The Cloud Garden 和 the University of Edinburgh爱丁堡大学开发的 Festival 学习。

         2.Microsoft Speech SDK/TTS(Text to speech)
             这个API是免费使用的，可以采用，金山词霸也是采用这个API。

         3.IBM语音识别技术
             1)Speech for Java
               这个可以免费使用
             2)ViaVoice(IBM语音识别应用软件)
               可以开放源代码，但必须与其websphere等付费软件使用，才能享有

    下文将继续介绍IBM软件

         4.开发语言
             1)Java
             2)C++
             3).net技术(C#)
             4)C/C++/Delph/VB等
             分别与相应的语音识别技术配合。

      二、关于学校自行开发语音识别软件
         这个是可行的，但是需要学习语音识别原理，不是我一个人能做到的，需要时间。

      三、Flash as2.0与后台的联系
         对于新的as语言，需要明白其余后台的链接，尚不明确，尽快确定。
以上已经下在相应的文档，待研究后确定。

此外我还要考软考，需要考虑，既能锻炼自己，又能开发。