Speech Visualization by Integrating Features for the Hearing Impaired

渡邉, 亮; Watanabe, Akira; Tomishige, Shingo; Nakatake, Masahiro

doi:10.1109/89.848226

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

{"_buckets": {"deposit": "bb5a9817-3087-4448-96e4-9196b53fad59"}, "_deposit": {"created_by": 1, "id": "17513", "owners": [1], "pid": {"revision_id": 0, "type": "depid", "value": "17513"}, "status": "published"}, "_oai": {"id": "oai:kumadai.repo.nii.ac.jp:00017513", "sets": ["428"]}, "author_link": ["80439", "80435", "80436", "80434", "80440", "80438", "80437"], "item_16_biblio_info_6": {"attribute_name": "書誌情報", "attribute_value_mlt": [{"bibliographicIssueDates": {"bibliographicIssueDate": "2000-07", "bibliographicIssueDateType": "Issued"}, "bibliographicIssueNumber": "4", "bibliographicPageEnd": "466", "bibliographicPageStart": "454", "bibliographicVolumeNumber": "8", "bibliographic_titles": [{"bibliographic_title": "IEEE Transactions on Speech and Audio Processing"}]}]}, "item_16_creator_3": {"attribute_name": "別言語の著者", "attribute_type": "creator", "attribute_value_mlt": [{"creatorNames": [{"creatorName": "渡邉, 亮"}], "nameIdentifiers": [{"nameIdentifier": "80440", "nameIdentifierScheme": "WEKO"}]}]}, "item_16_description_17": {"attribute_name": "フォーマット", "attribute_value_mlt": [{"subitem_description": "application/pdf", "subitem_description_type": "Other"}]}, "item_16_description_46": {"attribute_name": "資源タイプ", "attribute_value_mlt": [{"subitem_description": "論文(Article)", "subitem_description_type": "Other"}]}, "item_16_description_5": {"attribute_name": "内容記述", "attribute_value_mlt": [{"subitem_description": "Describes development of a new speech visualization system that creates readable patterns by integrating different speech features into a single picture. The system extracts the phonemic and prosodic features from speech signals and converts them into a visual image using neither speech segmentation nor speech recognition. We used four time-delay neural networks (TDNNs) to generate phonemic features in the new system. Training of the TDNNs using three selected frames of eight kinds of acoustic parameters showed significant improvement in the performance. The TDNN outputs control the brightness of patterns used for consonants, that is, each of the consonant-patterns is represented by a different white texture whose brightness is weighted by the output of a corresponding TDNN. All the weighted consonant-patterns are simply added and then overlaid synchronously on colors due to the formant frequencies. When this is done, phonemic sequences and boundaries manifest themselves in the resulting visual patterns. In addition, the color of a single vowel sandwiched between consonants looks uniform. These visual phenomena are very useful for decoding the complex speech code, which is generated by the continuous movements of speech organs. We evaluated the visualized speech in a preliminary test. When three students read the patterns of 75 words uttered by four males (300 items), the learning curves showed a steep rise and the correct answer rate reached 96-99%. The learning effect was durable: after five months of absence from the system, a subject read 96.3% of the 300 tokens in a response time which averaged only 1.3 s/word.", "subitem_description_type": "Other"}]}, "item_16_publisher_36": {"attribute_name": "出版者", "attribute_value_mlt": [{"subitem_publisher": "Institute of Electrical and Electronics Engineers"}]}, "item_16_relation_11": {"attribute_name": "DOI", "attribute_value_mlt": [{"subitem_relation_type": "isIdenticalTo", "subitem_relation_type_id": {"subitem_relation_type_id_text": "10.1109/89.848226", "subitem_relation_type_select": "DOI"}}]}, "item_16_rights_12": {"attribute_name": "権利", "attribute_value_mlt": [{"subitem_rights": "c2000 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. IEEE, IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 8, 4, 2000, 454-466"}]}, "item_16_source_id_9": {"attribute_name": "書誌レコードID", "attribute_value_mlt": [{"subitem_source_identifier": "AA10888994", "subitem_source_identifier_type": "NCID"}]}, "item_16_subject_20": {"attribute_name": "日本十進分類法", "attribute_value_mlt": [{"subitem_subject": "500", "subitem_subject_scheme": "NDC"}]}, "item_16_text_18": {"attribute_name": "形態", "attribute_value_mlt": [{"subitem_text_value": "10386188 bytes"}]}, "item_16_text_47": {"attribute_name": "資源タイプ・ローカル", "attribute_value_mlt": [{"subitem_text_value": "雑誌掲載論文"}]}, "item_16_text_48": {"attribute_name": "資源タイプ・NII", "attribute_value_mlt": [{"subitem_text_value": "Journal Article"}]}, "item_16_text_49": {"attribute_name": "資源タイプ・DCMI", "attribute_value_mlt": [{"subitem_text_value": "text"}]}, "item_16_text_50": {"attribute_name": "資源タイプ・ローカル表示コード", "attribute_value_mlt": [{"subitem_text_value": "01"}]}, "item_16_text_76": {"attribute_name": "URI", "attribute_value_mlt": [{"subitem_text_value": "http://hdl.handle.net/2298/3522"}]}, "item_16_text_79": {"attribute_name": "ローカルコメント", "attribute_value_mlt": [{"subitem_text_value": "投稿：渡邊先生"}]}, "item_16_version_type_19": {"attribute_name": "著者版フラグ", "attribute_value_mlt": [{"subitem_version_resource": "http://purl.org/coar/version/c_970fb48d4fbd8a85", "subitem_version_type": "VoR"}]}, "item_creator": {"attribute_name": "著者", "attribute_type": "creator", "attribute_value_mlt": [{"creatorNames": [{"creatorName": "Watanabe, Akira"}], "nameIdentifiers": [{"nameIdentifier": "80434", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "Tomishige, Shingo"}], "nameIdentifiers": [{"nameIdentifier": "80435", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "Nakatake, Masahiro"}], "nameIdentifiers": [{"nameIdentifier": "80436", "nameIdentifierScheme": "WEKO"}]}]}, "item_files": {"attribute_name": "ファイル情報", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_date", "date": [{"dateType": "Available", "dateValue": "2020-02-28"}], "displaytype": "detail", "download_preview_message": "", "file_order": 0, "filename": "scansnap002.pdf", "filesize": [{"value": "10.4 MB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 10400000.0, "url": {"label": "scansnap002.pdf", "url": "https://kumadai.repo.nii.ac.jp/record/17513/files/scansnap002.pdf"}, "version_id": "8cd27985-1be3-47ab-a448-9c7ab13f13d3"}]}, "item_keyword": {"attribute_name": "キーワード", "attribute_value_mlt": [{"subitem_subject": "Feature extraction", "subitem_subject_scheme": "Other"}, {"subitem_subject": "reading test", "subitem_subject_scheme": "Other"}, {"subitem_subject": "speech visualization", "subitem_subject_scheme": "Other"}]}, "item_language": {"attribute_name": "言語", "attribute_value_mlt": [{"subitem_language": "eng"}]}, "item_resource_type": {"attribute_name": "資源タイプ", "attribute_value_mlt": [{"resourcetype": "journal article", "resourceuri": "http://purl.org/coar/resource_type/c_6501"}]}, "item_title": "Speech Visualization by Integrating Features for the Hearing Impaired", "item_titles": {"attribute_name": "タイトル", "attribute_value_mlt": [{"subitem_title": "Speech Visualization by Integrating Features for the Hearing Impaired"}]}, "item_type_id": "16", "owner": "1", "path": ["428"], "permalink_uri": "http://hdl.handle.net/2298/3522", "pubdate": {"attribute_name": "公開日", "attribute_value": "2007-08-15"}, "publish_date": "2007-08-15", "publish_status": "0", "recid": "17513", "relation": {}, "relation_version_is_last": true, "title": ["Speech Visualization by Integrating Features for the Hearing Impaired"], "weko_shared_id": -1}

Speech Visualization by Integrating Features for the Hearing Impaired

http://hdl.handle.net/2298/3522

名前 / ファイル	ライセンス	アクション
scansnap002.pdf (10.4 MB)

Item type

学術雑誌論文 / Journal Article(1)

公開日

2007-08-15

タイトル

Speech Visualization by Integrating Features for the Hearing Impaired

言語

eng

キーワード

主題

Feature extraction, reading test, speech visualization

資源タイプ

journal article

著者

Watanabe, Akira

Tomishige, Shingo

Nakatake, Masahiro

別言語の著者

渡邉, 亮

内容記述

Describes development of a new speech visualization system that creates readable patterns by integrating different speech features into a single picture. The system extracts the phonemic and prosodic features from speech signals and converts them into a visual image using neither speech segmentation nor speech recognition. We used four time-delay neural networks (TDNNs) to generate phonemic features in the new system. Training of the TDNNs using three selected frames of eight kinds of acoustic parameters showed significant improvement in the performance. The TDNN outputs control the brightness of patterns used for consonants, that is, each of the consonant-patterns is represented by a different white texture whose brightness is weighted by the output of a corresponding TDNN. All the weighted consonant-patterns are simply added and then overlaid synchronously on colors due to the formant frequencies. When this is done, phonemic sequences and boundaries manifest themselves in the resulting visual patterns. In addition, the color of a single vowel sandwiched between consonants looks uniform. These visual phenomena are very useful for decoding the complex speech code, which is generated by the continuous movements of speech organs. We evaluated the visualized speech in a preliminary test. When three students read the patterns of 75 words uttered by four males (300 items), the learning curves showed a steep rise and the correct answer rate reached 96-99%. The learning effect was durable: after five months of absence from the system, a subject read 96.3% of the 300 tokens in a response time which averaged only 1.3 s/word.

書誌情報

IEEE Transactions on Speech and Audio Processing

巻 8, 号 4, p. 454-466, 発行年 2000-07

書誌レコードID

収録物識別子

AA10888994

DOI

Versions

Ver.1

2023-06-19 19:15:59.540951

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Speech Visualization by Integrating Features for the Hearing Impaired

× Watanabe, Akira

× Tomishige, Shingo

× Nakatake, Masahiro

× 渡邉, 亮

Versions

Share

Cite as

エクスポート