rasa nlp 组件

Rasa官方提供了多种Language Models的组件引入,包括:

  • MitieNLP
  • SpacyNLP
  • HFTransformersNLP

组件说明见,地址:https://rasa.com/docs/rasa/components

Spacy安装说明

Spacy提供的安装说明,如下:

pip install -U pip setuptools wheel
pip install -U spacy
python -m spacy download en_core_web_sm

详见官网地址:https://spacy.io/usage#quickstart

下载模型将出现错误,可通过域名反查ip地址,把ip地址写入hosts解决。

域名反查ip地址:https://www.ipaddress.com/

实际上,模型从github地址下载:https://github.com/explosion/spacy-models/tags

所以可以还可以通过以下方式安装:

# With external URL
$ pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0-py3-none-any.whl
$ pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz

# With local file
$ pip install /Users/you/en_core_web_sm-3.0.0-py3-none-any.whl
$ pip install /Users/you/en_core_web_sm-3.0.0.tar.gz

rasa配置

rasa官方配置说明:https://rasa.com/docs/rasa/tuning-your-model#sensible-starting-pipelines

我的配置示例:

language: "zh"

pipeline:
  - name: SpacyNLP
    model: "zh_core_web_sm"
    case_sensitive: TRUE
    intent_tokenization_flag: True
    intent_split_symbol: "_"
  - name: SpacyTokenizer
  - name: SpacyFeaturizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100

policies:
  - name: MemoizationPolicy
  - name: TEDPolicy
    max_history: 5
    epochs: 100
  - name: RulePolicy

spacy实体抽取

rasa官网描述:spacyentityextractor

抽取维度见网址:https://explosion.ai/demos/displacy-ent

我的实体抽取示例:

language: "zh"

pipeline:
  - name: SpacyNLP
    model: "zh_core_web_sm"
    case_sensitive: TRUE
    intent_tokenization_flag: True
    intent_split_symbol: "_"
  - name: "SpacyEntityExtractor"
    dimensions: ["PERSON", "LOC", "ORG", "PRODUCT", "DATE", "TIME", "GPE"]
  - name: SpacyTokenizer
  - name: SpacyFeaturizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100

policies:
  - name: MemoizationPolicy
  - name: TEDPolicy
    max_history: 5
    epochs: 100
  - name: RulePolicy

在这里插入图片描述

Logo

CSDN联合极客时间,共同打造面向开发者的精品内容学习社区,助力成长!

更多推荐