Files
MoFin/venv/lib/python3.12/site-packages/tinysegmenter-0.3.dist-info/METADATA
T
知微 fa45d8aa5f fix: 小果地址统一node122(兼容LAN+EasyTier)
- health_checklist.json: 192.168.1.122→node122
- ocr_client.py: docstring IP→node122
- docs/market-data-requirements.md: IP→node122
- 所有API调用通过ProxyHandler({})绕过系统代理
  Privoxy对node122:18003返回500,直连正常
2026-06-30 02:56:35 +08:00

102 lines
3.6 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Metadata-Version: 2.4
Name: tinysegmenter
Version: 0.3
Summary: Very compact Japanese tokenizer
Home-page: http://tinysegmenter.tuxfamily.org/
Author: Taku Kudo
Author-email: taku@chasen.org
Maintainer: Jehan
Maintainer-email: tinysegmenter@zemarmot.net
License: New BSD
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python
Classifier: Operating System :: POSIX :: Linux
Classifier: Development Status :: 4 - Beta
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing :: Linguistic
License-File: COPYING
License-File: AUTHORS
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: maintainer
Dynamic: maintainer-email
Dynamic: summary
TinySegmenter
=============
“TinySegmenter in Python” is a Python port_ by Masato Hagiwara of TinySegmenter_, which is an extremely compact Japanese tokenizer originally written in JavaScript by Mr. Taku Kudo.
The library has been finally packaged by Jehan. It resulted into this fork because Masako Hagiwara did not answer emails, and packaging patches
could therefore not be committed upstream. But this is a friendly fork, and Masako Hagiwara is welcome to take back maintainance over his
project.
For the time being, I (Jehan) took up the maintenance, so please refer to this new website_ as being official, and
direct any new patch_ there. I will follow up on patchs and bug reports, but probably won't maintain an active development. Anyone wishing to
improve the library is welcome to participate and will be gladly given committer rights.
It works on Python 2.6 or above (works on Python 3 too).
.. _port: http://lilyx.net/tinysegmenter-in-python/
.. _TinySegmenter: http://chasen.org/~taku/software/TinySegmenter/
.. _website: http://tinysegmenter.tuxfamily.org/
Authors
-------
See all authors and contributors in ``AUTHORS`` file.
Download and Installation
-------------------------
This library can be installed the common ways: with a setup.py, as a pip package...
See the ``INSTALL`` file in the package for more details.
If you simply want to download the source package, refer to the pypi repository: http://pypi.python.org/pypi/tinysegmenter
Development version can be downloaded anonymously at the Git repository::
$ git clone git://git.tuxfamily.org/gitroot/tinysegmente/tinysegmenter.git
or browsed online at: http://git.tuxfamily.org/tinysegmente/tinysegmenter/
Usage
-----
Example code for direct usage::
> import tinysegmenter
> segmenter = tinysegmenter.TinySegmenter()
> print(' | '.join(segmenter.tokenize(u"私の名前は中野です")))
私 | の | 名前 | は | 中野 | です
TinySegmenters interface is compatible with ``NLTK``s ``TokenizerI`` class, although the distribution does not directly depend on NLTK.
Here is one way to use it as a tokenizer in NLTK (order of the multiple base classes matters)::
import nltk.tokenize.api
class myTinySegmenter(tinysegmenter.TinySegmenter, nltk.tokenize.api.TokenizerI):
pass
segmenter = myTinySegmenter()
# This segmenter can be used any place which expects a NLTK's TokenizerI subclass.
For more about NLTK (*Natural Language Toolkit* module), see: http://nltk.org/api/nltk.tokenize.html#nltk.tokenize.api.TokenizerI
.. _patch:
Contact, Bugs and Contributing
------------------------------
All bug, patch, question, etc. can be sent to `tinysegmenter` at `zemarmot` dot `net`.
License
-------
This package is distributed under a New BSD License (see ``COPYING`` file).