fa45d8aa5f
- health_checklist.json: 192.168.1.122→node122
- ocr_client.py: docstring IP→node122
- docs/market-data-requirements.md: IP→node122
- 所有API调用通过ProxyHandler({})绕过系统代理
Privoxy对node122:18003返回500,直连正常
102 lines
3.6 KiB
Plaintext
102 lines
3.6 KiB
Plaintext
Metadata-Version: 2.4
|
||
Name: tinysegmenter
|
||
Version: 0.3
|
||
Summary: Very compact Japanese tokenizer
|
||
Home-page: http://tinysegmenter.tuxfamily.org/
|
||
Author: Taku Kudo
|
||
Author-email: taku@chasen.org
|
||
Maintainer: Jehan
|
||
Maintainer-email: tinysegmenter@zemarmot.net
|
||
License: New BSD
|
||
Classifier: License :: OSI Approved :: BSD License
|
||
Classifier: Programming Language :: Python
|
||
Classifier: Operating System :: POSIX :: Linux
|
||
Classifier: Development Status :: 4 - Beta
|
||
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
||
Classifier: Topic :: Scientific/Engineering :: Information Analysis
|
||
Classifier: Topic :: Text Processing :: Linguistic
|
||
License-File: COPYING
|
||
License-File: AUTHORS
|
||
Dynamic: author
|
||
Dynamic: author-email
|
||
Dynamic: classifier
|
||
Dynamic: description
|
||
Dynamic: home-page
|
||
Dynamic: license
|
||
Dynamic: license-file
|
||
Dynamic: maintainer
|
||
Dynamic: maintainer-email
|
||
Dynamic: summary
|
||
|
||
TinySegmenter
|
||
=============
|
||
|
||
“TinySegmenter in Python” is a Python port_ by Masato Hagiwara of TinySegmenter_, which is an extremely compact Japanese tokenizer originally written in JavaScript by Mr. Taku Kudo.
|
||
|
||
The library has been finally packaged by Jehan. It resulted into this fork because Masako Hagiwara did not answer emails, and packaging patches
|
||
could therefore not be committed upstream. But this is a friendly fork, and Masako Hagiwara is welcome to take back maintainance over his
|
||
project.
|
||
For the time being, I (Jehan) took up the maintenance, so please refer to this new website_ as being official, and
|
||
direct any new patch_ there. I will follow up on patchs and bug reports, but probably won't maintain an active development. Anyone wishing to
|
||
improve the library is welcome to participate and will be gladly given committer rights.
|
||
|
||
It works on Python 2.6 or above (works on Python 3 too).
|
||
|
||
.. _port: http://lilyx.net/tinysegmenter-in-python/
|
||
.. _TinySegmenter: http://chasen.org/~taku/software/TinySegmenter/
|
||
.. _website: http://tinysegmenter.tuxfamily.org/
|
||
|
||
Authors
|
||
-------
|
||
|
||
See all authors and contributors in ``AUTHORS`` file.
|
||
|
||
Download and Installation
|
||
-------------------------
|
||
|
||
This library can be installed the common ways: with a setup.py, as a pip package...
|
||
See the ``INSTALL`` file in the package for more details.
|
||
|
||
If you simply want to download the source package, refer to the pypi repository: http://pypi.python.org/pypi/tinysegmenter
|
||
|
||
Development version can be downloaded anonymously at the Git repository::
|
||
|
||
$ git clone git://git.tuxfamily.org/gitroot/tinysegmente/tinysegmenter.git
|
||
|
||
or browsed online at: http://git.tuxfamily.org/tinysegmente/tinysegmenter/
|
||
|
||
Usage
|
||
-----
|
||
|
||
Example code for direct usage::
|
||
|
||
> import tinysegmenter
|
||
> segmenter = tinysegmenter.TinySegmenter()
|
||
> print(' | '.join(segmenter.tokenize(u"私の名前は中野です")))
|
||
私 | の | 名前 | は | 中野 | です
|
||
|
||
|
||
TinySegmenter‘s interface is compatible with ``NLTK``’s ``TokenizerI`` class, although the distribution does not directly depend on NLTK.
|
||
Here is one way to use it as a tokenizer in NLTK (order of the multiple base classes matters)::
|
||
|
||
import nltk.tokenize.api
|
||
|
||
class myTinySegmenter(tinysegmenter.TinySegmenter, nltk.tokenize.api.TokenizerI):
|
||
pass
|
||
segmenter = myTinySegmenter()
|
||
# This segmenter can be used any place which expects a NLTK's TokenizerI subclass.
|
||
|
||
For more about NLTK (*Natural Language Toolkit* module), see: http://nltk.org/api/nltk.tokenize.html#nltk.tokenize.api.TokenizerI
|
||
|
||
.. _patch:
|
||
|
||
Contact, Bugs and Contributing
|
||
------------------------------
|
||
|
||
All bug, patch, question, etc. can be sent to `tinysegmenter` at `zemarmot` dot `net`.
|
||
|
||
License
|
||
-------
|
||
|
||
This package is distributed under a New BSD License (see ``COPYING`` file).
|