yosina-lib/yosina-swift
A Swift implementation of the Yosina transliteration library for Japanese text processing.
Features
- Character normalization: Spaces, hyphens, mathematical symbols, etc.
- Japanese-specific transliterations:
- Prolonged sound marks conversion - Hiragana/Katakana composition with combining marks - JIS X 0201 (half-width/full-width) conversion - Old to new kanji form conversion
- Unicode variation sequences: IVS/SVS support
- Chained transliterations: Combine multiple transliterators
Installation
Add this package to your Swift project:
dependencies: [
.package(url: "https://github.com/yosina-lib/yosina-swift", from: "3.0.0")
]Usage
Basic Usage
import Yosina
// Create a simple transliterator
let transliterator = SpacesTransliterator().stringTransliterator
let result = transliterator("Hello World") // Full-width space
// Result: "Hello World" (normalized to regular space)Recipe-based Usage (Recommended)
import Yosina
// Create a recipe with desired transformations
var recipe = TransliterationRecipe()
recipe.kanjiOldNew = true
recipe.replaceSpaces = true
recipe.replaceCircledOrSquaredCharacters = .enabled
recipe.replaceCombinedCharacters = true
recipe.toFullwidth = .enabled
// Create the transliterator
let transliterator = try recipe.makeTransliterator()
// Use it with various special characters
let input = "①②③ ⒶⒷⒸ ㍿㍑㌠㋿"
let result = transliterator(input)
// Result: "(1)(2)(3) (A)(B)(C) 株式会社リットルサンチーム令和"Configuration-based Usage
import Yosina
// Create transliterator with direct configurations
let configs: [TransliteratorConfig] = [
.spaces,
.prolongedSoundMarks,
.jisx0201AndAlike()
]
let transliterator = configs.makeTransliterator()
let result = transliterator("データ- ベース")
// Result: "データー ベース"Individual Transliterators
// Half-width to full-width conversion
let jisTransliterator = Jisx0201AndAlikeTransliterator().stringTransliterator
let fullwidth = jisTransliterator("ABC123")
// Result: "ABC123"
// Hiragana/Katakana composition
let composer = HiraKataCompositionTransliterator().stringTransliterator
let composed = composer("か\u{3099}") // か + combining dakuten
// Result: "が"Available Transliterators
1. Circled or Squared (circled-or-squared)
Converts circled or squared characters to their plain equivalents.
- Options:
templates(custom rendering),includeEmojis(include emoji characters) - Example:
①②③→(1)(2)(3),㊙㊗→(秘)(祝)
2. Combined (combined)
Expands combined characters into their individual character sequences.
- Example:
㍻(Heisei era) →平成,㈱→(株)
3. Hiragana-Katakana Composition (hira-kata-composition)
Combines decomposed hiraganas and katakanas into composed equivalents.
- Options:
composeNonCombiningMarks(compose non-combining marks) - Example:
か + ゙→が,ヘ + ゜→ペ
4. Hiragana-Katakana (hira-kata)
Converts between hiragana and katakana scripts bidirectionally.
- Options:
mode("hira-to-kata" or "kata-to-hira") - Example:
ひらがな→ヒラガナ(hira-to-kata)
5. Hyphens (hyphens)
Replaces various dash/hyphen symbols with common ones used in Japanese.
- Options:
precedence(mapping priority order) - Available mappings: "ascii", "jisx0201", "jisx0208_90", "jisx0208_90_windows", "jisx0208_verbatim"
- Example:
2019—2020(em dash) →2019-2020
6. Ideographic Annotations (ideographic-annotations)
Replaces ideographic annotations used in traditional Chinese-to-Japanese translation.
- Example:
㆖㆘→上下
7. IVS-SVS Base (ivs-svs-base)
Handles Ideographic and Standardized Variation Selectors.
- Options:
charset,mode("ivs-or-svs" or "base"),preferSVS,dropSelectorsAltogether - Example:
葛󠄀(葛 + IVS) →葛
8. Japanese Iteration Marks (japanese-iteration-marks)
Expands iteration marks by repeating the preceding character.
- Example:
時々→時時,いすゞ→いすず
9. JIS X 0201 and Alike (jisx0201-and-alike)
Handles half-width/full-width character conversion.
- Options:
fullwidthToHalfwidth,convertGL(alphanumerics/symbols),convertGR(katakana),u005cAsYenSign - Example:
ABC123→ABC123,カタカナ→カタカナ
10. Kanji Old-New (kanji-old-new)
Converts old-style kanji (旧字体) to modern forms (新字体).
- Example:
舊字體の變換→旧字体の変換
11. Mathematical Alphanumerics (mathematical-alphanumerics)
Normalizes mathematical alphanumeric symbols to plain ASCII.
- Example:
𝐀𝐁𝐂(mathematical bold) →ABC
12. Prolonged Sound Marks (prolonged-sound-marks)
Handles contextual conversion between hyphens and prolonged sound marks.
- Options:
skipAlreadyTransliteratedChars,allowProlongedHatsuon,allowProlongedSokuon,replaceProlongedMarksFollowingAlnums - Example:
イ−ハト−ヴォ(with hyphen) →イーハトーヴォ(prolonged mark)
13. Radicals (radicals)
Converts CJK radical characters to their corresponding ideographs.
- Example:
⾔⾨⾷(Kangxi radicals) →言門食
14. Spaces (spaces)
Normalizes various Unicode space characters to standard ASCII space.
- Example:
A B(ideographic space) →A B
15. Roman Numerals (roman-numerals)
Converts Unicode Roman numeral characters to their ASCII letter equivalents.
- Example:
Ⅰ Ⅱ Ⅲ→I II III,ⅰ ⅱ ⅲ→i ii iii
16. Small Hirakatas (small-hirakatas)
Converts small hiragana and katakana characters to their ordinary-sized equivalents.
- Example:
ぁぃぅ→あいう,ァィゥ→アイウ
17. Archaic Hirakatas (archaic-hirakatas)
Converts archaic kana (hentaigana) to their modern hiragana or katakana equivalents.
- Example:
𛀁→え
18. Historical Hirakatas (historical-hirakatas)
Converts historical hiragana and katakana characters to their modern equivalents.
- Options:
hiraganas("simple", "decompose", or "skip"),katakanas("simple", "decompose", or "skip"),voicedKatakanas("decompose" or "skip") - Example:
ゐ→い(simple),ゐ→うぃ(decompose),ヰ→イ(simple)
Custom Transliterators
You can create custom transliterators by implementing the Transliterator protocol:
struct MyTransliterator: Transliterator {
func transliterate<S: Sequence>(_ chars: S) -> [TransliteratorChar]
where S.Element == TransliteratorChar {
// Your implementation here
}
}
// Use it with the configuration
let configs: [TransliteratorConfig] = [
.custom(MyTransliterator())
]
let transliterator = configs.makeTransliterator()Testing
Run the tests with:
swift testCode Generation
Some transliterators are automatically generated from JSON data files. To regenerate:
cd codegen
swift runLicense
See the main project LICENSE file.
Package Metadata
Repository: yosina-lib/yosina-swift
Default branch: main
README: README.md