unicode - Python - Transliterate German Umlauts to Diacritic -
i have list of unicode file paths in need replace umlauts english diacritic. example, ü ue, ä ae , on. have defined dictionary of umlauts (keys) , diacritics (values). need compare each key each file path , key found, replace value. seems simple, can't work. out there have ideas? feedback appreciated!
code far:
# -*- coding: utf-8 -*- import os def getfilepaths(directory): """ function generate file names directory tree using os.walk. returns list of file paths. """ file_paths = [] root, directories, files in os.walk(directory): filename in files: filepath = os.path.join(root, filename) file_paths.append(filepath) return file_paths # dictionary of umlaut unicode representations (keys) , replacements (values) umlautdictionary = {u'Ä': 'ae', u'Ö': 'oe', u'Ü': 'ue', u'ä': 'ae', u'ö': 'oe', u'ü': 'ue' } # file paths in root directory , subfolders filepathslist = getfilepaths(u'c:\\scripts\\replace characters\\umlauts') file in filepathslist: key, value in umlautdictionary.iteritems(): if key in file: file.replace(key, value) # not work -- umlauts still in file path! print file
the replace
method returns new string, not modify original string.
so need
file = file.replace(key, value)
instead of file.replace(key, value)
.
note use the translate
method replacements @ once, instead of using for-loop
:
in [20]: umap = {ord(key):unicode(val) key, val in umlautdictionary.items()} in [21]: umap out[21]: {196: u'ae', 214: u'oe', 220: u'ue', 228: u'ae', 246: u'oe', 252: u'ue'} in [22]: print(u'ÄÖ'.translate(umap)) aeoe
so use
umap = {ord(key):unicode(val) key, val in umlautdictionary.items()} filename in filepathslist: filename = filename.translate(umap) print(filename)
Comments
Post a Comment