Created for
Python3 represents strings as Unicode
The default encoding for Python3 source code is UTF-8
Python 3 also supports Unicode characters in identifiers
низ = "This is a normal Python string :ছ 𝄞 ☕"
print(низ)
# This is a normal Python string :ছ 𝄞 ☕
ord()
and chr()
functionsord(char)
- return an integer representing the Unicode code point of char given.chr(i)
- return the string representing a character whose Unicode code point is the integer i
print( ord('я') )
# 1103
print( chr(1103) )
# я
# Unicode symbol in string:
print("Ѣ")
# Using the character name:
print("\N{Cyrillic Capital Letter Yat}")
# Using a 16-bit hex value code point:
print("\u0462")
# Using a 32-bit hex value code point:
print("\U00000462")
str.encode()
- syntax
str.encode(encoding="utf-8", errors="strict")
str.encode()
- example
string = "123абв"
str_in_utf = string.encode()
print("Byte object:", str_in_utf)
print("Type: ",type(str_in_utf) )
print("Length:",len(str_in_utf) )
#Byte object: b'123\xd0\xb0\xd0\xb1\xd0\xb2'
#Type: <class 'bytes'>
#Length: 9
Note, that the len() of byte object returns the number of bytes, not the number of characters encoded!
bytes.decode()
- syntax
bytes.decode(encoding="utf-8", errors="strict")
bytes.decode()
- example
str_in_bytes = b'1\xd0\xb02\xd0\xb13\xd0\xb2'
str_in_utf8 = str_in_bytes.decode()
print("String object:", str_in_utf8)
print("Type: ",type(str_in_utf8) )
print("Length:",len(str_in_utf8) )
#String object: 1а2б3в
#Type: <class 'str'>
#Length: 6
Note, that the len() of byte object returns the number of bytes, not the number of characters encoded!
cp1251_to_utf8.py
, which will receive an input file name as argument and will create an UTF encoded file with the same name, but with sufix "_utf8_" added.
.
├── cp1251_to_utf8.py
└── Silicon.Valley.sampleBGsubs.srt
$ python cp1251_to_utf8.py Silicon.Valley.sampleBGsubs.srt
.
├── cp1251_to_utf8.py
├── Silicon.Valley.sampleBGsubs.srt
└── Silicon.Valley.sampleBGsubs_utf8_.srt
Make sure, that Silicon.Valley.sampleBGsubs_utf8_.srt is properly converted and readable!
These slides are based on
customised version of
framework