Repeat after me: Each unicode encoding (UTF-8, UTF-7, UTF-16, UTF-32, etc) maps different sequences of bytes to the unicode code points (therefore might as well map same sequences of bytes to different unicode code points). A code point is a number that maps to a particular abstract character (grapheme).
unicodetype stores an abstract sequence of code points.
stris for strings of bytes. These are very similar in nature to how strings are handled in C.
Important - Python 3. Important not only for guidelines but also for clarity. Moreover many “solutions” floating on the web are Python 3 (
open() has both an encoding and newlines param in P3. ). Speaking of 2:
with open("file.txt", 'wb')as out: out.write('\n') with open("file2.txt", 'w')as out: out.write('\n') print(os.stat('file.txt').st_size) # 1 print(os.stat('file2.txt').st_size) # 2