2
I want to compare two strings, which are equal but have different encoding.
G%C3%A9rard Depardieu
and Gérard Depardieu
I need to make several comparisons in two lists, but I came across this. The list A
is full of names encoded in the form url(at least I think it is) and the second B
is this way, showing the accents and all kinds of special characters. But I’m not sure how to encode accented characters for url type encoding and make comparisons.
name1 = 'G%C3%A9rard Depardieu'
name2 = ''
arq = open('gerard.txt', 'r', encoding='utf-8')
for a in arq:
name2 = a.replace('\n', '')
print(name1==name2) #false
Also print the name: print(name2)
gives the following error:
Out[11]: ---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-11-e1187e69ab52> in <module>()
----> 1 name2
~/.virtualenv/crawler/lib/python3.5/site-packages/IPython/core/displayhook.py in __call__(self, result)
259 self.fill_exec_result(result)
260 if format_dict:
--> 261 self.write_format_data(format_dict, md_dict)
262 self.log_output(format_dict)
263 self.finish_displayhook()
~/.virtualenv/crawler/lib/python3.5/site-packages/IPython/core/displayhook.py in write_format_data(self, format_dict, md_dict)
188 result_repr = '\n' + result_repr
189
--> 190 print(result_repr)
191
192 def update_user_ns(self, result):
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 2: ordinal not in range(128)
But my goal is not to print the names but to make comparisons.
The file Gerard.txt has only one line: Gérard Depardieu