7
Setting
I need to implement a file change check between 2 points in my application. *¹
- Point 1 - Server - I have a folder where are some product images;
- Point 2 - Mobile Device - I have a catalog app that downloads these images from the Server to a specific folder on your sdcard;
Problem
I would like from time to time to compare the Device images to the Server images, and check if there is any modification, and if there is any re-downloading of the image;
Requirements
- Synchronization is applied via internet, so one should consider the size of the information trafficked on the network;
Technologies
The technology I’m using is as follows::
- The Mobile App is on Android;
- The Webservice that checks and returns the image to the application is in C# (MVC Web API);
Question
One of the options I’ve found to implement this is by comparing hashs
. So
I wonder if generation of hash
of the file on the Device and compare it with the hash
Is the file on the Server efficient for this case? Or is there a better and more efficient option? (remembering that the Server may have several generation requests hash
simultaneous, this is a lightweight operation for the Server?).
*¹ - the changes that should be relevant are the ones applied in the Server folder.
Note: When I quote "efficiency", I mean: better reliability (I accept the 99.999% of the
hash
as quoted by @Miguelangelo in the comments) and performance (involving here, time and resources, being them in processing, or in network traffic).
I don’t know if it fits your scenario, but maybe it wouldn’t be better to associate each file with a versioning ID? for example: ~/Docs/test.png has version 14 and on the mobile device is version 13 (update required). Files are manageable by application?
– Leonardo Bosquett
Hash algorithms are efficient and, in general, "light" - in quotes because the definition of light is something subjective. As far as I know, many applications do this. There are more sophisticated hash algorithms called checksum that go beyond this, and that are routinely applied to files to see if there has been no change in them (among other things).
– Oralista de Sistemas
@Leonardobosquett and if a user changes the versioning ID on purpose?
– Oralista de Sistemas
Two identical hashes do not guarantee that the files are equal, in which case all bytes would have to be compared. You could however rely on the low probability of collisions of some hash algorithms, such as MD5, and take as "right" (99.999% chance) that equal hashes indicate equal files.
– Miguel Angelo
@Leonardobosquett, the problem is that the folder and the images, can be modified freely, and it can occur that the file is deleted and placed another with the same name, making it unreliable and vulnerable this type of control. Just to illustrate, until then I was doing this control by date of creation and alteration of the files, but the house fell if the original file was deleted and was copied another, older to its place.
– Fernando Leal
understood, versioning does not apply. In the case of hash is as Miguel said, it is only necessary to be careful with the collision. Another suggestion I have is to use the class
FileSystemWatcher
that monitors modifications in a directory, if that’s the case you could come back with your old idea because you would know the date of the modification in relation to the folder.– Leonardo Bosquett
@Miguelangelo, I understand your placement, but despite the possibility, it is a mild form, and the probability that this occurs is really very remote, to remove this probability, just comparing all bytes, hehe
– Fernando Leal
@Leonardobosquett, I’ve even run some tests on
FileSystemWatcher
and presented as an option here in the company, more as we are third parties, the possibility to maintain a service, dropped when user, can simply finish the service, the whole control, goes from nothing, and maybe it would not be noticed easily, to be started before major problems. (Users with privileges is bone);– Fernando Leal
@Fernando only one point once I did something similar checked by byte array? will it be efficient in your case, in mine was?
– user6026
@Harrypotter, in case you say send all bytes, from the image to the server and check if the bytes match? If this is so it is impractical, because the traffic will be very large, think of 100 images of 200 KB, each cycle will be 20000 KB (~20 MB), and more the return of the images that are different, if so is more efficient, each cycle just download them again, without any verification;
– Fernando Leal
@Fernando understood, it is even complicated, I would use a base so, where all the altered images were made an update to the device. From what I understand it’s only from the server to the device...
– user6026
Checking the file size (client)/(server) would not solve the problem?
– PauloHDSousa
@Paulohdsousa, would not be very reliable, since it would not be difficult to be edited and continue with the same size. right?
– Fernando Leal
You can have a look here http://msdn.microsoft.com/en-us/library/ms379571(v=vs.80). aspx, search for the title on the page 'Collision Resolution in the Hashtable Class', maybe it will help you
– Leonardo Bosquett
@Fernando I currently use CRC32, size, date and time of filesystem modification - it is a system own file synchronization between machines.
– Bacco