[...] The first thought was to use JSON, but listening and reading some things, I ended up seeing that a lot of the processing of a server ends up going because of parse and as an alternative I came to think of using some other integration possibilities, as Socket or RMI. [...]
This from here is fallacious for several reasons:
You’re mixing things up. Socket is a communication channel to carry a lot of bytes back and forth. JSON is a data structuring format. Making an analogy with the real world, let’s assume that you want to travel from City A to City B, so the socket would be equivalent to a garage gate or a train platform and JSON would be the luggage you’re carrying with you. There’s no point in saying you choose one over the other.
Assuming you use socket to carry something that is not JSON, still, to assemble packages with that data, send them back and forth and interpret itlos correctly, you will probably end up inventing a parser with this and this parser can be simpler or more complex than using JSON.
RMI is a technology that allows you to work faster on the integration and makes the details of the transport of objects more transparent. However, the RMI protocol is significantly complex and cumbersome, far more complicated than HTTP.
There are also other aspects to consider:
RMI is not very firewall friendly.
Inspecting the content of traffic messages via RMI, logging and measuring traffic are complicated tasks.
Serialization is a very boring thing to deal with RMI, especially when you have clients with different versions of the application that use mutually incompatible classes and the server has to support both. This scenario is much easier to handle using REST + JSON.
All of this ultimately uses sockets. If you want to use sockets directly, a lot of things will still depend on what you’re using in them. However, without having more details of what is the format of the traffic data, it is difficult to evaluate this solution.
There are other alternatives besides these. Nothing prevents you from sending via HTTP, FTP, SOAP, email, or whatever, a lot of bytes with the structure you prefer: XML, JSON, images, binary, TXT or the format you want.
Ah, and performance is not always such an important issue or can be measured simply. For example, let’s look at the following scenarios:
You have a distributed computational grid crunching numbers to do quantum simulations of particle behavior.
You will provide high resolution live video streaming to tens of thousands of people.
You have to immediately process thousands of simultaneous purchase orders originating from stores networks with thousands of branches.
You have a Crawler that searches the internet or social platforms to search, sort and filter information.
You have a social network where multiple customers/users are interacting with each other, posting and receiving a lot of content.
Each of these scenarios is optimized in a different way. A considered high-performance solution in one of these scenarios may be completely inadequate in another.
Your scenario is closer to 3, and I think what you’ll want to optimize will be the request response time, which is different from the classic performance concept which is just the amount of instructions processed per unit of time (which makes more sense in case 1). With this, you will have to measure or estimate:
(A) How long it takes to receive the request.
(B) How long it takes to parse the request data.
(C) How long it takes to do some service with this data (turn into other data, save to database, send to another service, etc).
(D) How much memory is spent on this process.
If by measuring or estimating this process, the time spent is dominated by anything other than B, then optimizing parse time will not bring you many real gains, as that’s probably not where your bottleneck would be. In most real cases I see, the biggest bottleneck is in item C, not item B. In fact, in the case of JSON, which has a very simple structure, almost always item B ends up being insignificant.
Another important aspect is whether you will use synchronous or asynchronous request. The asynchronous model is much more scalable, and is one of the reasons behind the success of sites like Amazon, Twitter and Facebook, which deal with a monstrous volume of requests. The idea here is as soon as the above step A is completed, that the received data is stored in memory, on disk or in the database with as little processing as possible in order to give an immediate response to the client. This reply to the client does not say whether the request was successful or not, it just says that it was received and will be processed later. With this, you can move this received information to other servers and process it without having to keep the client connection open, including item B and most of items C and D. Later the client requests the server to see if the process has been completed and if it has been successful or wrong, and if the process has already been completed, the server takes the result already ready and properly formatted and the delivery.
Finally, you must separate two things: It is one thing to specify which format of the data, it is quite another to specify how they leave the source and arrive at the destination. Technologies such as RMI, CORBA and EJB unite the two, but the most modern trends in computer architecture are going the other way.
Either way, I would’ve already eliminated the MRI. Managing the RMI is difficult, and it only works well when there is a network available of high availability, high reliability and high speed, which is not your case. The MRI would be a viable solution for case 4 outlined above and perhaps case 1, but for case 3, this does not work as well. The RMI would only work cool (maybe) if you have a set of servers forming a grid, and even if it is restricted only to the server environment. But anyway, I don’t see the RMI as a good alternative to talking to customers.
What kind of data do you need to integrate? Will it be a lot of communications? Too much data? Any more systems need to integrate with the cloud server? Perhaps answering these questions makes it easier to help you.
– Murillo Goulart
Welcome to [en.so]! Reinforcing what Murillo said above, it is almost impossible to say the best format for an integration without having a notion of type, size, frequency of transmission and expected response time. Please edit your question with these details, if possible with examples. Hug!
– utluiz
Just to put it in context and give you an idea, JSON’s interpretation time is completely insignificant in the overwhelming majority of cases, except for large volumes of data or perhaps for "real-time" systems, in which you can get advantages in sweating binary or more compact formats, but humanly unintelligible.
– utluiz
@Murillogoulart sorry for the lack of information in the question. I added the scenario that will be the integration and some more information.
– melpin
@utluiz Thank you very much! I have already made the modifications and put as will be the scenario more or less. Regarding the parse of the JSON thought due to the objects have other objects coupled, as Sale, I own a list of products sold, that in this list I own a product, that in this product has several other information... So I thought about these alternatives like RMI and Socket.
– melpin
I just wanted to say that it is a common fallacy to design a system based on which it will be sent in a certain protocol and in a certain format. Instead design your system to solve the problem in question and think that the data can be sent through any protocol and in any format. The company where I currently work suffers from not having respected this principle...
– Bruno Costa
Hello @Brunocost in the case the system that will run in the cloud already has an idea, missing only realize the integrations of the data that exist in the clients with the server in the cloud, in the case remaining this doubt in adopting one of these technologies.
– melpin
Did any answer resolve what was in doubt? Something else needs to be improved?
– Murillo Goulart
@Murillogoulart I believe my doubt has been solved with the help of all the answers. Thank you.
– melpin