19
I have seen many examples of how to export data from Google Analytics to other formats as well as keep the exported data updated, but none so far served to export the data to Mysql given the following problems:
- Exceeding the limits for consultations;
- Process takes hours to keep records up to date;
- Process becomes unviable from a certain number of records because the day only has 24h.
So far my implementation is in this format:
┌─────────────────────────────────────────┐
│ Por cada registro na base de dados local │
└────────────────────┬────────────────────┘
│
┌─────────────────┴──────────────────┐
│ Enviar URL para o Google Analytics │
└─────────────────┬──────────────────┘
│
┌─────────────────────────────┴─────────────────────────────┐
│ Com estatísticas recebidas, atualizar base de dados local │
└───────────────────────────────────────────────────────────┘
This works well, but to overcome the problems listed, you need to take breaks:
while ($registoLocal = $sth->fetch()) {
if (time_nanosleep(0, 500000000) === true) {
// comunicar com o Google Analytics
}
}
This is reflected in a +/- fixed time per $registoLocal
, currently in 2 seconds which leaves us with a maximum of 43200 queries per day.
In addition we have the limits of API of Google Analytics:
- 50,000 requests per project per day
- 10 queries per second per IP
Code in use
Include dependencies and instantiate the API:
/* API dependencies
*/
require_once('caminho/para/Google_Client.php');
require_once('caminho/para/contrib/Google_AnalyticsService.php');
/* Create client object and set app name
*/
$client = new Google_Client();
$client->setApplicationName('O nome da minha aplicação'); // app name
/* Set assertion credentials
*/
$client->setAssertionCredentials(
new Google_AssertionCredentials(
'[email protected]', // API Service Account Email Address
array('https://www.googleapis.com/auth/analytics.readonly'),
file_get_contents('caminho/para/o/ficheiro/O nome da minha aplicação.p12') // downloaded keyfile
)
);
/* Other settings
* Access the Google Cloud Console and view the Web App-project details:
* https://cloud.google.com/console
*/
$client->setClientId('outro-mega-super-grande-email-da-google@developer.gserviceaccount.com'); // Client ID for web application Email Address
$client->setAccessType('offline');
$client->setUseObjects(true);
Link to local database:
/* Connection access details
*/
$dbcon = array(
"host" => "localhost",
"dbname" => "nomeBaseDados",
"username" => "utilizador",
"password" => "password"
);
/* Connect to the database
*/
$dbh = new PDO(
'mysql:host='.$dbcon['host'].';dbname='.$dbcon['dbname'].';',
$dbcon['username'],
$dbcon['password'],
array(
PDO::ATTR_PERSISTENT => false,
PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => true,
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"
)
);
/* Prepare query for Insert
*/
$sqlInsert = "
INSERT INTO press_statistic_analytics
(id, press_id, ga_profile_id, ga_sessions, ga_pageviews, ga_percentNewVisits, date_created, date_updated)
VALUES
(null, :press_id, :ga_profile_id, :ga_sessions, :ga_pageviews, :ga_percentNewVisits, :date_created, :date_updated)";
$queryInsert = $dbh->prepare($sqlInsert);
/* Prepare query for Update
*/
$sqlUpdate = "
UPDATE press_statistic_analytics SET
ga_sessions = :ga_sessions,
ga_pageviews = :ga_pageviews,
ga_percentNewVisits = :ga_percentNewVisits,
date_updated = :date_updated
WHERE press_id = :press_id
AND ga_profile_id = :ga_profile_id ";
$queryUpdate = $dbh->prepare($sqlUpdate);
/* Get all Articles
*/
$sth = $dbh->query("
SELECT
press_i18n.press_id,
press_i18n.title,
press_i18n.slug
FROM press_i18n
ORDER BY press_i18n.id DESC");
$sth->setFetchMode(PDO::FETCH_OBJ);
For each local record, communicate with Google Analytics and collect information:
while ($article = $sth->fetch()) {
if (time_nanosleep(0, 500000000) === true) {
/* Create service and get data
*/
$gaProfileID = 'ga:11111111';
$startDate = '2000-01-01';
$endDate = date("Y-m-d", time());
$metrics = 'ga:sessions,ga:pageviews,ga:percentNewVisits';
$optParams = array(
"filters" => "ga:pagePath==/".$article->slug
);
$service = new Google_AnalyticsService($client);
/* Query Google and process results
* to a more usable array.
*/
try {
// Get data
$results = $service->data_ga->get($gaProfileID, $startDate, $endDate, $metrics, $optParams);
$googleAnalyticsArr = array();
if (isset($results) && is_object($results) && count($results->getRows()) > 0) {
$headersArr = (Array)$results->getColumnHeaders();
$i=0;
foreach ($results->getRows() as $row) {
foreach ($row as $cell) {
$googleAnalyticsArr[$headersArr[$i]->name] = htmlspecialchars($cell, ENT_NOQUOTES);
$i++;
}
}
}
/* Update or insert the collected data
*/
$sthSelectExistent = $dbh->query("
SELECT *
FROM press_statistic_analytics
WHERE press_id = ".$article->press_id."
AND ga_profile_id = '".$gaProfileID."'
");
$sthSelectExistent->setFetchMode(PDO::FETCH_OBJ);
$rowSelectExistent = $sthSelectExistent->fetch();
/* Prepare data
*/
$ga_sessions = isset($googleAnalyticsArr["ga:sessions"]) ? $googleAnalyticsArr["ga:sessions"] : 0;
$ga_pageviews = isset($googleAnalyticsArr["ga:pageviews"]) ? $googleAnalyticsArr["ga:pageviews"] : 0;
$ga_percentNewVisits = isset($googleAnalyticsArr["ga:percentNewVisits"]) ? $googleAnalyticsArr["ga:percentNewVisits"] : 0.00;
if (is_object($rowSelectExistent)) {
$queryUpdate->execute(
array(
"ga_sessions" => $ga_sessions,
"ga_pageviews" => $ga_pageviews,
"ga_percentNewVisits" => $ga_percentNewVisits,
"date_updated" => date("Y-m-d H:i:s"),
"press_id" => $article->press_id,
"ga_profile_id" => $gaProfileID
)
);
}
else {
$queryInsert->execute(
array(
"press_id" => $article->press_id,
"ga_profile_id" => $gaProfileID,
"ga_sessions" => $ga_sessions,
"ga_pageviews" => $ga_pageviews,
"ga_percentNewVisits" => $ga_percentNewVisits,
"date_created" => date("Y-m-d H:i:s"),
"date_updated" => "0000-00-00 00:00:00"
)
);
}
}
catch (apiServiceException $e) {
// Handle API service exceptions.
//$error = $e->getMessage();
var_dump($e);
}
catch (Google_ServiceException $e) {
// Handle API service exceptions.
//$error = $e->getMessage();
var_dump($e);
}
}
}
Question
How to optimize this process of collecting statistics from each URL present in Google Analytics to the local Mysql database in order to avoid forced breaks on script and without exceeding the limits currently imposed by Google ?
Do you have in your database all the pages tracked by Analytics? It would not be possible to make a single request to the service, without filtering per page, and then make the matching loop with your bank?
– bfavaretto
This is what I thought at the time of the first implementation, but they have a limit of 1000 results per query, which made me give up this method! Since it doesn’t come all just implementing some control to get 999, then 999 more, etc... It may be a solution, it needs more tests to ascertain its viability.
– Zuul
Fifteenth? :D
– brasofilo
@brasofilo Hmmm... bounties is good, but spending the REP on an answer that solves the problem! Bounty will bring "this should work" answers and the REP will be spent without solution viability... But... If you have a solution, let me know, that I put as much Bounty as possible on this topic ;)
– Zuul
Ah, that’s part of the Gamble, a good one may appear, it may not :) I found the problem very interesting and thought to put one, but then I saw whose, jejeje
– brasofilo
@Zuul I think you can tailor this script to use the cron implementation I made for the SES email sending http://answall.com/questions/9958/enviar-email-em-massa-usando-amazon-ses but what’s the use of that? I mean what tool you want to have?
– marcusagm
I understand that you do URL by URL because of the limitation of 1000 records per query, in which case, which ensures that a URL would not have more than 1000, this is taken into account?
– Ricardo
Yeah, it looks like this reward is going to be "for the government" anyway. : D Or is @Ricardo going to make a response in the 45 seconds of the second half? :)
– gustavox
Because it’s @gustavox, my comment was as follows: If there are X urls to be run (impossible to change this) and google stipulates Y interval seconds then the minimum time is (X*Y)+(its processing). Now, if the fact that a URL has more than 1000 records is taken into account and handled then we can start thinking about reducing the Urls... Or contact google even to see if they have any response to this.
– Ricardo
Well, analyzing your problem I have a few questions:
– Jhonatan Simões
Well, analyzing your problem I have a few questions: Is it possible to separate the search code and write on a separate page without while just once? If yes to the previous question why not schedule this page to be called by your server’s task scheduler? Another thing, considered using instead of while the multi Curl, I believe it looks much better the code.
– Jhonatan Simões
So @Ricardo I really don’t understand enough to give an opinion here... It’s a shame that Zuul didn’t show up to give you a feedback, he’s a top user here at Sopt, so I guess he hasn’t seen the comments yet, because otherwise I’m sure he’d answer... I offered the reward mainly because whenever I enter the area of unanswered questions I find it a judicious question so well evaluated have no answer, because I think a solution here can still be useful to me in the future, and by the very learning that I always ask questions like this...
– gustavox
How do I think that no one is really going to risk an answer, and that this reward is really lost (which is NOT to say that I will give the reward for any answer that comes up now, even more that I will not have time to see how the community will evaluate the answer...)Then I’ll do it differently, and I’ll leave here guaranteed a 100-point reward for any response that Zuul accepts and/or has more than 10 favorable votes. This reward I will grant after the answer (which is accepted and even suggested by the system, let it be said) has been accepted or voted well.
– gustavox