Collect Google Analytics data for Mysql tables

Asked

Viewed 594 times

19

I have seen many examples of how to export data from Google Analytics to other formats as well as keep the exported data updated, but none so far served to export the data to Mysql given the following problems:

  • Exceeding the limits for consultations;
  • Process takes hours to keep records up to date;
  • Process becomes unviable from a certain number of records because the day only has 24h.

So far my implementation is in this format:

         ┌─────────────────────────────────────────┐
         │ Por cada registro na base de dados local │
         └────────────────────┬────────────────────┘
                              │
            ┌─────────────────┴──────────────────┐
            │ Enviar URL para o Google Analytics │
            └─────────────────┬──────────────────┘
                              │
┌─────────────────────────────┴─────────────────────────────┐
│ Com estatísticas recebidas, atualizar base de dados local │
└───────────────────────────────────────────────────────────┘

This works well, but to overcome the problems listed, you need to take breaks:

while ($registoLocal = $sth->fetch()) {

    if (time_nanosleep(0, 500000000) === true) {
        // comunicar com o Google Analytics
    }
}

This is reflected in a +/- fixed time per $registoLocal, currently in 2 seconds which leaves us with a maximum of 43200 queries per day.

In addition we have the limits of API of Google Analytics:

  • 50,000 requests per project per day
  • 10 queries per second per IP

Code in use

Include dependencies and instantiate the API:

/* API dependencies
 */
require_once('caminho/para/Google_Client.php');
require_once('caminho/para/contrib/Google_AnalyticsService.php');

/* Create client object and set app name
 */
$client = new Google_Client();
$client->setApplicationName('O nome da minha aplicação'); // app name

/* Set assertion credentials
 */
$client->setAssertionCredentials(
    new Google_AssertionCredentials(
        '[email protected]', // API Service Account Email Address
        array('https://www.googleapis.com/auth/analytics.readonly'),
        file_get_contents('caminho/para/o/ficheiro/O nome da minha aplicação.p12')  // downloaded keyfile
    )
);

/* Other settings
 * Access the Google Cloud Console and view the Web App-project details:
 * https://cloud.google.com/console
 */
$client->setClientId('outro-mega-super-grande-email-da-google@developer.gserviceaccount.com'); // Client ID for web application Email Address
$client->setAccessType('offline');
$client->setUseObjects(true);

Link to local database:

/* Connection access details
 */
$dbcon = array(
    "host"     => "localhost",
    "dbname"   => "nomeBaseDados",
    "username" => "utilizador",
    "password" => "password"
);

/* Connect to the database
 */
$dbh = new PDO(
    'mysql:host='.$dbcon['host'].';dbname='.$dbcon['dbname'].';',
    $dbcon['username'],
    $dbcon['password'],
    array(
        PDO::ATTR_PERSISTENT               => false,
        PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => true,
        PDO::ATTR_ERRMODE                  => PDO::ERRMODE_EXCEPTION,
        PDO::MYSQL_ATTR_INIT_COMMAND       => "SET NAMES utf8"
    )
);

/* Prepare query for Insert
 */
$sqlInsert = "
INSERT INTO press_statistic_analytics
    (id, press_id, ga_profile_id, ga_sessions, ga_pageviews, ga_percentNewVisits, date_created, date_updated)
VALUES
    (null, :press_id, :ga_profile_id, :ga_sessions, :ga_pageviews, :ga_percentNewVisits, :date_created, :date_updated)";

$queryInsert = $dbh->prepare($sqlInsert);

/* Prepare query for Update
 */
$sqlUpdate = "
UPDATE press_statistic_analytics SET
    ga_sessions = :ga_sessions,
    ga_pageviews = :ga_pageviews,
    ga_percentNewVisits = :ga_percentNewVisits,
    date_updated = :date_updated
WHERE press_id = :press_id
AND ga_profile_id = :ga_profile_id ";

$queryUpdate = $dbh->prepare($sqlUpdate);

/* Get all Articles
 */
$sth = $dbh->query("
SELECT
    press_i18n.press_id,
    press_i18n.title,
    press_i18n.slug
FROM press_i18n
ORDER BY press_i18n.id DESC");

$sth->setFetchMode(PDO::FETCH_OBJ);

For each local record, communicate with Google Analytics and collect information:

while ($article = $sth->fetch()) {

    if (time_nanosleep(0, 500000000) === true) {

        /* Create service and get data
         */
        $gaProfileID = 'ga:11111111';
        $startDate = '2000-01-01';
        $endDate = date("Y-m-d", time());
        $metrics = 'ga:sessions,ga:pageviews,ga:percentNewVisits';
        $optParams = array(
            "filters" => "ga:pagePath==/".$article->slug
        );

        $service = new Google_AnalyticsService($client);

        /* Query Google and process results
         * to a more usable array.
         */
        try {

            // Get data
            $results = $service->data_ga->get($gaProfileID, $startDate, $endDate, $metrics, $optParams);

            $googleAnalyticsArr = array();

            if (isset($results) && is_object($results) && count($results->getRows()) > 0) {

                $headersArr = (Array)$results->getColumnHeaders();

                $i=0;

                foreach ($results->getRows() as $row) {

                    foreach ($row as $cell) {

                        $googleAnalyticsArr[$headersArr[$i]->name] = htmlspecialchars($cell, ENT_NOQUOTES);

                        $i++;
                    }
                }
            }

            /* Update or insert the collected data
             */
            $sthSelectExistent = $dbh->query("
                SELECT *
                FROM press_statistic_analytics
                WHERE press_id = ".$article->press_id."
                AND ga_profile_id = '".$gaProfileID."'
            ");
            $sthSelectExistent->setFetchMode(PDO::FETCH_OBJ);
            $rowSelectExistent = $sthSelectExistent->fetch();

            /* Prepare data
             */
            $ga_sessions = isset($googleAnalyticsArr["ga:sessions"]) ? $googleAnalyticsArr["ga:sessions"] : 0;
            $ga_pageviews = isset($googleAnalyticsArr["ga:pageviews"]) ? $googleAnalyticsArr["ga:pageviews"] : 0;
            $ga_percentNewVisits = isset($googleAnalyticsArr["ga:percentNewVisits"]) ? $googleAnalyticsArr["ga:percentNewVisits"] : 0.00;

            if (is_object($rowSelectExistent)) {

                $queryUpdate->execute(
                    array(
                        "ga_sessions"         => $ga_sessions,
                        "ga_pageviews"        => $ga_pageviews,
                        "ga_percentNewVisits" => $ga_percentNewVisits,
                        "date_updated"        => date("Y-m-d H:i:s"),
                        "press_id"            => $article->press_id,
                        "ga_profile_id"       => $gaProfileID
                    )
                );
            }
            else {
                $queryInsert->execute(
                    array(
                        "press_id"            => $article->press_id,
                        "ga_profile_id"       => $gaProfileID,
                        "ga_sessions"         => $ga_sessions,
                        "ga_pageviews"        => $ga_pageviews,
                        "ga_percentNewVisits" => $ga_percentNewVisits,
                        "date_created"        => date("Y-m-d H:i:s"),
                        "date_updated"        => "0000-00-00 00:00:00"
                    )
                );
            }
        }
        catch (apiServiceException $e) {

            // Handle API service exceptions.
            //$error = $e->getMessage();
            var_dump($e);
        }
        catch (Google_ServiceException $e) {

            // Handle API service exceptions.
            //$error = $e->getMessage();
            var_dump($e);
        }
    }
}

Question

How to optimize this process of collecting statistics from each URL present in Google Analytics to the local Mysql database in order to avoid forced breaks on script and without exceeding the limits currently imposed by Google ?

  • Do you have in your database all the pages tracked by Analytics? It would not be possible to make a single request to the service, without filtering per page, and then make the matching loop with your bank?

  • This is what I thought at the time of the first implementation, but they have a limit of 1000 results per query, which made me give up this method! Since it doesn’t come all just implementing some control to get 999, then 999 more, etc... It may be a solution, it needs more tests to ascertain its viability.

  • @brasofilo Hmmm... bounties is good, but spending the REP on an answer that solves the problem! Bounty will bring "this should work" answers and the REP will be spent without solution viability... But... If you have a solution, let me know, that I put as much Bounty as possible on this topic ;)

  • Ah, that’s part of the Gamble, a good one may appear, it may not :) I found the problem very interesting and thought to put one, but then I saw whose, jejeje

  • @Zuul I think you can tailor this script to use the cron implementation I made for the SES email sending http://answall.com/questions/9958/enviar-email-em-massa-usando-amazon-ses but what’s the use of that? I mean what tool you want to have?

  • 1

    I understand that you do URL by URL because of the limitation of 1000 records per query, in which case, which ensures that a URL would not have more than 1000, this is taken into account?

  • Yeah, it looks like this reward is going to be "for the government" anyway. : D Or is @Ricardo going to make a response in the 45 seconds of the second half? :)

  • Because it’s @gustavox, my comment was as follows: If there are X urls to be run (impossible to change this) and google stipulates Y interval seconds then the minimum time is (X*Y)+(its processing). Now, if the fact that a URL has more than 1000 records is taken into account and handled then we can start thinking about reducing the Urls... Or contact google even to see if they have any response to this.

  • Well, analyzing your problem I have a few questions:

  • Well, analyzing your problem I have a few questions: Is it possible to separate the search code and write on a separate page without while just once? If yes to the previous question why not schedule this page to be called by your server’s task scheduler? Another thing, considered using instead of while the multi Curl, I believe it looks much better the code.

  • So @Ricardo I really don’t understand enough to give an opinion here... It’s a shame that Zuul didn’t show up to give you a feedback, he’s a top user here at Sopt, so I guess he hasn’t seen the comments yet, because otherwise I’m sure he’d answer... I offered the reward mainly because whenever I enter the area of unanswered questions I find it a judicious question so well evaluated have no answer, because I think a solution here can still be useful to me in the future, and by the very learning that I always ask questions like this...

  • How do I think that no one is really going to risk an answer, and that this reward is really lost (which is NOT to say that I will give the reward for any answer that comes up now, even more that I will not have time to see how the community will evaluate the answer...)Then I’ll do it differently, and I’ll leave here guaranteed a 100-point reward for any response that Zuul accepts and/or has more than 10 favorable votes. This reward I will grant after the answer (which is accepted and even suggested by the system, let it be said) has been accepted or voted well.

Show 8 more comments

2 answers

2

Proposed solution:

1 . Create a column with the date of the last time you google the data from article

ALTER TABLE press_i18n ADD COLUMN date_ga DATE NULL

2 . Change the script that searches all articles to pick up by the last date not updated

$sth = $dbh->query("
SELECT
    press_i18n.press_id,
    press_i18n.title,
    press_i18n.slug
FROM press_i18n
WHERE date_ga < DATE(NOW()) OR date_ga IS NULL
ORDER BY press_i18n.id DESC LIMIT 1");

3 . Give UPDATE of date_ga from the database search log just below

// Pegamos o artigo
$article = $sth->fetch();

if(!$article) {
    echo "Não encontrou article";
    break;
}

echo "Article: " . $article->press_id;

$sthUpdate = "UPDATE press_i18n SET date_ga = DATE(NOW()) WHERE press_i18n.press_id = :press_id";
$queryUpdateGa = $dbh->prepare($sthUpdate);

$queryUpdateGa->execute(
    array(
        "press_id" => $article->press_id
    )
);

4 . Change communication code with Google to always run an article record at a time

$gaProfileID = 'ga:11111111';
$startDate = '2000-01-01';
$endDate = date("Y-m-d", time());
$metrics = 'ga:sessions,ga:pageviews,ga:percentNewVisits';
$optParams = array(
    "filters" => "ga:pagePath==/".$article->slug
);

$service = new Google_AnalyticsService($client);

try {

    // Get data
    $results = $service->data_ga->get($gaProfileID, $startDate, $endDate, $metrics, $optParams);

    $googleAnalyticsArr = array();

    if (isset($results) && is_object($results) && count($results->getRows()) > 0) {

        $headersArr = (Array)$results->getColumnHeaders();

        $i=0;

        foreach ($results->getRows() as $row) {

            foreach ($row as $cell) {

                $googleAnalyticsArr[$headersArr[$i]->name] = htmlspecialchars($cell, ENT_NOQUOTES);

                $i++;
            }
        }
    }

    /* Update or insert the collected data
     */
    $sthSelectExistent = $dbh->query("
        SELECT *
        FROM press_statistic_analytics
        WHERE press_id = ".$article->press_id."
        AND ga_profile_id = '".$gaProfileID."'
    ");
    $sthSelectExistent->setFetchMode(PDO::FETCH_OBJ);
    $rowSelectExistent = $sthSelectExistent->fetch();

    /* Prepare data
     */
    $ga_sessions = isset($googleAnalyticsArr["ga:sessions"]) ? $googleAnalyticsArr["ga:sessions"] : 0;
    $ga_pageviews = isset($googleAnalyticsArr["ga:pageviews"]) ? $googleAnalyticsArr["ga:pageviews"] : 0;
    $ga_percentNewVisits = isset($googleAnalyticsArr["ga:percentNewVisits"]) ? $googleAnalyticsArr["ga:percentNewVisits"] : 0.00;

    if (is_object($rowSelectExistent)) {

        $queryUpdate->execute(
            array(
                "ga_sessions"         => $ga_sessions,
                "ga_pageviews"        => $ga_pageviews,
                "ga_percentNewVisits" => $ga_percentNewVisits,
                "date_updated"        => date("Y-m-d H:i:s"),
                "press_id"            => $article->press_id,
                "ga_profile_id"       => $gaProfileID
            )
        );
    }
    else {
        $queryInsert->execute(
            array(
                "press_id"            => $article->press_id,
                "ga_profile_id"       => $gaProfileID,
                "ga_sessions"         => $ga_sessions,
                "ga_pageviews"        => $ga_pageviews,
                "ga_percentNewVisits" => $ga_percentNewVisits,
                "date_created"        => date("Y-m-d H:i:s"),
                "date_updated"        => "0000-00-00 00:00:00"
            )
        );
    }
}
catch (apiServiceException $e) {

    // Handle API service exceptions.
    //$error = $e->getMessage();
    var_dump($e);
}
catch (Google_ServiceException $e) {

    // Handle API service exceptions.
    //$error = $e->getMessage();
    var_dump($e);
}

5 . Assuming that this page has a name (and can be accessed externally) we can create a PHP page to call it or schedule in CRON for this page to be called every X time Since I do not know which is your server I will create a page that calls the page of communication with google every 1 second, theoretically we can duplicate the requests in google.

<meta http-equiv="refresh" content="1">

<?php 
    @file_get_contents( "http://seuservidor/pasta/arquivo_que_comunica_com_o_google.php" );     
?>
  • The solution was not tested because I do not have the database of the question, but I do a similar process here in my company and works by crontable approximately 150 thousand requests/day
  • I believe using the CRONTAB You can increase the number of requests on google.
  • In case of doubt I am available in the chat

1

This is not a limitation of the tool by capacity but by price ( https://developers.google.com/analytics/devguides/config/mgmt/v3/limits-quotas ).

They set the search limit based on a request every 1.7 seconds that would be a good time for a routine. So swindling this is going to be a little tricky if you’re not going to pay for the api. Adwords works the same way and DSP pays good money to use Exchanges.

In the API screen the billing can be linked to increase this quota. And remembering that if you are using the free account you still have the data sample problem in the API ( In pay too, but they say no ). Remember that if using premium you can use the Big Query that will give you the most chewed files.

Some tools you can use like: - Webtrends; - Piwik ( Open Source Analytics ); - At Internet;

I know that’s not exactly the answer you were hoping for, but the limit is there for them to be able to monetize what they support from data. Try with Piwik will have full control of the FREE data, and can still implement with Javascript via tag manager in the same way.


I forgot to comment outside the real time, the other reports take 24 hours ( No SLA ) in the free version and 4 hours in the premium version. You can request every hour when there is a difference in data.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.