1. Halo Guest, pastikan Anda selalu menaati peraturan forum sebelum mengirimkan post atau thread baru.

Untuk bikin AGC atau grabing cara manakah yg jadi favoritmu?

Discussion in 'Pemrograman Web' started by mp3online, Dec 7, 2012.

  1. mp3online

    mp3online Super Hero

    Joined:
    Jul 19, 2011
    Messages:
    2,228
    Likes Received:
    294
    Location:
    jakarta
    AGC atau grabber bisa dibangun menggunakan curl, snoopy class dan file get contents.
    dari ke 3 metode tersebut mana yang jadi favoritmu?
    metode mana yang lebih cepat?
    di sini aku cuma mau ngasih gambaran kecepatan masing2 metode tersebut. kalian bisa membuktikan sendiri dengan script ini
    PHP:
    <?php
    # increase the amount of time available for the functions to run.
    set_time_limit(1234);
     
    include 
    'Snoopy.class.php';
     
    # initialise variables
    $contents '';
    $times = array();
     
    $url 'http://php.net/';
     

    # generate microtime value
    function getmicrotime($t){
      list(
    $usec$sec) = explode(" ",$t);
      return ((float)
    $usec + (float)$sec);
    }

    for(
    $i=0;$i<=50;++$i){
     
    ### curl
     
    $start microtime();
     
     
    $ch curl_init();
     
    $user_agent "Mozilla/5.0";
     
    curl_setopt ($chCURLOPT_URL$url);
     
    curl_setopt ($chCURLOPT_USERAGENT$user_agent);
     
    curl_setopt ($chCURLOPT_HEADER1);
     
    curl_setopt ($chCURLOPT_RETURNTRANSFER1);
     
    curl_setopt ($chCURLOPT_FOLLOWLOCATION1);
     
    curl_setopt ($chCURLOPT_TIMEOUT120);
     
    $contents curl_exec($ch);
     
    curl_close($ch);
     
     
    $end microtime();
     
    $times['curl'][] = (getmicrotime($end) - getmicrotime($start));
     
    ### snoopy
     
    $start microtime();
     
     
    $snoopy = new Snoopy();
     
    $snoopy->agent "Mozilla/5.0";
     
    $snoopy->fetch($url);
     
    $contents $snoopy->results;
     
     
    $end microtime();
     
    $times['snoopy'][] = (getmicrotime($end) - getmicrotime($start));

    ### file_get_contents
     
    $start microtime();
     
     
    $contents file_get_contents($url);
     
     
    $end microtime();
     
    $times['file_get_contents'][] = (getmicrotime($end) - getmicrotime($start));
     
    }

    # sort the times
    sort($times['curl']);
    sort($times['snoopy']);
    sort($times['file_get_contents']);

     
    #calculate stats for times 
    foreach($times as $method=>$time){
        echo 
    '<b>'.$method.'</b><br/>average = '.(array_sum($time)/count($time)).'<br/>min = '.$time[0].'<br>max = '.$time[count($time)-1].'<br/>';
    }
    ?>
    hasilnya kira2 semacam ini, tapi gak selalu sama seperti ini, tergantung server dan koneksi internetnya juga
    curl
    average = 1.6519486109416
    min = 1.4760458469391
    max = 2.1774718761444
    snoopy
    average = 1.5967303571247
    min = 1.3639419078827
    max = 1.9960110187531
    file_get_contents
    average = 1.7033786319551
    min = 1.3668169975281
    max = 2.8225719928741
     
    dimasku, minul and suksesjitu like this.
  2. suriemie

    suriemie Ads.id Pro

    Joined:
    Sep 15, 2006
    Messages:
    404
    Likes Received:
    53
    saya seringnya pake curl doank soalnya gampang ga usah include file lagi :D
    dulu sering pake snoopy tapi trus ketemu problem (lupa masalahnya apa waktu itu) yg bikin scraping nya ngaco jadinya pindah ke curl. mungkin problemnya bisa dibenerin kalo otak atik setting snoopy tapi waktu itu pake curl standar langsung kelar problemya. jadi ya pindah ke curl sampe skarang
     
  3. yuniico

    yuniico Super Hero

    Joined:
    Oct 26, 2009
    Messages:
    1,925
    Likes Received:
    374
    Location:
    F*CKGINA
    Tetep setia kepada file_get_content saja. Hehehe
     
    syalala likes this.
  4. nicefirework

    nicefirework Super Hero

    Joined:
    Aug 21, 2010
    Messages:
    1,304
    Likes Received:
    251
    masih senang pakai curl, bedanya juga tidak significant.
     
  5. nekaters

    nekaters Hero

    Joined:
    Aug 12, 2012
    Messages:
    673
    Likes Received:
    31
    Location:
    Yogyakarta
    idem masgan. curl jg
     
  6. chiman

    chiman Hero

    Joined:
    Jul 16, 2007
    Messages:
    732
    Likes Received:
    33
    Location:
    Bali
    buat ane:

    curl + simple_html_dom

    udah bisa memenuhi hampir semua kebutuhan scraping
     
  7. aryanta12345

    aryanta12345 Banned

    Joined:
    Aug 14, 2011
    Messages:
    624
    Likes Received:
    66
    Location:
    Leyangan, Grobogan
    ga mudeng aku mas... ;(
     
  8. dimasku

    dimasku Super Hero

    Joined:
    Aug 6, 2012
    Messages:
    1,296
    Likes Received:
    151
    Location:
    Surabaya
    apapaun metodenya.. tetep pake simple_html_dom
     
  9. tsaniy

    tsaniy Super Hero

    Joined:
    Feb 19, 2009
    Messages:
    1,004
    Likes Received:
    50
    Location:
    jombang kidul
    Setia ama curl...soale ga mudeng ama yang lain..!!
     
  10. Gambreng_12

    Gambreng_12 Ads.id Pro

    Joined:
    Dec 4, 2010
    Messages:
    270
    Likes Received:
    6
    Location:
    Rumah
    kalau saya file_get_content saja
     
  11. brojomusti99

    brojomusti99 Ads.id Fan

    Joined:
    Feb 23, 2009
    Messages:
    161
    Likes Received:
    4
    Location:
    Somewhere Out There
    wah bahasanya ga ngerti yaa... apa pada programmer semua yaa....
    :confused:

    Sent from my GT-I9100 using Tapatalk 2
     
  12. mp3online

    mp3online Super Hero

    Joined:
    Jul 19, 2011
    Messages:
    2,228
    Likes Received:
    294
    Location:
    jakarta
    masing2 ada kelebihan n kekurangannya :)
    kl web page yg discrap size nya kecil lebih cepet pake curl.
    untuk manipulasi http header lebih gampang pake snoopy.
    kalau pake hosting gratisan curl gak bisa follow location, tp bisa dibikin fungsi untuk pengganti followlocation.
    file get content paling susah kalau mau manipulasi header, susah jg buat mainan cookies.
    untuk scrap multi page dlm 1x scrap lbh gampang pake curl
     
    gembel-intelek likes this.
  13. PenjualBayaran

    PenjualBayaran Ads.id Pro

    Joined:
    Apr 4, 2010
    Messages:
    300
    Likes Received:
    1
    Ane pake Curl+phpQuery gan
     
  14. go.dre.am

    go.dre.am Ads.id Pro

    Joined:
    Jun 4, 2011
    Messages:
    376
    Likes Received:
    61
    Location:
    www.tetuku.com
    bikin polling nih gan? ikutan pollingnya: gw suka pake curl
     
  15. deradja

    deradja Ads.id Pro

    Joined:
    Jan 2, 2012
    Messages:
    315
    Likes Received:
    10
    Location:
    My Lovely Laptop
    favorite ane gak ada masgan... gak pernah maen AGC, gak ngerti... :pusing: mau ngajarin gak ??
     
  16. netrix

    netrix Super Hero

    Joined:
    Jan 5, 2009
    Messages:
    1,494
    Likes Received:
    242
    Location:
    Not Telling
    Setia pke curl kang.. mudah digunakan menurut ane..

    Sent from my GT-P3100 using Tapatalk 2
     
  17. mp3online

    mp3online Super Hero

    Joined:
    Jul 19, 2011
    Messages:
    2,228
    Likes Received:
    294
    Location:
    jakarta
    belajar aja dari script yg udah dishare di forum ini boss. aku dulu waktu msh buta php bikin pake explode berkali2, mana bikinnya pake hp lagi hehe
    sekarang kalau gak kepepet banget ogah ngedit pake hp bikin pusing hehe
     
  18. janganan

    janganan Hero

    Joined:
    Feb 11, 2009
    Messages:
    575
    Likes Received:
    13
    Location:
    Depan Laptop
    kalo ane pake drupal_http_request, soalnya pakenya drupal dan dia pake koneksi socket buat ambil datanya..
     
  19. zackerby

    zackerby Ads.id Starter

    Joined:
    Jan 22, 2012
    Messages:
    55
    Likes Received:
    0
    aduhhh ga kuat nangkep aku masbro, bahasa tingkat dewa ini.
    liatin kode angka ama huruf nya mastah mastah disini jadi pusing saya hihihi
     
  20. gembel-intelek

    gembel-intelek Lurker

    Joined:
    Mar 29, 2009
    Messages:
    4,341
    Likes Received:
    907
    Location:
    New Coral
    ane sendiri biasa pake curl, jarang pake file get content, ga pernah pake snoopy. mungkin kapan2 mau nyobain :D

    dalam scrapping sendiri biasanya ane mulai cari size dari yg paling kecil, kalo masih terpenuhi api biasanya pake api, kalo ga ane cari versi mobilenya.. kalo ga ada juga baru full web
     
    dimasku likes this.

Share This Page