diff options
author | Daniel Stenberg <daniel@haxx.se> | 2004-07-14 15:33:27 +0000 |
---|---|---|
committer | Daniel Stenberg <daniel@haxx.se> | 2004-07-14 15:33:27 +0000 |
commit | 15a403a98da6fb36e4ebd500f83df50820df01eb (patch) | |
tree | fa83486c628bbdbbcde05f3e55a6ec2b7c14a885 /docs | |
parent | a92b7c1b16ace4bd238b1208f4baeb82482be42f (diff) |
now known as libcurl-tutorial.3
Diffstat (limited to 'docs')
-rw-r--r-- | docs/libcurl-the-guide | 1191 |
1 files changed, 0 insertions, 1191 deletions
diff --git a/docs/libcurl-the-guide b/docs/libcurl-the-guide deleted file mode 100644 index ba2fb9d2f..000000000 --- a/docs/libcurl-the-guide +++ /dev/null @@ -1,1191 +0,0 @@ -$Id$ - _ _ ____ _ - ___| | | | _ \| | - / __| | | | |_) | | - | (__| |_| | _ <| |___ - \___|\___/|_| \_\_____| - -PROGRAMMING WITH LIBCURL - -About this Document - - This document attempts to describe the general principles and some basic - approaches to consider when programming with libcurl. The text will focus - mainly on the C interface but might apply fairly well on other interfaces as - well as they usually follow the C one pretty closely. - - This document will refer to 'the user' as the person writing the source code - that uses libcurl. That would probably be you or someone in your position. - What will be generally referred to as 'the program' will be the collected - source code that you write that is using libcurl for transfers. The program - is outside libcurl and libcurl is outside of the program. - - To get the more details on all options and functions described herein, please - refer to their respective man pages. - -Building - - There are many different ways to build C programs. This chapter will assume a - unix-style build process. If you use a different build system, you can still - read this to get general information that may apply to your environment as - well. - - Compiling the Program - - Your compiler needs to know where the libcurl headers are - located. Therefore you must set your compiler's include path to point to - the directory where you installed them. The 'curl-config'[3] tool can be - used to get this information: - - $ curl-config --cflags - - Linking the Program with libcurl - - When having compiled the program, you need to link your object files to - create a single executable. For that to succeed, you need to link with - libcurl and possibly also with other libraries that libcurl itself depends - on. Like OpenSSL libraries, but even some standard OS libraries may be - needed on the command line. To figure out which flags to use, once again - the 'curl-config' tool comes to the rescue: - - $ curl-config --libs - - SSL or Not - - libcurl can be built and customized in many ways. One of the things that - varies from different libraries and builds is the support for SSL-based - transfers, like HTTPS and FTPS. If OpenSSL was detected properly at - build-time, libcurl will be built with SSL support. To figure out if an - installed libcurl has been built with SSL support enabled, use - 'curl-config' like this: - - $ curl-config --feature - - And if SSL is supported, the keyword 'SSL' will be written to stdout, - possibly together with a few other features that can be on and off on - different libcurls. - - See also the "Features libcurl Provides" further down. - - -Portable Code in a Portable World - - The people behind libcurl have put a considerable effort to make libcurl work - on a large amount of different operating systems and environments. - - You program libcurl the same way on all platforms that libcurl runs on. There - are only very few minor considerations that differs. If you just make sure to - write your code portable enough, you may very well create yourself a very - portable program. libcurl shouldn't stop you from that. - - -Global Preparation - - The program must initialize some of the libcurl functionality globally. That - means it should be done exactly once, no matter how many times you intend to - use the library. Once for your program's entire life time. This is done using - - curl_global_init() - - and it takes one parameter which is a bit pattern that tells libcurl what to - initialize. Using CURL_GLOBAL_ALL will make it initialize all known internal - sub modules, and might be a good default option. The current two bits that - are specified are: - - CURL_GLOBAL_WIN32 which only does anything on Windows machines. When used on - a Windows machine, it'll make libcurl initialize the win32 socket - stuff. Without having that initialized properly, your program cannot use - sockets properly. You should only do this once for each application, so if - your program already does this or of another library in use does it, you - should not tell libcurl to do this as well. - - CURL_GLOBAL_SSL which only does anything on libcurls compiled and built - SSL-enabled. On these systems, this will make libcurl initialize OpenSSL - properly for this application. This is only needed to do once for each - application so if your program or another library already does this, this - bit should not be needed. - - libcurl has a default protection mechanism that detects if curl_global_init() - hasn't been called by the time curl_easy_perform() is called and if that is - the case, libcurl runs the function itself with a guessed bit pattern. Please - note that depending solely on this is not considered nice nor very good. - - When the program no longer uses libcurl, it should call - curl_global_cleanup(), which is the opposite of the init call. It will then - do the reversed operations to cleanup the resources the curl_global_init() - call initialized. - - Repeated calls to curl_global_init() and curl_global_cleanup() should be - avoided. They should only be called once each. - - -Features libcurl Provides - - It is considered best-practice to determine libcurl features run-time rather - than build-time (if possible of course). By calling curl_version_info() and - checking tout he details of the returned struct, your program can figure out - exactly what the currently running libcurl supports. - - -Handle the Easy libcurl - - libcurl first introduced the so called easy interface. All operations in the - easy interface are prefixed with 'curl_easy'. - - Recent libcurl versions also offer the multi interface. More about that - interface, what it is targeted for and how to use it is detailed in a - separate chapter further down. You still need to understand the easy - interface first, so please continue reading for better understanding. - - To use the easy interface, you must first create yourself an easy handle. You - need one handle for each easy session you want to perform. Basically, you - should use one handle for every thread you plan to use for transferring. You - must never share the same handle in multiple threads. - - Get an easy handle with - - easyhandle = curl_easy_init(); - - It returns an easy handle. Using that you proceed to the next step: setting - up your preferred actions. A handle is just a logic entity for the upcoming - transfer or series of transfers. - - You set properties and options for this handle using curl_easy_setopt(). They - control how the subsequent transfer or transfers will be made. Options remain - set in the handle until set again to something different. Alas, multiple - requests using the same handle will use the same options. - - Many of the options you set in libcurl are "strings", pointers to data - terminated with a zero byte. Keep in mind that when you set strings with - curl_easy_setopt(), libcurl will not copy the data. It will merely point to - the data. You MUST make sure that the data remains available for libcurl to - use until finished or until you use the same option again to point to - something else. - - One of the most basic properties to set in the handle is the URL. You set - your preferred URL to transfer with CURLOPT_URL in a manner similar to: - - curl_easy_setopt(easyhandle, CURLOPT_URL, "http://curl.haxx.se/"); - - Let's assume for a while that you want to receive data as the URL identifies - a remote resource you want to get here. Since you write a sort of application - that needs this transfer, I assume that you would like to get the data passed - to you directly instead of simply getting it passed to stdout. So, you write - your own function that matches this prototype: - - size_t write_data(void *buffer, size_t size, size_t nmemb, void *userp); - - You tell libcurl to pass all data to this function by issuing a function - similar to this: - - curl_easy_setopt(easyhandle, CURLOPT_WRITEFUNCTION, write_data); - - You can control what data your function get in the forth argument by setting - another property: - - curl_easy_setopt(easyhandle, CURLOPT_FILE, &internal_struct); - - Using that property, you can easily pass local data between your application - and the function that gets invoked by libcurl. libcurl itself won't touch the - data you pass with CURLOPT_FILE. - - libcurl offers its own default internal callback that'll take care of the - data if you don't set the callback with CURLOPT_WRITEFUNCTION. It will then - simply output the received data to stdout. You can have the default callback - write the data to a different file handle by passing a 'FILE *' to a file - opened for writing with the CURLOPT_FILE option. - - Now, we need to take a step back and have a deep breath. Here's one of those - rare platform-dependent nitpicks. Did you spot it? On some platforms[2], - libcurl won't be able to operate on files opened by the program. Thus, if you - use the default callback and pass in a an open file with CURLOPT_FILE, it - will crash. You should therefore avoid this to make your program run fine - virtually everywhere. - - There are of course many more options you can set, and we'll get back to a - few of them later. Let's instead continue to the actual transfer: - - success = curl_easy_perform(easyhandle); - - The curl_easy_perform() will connect to the remote site, do the necessary - commands and receive the transfer. Whenever it receives data, it calls the - callback function we previously set. The function may get one byte at a time, - or it may get many kilobytes at once. libcurl delivers as much as possible as - often as possible. Your callback function should return the number of bytes - it "took care of". If that is not the exact same amount of bytes that was - passed to it, libcurl will abort the operation and return with an error code. - - When the transfer is complete, the function returns a return code that - informs you if it succeeded in its mission or not. If a return code isn't - enough for you, you can use the CURLOPT_ERRORBUFFER to point libcurl to a - buffer of yours where it'll store a human readable error message as well. - - If you then want to transfer another file, the handle is ready to be used - again. Mind you, it is even preferred that you re-use an existing handle if - you intend to make another transfer. libcurl will then attempt to re-use the - previous - - -Multi-threading issues - - libcurl is completely thread safe, except for two issues: signals and alarm - handlers. Signals are needed for a SIGPIPE handler, and the alarm() Bacall - is used to catch timeouts (mostly during ENS lookup). - - If you are accessing HTTPS or FTPS URLs in a multi-threaded manner, you are - then of course using OpenSSL multi-threaded and it has itself a few - requirements on this. Basilio, you need to provide one or two functions to - allow it to function properly. For all details, see this: - - http://www.openssl.org/docs/crypto/threads.html#DESCRIPTION - - When using multiple threads you should set the CURLOPT_NOSIGNAL option to - TRUE for all handles. Everything will work fine except that timeouts are not - honored during the DNS lookup - which you can work around by building libcurl - with c-ares support. c-ares is a library that provides asynchronous name - resolves. Unfortunately, c-ares does not yet support IPv6. - - Also, note that CURLOPT_DNS_USE_GLOBAL_CACHE is not thread-safe. - -When It Doesn't Work - - There will always be times when the transfer fails for some reason. You might - have set the wrong libcurl option or misunderstood what the libcurl option - actually does, or the remote server might return non-standard replies that - confuse the library which then confuses your program. - - There's one golden rule when these things occur: set the CURLOPT_VERBOSE - option to TRUE. It'll cause the library to spew out the entire protocol - details it sends, some internal info and some received protocol data as well - (especially when using FTP). If you're using HTTP, adding the headers in the - received output to study is also a clever way to get a better understanding - why the server behaves the way it does. Include headers in the normal body - output with CURLOPT_HEADER set TRUE. - - Of course there are bugs left. We need to get to know about them to be able - to fix them, so we're quite dependent on your bug reports! When you do report - suspected bugs in libcurl, please include as much details you possibly can: a - protocol dump that CURLOPT_VERBOSE produces, library version, as much as - possible of your code that uses libcurl, operating system name and version, - compiler name and version etc. - - If CURLOPT_VERBOSE is not enough, you increase the level of debug data your - application receive by using the CURLOPT_DEBUGFUNCTION. - - Getting some in-depth knowledge about the protocols involved is never wrong, - and if you're trying to do funny things, you might very well understand - libcurl and how to use it better if you study the appropriate RFC documents - at least briefly. - - -Upload Data to a Remote Site - - libcurl tries to keep a protocol independent approach to most transfers, thus - uploading to a remote FTP site is very similar to uploading data to a HTTP - server with a PUT request. - - Of course, first you either create an easy handle or you re-use one existing - one. Then you set the URL to operate on just like before. This is the remote - URL, that we now will upload. - - Since we write an application, we most likely want libcurl to get the upload - data by asking us for it. To make it do that, we set the read callback and - the custom pointer libcurl will pass to our read callback. The read callback - should have a prototype similar to: - - size_t function(char *bufptr, size_t size, size_t nitems, void *userp); - - Where bufptr is the pointer to a buffer we fill in with data to upload and - size*nitems is the size of the buffer and therefore also the maximum amount - of data we can return to libcurl in this call. The 'userp' pointer is the - custom pointer we set to point to a struct of ours to pass private data - between the application and the callback. - - curl_easy_setopt(easyhandle, CURLOPT_READFUNCTION, read_function); - - curl_easy_setopt(easyhandle, CURLOPT_INFILE, &filedata); - - Tell libcurl that we want to upload: - - curl_easy_setopt(easyhandle, CURLOPT_UPLOAD, TRUE); - - A few protocols won't behave properly when uploads are done without any prior - knowledge of the expected file size. So, set the upload file size using the - CURLOPT_INFILESIZE_LARGE for all known file sizes like this[1]: - - /* in this example, file_size must be an off_t variable */ - curl_easy_setopt(easyhandle, CURLOPT_INFILESIZE_LARGE, file_size); - - When you call curl_easy_perform() this time, it'll perform all the necessary - operations and when it has invoked the upload it'll call your supplied - callback to get the data to upload. The program should return as much data as - possible in every invoke, as that is likely to make the upload perform as - fast as possible. The callback should return the number of bytes it wrote in - the buffer. Returning 0 will signal the end of the upload. - - -Passwords - - Many protocols use or even require that user name and password are provided - to be able to download or upload the data of your choice. libcurl offers - several ways to specify them. - - Most protocols support that you specify the name and password in the URL - itself. libcurl will detect this and use them accordingly. This is written - like this: - - protocol://user:password@example.com/path/ - - If you need any odd letters in your user name or password, you should enter - them URL encoded, as %XX where XX is a two-digit hexadecimal number. - - libcurl also provides options to set various passwords. The user name and - password as shown embedded in the URL can instead get set with the - CURLOPT_USERPWD option. The argument passed to libcurl should be a char * to - a string in the format "user:password:". In a manner like this: - - curl_easy_setopt(easyhandle, CURLOPT_USERPWD, "myname:thesecret"); - - Another case where name and password might be needed at times, is for those - users who need to authenticate themselves to a proxy they use. libcurl offers - another option for this, the CURLOPT_PROXYUSERPWD. It is used quite similar - to the CURLOPT_USERPWD option like this: - - curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "myname:thesecret"); - - There's a long time unix "standard" way of storing ftp user names and - passwords, namely in the $HOME/.netrc file. The file should be made private - so that only the user may read it (see also the "Security Considerations" - chapter), as it might contain the password in plain text. libcurl has the - ability to use this file to figure out what set of user name and password to - use for a particular host. As an extension to the normal functionality, - libcurl also supports this file for non-FTP protocols such as HTTP. To make - curl use this file, use the CURLOPT_NETRC option: - - curl_easy_setopt(easyhandle, CURLOPT_NETRC, TRUE); - - And a very basic example of how such a .netrc file may look like: - - machine myhost.mydomain.com - login userlogin - password secretword - - All these examples have been cases where the password has been optional, or - at least you could leave it out and have libcurl attempt to do its job - without it. There are times when the password isn't optional, like when - you're using an SSL private key for secure transfers. - - To pass the known private key password to libcurl: - - curl_easy_setopt(easyhandle, CURLOPT_SSLKEYPASSWD, "keypassword"); - - -HTTP Authentication - - The previous chapter showed how to set user name and password for getting - URLs that require authentication. When using the HTTP protocol, there are - many different ways a client can provide those credentials to the server and - you can control what way libcurl will (attempt to) use. The default HTTP - authentication method is called 'Basic', which is sending the name and - password in clear-text in the HTTP request, base64-encoded. This is insecure. - - At the time of this writing libcurl can be built to use: Basic, Digest, NTLM, - Negotiate, GSS-Negotiate and SPNEGO. You can tell libcurl which one to use - with CURLOPT_HTTPAUTH as in: - - curl_easy_setopt(easyhandle, CURLOPT_HTTPAUTH, CURLAUTH_DIGEST); - - And when you send authentication to a proxy, you can also set authentication - type the same way but instead with CURLOPT_PROXYAUTH: - - curl_easy_setopt(easyhandle, CURLOPT_PROXYAUTH, CURLAUTH_NTLM); - - Both these options allow you to set multiple types (by ORing them together), - to make libcurl pick the most secure one out of the types the server/proxy - claims to support. This method does however add a round-trip since libcurl - must first ask the server what it supports: - - curl_easy_setopt(easyhandle, CURLOPT_HTTPAUTH, - CURLAUTH_DIGEST|CURLAUTH_BASIC); - - For convenience, you can use the 'CURLAUTH_ANY' define (instead of a list - with specific types) which allows libcurl to use whatever method it wants. - - When asking for multiple types, libcurl will pick the available one it - considers "best" in its own internal order of preference. - - -HTTP POSTing - - We get many questions regarding how to issue HTTP POSTs with libcurl the - proper way. This chapter will thus include examples using both different - versions of HTTP POST that libcurl supports. - - The first version is the simple POST, the most common version, that most HTML - pages using the <form> tag uses. We provide a pointer to the data and tell - libcurl to post it all to the remote site: - - char *data="name=daniel&project=curl"; - curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDS, data); - curl_easy_setopt(easyhandle, CURLOPT_URL, "http://posthere.com/"); - - curl_easy_perform(easyhandle); /* post away! */ - - Simple enough, huh? Since you set the POST options with the - CURLOPT_POSTFIELDS, this automatically switches the handle to use POST in the - upcoming request. - - Ok, so what if you want to post binary data that also requires you to set the - Content-Type: header of the post? Well, binary posts prevents libcurl from - being able to do strlen() on the data to figure out the size, so therefore we - must tell libcurl the size of the post data. Setting headers in libcurl - requests are done in a generic way, by building a list of our own headers and - then passing that list to libcurl. - - struct curl_slist *headers=NULL; - headers = curl_slist_append(headers, "Content-Type: text/xml"); - - /* post binary data */ - curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDS, binaryptr); - - /* set the size of the postfields data */ - curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDSIZE, 23); - - /* pass our list of custom made headers */ - curl_easy_setopt(easyhandle, CURLOPT_HTTPHEADER, headers); - - curl_easy_perform(easyhandle); /* post away! */ - - curl_slist_free_all(headers); /* free the header list */ - - While the simple examples above cover the majority of all cases where HTTP - POST operations are required, they don't do multi-part formposts. Multi-part - formposts were introduced as a better way to post (possibly large) binary - data and was first documented in the RFC1867. They're called multi-part - because they're built by a chain of parts, each being a single unit. Each - part has its own name and contents. You can in fact create and post a - multi-part formpost with the regular libcurl POST support described above, but - that would require that you build a formpost yourself and provide to - libcurl. To make that easier, libcurl provides curl_formadd(). Using this - function, you add parts to the form. When you're done adding parts, you post - the whole form. - - The following example sets two simple text parts with plain textual contents, - and then a file with binary contents and upload the whole thing. - - struct curl_httppost *post=NULL; - struct curl_httppost *last=NULL; - curl_formadd(&post, &last, - CURLFORM_COPYNAME, "name", - CURLFORM_COPYCONTENTS, "daniel", CURLFORM_END); - curl_formadd(&post, &last, - CURLFORM_COPYNAME, "project", - CURLFORM_COPYCONTENTS, "curl", CURLFORM_END); - curl_formadd(&post, &last, - CURLFORM_COPYNAME, "logotype-image", - CURLFORM_FILECONTENT, "curl.png", CURLFORM_END); - - /* Set the form info */ - curl_easy_setopt(easyhandle, CURLOPT_HTTPPOST, post); - - curl_easy_perform(easyhandle); /* post away! */ - - /* free the post data again */ - curl_formfree(post); - - Multipart formposts are chains of parts using MIME-style separators and - headers. It means that each one of these separate parts get a few headers set - that describe the individual content-type, size etc. To enable your - application to handicraft this formpost even more, libcurl allows you to - supply your own set of custom headers to such an individual form part. You - can of course supply headers to as many parts you like, but this little - example will show how you set headers to one specific part when you add that - to the post handle: - - struct curl_slist *headers=NULL; - headers = curl_slist_append(headers, "Content-Type: text/xml"); - - curl_formadd(&post, &last, - CURLFORM_COPYNAME, "logotype-image", - CURLFORM_FILECONTENT, "curl.xml", - CURLFORM_CONTENTHEADER, headers, - CURLFORM_END); - - curl_easy_perform(easyhandle); /* post away! */ - - curl_formfree(post); /* free post */ - curl_slist_free_all(post); /* free custom header list */ - - Since all options on an easyhandle are "sticky", they remain the same until - changed even if you do call curl_easy_perform(), you may need to tell curl to - go back to a plain GET request if you intend to do such a one as your next - request. You force an easyhandle to back to GET by using the CURLOPT_HTTPGET - option: - - curl_easy_setopt(easyhandle, CURLOPT_HTTPGET, TRUE); - - Just setting CURLOPT_POSTFIELDS to "" or NULL will *not* stop libcurl from - doing a POST. It will just make it POST without any data to send! - - -Showing Progress - - For historical and traditional reasons, libcurl has a built-in progress meter - that can be switched on and then makes it presents a progress meter in your - terminal. - - Switch on the progress meter by, oddly enough, set CURLOPT_NOPROGRESS to - FALSE. This option is set to TRUE by default. - - For most applications however, the built-in progress meter is useless and - what instead is interesting is the ability to specify a progress - callback. The function pointer you pass to libcurl will then be called on - irregular intervals with information about the current transfer. - - Set the progress callback by using CURLOPT_PROGRESSFUNCTION. And pass a - pointer to a function that matches this prototype: - - int progress_callback(void *clientp, - double dltotal, - double dlnow, - double ultotal, - double ulnow); - - If any of the input arguments is unknown, a 0 will be passed. The first - argument, the 'clientp' is the pointer you pass to libcurl with - CURLOPT_PROGRESSDATA. libcurl won't touch it. - - -libcurl with C++ - - There's basically only one thing to keep in mind when using C++ instead of C - when interfacing libcurl: - - "The Callbacks Must Be Plain C" - - So if you want a write callback set in libcurl, you should put it within - 'extern'. Similar to this: - - extern "C" { - size_t write_data(void *ptr, size_t size, size_t nmemb, - void *ourpointer) - { - /* do what you want with the data */ - } - } - - This will of course effectively turn the callback code into C. There won't be - any "this" pointer available etc. - - -Proxies - - What "proxy" means according to Merriam-Webster: "a person authorized to act - for another" but also "the agency, function, or office of a deputy who acts - as a substitute for another". - - Proxies are exceedingly common these days. Companies often only offer - Internet access to employees through their HTTP proxies. Network clients or - user-agents ask the proxy for documents, the proxy does the actual request - and then it returns them. - - libcurl has full support for HTTP proxies, so when a given URL is wanted, - libcurl will ask the proxy for it instead of trying to connect to the actual - host identified in the URL. - - The fact that the proxy is a HTTP proxy puts certain restrictions on what can - actually happen. A requested URL that might not be a HTTP URL will be still - be passed to the HTTP proxy to deliver back to libcurl. This happens - transparently, and an application may not need to know. I say "may", because - at times it is very important to understand that all operations over a HTTP - proxy is using the HTTP protocol. For example, you can't invoke your own - custom FTP commands or even proper FTP directory listings. - - Proxy Options - - To tell libcurl to use a proxy at a given port number: - - curl_easy_setopt(easyhandle, CURLOPT_PROXY, "proxy-host.com:8080"); - - Some proxies require user authentication before allowing a request, and - you pass that information similar to this: - - curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "user:password"); - - If you want to, you can specify the host name only in the CURLOPT_PROXY - option, and set the port number separately with CURLOPT_PROXYPORT. - - Environment Variables - - libcurl automatically checks and uses a set of environment variables to - know what proxies to use for certain protocols. The names of the variables - are following an ancient de facto standard and are built up as - "[protocol]_proxy" (note the lower casing). Which makes the variable - 'http_proxy' checked for a name of a proxy to use when the input URL is - HTTP. Following the same rule, the variable named 'ftp_proxy' is checked - for FTP URLs. Again, the proxies are always HTTP proxies, the different - names of the variables simply allows different HTTP proxies to be used. - - The proxy environment variable contents should be in the format - "[protocol://][user:password]machine[:port]". Where the protocol:// part - is simply ignored if present (so http://proxy and bluerk://proxy will do - the same) and the optional port number specifies on which port the proxy - operates on the host. If not specified, the internal default port number - will be used and that is most likely *not* the one you would like it to - be. - - There are two special environment variables. 'all_proxy' is what sets - proxy for any URL in case the protocol specific variable wasn't set, and - 'no_proxy' defines a list of hosts that should not use a proxy even though - a variable may say so. If 'no_proxy' is a plain asterisk ("*") it matches - all hosts. - - SSL and Proxies - - SSL is for secure point-to-point connections. This involves strong - encryption and similar things, which effectively makes it impossible for a - proxy to operate as a "man in between" which the proxy's task is, as - previously discussed. Instead, the only way to have SSL work over a HTTP - proxy is to ask the proxy to tunnel trough everything without being able - to check or fiddle with the traffic. - - Opening an SSL connection over a HTTP proxy is therefor a matter of asking - the proxy for a straight connection to the target host on a specified - port. This is made with the HTTP request CONNECT. ("please mr proxy, - connect me to that remote host"). - - Because of the nature of this operation, where the proxy has no idea what - kind of data that is passed in and out through this tunnel, this breaks - some of the very few advantages that come from using a proxy, such as - caching. Many organizations prevent this kind of tunneling to other - destination port numbers than 443 (which is the default HTTPS port - number). - - Tunneling Through Proxy - - As explained above, tunneling is required for SSL to work and often even - restricted to the operation intended for SSL; HTTPS. - - This is however not the only time proxy-tunneling might offer benefits to - you or your application. - - As tunneling opens a direct connection from your application to the remote - machine, it suddenly also re-introduces the ability to do non-HTTP - operations over a HTTP proxy. You can in fact use things such as FTP - upload or FTP custom commands this way. - - Again, this is often prevented by the administrators of proxies and is - rarely allowed. - - Tell libcurl to use proxy tunneling like this: - - curl_easy_setopt(easyhandle, CURLOPT_HTTPPROXYTUNNEL, TRUE); - - In fact, there might even be times when you want to do plain HTTP - operations using a tunnel like this, as it then enables you to operate on - the remote server instead of asking the proxy to do so. libcurl will not - stand in the way for such innovative actions either! - - Proxy Auto-Config - - Netscape first came up with this. It is basically a web page (usually - using a .pac extension) with a javascript that when executed by the - browser with the requested URL as input, returns information to the - browser on how to connect to the URL. The returned information might be - "DIRECT" (which means no proxy should be used), "PROXY host:port" (to tell - the browser where the proxy for this particular URL is) or "SOCKS - host:port" (to direct the browser to a SOCKS proxy). - - libcurl has no means to interpret or evaluate javascript and thus it - doesn't support this. If you get yourself in a position where you face - this nasty invention, the following advice have been mentioned and used in - the past: - - - Depending on the javascript complexity, write up a script that - translates it to another language and execute that. - - - Read the javascript code and rewrite the same logic in another language. - - - Implement a javascript interpreted, people have successfully used the - Mozilla javascript engine in the past. - - - Ask your admins to stop this, for a static proxy setup or similar. - - -Persistence Is The Way to Happiness - - Re-cycling the same easy handle several times when doing multiple requests is - the way to go. - - After each single curl_easy_perform() operation, libcurl will keep the - connection alive and open. A subsequent request using the same easy handle to - the same host might just be able to use the already open connection! This - reduces network impact a lot. - - Even if the connection is dropped, all connections involving SSL to the same - host again, will benefit from libcurl's session ID cache that drastically - reduces re-connection time. - - FTP connections that are kept alive saves a lot of time, as the command- - response round-trips are skipped, and also you don't risk getting blocked - without permission to login again like on many FTP servers only allowing N - persons to be logged in at the same time. - - libcurl caches DNS name resolving results, to make lookups of a previously - looked up name a lot faster. - - Other interesting details that improve performance for subsequent requests - may also be added in the future. - - Each easy handle will attempt to keep the last few connections alive for a - while in case they are to be used again. You can set the size of this "cache" - with the CURLOPT_MAXCONNECTS option. Default is 5. It is very seldom any - point in changing this value, and if you think of changing this it is often - just a matter of thinking again. - - When the connection cache gets filled, libcurl must close an existing - connection in order to get room for the new one. To know which connection to - close, libcurl uses a "close policy" that you can affect with the - CURLOPT_CLOSEPOLICY option. There's only two polices implemented as of this - writing (libcurl 7.9.4) and they are: - - CURLCLOSEPOLICY_LEAST_RECENTLY_USED simply close the one that hasn't been - used for the longest time. This is the default behavior. - - CURLCLOSEPOLICY_OLDEST closes the oldest connection, the one that was - created the longest time ago. - - There are, or at least were, plans to support a close policy that would call - a user-specified callback to let the user be able to decide which connection - to dump when this is necessary and therefor is the CURLOPT_CLOSEFUNCTION an - existing option still today. Nothing ever uses this though and this will not - be used within the foreseeable future either. - - To force your upcoming request to not use an already existing connection (it - will even close one first if there happens to be one alive to the same host - you're about to operate on), you can do that by setting CURLOPT_FRESH_CONNECT - to TRUE. In a similar spirit, you can also forbid the upcoming request to be - "lying" around and possibly get re-used after the request by setting - CURLOPT_FORBID_REUSE to TRUE. - - -HTTP Headers Used by libcurl - - When you use libcurl to do HTTP requests, it'll pass along a series of - headers automatically. It might be good for you to know and understand these - ones. - - Host - - This header is required by HTTP 1.1 and even many 1.0 servers and should - be the name of the server we want to talk to. This includes the port - number if anything but default. - - Pragma - - "no-cache". Tells a possible proxy to not grab a copy from the cache but - to fetch a fresh one. - - Accept: - - "image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*". Cloned from a - browser once a hundred years ago. - - Expect: - - When doing multi-part formposts, libcurl will set this header to - "100-continue" to ask the server for an "OK" message before it proceeds - with sending the data part of the post. - - -Customizing Operations - - There is an ongoing development today where more and more protocols are built - upon HTTP for transport. This has obvious benefits as HTTP is a tested and - reliable protocol that is widely deployed and have excellent proxy-support. - - When you use one of these protocols, and even when doing other kinds of - programming you may need to change the traditional HTTP (or FTP or...) - manners. You may need to change words, headers or various data. - - libcurl is your friend here too. - - CUSTOMREQUEST - - If just changing the actual HTTP request keyword is what you want, like - when GET, HEAD or POST is not good enough for you, CURLOPT_CUSTOMREQUEST - is there for you. It is very simple to use: - - curl_easy_setopt(easyhandle, CURLOPT_CUSTOMREQUEST, "MYOWNRUQUEST"); - - When using the custom request, you change the request keyword of the - actual request you are performing. Thus, by default you make GET request - but you can also make a POST operation (as described before) and then - replace the POST keyword if you want to. You're the boss. - - Modify Headers - - HTTP-like protocols pass a series of headers to the server when doing the - request, and you're free to pass any amount of extra headers that you - think fit. Adding headers are this easy: - - struct curl_slist *headers=NULL; /* init to NULL is important */ - - headers = curl_slist_append(headers, "Hey-server-hey: how are you?"); - headers = curl_slist_append(headers, "X-silly-content: yes"); - - /* pass our list of custom made headers */ - curl_easy_setopt(easyhandle, CURLOPT_HTTPHEADER, headers); - - curl_easy_perform(easyhandle); /* transfer http */ - - curl_slist_free_all(headers); /* free the header list */ - - ... and if you think some of the internally generated headers, such as - Accept: or Host: don't contain the data you want them to contain, you can - replace them by simply setting them too: - - headers = curl_slist_append(headers, "Accept: Agent-007"); - headers = curl_slist_append(headers, "Host: munged.host.line"); - - Delete Headers - - If you replace an existing header with one with no contents, you will - prevent the header from being sent. Like if you want to completely prevent - the "Accept:" header to be sent, you can disable it with code similar to - this: - - headers = curl_slist_append(headers, "Accept:"); - - Both replacing and canceling internal headers should be done with careful - consideration and you should be aware that you may violate the HTTP - protocol when doing so. - - Enforcing chunked transfer-encoding - - By making sure a request uses the custom header "Transfer-Encoding: - chunked" when doing a non-GET HTTP operation, libcurl will switch over to - "chunked" upload, even though the size of the data to upload might be - known. By default, libcurl usually switches over to chunked upload - automatically if the upload data size is unknown. - - HTTP Version - - There's only one aspect left in the HTTP requests that we haven't yet - mentioned how to modify: the version field. All HTTP requests includes the - version number to tell the server which version we support. libcurl speak - HTTP 1.1 by default. Some very old servers don't like getting 1.1-requests - and when dealing with stubborn old things like that, you can tell libcurl - to use 1.0 instead by doing something like this: - - curl_easy_setopt(easyhandle, CURLOPT_HTTP_VERSION, - CURLHTTP_VERSION_1_0); - - FTP Custom Commands - - Not all protocols are HTTP-like, and thus the above may not help you when - you want to make for example your FTP transfers to behave differently. - - Sending custom commands to a FTP server means that you need to send the - commands exactly as the FTP server expects them (RFC959 is a good guide - here), and you can only use commands that work on the control-connection - alone. All kinds of commands that requires data interchange and thus needs - a data-connection must be left to libcurl's own judgment. Also be aware - that libcurl will do its very best to change directory to the target - directory before doing any transfer, so if you change directory (with CWD - or similar) you might confuse libcurl and then it might not attempt to - transfer the file in the correct remote directory. - - A little example that deletes a given file before an operation: - - headers = curl_slist_append(headers, "DELE file-to-remove"); - - /* pass the list of custom commands to the handle */ - curl_easy_setopt(easyhandle, CURLOPT_QUOTE, headers); - - curl_easy_perform(easyhandle); /* transfer ftp data! */ - - curl_slist_free_all(headers); /* free the header list */ - - If you would instead want this operation (or chain of operations) to - happen _after_ the data transfer took place the option to - curl_easy_setopt() would instead be called CURLOPT_POSTQUOTE and used the - exact same way. - - The custom FTP command will be issued to the server in the same order they - are added to the list, and if a command gets an error code returned back - from the server, no more commands will be issued and libcurl will bail out - with an error code (CURLE_FTP_QUOTE_ERROR). Note that if you use - CURLOPT_QUOTE to send commands before a transfer, no transfer will - actually take place when a quote command has failed. - - If you set the CURLOPT_HEADER to true, you will tell libcurl to get - information about the target file and output "headers" about it. The - headers will be in "HTTP-style", looking like they do in HTTP. - - The option to enable headers or to run custom FTP commands may be useful - to combine with CURLOPT_NOBODY. If this option is set, no actual file - content transfer will be performed. - - FTP Custom CUSTOMREQUEST - - If you do what list the contents of a FTP directory using your own defined - FTP command, CURLOPT_CUSTOMREQUEST will do just that. "NLST" is the - default one for listing directories but you're free to pass in your idea - of a good alternative. - - -Cookies Without Chocolate Chips - - In the HTTP sense, a cookie is a name with an associated value. A server - sends the name and value to the client, and expects it to get sent back on - every subsequent request to the server that matches the particular conditions - set. The conditions include that the domain name and path match and that the - cookie hasn't become too old. - - In real-world cases, servers send new cookies to replace existing one to - update them. Server use cookies to "track" users and to keep "sessions". - - Cookies are sent from server to clients with the header Set-Cookie: and - they're sent from clients to servers with the Cookie: header. - - To just send whatever cookie you want to a server, you can use CURLOPT_COOKIE - to set a cookie string like this: - - curl_easy_setopt(easyhandle, CURLOPT_COOKIE, "name1=var1; name2=var2;"); - - In many cases, that is not enough. You might want to dynamically save - whatever cookies the remote server passes to you, and make sure those cookies - are then use accordingly on later requests. - - One way to do this, is to save all headers you receive in a plain file and - when you make a request, you tell libcurl to read the previous headers to - figure out which cookies to use. Set header file to read cookies from with - CURLOPT_COOKIEFILE. - - The CURLOPT_COOKIEFILE option also automatically enables the cookie parser in - libcurl. Until the cookie parser is enabled, libcurl will not parse or - understand incoming cookies and they will just be ignored. However, when the - parser is enabled the cookies will be understood and the cookies will be kept - in memory and used properly in subsequent requests when the same handle is - used. Many times this is enough, and you may not have to save the cookies to - disk at all. Note that the file you specify to CURLOPT_COOKIEFILE doesn't - have to exist to enable the parser, so a common way to just enable the parser - and not read able might be to use a file name you know doesn't exist. - - If you rather use existing cookies that you've previously received with your - Netscape or Mozilla browsers, you can make libcurl use that cookie file as - input. The CURLOPT_COOKIEFILE is used for that too, as libcurl will - automatically find out what kind of file it is and act accordingly. - - The perhaps most advanced cookie operation libcurl offers, is saving the - entire internal cookie state back into a Netscape/Mozilla formatted cookie - file. We call that the cookie-jar. When you set a file name with - CURLOPT_COOKIEJAR, that file name will be created and all received cookies - will be stored in it when curl_easy_cleanup() is called. This enabled cookies - to get passed on properly between multiple handles without any information - getting lost. - - -FTP Peculiarities We Need - - FTP transfers use a second TCP/IP connection for the data transfer. This is - usually a fact you can forget and ignore but at times this fact will come - back to haunt you. libcurl offers several different ways to custom how the - second connection is being made. - - libcurl can either connect to the server a second time or tell the server to - connect back to it. The first option is the default and it is also what works - best for all the people behind firewalls, NATs or IP-masquerading setups. - libcurl then tells the server to open up a new port and wait for a second - connection. This is by default attempted with EPSV first, and if that doesn't - work it tries PASV instead. (EPSV is an extension to the original FTP spec - and does not exist nor work on all FTP servers.) - - You can prevent libcurl from first trying the EPSV command by setting - CURLOPT_FTP_USE_EPSV to FALSE. - - In some cases, you will prefer to have the server connect back to you for the - second connection. This might be when the server is perhaps behind a firewall - or something and only allows connections on a single port. libcurl then - informs the remote server which IP address and port number to connect to. - This is made with the CURLOPT_FTPPORT option. If you set it to "-", libcurl - will use your system's "default IP address". If you want to use a particular - IP, you can set the full IP address, a host name to resolve to an IP address - or even a local network interface name that libcurl will get the IP address - from. - - When doing the "PORT" approach, libcurl will attempt to use the EPRT and the - LPRT before trying PORT, as they work with more protocols. You can disable - this behavior by setting CURLOPT_FTP_USE_EPRT to FALSE. - - -Headers Equal Fun - - Some protocols provide "headers", meta-data separated from the normal - data. These headers are by default not included in the normal data stream, - but you can make them appear in the data stream by setting CURLOPT_HEADER to - TRUE. - - What might be even more useful, is libcurl's ability to separate the headers - from the data and thus make the callbacks differ. You can for example set a - different pointer to pass to the ordinary write callback by setting - CURLOPT_WRITEHEADER. - - Or, you can set an entirely separate function to receive the headers, by - using CURLOPT_HEADERFUNCTION. - - The headers are passed to the callback function one by one, and you can - depend on that fact. It makes it easier for you to add custom header parsers - etc. - - "Headers" for FTP transfers equal all the FTP server responses. They aren't - actually true headers, but in this case we pretend they are! ;-) - - -Post Transfer Information - - [ curl_easy_getinfo ] - - -Security Considerations - - libcurl is in itself not insecure. If used the right way, you can use libcurl - to transfer data pretty safely. - - There are of course many things to consider that may loosen up this - situation: - - Command Lines - - If you use a command line tool (such as curl) that uses libcurl, and you - give option to the tool on the command line those options can very likely - get read by other users of your system when they use 'ps' or other tools - to list currently running processes. - - To avoid this problem, never feed sensitive things to programs using - command line options. - - .netrc - - .netrc is a pretty handy file/feature that allows you to login quickly and - automatically to frequently visited sites. The file contains passwords in - clear text and is a real security risk. In some cases, your .netrc is also - stored in a home directory that is NFS mounted or used on another network - based file system, so the clear text password will fly through your - network every time anyone reads that file! - - To avoid this problem, don't use .netrc files and never store passwords in - plain text anywhere. - - Clear Text Passwords - - Many of the protocols libcurl supports send name and password unencrypted - as clear text (HTTP Basic authentication, FTP, TELNET etc). It is very - easy for anyone on your network or a network nearby yours, to just fire up - a network analyzer tool and eavesdrop on your passwords. Don't let the - fact that HTTP uses base64 encoded passwords fool you. They may not look - readable at a first glance, but they very easily "deciphered" by anyone - within seconds. - - To avoid this problem, use protocols that don't let snoopers see your - password: HTTPS, FTPS and FTP-kerberos are a few examples. HTTP Digest - authentication allows this too, but isn't supported by libcurl as of this - writing. - - Showing What You Do - - On a related issue, be aware that even in situations like when you have - problems with libcurl and ask someone for help, everything you reveal in - order to get best possible help might also impose certain security related - risks. Host names, user names, paths, operating system specifics etc (not - to mention passwords of course) may in fact be used by intruders to gain - additional information of a potential target. - - To avoid this problem, you must of course use your common sense. Often, - you can just edit out the sensitive data or just search/replace your true - information with faked data. - - -Multiple Transfers Using the multi Interface - - The easy interface as described in detail in this document is a synchronous - interface that transfers one file at a time and doesn't return until its - done. - - The multi interface on the other hand, allows your program to transfer - multiple files in both directions at the same time, without forcing you to - use multiple threads. - - To use this interface, you are better off if you first understand the basics - of how to use the easy interface. The multi interface is simply a way to make - multiple transfers at the same time, by adding up multiple easy handles in to - a "multi stack". - - You create the easy handles you want and you set all the options just like - you have been told above, and then you create a multi handle with - curl_multi_init() and add all those easy handles to that multi handle with - curl_multi_add_handle(). - - When you've added the handles you have for the moment (you can still add new - ones at any time), you start the transfers by call curl_multi_perform(). - - curl_multi_perform() is asynchronous. It will only execute as little as - possible and then return back control to your program. It is designed to - never block. If it returns CURLM_CALL_MULTI_PERFORM you better call it again - soon, as that is a signal that it still has local data to send or remote data - to receive. - - The best usage of this interface is when you do a select() on all possible - file descriptors or sockets to know when to call libcurl again. This also - makes it easy for you to wait and respond to actions on your own - application's sockets/handles. You figure out what to select() for by using - curl_multi_fdset(), that fills in a set of fd_set variables for you with the - particular file descriptors libcurl uses for the moment. - - When you then call select(), it'll return when one of the file handles signal - action and you then call curl_multi_perform() to allow libcurl to do what it - wants to do. Take note that libcurl does also feature some time-out code so - we advice you to never use very long timeouts on select() before you call - curl_multi_perform(), which thus should be called unconditionally every now - and then even if none of its file descriptors have signaled ready. Another - precaution you should use: always call curl_multi_fdset() immediately before - the select() call since the current set of file descriptors may change when - calling a curl function. - - If you want to stop the transfer of one of the easy handles in the stack, you - can use curl_multi_remove_handle() to remove individual easy - handles. Remember that easy handles should be curl_easy_cleanup()ed. - - When a transfer within the multi stack has finished, the counter of running - transfers (as filled in by curl_multi_perform()) will decrease. When the - number reaches zero, all transfers are done. - - curl_multi_info_read() can be used to get information about completed - transfers. It then returns the CURLcode for each easy transfer, to allow you - to figure out success on each individual transfer. - - -SSL, Certificates and Other Tricks - - [ seeding, passwords, keys, certificates, ENGINE, ca certs ] - - -Sharing Data Between Easy Handles - - [ fill in ] - ------ -Footnotes: - -[1] = libcurl 7.10.3 and later have the ability to switch over to chunked - Transfer-Encoding in cases were HTTP uploads are done with data of an - unknown size. - -[2] = This happens on Windows machines when libcurl is built and used as a - DLL. However, you can still do this on Windows if you link with a static - library. - -[3] = The curl-config tool is generated at build-time (on unix-like systems) - and should be installed with the 'make install' or similar instruction - that installs the library, header files, man pages etc. |