aboutsummaryrefslogtreecommitdiff
path: root/docs/INTERNALS
diff options
context:
space:
mode:
authorDaniel Stenberg <daniel@haxx.se>2016-08-09 12:01:47 +0200
committerDaniel Stenberg <daniel@haxx.se>2016-08-09 12:01:47 +0200
commitd263e83079722586c3a491dd11bf403fdc6fd707 (patch)
treead0180a381b6148a9f61ae3ca5b6171e77cf842a /docs/INTERNALS
parente24597dc8309e5d388bde4dfa79d1941ccc0e7db (diff)
INTERNALS.md: use markdown extension for markdown content
Diffstat (limited to 'docs/INTERNALS')
-rw-r--r--docs/INTERNALS1094
1 files changed, 0 insertions, 1094 deletions
diff --git a/docs/INTERNALS b/docs/INTERNALS
deleted file mode 100644
index 565d9df6c..000000000
--- a/docs/INTERNALS
+++ /dev/null
@@ -1,1094 +0,0 @@
-Table of Contents
-=================
-
- - [Intro](#intro)
- - [git](#git)
- - [Portability](#Portability)
- - [Windows vs Unix](#winvsunix)
- - [Library](#Library)
- - [`Curl_connect`](#Curl_connect)
- - [`Curl_do`](#Curl_do)
- - [`Curl_readwrite`](#Curl_readwrite)
- - [`Curl_done`](#Curl_done)
- - [`Curl_disconnect`](#Curl_disconnect)
- - [HTTP(S)](#http)
- - [FTP](#ftp)
- - [Kerberos](#kerberos)
- - [TELNET](#telnet)
- - [FILE](#file)
- - [SMB](#smb)
- - [LDAP](#ldap)
- - [E-mail](#email)
- - [General](#general)
- - [Persistent Connections](#persistent)
- - [multi interface/non-blocking](#multi)
- - [SSL libraries](#ssl)
- - [Library Symbols](#symbols)
- - [Return Codes and Informationals](#returncodes)
- - [AP/ABI](#abi)
- - [Client](#client)
- - [Memory Debugging](#memorydebug)
- - [Test Suite](#test)
- - [Asynchronous name resolves](#asyncdns)
- - [c-ares](#cares)
- - [`curl_off_t`](#curl_off_t)
- - [curlx](#curlx)
- - [Content Encoding](#contentencoding)
- - [hostip.c explained](#hostip)
- - [Track Down Memory Leaks](#memoryleak)
- - [`multi_socket`](#multi_socket)
- - [Structs in libcurl](#structs)
-
-<a name="intro"></a>
-curl internals
-==============
-
- This project is split in two. The library and the client. The client part
- uses the library, but the library is designed to allow other applications to
- use it.
-
- The largest amount of code and complexity is in the library part.
-
-
-<a name="git"></a>
-git
-===
-
- All changes to the sources are committed to the git repository as soon as
- they're somewhat verified to work. Changes shall be committed as independently
- as possible so that individual changes can be easier spotted and tracked
- afterwards.
-
- Tagging shall be used extensively, and by the time we release new archives we
- should tag the sources with a name similar to the released version number.
-
-<a name="Portability"></a>
-Portability
-===========
-
- We write curl and libcurl to compile with C89 compilers. On 32bit and up
- machines. Most of libcurl assumes more or less POSIX compliance but that's
- not a requirement.
-
- We write libcurl to build and work with lots of third party tools, and we
- want it to remain functional and buildable with these and later versions
- (older versions may still work but is not what we work hard to maintain):
-
-Dependencies
-------------
-
- - OpenSSL 0.9.7
- - GnuTLS 1.2
- - zlib 1.1.4
- - libssh2 0.16
- - c-ares 1.6.0
- - libidn 0.4.1
- - cyassl 2.0.0
- - openldap 2.0
- - MIT Kerberos 1.2.4
- - GSKit V5R3M0
- - NSS 3.14.x
- - axTLS 1.2.7
- - PolarSSL 1.3.0
- - Heimdal ?
- - nghttp2 1.0.0
-
-Operating Systems
------------------
-
- On systems where configure runs, we aim at working on them all - if they have
- a suitable C compiler. On systems that don't run configure, we strive to keep
- curl running fine on:
-
- - Windows 98
- - AS/400 V5R3M0
- - Symbian 9.1
- - Windows CE ?
- - TPF ?
-
-Build tools
------------
-
- When writing code (mostly for generating stuff included in release tarballs)
- we use a few "build tools" and we make sure that we remain functional with
- these versions:
-
- - GNU Libtool 1.4.2
- - GNU Autoconf 2.57
- - GNU Automake 1.7
- - GNU M4 1.4
- - perl 5.004
- - roffit 0.5
- - groff ? (any version that supports "groff -Tps -man [in] [out]")
- - ps2pdf (gs) ?
-
-<a name="winvsunix"></a>
-Windows vs Unix
-===============
-
- There are a few differences in how to program curl the unix way compared to
- the Windows way. The four perhaps most notable details are:
-
- 1. Different function names for socket operations.
-
- In curl, this is solved with defines and macros, so that the source looks
- the same at all places except for the header file that defines them. The
- macros in use are sclose(), sread() and swrite().
-
- 2. Windows requires a couple of init calls for the socket stuff.
-
- That's taken care of by the `curl_global_init()` call, but if other libs
- also do it etc there might be reasons for applications to alter that
- behaviour.
-
- 3. The file descriptors for network communication and file operations are
- not easily interchangeable as in unix.
-
- We avoid this by not trying any funny tricks on file descriptors.
-
- 4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus
- destroying binary data, although you do want that conversion if it is
- text coming through... (sigh)
-
- We set stdout to binary under windows
-
- Inside the source code, We make an effort to avoid `#ifdef [Your OS]`. All
- conditionals that deal with features *should* instead be in the format
- `#ifdef HAVE_THAT_WEIRD_FUNCTION`. Since Windows can't run configure scripts,
- we maintain a `curl_config-win32.h` file in lib directory that is supposed to
- look exactly as a `curl_config.h` file would have looked like on a Windows
- machine!
-
- Generally speaking: always remember that this will be compiled on dozens of
- operating systems. Don't walk on the edge.
-
-<a name="Library"></a>
-Library
-=======
-
- (See [Structs in libcurl](#structs) for the separate section describing all
- major internal structs and their purposes.)
-
- There are plenty of entry points to the library, namely each publicly defined
- function that libcurl offers to applications. All of those functions are
- rather small and easy-to-follow. All the ones prefixed with `curl_easy` are
- put in the lib/easy.c file.
-
- `curl_global_init_()` and `curl_global_cleanup()` should be called by the
- application to initialize and clean up global stuff in the library. As of
- today, it can handle the global SSL initing if SSL is enabled and it can init
- the socket layer on windows machines. libcurl itself has no "global" scope.
-
- All printf()-style functions use the supplied clones in lib/mprintf.c. This
- makes sure we stay absolutely platform independent.
-
- [ `curl_easy_init()`][2] allocates an internal struct and makes some
- initializations. The returned handle does not reveal internals. This is the
- 'Curl_easy' struct which works as an "anchor" struct for all `curl_easy`
- functions. All connections performed will get connect-specific data allocated
- that should be used for things related to particular connections/requests.
-
- [`curl_easy_setopt()`][1] takes three arguments, where the option stuff must
- be passed in pairs: the parameter-ID and the parameter-value. The list of
- options is documented in the man page. This function mainly sets things in
- the 'Curl_easy' struct.
-
- `curl_easy_perform()` is just a wrapper function that makes use of the multi
- API. It basically calls `curl_multi_init()`, `curl_multi_add_handle()`,
- `curl_multi_wait()`, and `curl_multi_perform()` until the transfer is done
- and then returns.
-
- Some of the most important key functions in url.c are called from multi.c
- when certain key steps are to be made in the transfer operation.
-
-<a name="Curl_connect"></a>
-Curl_connect()
---------------
-
- Analyzes the URL, it separates the different components and connects to the
- remote host. This may involve using a proxy and/or using SSL. The
- `Curl_resolv()` function in lib/hostip.c is used for looking up host names
- (it does then use the proper underlying method, which may vary between
- platforms and builds).
-
- When `Curl_connect` is done, we are connected to the remote site. Then it
- is time to tell the server to get a document/file. `Curl_do()` arranges
- this.
-
- This function makes sure there's an allocated and initiated 'connectdata'
- struct that is used for this particular connection only (although there may
- be several requests performed on the same connect). A bunch of things are
- inited/inherited from the Curl_easy struct.
-
-<a name="Curl_do"></a>
-Curl_do()
----------
-
- `Curl_do()` makes sure the proper protocol-specific function is called. The
- functions are named after the protocols they handle.
-
- The protocol-specific functions of course deal with protocol-specific
- negotiations and setup. They have access to the `Curl_sendf()` (from
- lib/sendf.c) function to send printf-style formatted data to the remote
- host and when they're ready to make the actual file transfer they call the
- `Curl_Transfer()` function (in lib/transfer.c) to setup the transfer and
- returns.
-
- If this DO function fails and the connection is being re-used, libcurl will
- then close this connection, setup a new connection and re-issue the DO
- request on that. This is because there is no way to be perfectly sure that
- we have discovered a dead connection before the DO function and thus we
- might wrongly be re-using a connection that was closed by the remote peer.
-
- Some time during the DO function, the `Curl_setup_transfer()` function must
- be called with some basic info about the upcoming transfer: what socket(s)
- to read/write and the expected file transfer sizes (if known).
-
-<a name="Curl_readwrite"></a>
-Curl_readwrite()
-----------------
-
- Called during the transfer of the actual protocol payload.
-
- During transfer, the progress functions in lib/progress.c are called at a
- frequent interval (or at the user's choice, a specified callback might get
- called). The speedcheck functions in lib/speedcheck.c are also used to
- verify that the transfer is as fast as required.
-
-<a name="Curl_done"></a>
-Curl_done()
------------
-
- Called after a transfer is done. This function takes care of everything
- that has to be done after a transfer. This function attempts to leave
- matters in a state so that `Curl_do()` should be possible to call again on
- the same connection (in a persistent connection case). It might also soon
- be closed with `Curl_disconnect()`.
-
-<a name="Curl_disconnect"></a>
-Curl_disconnect()
------------------
-
- When doing normal connections and transfers, no one ever tries to close any
- connections so this is not normally called when `curl_easy_perform()` is
- used. This function is only used when we are certain that no more transfers
- is going to be made on the connection. It can be also closed by force, or
- it can be called to make sure that libcurl doesn't keep too many
- connections alive at the same time.
-
- This function cleans up all resources that are associated with a single
- connection.
-
-<a name="http"></a>
-HTTP(S)
-=======
-
- HTTP offers a lot and is the protocol in curl that uses the most lines of
- code. There is a special file (lib/formdata.c) that offers all the multipart
- post functions.
-
- base64-functions for user+password stuff (and more) is in (lib/base64.c) and
- all functions for parsing and sending cookies are found in (lib/cookie.c).
-
- HTTPS uses in almost every means the same procedure as HTTP, with only two
- exceptions: the connect procedure is different and the function used to read
- or write from the socket is different, although the latter fact is hidden in
- the source by the use of `Curl_read()` for reading and `Curl_write()` for
- writing data to the remote server.
-
- `http_chunks.c` contains functions that understands HTTP 1.1 chunked transfer
- encoding.
-
- An interesting detail with the HTTP(S) request, is the `Curl_add_buffer()`
- series of functions we use. They append data to one single buffer, and when
- the building is done the entire request is sent off in one single write. This
- is done this way to overcome problems with flawed firewalls and lame servers.
-
-<a name="ftp"></a>
-FTP
-===
-
- The `Curl_if2ip()` function can be used for getting the IP number of a
- specified network interface, and it resides in lib/if2ip.c.
-
- `Curl_ftpsendf()` is used for sending FTP commands to the remote server. It
- was made a separate function to prevent us programmers from forgetting that
- they must be CRLF terminated. They must also be sent in one single write() to
- make firewalls and similar happy.
-
-<a name="kerberos"></a>
-Kerberos
---------
-
- Kerberos support is mainly in lib/krb5.c and lib/security.c but also
- `curl_sasl_sspi.c` and `curl_sasl_gssapi.c` for the email protocols and
- `socks_gssapi.c` and `socks_sspi.c` for SOCKS5 proxy specifics.
-
-<a name="telnet"></a>
-TELNET
-======
-
- Telnet is implemented in lib/telnet.c.
-
-<a name="file"></a>
-FILE
-====
-
- The file:// protocol is dealt with in lib/file.c.
-
-<a name="smb"></a>
-SMB
-===
-
- The smb:// protocol is dealt with in lib/smb.c.
-
-<a name="ldap"></a>
-LDAP
-====
-
- Everything LDAP is in lib/ldap.c and lib/openldap.c
-
-<a name="email"></a>
-E-mail
-======
-
- The e-mail related source code is in lib/imap.c, lib/pop3.c and lib/smtp.c.
-
-<a name="general"></a>
-General
-=======
-
- URL encoding and decoding, called escaping and unescaping in the source code,
- is found in lib/escape.c.
-
- While transferring data in Transfer() a few functions might get used.
- `curl_getdate()` in lib/parsedate.c is for HTTP date comparisons (and more).
-
- lib/getenv.c offers `curl_getenv()` which is for reading environment
- variables in a neat platform independent way. That's used in the client, but
- also in lib/url.c when checking the proxy environment variables. Note that
- contrary to the normal unix getenv(), this returns an allocated buffer that
- must be free()ed after use.
-
- lib/netrc.c holds the .netrc parser
-
- lib/timeval.c features replacement functions for systems that don't have
- gettimeofday() and a few support functions for timeval conversions.
-
- A function named `curl_version()` that returns the full curl version string
- is found in lib/version.c.
-
-<a name="persistent"></a>
-Persistent Connections
-======================
-
- The persistent connection support in libcurl requires some considerations on
- how to do things inside of the library.
-
- - The 'Curl_easy' struct returned in the [`curl_easy_init()`][2] call
- must never hold connection-oriented data. It is meant to hold the root data
- as well as all the options etc that the library-user may choose.
-
- - The 'Curl_easy' struct holds the "connection cache" (an array of
- pointers to 'connectdata' structs).
-
- - This enables the 'curl handle' to be reused on subsequent transfers.
-
- - When libcurl is told to perform a transfer, it first checks for an already
- existing connection in the cache that we can use. Otherwise it creates a
- new one and adds that the cache. If the cache is full already when a new
- connection is added added, it will first close the oldest unused one.
-
- - When the transfer operation is complete, the connection is left
- open. Particular options may tell libcurl not to, and protocols may signal
- closure on connections and then they won't be kept open of course.
-
- - When `curl_easy_cleanup()` is called, we close all still opened connections,
- unless of course the multi interface "owns" the connections.
-
- The curl handle must be re-used in order for the persistent connections to
- work.
-
-<a name="multi"></a>
-multi interface/non-blocking
-============================
-
- The multi interface is a non-blocking interface to the library. To make that
- interface work as good as possible, no low-level functions within libcurl
- must be written to work in a blocking manner. (There are still a few spots
- violating this rule.)
-
- One of the primary reasons we introduced c-ares support was to allow the name
- resolve phase to be perfectly non-blocking as well.
-
- The FTP and the SFTP/SCP protocols are examples of how we adapt and adjust
- the code to allow non-blocking operations even on multi-stage command-
- response protocols. They are built around state machines that return when
- they would otherwise block waiting for data. The DICT, LDAP and TELNET
- protocols are crappy examples and they are subject for rewrite in the future
- to better fit the libcurl protocol family.
-
-<a name="ssl"></a>
-SSL libraries
-=============
-
- Originally libcurl supported SSLeay for SSL/TLS transports, but that was then
- extended to its successor OpenSSL but has since also been extended to several
- other SSL/TLS libraries and we expect and hope to further extend the support
- in future libcurl versions.
-
- To deal with this internally in the best way possible, we have a generic SSL
- function API as provided by the vtls/vtls.[ch] system, and they are the only
- SSL functions we must use from within libcurl. vtls is then crafted to use
- the appropriate lower-level function calls to whatever SSL library that is in
- use. For example vtls/openssl.[ch] for the OpenSSL library.
-
-<a name="symbols"></a>
-Library Symbols
-===============
-
- All symbols used internally in libcurl must use a `Curl_` prefix if they're
- used in more than a single file. Single-file symbols must be made static.
- Public ("exported") symbols must use a `curl_` prefix. (There are exceptions,
- but they are to be changed to follow this pattern in future versions.) Public
- API functions are marked with `CURL_EXTERN` in the public header files so
- that all others can be hidden on platforms where this is possible.
-
-<a name="returncodes"></a>
-Return Codes and Informationals
-===============================
-
- I've made things simple. Almost every function in libcurl returns a CURLcode,
- that must be `CURLE_OK` if everything is OK or otherwise a suitable error
- code as the curl/curl.h include file defines. The very spot that detects an
- error must use the `Curl_failf()` function to set the human-readable error
- description.
-
- In aiding the user to understand what's happening and to debug curl usage, we
- must supply a fair amount of informational messages by using the
- `Curl_infof()` function. Those messages are only displayed when the user
- explicitly asks for them. They are best used when revealing information that
- isn't otherwise obvious.
-
-<a name="abi"></a>
-API/ABI
-=======
-
- We make an effort to not export or show internals or how internals work, as
- that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI
- for our promise to users.
-
-<a name="client"></a>
-Client
-======
-
- main() resides in `src/tool_main.c`.
-
- `src/tool_hugehelp.c` is automatically generated by the mkhelp.pl perl script
- to display the complete "manual" and the src/tool_urlglob.c file holds the
- functions used for the URL-"globbing" support. Globbing in the sense that the
- {} and [] expansion stuff is there.
-
- The client mostly messes around to setup its 'config' struct properly, then
- it calls the `curl_easy_*()` functions of the library and when it gets back
- control after the `curl_easy_perform()` it cleans up the library, checks
- status and exits.
-
- When the operation is done, the ourWriteOut() function in src/writeout.c may
- be called to report about the operation. That function is using the
- `curl_easy_getinfo()` function to extract useful information from the curl
- session.
-
- It may loop and do all this several times if many URLs were specified on the
- command line or config file.
-
-<a name="memorydebug"></a>
-Memory Debugging
-================
-
- The file lib/memdebug.c contains debug-versions of a few functions. Functions
- such as malloc, free, fopen, fclose, etc that somehow deal with resources
- that might give us problems if we "leak" them. The functions in the memdebug
- system do nothing fancy, they do their normal function and then log
- information about what they just did. The logged data can then be analyzed
- after a complete session,
-
- memanalyze.pl is the perl script present in tests/ that analyzes a log file
- generated by the memory tracking system. It detects if resources are
- allocated but never freed and other kinds of errors related to resource
- management.
-
- Internally, definition of preprocessor symbol DEBUGBUILD restricts code which
- is only compiled for debug enabled builds. And symbol CURLDEBUG is used to
- differentiate code which is _only_ used for memory tracking/debugging.
-
- Use -DCURLDEBUG when compiling to enable memory debugging, this is also
- switched on by running configure with --enable-curldebug. Use -DDEBUGBUILD
- when compiling to enable a debug build or run configure with --enable-debug.
-
- curl --version will list 'Debug' feature for debug enabled builds, and
- will list 'TrackMemory' feature for curl debug memory tracking capable
- builds. These features are independent and can be controlled when running
- the configure script. When --enable-debug is given both features will be
- enabled, unless some restriction prevents memory tracking from being used.
-
-<a name="test"></a>
-Test Suite
-==========
-
- The test suite is placed in its own subdirectory directly off the root in the
- curl archive tree, and it contains a bunch of scripts and a lot of test case
- data.
-
- The main test script is runtests.pl that will invoke test servers like
- httpserver.pl and ftpserver.pl before all the test cases are performed. The
- test suite currently only runs on unix-like platforms.
-
- You'll find a description of the test suite in the tests/README file, and the
- test case data files in the tests/FILEFORMAT file.
-
- The test suite automatically detects if curl was built with the memory
- debugging enabled, and if it was it will detect memory leaks, too.
-
-<a name="asyncdns"></a>
-Asynchronous name resolves
-==========================
-
- libcurl can be built to do name resolves asynchronously, using either the
- normal resolver in a threaded manner or by using c-ares.
-
-<a name="cares"></a>
-[c-ares][3]
-------
-
-### Build libcurl to use a c-ares
-
-1. ./configure --enable-ares=/path/to/ares/install
-2. make
-
-### c-ares on win32
-
- First I compiled c-ares. I changed the default C runtime library to be the
- single-threaded rather than the multi-threaded (this seems to be required to
- prevent linking errors later on). Then I simply build the areslib project
- (the other projects adig/ahost seem to fail under MSVC).
-
- Next was libcurl. I opened lib/config-win32.h and I added a:
- `#define USE_ARES 1`
-
- Next thing I did was I added the path for the ares includes to the include
- path, and the libares.lib to the libraries.
-
- Lastly, I also changed libcurl to be single-threaded rather than
- multi-threaded, again this was to prevent some duplicate symbol errors. I'm
- not sure why I needed to change everything to single-threaded, but when I
- didn't I got redefinition errors for several CRT functions (malloc, stricmp,
- etc.)
-
-<a name="curl_off_t"></a>
-`curl_off_t`
-==========
-
- curl_off_t is a data type provided by the external libcurl include
- headers. It is the type meant to be used for the [`curl_easy_setopt()`][1]
- options that end with LARGE. The type is 64bit large on most modern
- platforms.
-
-curlx
-=====
-
- The libcurl source code offers a few functions by source only. They are not
- part of the official libcurl API, but the source files might be useful for
- others so apps can optionally compile/build with these sources to gain
- additional functions.
-
- We provide them through a single header file for easy access for apps:
- "curlx.h"
-
-`curlx_strtoofft()`
--------------------
- A macro that converts a string containing a number to a curl_off_t number.
- This might use the curlx_strtoll() function which is provided as source
- code in strtoofft.c. Note that the function is only provided if no
- strtoll() (or equivalent) function exist on your platform. If curl_off_t
- is only a 32 bit number on your platform, this macro uses strtol().
-
-`curlx_tvnow()`
----------------
- returns a struct timeval for the current time.
-
-`curlx_tvdiff()`
---------------
- returns the difference between two timeval structs, in number of
- milliseconds.
-
-`curlx_tvdiff_secs()`
----------------------
- returns the same as curlx_tvdiff but with full usec resolution (as a
- double)
-
-Future
-------
-
- Several functions will be removed from the public curl_ name space in a
- future libcurl release. They will then only become available as curlx_
- functions instead. To make the transition easier, we already today provide
- these functions with the curlx_ prefix to allow sources to get built properly
- with the new function names. The functions this concerns are:
-
- - `curlx_getenv`
- - `curlx_strequal`
- - `curlx_strnequal`
- - `curlx_mvsnprintf`
- - `curlx_msnprintf`
- - `curlx_maprintf`
- - `curlx_mvaprintf`
- - `curlx_msprintf`
- - `curlx_mprintf`
- - `curlx_mfprintf`
- - `curlx_mvsprintf`
- - `curlx_mvprintf`
- - `curlx_mvfprintf`
-
-<a name="contentencoding"></a>
-Content Encoding
-================
-
-## About content encodings
-
- [HTTP/1.1][4] specifies that a client may request that a server encode its
- response. This is usually used to compress a response using one of a set of
- commonly available compression techniques. These schemes are 'deflate' (the
- zlib algorithm), 'gzip' and 'compress'. A client requests that the sever
- perform an encoding by including an Accept-Encoding header in the request
- document. The value of the header should be one of the recognized tokens
- 'deflate', ... (there's a way to register new schemes/tokens, see sec 3.5 of
- the spec). A server MAY honor the client's encoding request. When a response
- is encoded, the server includes a Content-Encoding header in the
- response. The value of the Content-Encoding header indicates which scheme was
- used to encode the data.
-
- A client may tell a server that it can understand several different encoding
- schemes. In this case the server may choose any one of those and use it to
- encode the response (indicating which one using the Content-Encoding header).
- It's also possible for a client to attach priorities to different schemes so
- that the server knows which it prefers. See sec 14.3 of RFC 2616 for more
- information on the Accept-Encoding header.
-
-## Supported content encodings
-
- The 'deflate' and 'gzip' content encoding are supported by libcurl. Both
- regular and chunked transfers work fine. The zlib library is required for
- this feature.
-
-## The libcurl interface
-
- To cause libcurl to request a content encoding use:
-
- [`curl_easy_setopt`][1](curl, [`CURLOPT_ACCEPT_ENCODING`][5], string)
-
- where string is the intended value of the Accept-Encoding header.
-
- Currently, libcurl only understands how to process responses that use the
- "deflate" or "gzip" Content-Encoding, so the only values for
- [`CURLOPT_ACCEPT_ENCODING`][5] that will work (besides "identity," which does
- nothing) are "deflate" and "gzip" If a response is encoded using the
- "compress" or methods, libcurl will return an error indicating that the
- response could not be decoded. If <string> is NULL no Accept-Encoding header
- is generated. If <string> is a zero-length string, then an Accept-Encoding
- header containing all supported encodings will be generated.
-
- The [`CURLOPT_ACCEPT_ENCODING`][5] must be set to any non-NULL value for
- content to be automatically decoded. If it is not set and the server still
- sends encoded content (despite not having been asked), the data is returned
- in its raw form and the Content-Encoding type is not checked.
-
-## The curl interface
-
- Use the [--compressed][6] option with curl to cause it to ask servers to
- compress responses using any format supported by curl.
-
-<a name="hostip"></a>
-hostip.c explained
-==================
-
- The main compile-time defines to keep in mind when reading the host*.c source
- file are these:
-
-## `CURLRES_IPV6`
-
- this host has getaddrinfo() and family, and thus we use that. The host may
- not be able to resolve IPv6, but we don't really have to take that into
- account. Hosts that aren't IPv6-enabled have CURLRES_IPV4 defined.
-
-## `CURLRES_ARES`
-
- is defined if libcurl is built to use c-ares for asynchronous name
- resolves. This can be Windows or *nix.
-
-## `CURLRES_THREADED`
-
- is defined if libcurl is built to use threading for asynchronous name
- resolves. The name resolve will be done in a new thread, and the supported
- asynch API will be the same as for ares-builds. This is the default under
- (native) Windows.
-
- If any of the two previous are defined, `CURLRES_ASYNCH` is defined too. If
- libcurl is not built to use an asynchronous resolver, `CURLRES_SYNCH` is
- defined.
-
-## host*.c sources
-
- The host*.c sources files are split up like this:
-
- - hostip.c - method-independent resolver functions and utility functions
- - hostasyn.c - functions for asynchronous name resolves
- - hostsyn.c - functions for synchronous name resolves
- - asyn-ares.c - functions for asynchronous name resolves using c-ares
- - asyn-thread.c - functions for asynchronous name resolves using threads
- - hostip4.c - IPv4 specific functions
- - hostip6.c - IPv6 specific functions
-
- The hostip.h is the single united header file for all this. It defines the
- `CURLRES_*` defines based on the config*.h and curl_setup.h defines.
-
-<a name="memoryleak"></a>
-Track Down Memory Leaks
-=======================
-
-## Single-threaded
-
- Please note that this memory leak system is not adjusted to work in more
- than one thread. If you want/need to use it in a multi-threaded app. Please
- adjust accordingly.
-
-
-## Build
-
- Rebuild libcurl with -DCURLDEBUG (usually, rerunning configure with
- --enable-debug fixes this). 'make clean' first, then 'make' so that all
- files actually are rebuilt properly. It will also make sense to build
- libcurl with the debug option (usually -g to the compiler) so that debugging
- it will be easier if you actually do find a leak in the library.
-
- This will create a library that has memory debugging enabled.
-
-## Modify Your Application
-
- Add a line in your application code:
-
- `curl_memdebug("dump");`
-
- This will make the malloc debug system output a full trace of all resource
- using functions to the given file name. Make sure you rebuild your program
- and that you link with the same libcurl you built for this purpose as
- described above.
-
-## Run Your Application
-
- Run your program as usual. Watch the specified memory trace file grow.
-
- Make your program exit and use the proper libcurl cleanup functions etc. So
- that all non-leaks are returned/freed properly.
-
-## Analyze the Flow
-
- Use the tests/memanalyze.pl perl script to analyze the dump file:
-
- tests/memanalyze.pl dump
-
- This now outputs a report on what resources that were allocated but never
- freed etc. This report is very fine for posting to the list!
-
- If this doesn't produce any output, no leak was detected in libcurl. Then
- the leak is mostly likely to be in your code.
-
-<a name="multi_socket"></a>
-`multi_socket`
-==============
-
- Implementation of the `curl_multi_socket` API
-
- The main ideas of this API are simply:
-
- 1 - The application can use whatever event system it likes as it gets info
- from libcurl about what file descriptors libcurl waits for what action
- on. (The previous API returns `fd_sets` which is very select()-centric).
-
- 2 - When the application discovers action on a single socket, it calls
- libcurl and informs that there was action on this particular socket and
- libcurl can then act on that socket/transfer only and not care about
- any other transfers. (The previous API always had to scan through all
- the existing transfers.)
-
- The idea is that [`curl_multi_socket_action()`][7] calls a given callback
- with information about what socket to wait for what action on, and the
- callback only gets called if the status of that socket has changed.
-
- We also added a timer callback that makes libcurl call the application when
- the timeout value changes, and you set that with [`curl_multi_setopt()`][9]
- and the [`CURLMOPT_TIMERFUNCTION`][10] option. To get this to work,
- Internally, there's an added a struct to each easy handle in which we store
- an "expire time" (if any). The structs are then "splay sorted" so that we
- can add and remove times from the linked list and yet somewhat swiftly
- figure out both how long time there is until the next nearest timer expires
- and which timer (handle) we should take care of now. Of course, the upside
- of all this is that we get a [`curl_multi_timeout()`][8] that should also
- work with old-style applications that use [`curl_multi_perform()`][11].
-
- We created an internal "socket to easy handles" hash table that given
- a socket (file descriptor) return the easy handle that waits for action on
- that socket. This hash is made using the already existing hash code
- (previously only used for the DNS cache).
-
- To make libcurl able to report plain sockets in the socket callback, we had
- to re-organize the internals of the [`curl_multi_fdset()`][12] etc so that
- the conversion from sockets to `fd_sets` for that function is only done in
- the last step before the data is returned. I also had to extend c-ares to
- get a function that can return plain sockets, as that library too returned
- only `fd_sets` and that is no longer good enough. The changes done to c-ares
- are available in c-ares 1.3.1 and later.
-
-<a name="structs"></a>
-Structs in libcurl
-==================
-
-This section should cover 7.32.0 pretty accurately, but will make sense even
-for older and later versions as things don't change drastically that often.
-
-## Curl_easy
-
- The Curl_easy struct is the one returned to the outside in the external API
- as a "CURL *". This is usually known as an easy handle in API documentations
- and examples.
-
- Information and state that is related to the actual connection is in the
- 'connectdata' struct. When a transfer is about to be made, libcurl will
- either create a new connection or re-use an existing one. The particular
- connectdata that is used by this handle is pointed out by
- Curl_easy->easy_conn.
-
- Data and information that regard this particular single transfer is put in
- the SingleRequest sub-struct.
-
- When the Curl_easy struct is added to a multi handle, as it must be in order
- to do any transfer, the ->multi member will point to the `Curl_multi` struct
- it belongs to. The ->prev and ->next members will then be used by the multi
- code to keep a linked list of Curl_easy structs that are added to that same
- multi handle. libcurl always uses multi so ->multi *will* point to a
- `Curl_multi` when a transfer is in progress.
-
- ->mstate is the multi state of this particular Curl_easy. When
- `multi_runsingle()` is called, it will act on this handle according to which
- state it is in. The mstate is also what tells which sockets to return for a
- specific Curl_easy when [`curl_multi_fdset()`][12] is called etc.
-
- The libcurl source code generally use the name 'data' for the variable that
- points to the Curl_easy.
-
- When doing multiplexed HTTP/2 transfers, each Curl_easy is associated with
- an individual stream, sharing the same connectdata struct. Multiplexing
- makes it even more important to keep things associated with the right thing!
-
-## connectdata
-
- A general idea in libcurl is to keep connections around in a connection
- "cache" after they have been used in case they will be used again and then
- re-use an existing one instead of creating a new as it creates a significant
- performance boost.
-
- Each 'connectdata' identifies a single physical connection to a server. If
- the connection can't be kept alive, the connection will be closed after use
- and then this struct can be removed from the cache and freed.
-
- Thus, the same Curl_easy can be used multiple times and each time select
- another connectdata struct to use for the connection. Keep this in mind, as
- it is then important to consider if options or choices are based on the
- connection or the Curl_easy.
-
- Functions in libcurl will assume that connectdata->data points to the
- Curl_easy that uses this connection (for the moment).
-
- As a special complexity, some protocols supported by libcurl require a
- special disconnect procedure that is more than just shutting down the
- socket. It can involve sending one or more commands to the server before
- doing so. Since connections are kept in the connection cache after use, the
- original Curl_easy may no longer be around when the time comes to shut down
- a particular connection. For this purpose, libcurl holds a special dummy
- `closure_handle` Curl_easy in the `Curl_multi` struct to use when needed.
-
- FTP uses two TCP connections for a typical transfer but it keeps both in
- this single struct and thus can be considered a single connection for most
- internal concerns.
-
- The libcurl source code generally use the name 'conn' for the variable that
- points to the connectdata.
-
-## Curl_multi
-
- Internally, the easy interface is implemented as a wrapper around multi
- interface functions. This makes everything multi interface.
-
- `Curl_multi` is the multi handle struct exposed as "CURLM *" in external APIs.
-
- This struct holds a list of Curl_easy structs that have been added to this
- handle with [`curl_multi_add_handle()`][13]. The start of the list is
- ->easyp and ->num_easy is a counter of added Curl_easys.
-
- ->msglist is a linked list of messages to send back when
- [`curl_multi_info_read()`][14] is called. Basically a node is added to that
- list when an individual Curl_easy's transfer has completed.
-
- ->hostcache points to the name cache. It is a hash table for looking up name
- to IP. The nodes have a limited life time in there and this cache is meant
- to reduce the time for when the same name is wanted within a short period of
- time.
-
- ->timetree points to a tree of Curl_easys, sorted by the remaining time
- until it should be checked - normally some sort of timeout. Each Curl_easy
- has one node in the tree.
-
- ->sockhash is a hash table to allow fast lookups of socket descriptor to
- which Curl_easy that uses that descriptor. This is necessary for the
- `multi_socket` API.
-
- ->conn_cache points to the connection cache. It keeps track of all
- connections that are kept after use. The cache has a maximum size.
-
- ->closure_handle is described in the 'connectdata' section.
-
- The libcurl source code generally use the name 'multi' for the variable that
- points to the Curl_multi struct.
-
-## Curl_handler
-
- Each unique protocol that is supported by libcurl needs to provide at least
- one `Curl_handler` struct. It defines what the protocol is called and what
- functions the main code should call to deal with protocol specific issues.
- In general, there's a source file named [protocol].c in which there's a
- "struct `Curl_handler` `Curl_handler_[protocol]`" declared. In url.c there's
- then the main array with all individual `Curl_handler` structs pointed to
- from a single array which is scanned through when a URL is given to libcurl
- to work with.
-
- ->scheme is the URL scheme name, usually spelled out in uppercase. That's
- "HTTP" or "FTP" etc. SSL versions of the protcol need its own `Curl_handler`
- setup so HTTPS separate from HTTP.
-
- ->setup_connection is called to allow the protocol code to allocate protocol
- specific data that then gets associated with that Curl_easy for the rest of
- this transfer. It gets freed again at the end of the transfer. It will be
- called before the 'connectdata' for the transfer has been selected/created.
- Most protocols will allocate its private 'struct [PROTOCOL]' here and assign
- Curl_easy->req.protop to point to it.
-
- ->connect_it allows a protocol to do some specific actions after the TCP
- connect is done, that can still be considered part of the connection phase.
-
- Some protocols will alter the connectdata->recv[] and connectdata->send[]
- function pointers in this function.
-
- ->connecting is similarly a function that keeps getting called as long as the
- protocol considers itself still in the connecting phase.
-
- ->do_it is the function called to issue the transfer request. What we call
- the DO action internally. If the DO is not enough and things need to be kept
- getting done for the entire DO sequence to complete, ->doing is then usually
- also provided. Each protocol that needs to do multiple commands or similar
- for do/doing need to implement their own state machines (see SCP, SFTP,
- FTP). Some protocols (only FTP and only due to historical reasons) has a
- separate piece of the DO state called `DO_MORE`.
-
- ->doing keeps getting called while issuing the transfer request command(s)
-
- ->done gets called when the transfer is complete and DONE. That's after the
- main data has been transferred.
-
- ->do_more gets called during the `DO_MORE` state. The FTP protocol uses this
- state when setting up the second connection.
-
- ->`proto_getsock`
- ->`doing_getsock`
- ->`domore_getsock`
- ->`perform_getsock`
- Functions that return socket information. Which socket(s) to wait for which
- action(s) during the particular multi state.
-
- ->disconnect is called immediately before the TCP connection is shutdown.
-
- ->readwrite gets called during transfer to allow the protocol to do extra
- reads/writes
-
- ->defport is the default report TCP or UDP port this protocol uses
-
- ->protocol is one or more bits in the `CURLPROTO_*` set. The SSL versions
- have their "base" protocol set and then the SSL variation. Like
- "HTTP|HTTPS".
-
- ->flags is a bitmask with additional information about the protocol that will
- make it get treated differently by the generic engine:
-
- - `PROTOPT_SSL` - will make it connect and negotiate SSL
-
- - `PROTOPT_DUAL` - this protocol uses two connections
-
- - `PROTOPT_CLOSEACTION` - this protocol has actions to do before closing the
- connection. This flag is no longer used by code, yet still set for a bunch
- protocol handlers.
-
- - `PROTOPT_DIRLOCK` - "direction lock". The SSH protocols set this bit to
- limit which "direction" of socket actions that the main engine will
- concern itself about.
-
- - `PROTOPT_NONETWORK` - a protocol that doesn't use network (read file:)
-
- - `PROTOPT_NEEDSPWD` - this protocol needs a password and will use a default
- one unless one is provided
-
- - `PROTOPT_NOURLQUERY` - this protocol can't handle a query part on the URL
- (?foo=bar)
-
-## conncache
-
- Is a hash table with connections for later re-use. Each Curl_easy has a
- pointer to its connection cache. Each multi handle sets up a connection
- cache that all added Curl_easys share by default.
-
-## Curl_share
-
- The libcurl share API allocates a `Curl_share` struct, exposed to the
- external API as "CURLSH *".
-
- The idea is that the struct can have a set of own versions of caches and
- pools and then by providing this struct in the `CURLOPT_SHARE` option, those
- specific Curl_easys will use the caches/pools that this share handle
- holds.
-
- Then individual Curl_easy structs can be made to share specific things
- that they otherwise wouldn't, such as cookies.
-
- The `Curl_share` struct can currently hold cookies, DNS cache and the SSL
- session cache.
-
-## CookieInfo
-
- This is the main cookie struct. It holds all known cookies and related
- information. Each Curl_easy has its own private CookieInfo even when
- they are added to a multi handle. They can be made to share cookies by using
- the share API.
-
-
-[1]: https://curl.haxx.se/libcurl/c/curl_easy_setopt.html
-[2]: https://curl.haxx.se/libcurl/c/curl_easy_init.html
-[3]: http://c-ares.haxx.se/
-[4]: https://tools.ietf.org/html/rfc7230 "RFC 7230"
-[5]: https://curl.haxx.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html
-[6]: https://curl.haxx.se/docs/manpage.html#--compressed
-[7]: https://curl.haxx.se/libcurl/c/curl_multi_socket_action.html
-[8]: https://curl.haxx.se/libcurl/c/curl_multi_timeout.html
-[9]: https://curl.haxx.se/libcurl/c/curl_multi_setopt.html
-[10]: https://curl.haxx.se/libcurl/c/CURLMOPT_TIMERFUNCTION.html
-[11]: https://curl.haxx.se/libcurl/c/curl_multi_perform.html
-[12]: https://curl.haxx.se/libcurl/c/curl_multi_fdset.html
-[13]: https://curl.haxx.se/libcurl/c/curl_multi_add_handle.html
-[14]: https://curl.haxx.se/libcurl/c/curl_multi_info_read.html