aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--docs/INTERNALS234
-rw-r--r--docs/LIBCURL-STRUCTS248
-rw-r--r--docs/Makefile.am2
3 files changed, 234 insertions, 250 deletions
diff --git a/docs/INTERNALS b/docs/INTERNALS
index b134260c3..4cd63b4e2 100644
--- a/docs/INTERNALS
+++ b/docs/INTERNALS
@@ -37,6 +37,7 @@ Table of Contents
- [hostip.c explained](#hostip)
- [Track Down Memory Leaks](#memoryleak)
- [`multi_socket`](#multi_socket)
+ - [Structs in libcurl](#structs)
<a name="intro"></a>
curl internals
@@ -848,6 +849,235 @@ Track Down Memory Leaks
only `fd_sets` and that is no longer good enough. The changes done to c-ares
are available in c-ares 1.3.1 and later.
+<a name="structs"></a>
+Structs in libcurl
+==================
+
+This section should cover 7.32.0 pretty accurately, but will make sense even
+for older and later versions as things don't change drastically that often.
+
+## SessionHandle
+
+ The SessionHandle handle struct is the one returned to the outside in the
+ external API as a "CURL *". This is usually known as an easy handle in API
+ documentations and examples.
+
+ Information and state that is related to the actual connection is in the
+ 'connectdata' struct. When a transfer is about to be made, libcurl will
+ either create a new connection or re-use an existing one. The particular
+ connectdata that is used by this handle is pointed out by
+ SessionHandle->easy_conn.
+
+ Data and information that regard this particular single transfer is put in
+ the SingleRequest sub-struct.
+
+ When the SessionHandle struct is added to a multi handle, as it must be in
+ order to do any transfer, the ->multi member will point to the `Curl_multi`
+ struct it belongs to. The ->prev and ->next members will then be used by the
+ multi code to keep a linked list of SessionHandle structs that are added to
+ that same multi handle. libcurl always uses multi so ->multi *will* point to
+ a `Curl_multi` when a transfer is in progress.
+
+ ->mstate is the multi state of this particular SessionHandle. When
+ `multi_runsingle()` is called, it will act on this handle according to which
+ state it is in. The mstate is also what tells which sockets to return for a
+ specific SessionHandle when [`curl_multi_fdset()`][12] is called etc.
+
+ The libcurl source code generally use the name 'data' for the variable that
+ points to the SessionHandle.
+
+ When doing multiplexed HTTP/2 transfers, each SessionHandle is associated
+ with an individual stream, sharing the same connectdata struct. Multiplexing
+ makes it even more important to keep things associated with the right thing!
+
+## connectdata
+
+ A general idea in libcurl is to keep connections around in a connection
+ "cache" after they have been used in case they will be used again and then
+ re-use an existing one instead of creating a new as it creates a significant
+ performance boost.
+
+ Each 'connectdata' identifies a single physical connection to a server. If
+ the connection can't be kept alive, the connection will be closed after use
+ and then this struct can be removed from the cache and freed.
+
+ Thus, the same SessionHandle can be used multiple times and each time select
+ another connectdata struct to use for the connection. Keep this in mind, as
+ it is then important to consider if options or choices are based on the
+ connection or the SessionHandle.
+
+ Functions in libcurl will assume that connectdata->data points to the
+ SessionHandle that uses this connection (for the moment).
+
+ As a special complexity, some protocols supported by libcurl require a
+ special disconnect procedure that is more than just shutting down the
+ socket. It can involve sending one or more commands to the server before
+ doing so. Since connections are kept in the connection cache after use, the
+ original SessionHandle may no longer be around when the time comes to shut
+ down a particular connection. For this purpose, libcurl holds a special
+ dummy `closure_handle` SessionHandle in the `Curl_multi` struct to use when
+ needed.
+
+ FTP uses two TCP connections for a typical transfer but it keeps both in
+ this single struct and thus can be considered a single connection for most
+ internal concerns.
+
+ The libcurl source code generally use the name 'conn' for the variable that
+ points to the connectdata.
+
+## Curl_multi
+
+ Internally, the easy interface is implemented as a wrapper around multi
+ interface functions. This makes everything multi interface.
+
+ `Curl_multi` is the multi handle struct exposed as "CURLM *" in external APIs.
+
+ This struct holds a list of SessionHandle structs that have been added to
+ this handle with [`curl_multi_add_handle()`][13]. The start of the list is
+ ->easyp and ->num_easy is a counter of added SessionHandles.
+
+ ->msglist is a linked list of messages to send back when
+ [`curl_multi_info_read()`][14] is called. Basically a node is added to that
+ list when an individual SessionHandle's transfer has completed.
+
+ ->hostcache points to the name cache. It is a hash table for looking up name
+ to IP. The nodes have a limited life time in there and this cache is meant
+ to reduce the time for when the same name is wanted within a short period of
+ time.
+
+ ->timetree points to a tree of SessionHandles, sorted by the remaining time
+ until it should be checked - normally some sort of timeout. Each
+ SessionHandle has one node in the tree.
+
+ ->sockhash is a hash table to allow fast lookups of socket descriptor to
+ which SessionHandle that uses that descriptor. This is necessary for the
+ `multi_socket` API.
+
+ ->conn_cache points to the connection cache. It keeps track of all
+ connections that are kept after use. The cache has a maximum size.
+
+ ->closure_handle is described in the 'connectdata' section.
+
+ The libcurl source code generally use the name 'multi' for the variable that
+ points to the Curl_multi struct.
+
+## Curl_handler
+
+ Each unique protocol that is supported by libcurl needs to provide at least
+ one `Curl_handler` struct. It defines what the protocol is called and what
+ functions the main code should call to deal with protocol specific issues.
+ In general, there's a source file named [protocol].c in which there's a
+ "struct `Curl_handler` `Curl_handler_[protocol]`" declared. In url.c there's
+ then the main array with all individual `Curl_handler` structs pointed to
+ from a single array which is scanned through when a URL is given to libcurl
+ to work with.
+
+ ->scheme is the URL scheme name, usually spelled out in uppercase. That's
+ "HTTP" or "FTP" etc. SSL versions of the protcol need its own `Curl_handler`
+ setup so HTTPS separate from HTTP.
+
+ ->setup_connection is called to allow the protocol code to allocate protocol
+ specific data that then gets associated with that SessionHandle for the rest
+ of this transfer. It gets freed again at the end of the transfer. It will be
+ called before the 'connectdata' for the transfer has been selected/created.
+ Most protocols will allocate its private 'struct [PROTOCOL]' here and assign
+ SessionHandle->req.protop to point to it.
+
+ ->connect_it allows a protocol to do some specific actions after the TCP
+ connect is done, that can still be considered part of the connection phase.
+
+ Some protocols will alter the connectdata->recv[] and connectdata->send[]
+ function pointers in this function.
+
+ ->connecting is similarly a function that keeps getting called as long as the
+ protocol considers itself still in the connecting phase.
+
+ ->do_it is the function called to issue the transfer request. What we call
+ the DO action internally. If the DO is not enough and things need to be kept
+ getting done for the entire DO sequence to complete, ->doing is then usually
+ also provided. Each protocol that needs to do multiple commands or similar
+ for do/doing need to implement their own state machines (see SCP, SFTP,
+ FTP). Some protocols (only FTP and only due to historical reasons) has a
+ separate piece of the DO state called `DO_MORE`.
+
+ ->doing keeps getting called while issuing the transfer request command(s)
+
+ ->done gets called when the transfer is complete and DONE. That's after the
+ main data has been transferred.
+
+ ->do_more gets called during the `DO_MORE` state. The FTP protocol uses this
+ state when setting up the second connection.
+
+ ->`proto_getsock`
+ ->`doing_getsock`
+ ->`domore_getsock`
+ ->`perform_getsock`
+ Functions that return socket information. Which socket(s) to wait for which
+ action(s) during the particular multi state.
+
+ ->disconnect is called immediately before the TCP connection is shutdown.
+
+ ->readwrite gets called during transfer to allow the protocol to do extra
+ reads/writes
+
+ ->defport is the default report TCP or UDP port this protocol uses
+
+ ->protocol is one or more bits in the `CURLPROTO_*` set. The SSL versions
+ have their "base" protocol set and then the SSL variation. Like
+ "HTTP|HTTPS".
+
+ ->flags is a bitmask with additional information about the protocol that will
+ make it get treated differently by the generic engine:
+
+ - `PROTOPT_SSL` - will make it connect and negotiate SSL
+
+ - `PROTOPT_DUAL` - this protocol uses two connections
+
+ - `PROTOPT_CLOSEACTION` - this protocol has actions to do before closing the
+ connection. This flag is no longer used by code, yet still set for a bunch
+ protocol handlers.
+
+ - `PROTOPT_DIRLOCK` - "direction lock". The SSH protocols set this bit to
+ limit which "direction" of socket actions that the main engine will
+ concern itself about.
+
+ - `PROTOPT_NONETWORK` - a protocol that doesn't use network (read file:)
+
+ - `PROTOPT_NEEDSPWD` - this protocol needs a password and will use a default
+ one unless one is provided
+
+ - `PROTOPT_NOURLQUERY` - this protocol can't handle a query part on the URL
+ (?foo=bar)
+
+## conncache
+
+ Is a hash table with connections for later re-use. Each SessionHandle has
+ a pointer to its connection cache. Each multi handle sets up a connection
+ cache that all added SessionHandles share by default.
+
+## Curl_share
+
+ The libcurl share API allocates a `Curl_share` struct, exposed to the
+ external API as "CURLSH *".
+
+ The idea is that the struct can have a set of own versions of caches and
+ pools and then by providing this struct in the `CURLOPT_SHARE` option, those
+ specific SessionHandles will use the caches/pools that this share handle
+ holds.
+
+ Then individual SessionHandle structs can be made to share specific things
+ that they otherwise wouldn't, such as cookies.
+
+ The `Curl_share` struct can currently hold cookies, DNS cache and the SSL
+ session cache.
+
+## CookieInfo
+
+ This is the main cookie struct. It holds all known cookies and related
+ information. Each SessionHandle has its own private CookieInfo even when
+ they are added to a multi handle. They can be made to share cookies by using
+ the share API.
+
[1]: http://curl.haxx.se/libcurl/c/curl_easy_setopt.html
[2]: http://curl.haxx.se/libcurl/c/curl_easy_init.html
@@ -860,4 +1090,6 @@ Track Down Memory Leaks
[9]: http://curl.haxx.se/libcurl/c/curl_multi_setopt.html
[10]: http://curl.haxx.se/libcurl/c/CURLMOPT_TIMERFUNCTION.html
[11]: http://curl.haxx.se/libcurl/c/curl_multi_perform.html
-[12]: http://curl.haxx.se/libcurl/c/curl_multi_fdset.html \ No newline at end of file
+[12]: http://curl.haxx.se/libcurl/c/curl_multi_fdset.html
+[13]: http://curl.haxx.se/libcurl/c/curl_multi_add_handle.html
+[14]: http://curl.haxx.se/libcurl/c/curl_multi_info_read.html
diff --git a/docs/LIBCURL-STRUCTS b/docs/LIBCURL-STRUCTS
deleted file mode 100644
index 11dee8539..000000000
--- a/docs/LIBCURL-STRUCTS
+++ /dev/null
@@ -1,248 +0,0 @@
- _ _ ____ _
- ___| | | | _ \| |
- / __| | | | |_) | |
- | (__| |_| | _ <| |___
- \___|\___/|_| \_\_____|
-
-Structs in libcurl
-
-This document should cover 7.32.0 pretty accurately, but will make sense even
-for older and later versions as things don't change drastically that often.
-
- 1. The main structs in libcurl
- 1.1 SessionHandle
- 1.2 connectdata
- 1.3 Curl_multi
- 1.4 Curl_handler
- 1.5 conncache
- 1.6 Curl_share
- 1.7 CookieInfo
-
-==============================================================================
-
-1. The main structs in libcurl
-
- 1.1 SessionHandle
-
- The SessionHandle handle struct is the one returned to the outside in the
- external API as a "CURL *". This is usually known as an easy handle in API
- documentations and examples.
-
- Information and state that is related to the actual connection is in the
- 'connectdata' struct. When a transfer is about to be made, libcurl will
- either create a new connection or re-use an existing one. The particular
- connectdata that is used by this handle is pointed out by
- SessionHandle->easy_conn.
-
- Data and information that regard this particular single transfer is put in
- the SingleRequest sub-struct.
-
- When the SessionHandle struct is added to a multi handle, as it must be in
- order to do any transfer, the ->multi member will point to the Curl_multi
- struct it belongs to. The ->prev and ->next members will then be used by the
- multi code to keep a linked list of SessionHandle structs that are added to
- that same multi handle. libcurl always uses multi so ->multi *will* point to
- a Curl_multi when a transfer is in progress.
-
- ->mstate is the multi state of this particular SessionHandle. When
- multi_runsingle() is called, it will act on this handle according to which
- state it is in. The mstate is also what tells which sockets to return for a
- specific SessionHandle when curl_multi_fdset() is called etc.
-
- The libcurl source code generally use the name 'data' for the variable that
- points to the SessionHandle.
-
- When doing multiplexed HTTP/2 transfers, each SessionHandle is associated
- with an individual stream, sharing the same connectdata struct. Multiplexing
- makes it even more important to keep things associated with the right thing!
-
- 1.2 connectdata
-
- A general idea in libcurl is to keep connections around in a connection
- "cache" after they have been used in case they will be used again and then
- re-use an existing one instead of creating a new as it creates a significant
- performance boost.
-
- Each 'connectdata' identifies a single physical connection to a server. If
- the connection can't be kept alive, the connection will be closed after use
- and then this struct can be removed from the cache and freed.
-
- Thus, the same SessionHandle can be used multiple times and each time select
- another connectdata struct to use for the connection. Keep this in mind, as
- it is then important to consider if options or choices are based on the
- connection or the SessionHandle.
-
- Functions in libcurl will assume that connectdata->data points to the
- SessionHandle that uses this connection (for the moment).
-
- As a special complexity, some protocols supported by libcurl require a
- special disconnect procedure that is more than just shutting down the
- socket. It can involve sending one or more commands to the server before
- doing so. Since connections are kept in the connection cache after use, the
- original SessionHandle may no longer be around when the time comes to shut
- down a particular connection. For this purpose, libcurl holds a special
- dummy 'closure_handle' SessionHandle in the Curl_multi struct to
-
- FTP uses two TCP connections for a typical transfer but it keeps both in
- this single struct and thus can be considered a single connection for most
- internal concerns.
-
- The libcurl source code generally use the name 'conn' for the variable that
- points to the connectdata.
-
-
- 1.3 Curl_multi
-
- Internally, the easy interface is implemented as a wrapper around multi
- interface functions. This makes everything multi interface.
-
- Curl_multi is the multi handle struct exposed as "CURLM *" in external APIs.
-
- This struct holds a list of SessionHandle structs that have been added to
- this handle with curl_multi_add_handle(). The start of the list is ->easyp
- and ->num_easy is a counter of added SessionHandles.
-
- ->msglist is a linked list of messages to send back when
- curl_multi_info_read() is called. Basically a node is added to that list
- when an individual SessionHandle's transfer has completed.
-
- ->hostcache points to the name cache. It is a hash table for looking up name
- to IP. The nodes have a limited life time in there and this cache is meant
- to reduce the time for when the same name is wanted within a short period of
- time.
-
- ->timetree points to a tree of SessionHandles, sorted by the remaining time
- until it should be checked - normally some sort of timeout. Each
- SessionHandle has one node in the tree.
-
- ->sockhash is a hash table to allow fast lookups of socket descriptor to
- which SessionHandle that uses that descriptor. This is necessary for the
- multi_socket API.
-
- ->conn_cache points to the connection cache. It keeps track of all
- connections that are kept after use. The cache has a maximum size.
-
- ->closure_handle is described in the 'connectdata' section.
-
- The libcurl source code generally use the name 'multi' for the variable that
- points to the Curl_multi struct.
-
-
- 1.4 Curl_handler
-
- Each unique protocol that is supported by libcurl needs to provide at least
- one Curl_handler struct. It defines what the protocol is called and what
- functions the main code should call to deal with protocol specific issues.
- In general, there's a source file named [protocol].c in which there's a
- "struct Curl_handler Curl_handler_[protocol]" declared. In url.c there's
- then the main array with all individual Curl_handler structs pointed to from
- a single array which is scanned through when a URL is given to libcurl to
- work with.
-
- ->scheme is the URL scheme name, usually spelled out in uppercase. That's
- "HTTP" or "FTP" etc. SSL versions of the protcol need its own Curl_handler
- setup so HTTPS separate from HTTP.
-
- ->setup_connection is called to allow the protocol code to allocate protocol
- specific data that then gets associated with that SessionHandle for the rest
- of this transfer. It gets freed again at the end of the transfer. It will be
- called before the 'connectdata' for the transfer has been selected/created.
- Most protocols will allocate its private 'struct [PROTOCOL]' here and assign
- SessionHandle->req.protop to point to it.
-
- ->connect_it allows a protocol to do some specific actions after the TCP
- connect is done, that can still be considered part of the connection phase.
-
- Some protocols will alter the connectdata->recv[] and connectdata->send[]
- function pointers in this function.
-
- ->connecting is similarly a function that keeps getting called as long as the
- protocol considers itself still in the connecting phase.
-
- ->do_it is the function called to issue the transfer request. What we call
- the DO action internally. If the DO is not enough and things need to be kept
- getting done for the entire DO sequence to complete, ->doing is then usually
- also provided. Each protocol that needs to do multiple commands or similar
- for do/doing need to implement their own state machines (see SCP, SFTP,
- FTP). Some protocols (only FTP and only due to historical reasons) has a
- separate piece of the DO state called DO_MORE.
-
- ->doing keeps getting called while issuing the transfer request command(s)
-
- ->done gets called when the transfer is complete and DONE. That's after the
- main data has been transferred.
-
- ->do_more gets called during the DO_MORE state. The FTP protocol uses this
- state when setting up the second connection.
-
- ->proto_getsock
- ->doing_getsock
- ->domore_getsock
- ->perform_getsock
- Functions that return socket information. Which socket(s) to wait for which
- action(s) during the particular multi state.
-
- ->disconnect is called immediately before the TCP connection is shutdown.
-
- ->readwrite gets called during transfer to allow the protocol to do extra
- reads/writes
-
- ->defport is the default report TCP or UDP port this protocol uses
-
- ->protocol is one or more bits in the CURLPROTO_* set. The SSL versions have
- their "base" protocol set and then the SSL variation. Like "HTTP|HTTPS".
-
- ->flags is a bitmask with additional information about the protocol that will
- make it get treated differently by the generic engine:
-
- PROTOPT_SSL - will make it connect and negotiate SSL
-
- PROTOPT_DUAL - this protocol uses two connections
-
- PROTOPT_CLOSEACTION - this protocol has actions to do before closing the
- connection. This flag is no longer used by code, yet still set for a bunch
- protocol handlers.
-
- PROTOPT_DIRLOCK - "direction lock". The SSH protocols set this bit to
- limit which "direction" of socket actions that the main engine will
- concern itself about.
-
- PROTOPT_NONETWORK - a protocol that doesn't use network (read file:)
-
- PROTOPT_NEEDSPWD - this protocol needs a password and will use a default
- one unless one is provided
-
- PROTOPT_NOURLQUERY - this protocol can't handle a query part on the URL
- (?foo=bar)
-
-
- 1.5 conncache
-
- Is a hash table with connections for later re-use. Each SessionHandle has
- a pointer to its connection cache. Each multi handle sets up a connection
- cache that all added SessionHandles share by default.
-
-
- 1.6 Curl_share
-
- The libcurl share API allocates a Curl_share struct, exposed to the external
- API as "CURLSH *".
-
- The idea is that the struct can have a set of own versions of caches and
- pools and then by providing this struct in the CURLOPT_SHARE option, those
- specific SessionHandles will use the caches/pools that this share handle
- holds.
-
- Then individual SessionHandle structs can be made to share specific things
- that they otherwise wouldn't, such as cookies.
-
- The Curl_share struct can currently hold cookies, DNS cache and the SSL
- session cache.
-
-
- 1.7 CookieInfo
-
- This is the main cookie struct. It holds all known cookies and related
- information. Each SessionHandle has its own private CookieInfo even when
- they are added to a multi handle. They can be made to share cookies by using
- the share API.
diff --git a/docs/Makefile.am b/docs/Makefile.am
index d4a083274..e3e27d333 100644
--- a/docs/Makefile.am
+++ b/docs/Makefile.am
@@ -37,7 +37,7 @@ EXTRA_DIST = MANUAL BUGS CONTRIBUTE FAQ FEATURES INTERNALS SSLCERTS \
README.win32 RESOURCES TODO TheArtOfHttpScripting THANKS VERSIONS \
KNOWN_BUGS BINDINGS $(man_MANS) $(HTMLPAGES) HISTORY INSTALL \
$(PDFPAGES) LICENSE-MIXING README.netware DISTRO-DILEMMA INSTALL.devcpp \
- MAIL-ETIQUETTE HTTP-COOKIES LIBCURL-STRUCTS SECURITY RELEASE-PROCEDURE \
+ MAIL-ETIQUETTE HTTP-COOKIES SECURITY RELEASE-PROCEDURE \
SSL-PROBLEMS HTTP2.md ROADMAP.md
MAN2HTML= roffit < $< >$@