[PATCH 3/4] uclient-fetch: Use HEAD for --spider

Sergey Ponomarev stokito at gmail.com
Mon May 9 14:59:22 PDT 2022


From: Sungbo Eo <mans0n at gorani.run>

In GNU wget the --spider[1] first issues a HEAD request[2], then if HEAD fails, issues a GET request[3].
In uclient, only a GET request is sent. All webservers including uhttpd and BB httpd supports the HEAD.
The patch changes GET to HEAD e.g. get the file size only without downloading first.
This is still not totally compatible with GNU wget because it does not retry with GET if HEAD fails.
Potentially someone may use the --spider to call a GET only API, so they may be affected.
But this is incorrect usage while others may expect that the spider uses HEAD and don't expect a download.

For testing use a CGI script /www/cgi-bin/echo.sh:

#!/bin/sh
CONTENT=$(cat -)
printf "Content-Length: ${#CONTENT}\r\n"
printf "Content-Type: text/html\r\n"
printf "REQUEST_METHOD: $REQUEST_METHOD\r\n"
printf "CONTENT_TYPE: $CONTENT_TYPE\r\n"
printf "CONTENT_LENGTH: $CONTENT_LENGTH\r\n"
printf "\r\n"
printf "$CONTENT"

Then call it:

$ uclient-fetch -O - -q --spider http://localhost:8080/cgi-bin/echo.sh
HTTP/1.0 200 OK
Content-Length: 0
Content-Type: text/html
REQUEST_METHOD: HEAD
CONTENT_TYPE:
CONTENT_LENGTH:

When both post-data and spider options then gnu wget behaves confusing. See a bug [4].
It sets Content-Type: application/x-www-form-urlencoded as for post-data but anyway sends a HEAD request:

$ wget -O - -S -q --post-data="trololo" --spider http://localhost:8080/cgi-bin/echo.shest.sh
HTTP/1.0 200 OK
Content-Length: 7
Content-Type: text/html
REQUEST_METHOD: HEAD
CONTENT_TYPE: application/x-www-form-urlencoded
CONTENT_LENGTH:

Instead, this version will send the request as POST but still skip its response body:

$ uclient-fetch -O - -q --post-data="trololo" --spider http://localhost:8080/cgi-bin/echo.sh
HTTP/1.0 200 OK
Content-Length: 7
Content-Type: text/html
REQUEST_METHOD: POST
CONTENT_TYPE: application/x-www-form-urlencoded
CONTENT_LENGTH:

This would be useful for heavy API calls but we have to wait what GNU wget author will say.
We may change this behaviour later.

[1] https://www.gnu.org/software/wget/manual/wget.html#index-spider
[2] https://httpwg.org/specs/rfc7231.html#HEAD
[3] https://git.savannah.gnu.org/cgit/wget.git/tree/src/http.c#n4304
[4] https://savannah.gnu.org/bugs/index.php?56808

Signed-off-by: Sergey Ponomarev <stokito at gmail.com>
---
 uclient-fetch.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/uclient-fetch.c b/uclient-fetch.c
index f05d6d6..ade40eb 100644
--- a/uclient-fetch.c
+++ b/uclient-fetch.c
@@ -42,6 +42,7 @@
 #endif
 
 static const char *user_agent = "uclient-fetch";
+static const char *method = NULL;
 static const char *post_data;
 static const char *post_file;
 static bool opt_post = false; // if --post-data or --post-file are specified
@@ -338,7 +339,7 @@ static int init_request(struct uclient *cl)
 
 	msg_connecting(cl);
 
-	rc = uclient_http_set_request_type(cl, opt_post ? "POST" : "GET");
+	rc = uclient_http_set_request_type(cl, method);
 	if (rc)
 		return rc;
 
@@ -715,6 +716,15 @@ int main(int argc, char **argv)
 		}
 	}
 
+	if (opt_post) {
+		method = "POST";
+	} else if (no_output) {
+		/* Note: GNU wget --spider sends a HEAD and if it failed repeats with a GET */
+		method = "HEAD";
+	} else {
+		method = "GET";
+	}
+
 	argv += optind;
 	argc -= optind;
 
-- 
2.34.1




More information about the openwrt-devel mailing list