Session Cookie Management: Maintaining Auth Across Requests
Most authenticated web scraping depends on cookies. Once a server accepts your credentials, it hands back a small token – a session cookie – that proves you are logged in. Send that cookie with every subsequent request and you never need to log in again for the lifetime of that session. Lose it, forget to send it, or let it expire without noticing, and every request after the login page will bounce you right back to a login form. The Python requests library makes cookie management straightforward through its Session object, but there are enough edge cases around persistence, expiration, CSRF tokens, and multi-domain auth that it is worth walking through the full picture. For a comparison of when to use requests versus a full browser, see Python requests vs Selenium for speed and performance.
How Session Cookies Work
At the HTTP level, cookies are nothing more than headers. After a successful login, the server includes a Set-Cookie header in the response. The browser (or your HTTP client) stores this cookie and attaches it to every future request to the same domain via the Cookie header.
1
2
3
4
5
6
7
8
9
POST /login HTTP/1.1
Host: example.com
Content-Type: application/x-www-form-urlencoded
username=admin&password=secret123
HTTP/1.1 302 Found
Set-Cookie: session_id=a7f3c9e1b2d4; Path=/; HttpOnly; Secure
Location: /dashboard
From this point forward, every request to example.com includes:
1
2
3
GET /dashboard HTTP/1.1
Host: example.com
Cookie: session_id=a7f3c9e1b2d4
The server looks up a7f3c9e1b2d4 in its session store, finds the associated user record, and serves the protected content. No cookie, no access.
The Login Flow
The full cookie exchange follows a predictable pattern that repeats across nearly every web application.
flowchart TD
A[Client sends POST<br>with credentials] --> B[Server validates<br>username + password]
B --> C{Credentials<br>valid?}
C -->|Yes| D[Server creates session<br>in session store]
D --> E[Server sends response<br>with Set-Cookie header]
E --> F[Client stores<br>session cookie]
F --> G[Client sends GET<br>to protected page]
G --> H[Cookie header<br>included automatically]
H --> I[Server looks up<br>session by cookie value]
I --> J[Server returns<br>protected content]
C -->|No| K[Server returns<br>401 or login page]
The critical insight for scraping is that your HTTP client must behave like a browser: store the cookie after login, then attach it to every subsequent request. If you use raw requests.get() and requests.post() calls without a session, each request starts with a blank cookie jar and the server has no idea you already authenticated.
Python requests.Session: Automatic Cookie Persistence
The requests.Session object is the simplest way to handle this. It maintains a cookie jar internally, captures any Set-Cookie headers from responses, and automatically includes stored cookies in future requests.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import requests
session = requests.Session()
# Step 1: Log in
login_data = {
'username': 'admin',
'password': 'secret123'
}
response = session.post('https://example.com/login', data=login_data)
# Step 2: Access protected content -- cookie is sent automatically
dashboard = session.get('https://example.com/dashboard')
print(dashboard.status_code) # 200 if login succeeded
# Step 3: Keep scraping -- session cookie persists
profile = session.get('https://example.com/profile')
orders = session.get('https://example.com/orders')
There is no manual cookie extraction or header manipulation. The session object handles the entire lifecycle. It also persists other state across requests, including custom headers, auth tokens, and connection pools, which makes it faster than creating a new connection for every request.
The Login and Scrape Pattern
A practical authenticated scraper follows this skeleton:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import requests
from bs4 import BeautifulSoup
def create_authenticated_session(login_url, credentials):
"""Log in and return a session with valid cookies."""
session = requests.Session()
# Some sites require you to visit the login page first
# to pick up initial cookies (e.g., CSRF tokens)
session.get(login_url)
response = session.post(login_url, data=credentials)
# Verify login succeeded
if response.url == login_url:
raise Exception('Login failed -- still on login page')
return session
def scrape_protected_page(session, url):
"""Fetch and parse a protected page using an existing session."""
response = session.get(url)
response.raise_for_status()
return BeautifulSoup(response.text, 'html.parser')
# Usage
session = create_authenticated_session(
'https://example.com/login',
{'username': 'admin', 'password': 'secret123'}
)
soup = scrape_protected_page(session, 'https://example.com/dashboard')
items = soup.select('.data-row')
for item in items:
print(item.text.strip())
The key pattern is: create the session once, log in once, then reuse the session object for all subsequent requests. The session carries your authentication state everywhere it goes.

Inspecting Cookies
Sometimes you need to see what cookies the server set, debug authentication issues, or extract a specific cookie value. The session.cookies attribute is a RequestsCookieJar that supports dictionary-like access and iteration.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import requests
session = requests.Session()
session.post('https://example.com/login', data={
'username': 'admin',
'password': 'secret123'
})
# View all cookies
for cookie in session.cookies:
print(f'{cookie.name}: {cookie.value}')
print(f' Domain: {cookie.domain}')
print(f' Path: {cookie.path}')
print(f' Secure: {cookie.secure}')
print(f' Expires: {cookie.expires}')
print()
# Get a specific cookie by name
session_id = session.cookies.get('session_id')
print(f'Session ID: {session_id}')
# Get a cookie for a specific domain
token = session.cookies.get('token', domain='api.example.com')
# Check how many cookies are stored
print(f'Total cookies: {len(session.cookies)}')
You can also convert the cookie jar to a plain dictionary with requests.utils.dict_from_cookiejar(session.cookies) when you need to pass cookies to another tool. If you work with browser-based tools, the same principles apply to Selenium session management for saving cookies and localStorage.
Saving Cookies to Disk
If you are scraping on a schedule, logging in every single run is wasteful and potentially suspicious. Saving cookies to disk lets you restore a valid session without hitting the login endpoint again.
You can use pickle.dump(session.cookies) for the simplest approach, but pickle files are not human-readable and unpickling untrusted data is a security risk. JSON is the better choice for a portable, inspectable format.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import json
import requests
session = requests.Session()
session.post('https://example.com/login', data={
'username': 'admin',
'password': 'secret123'
})
# Save cookies as JSON
cookies_list = []
for cookie in session.cookies:
cookies_list.append({
'name': cookie.name,
'value': cookie.value,
'domain': cookie.domain,
'path': cookie.path,
'secure': cookie.secure,
'expires': cookie.expires
})
with open('cookies.json', 'w') as f:
json.dump(cookies_list, f, indent=2)
# Load cookies from JSON
session2 = requests.Session()
with open('cookies.json', 'r') as f:
cookies_list = json.load(f)
for cookie_data in cookies_list:
session2.cookies.set(
cookie_data['name'],
cookie_data['value'],
domain=cookie_data['domain'],
path=cookie_data['path']
)
response = session2.get('https://example.com/dashboard')
JSON files are easy to inspect, version-control, and share across different tools or languages.
Loading Saved Cookies to Skip Login
The complete pattern for cookie-based session resumption checks whether saved cookies exist and are still valid before falling back to a fresh login.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
import json
import os
import requests
COOKIE_FILE = 'cookies.json'
LOGIN_URL = 'https://example.com/login'
CHECK_URL = 'https://example.com/dashboard'
def save_cookies(session):
cookies_list = []
for cookie in session.cookies:
cookies_list.append({
'name': cookie.name,
'value': cookie.value,
'domain': cookie.domain,
'path': cookie.path,
'secure': cookie.secure,
'expires': cookie.expires
})
with open(COOKIE_FILE, 'w') as f:
json.dump(cookies_list, f, indent=2)
def load_cookies(session):
if not os.path.exists(COOKIE_FILE):
return False
with open(COOKIE_FILE, 'r') as f:
cookies_list = json.load(f)
for c in cookies_list:
session.cookies.set(c['name'], c['value'],
domain=c['domain'], path=c['path'])
return True
def is_logged_in(session):
"""Check if the current session is still authenticated."""
response = session.get(CHECK_URL, allow_redirects=False)
# A redirect to the login page means the session expired
return response.status_code == 200
def get_authenticated_session():
session = requests.Session()
# Try to load existing cookies
if load_cookies(session) and is_logged_in(session):
print('Resumed session from saved cookies')
return session
# Fall back to fresh login
print('Logging in...')
session.cookies.clear()
session.post(LOGIN_URL, data={
'username': 'admin',
'password': 'secret123'
})
save_cookies(session)
return session
session = get_authenticated_session()
page = session.get('https://example.com/protected-data')
print(page.status_code)
This approach reduces login frequency, which is polite to the server and less likely to trigger rate limiting or account lockouts.

Cookie Expiration and Re-Authentication
Session cookies do not last forever. Servers assign expiration times, and some cookies are session-only, meaning they vanish when the “browser” closes (which for a script means when the process exits). Your scraper needs to handle the moment a session becomes invalid.
flowchart TD
A[Start scraping run] --> B{Saved cookies<br>exist?}
B -->|No| C[Perform login]
B -->|Yes| D[Load cookies<br>into session]
D --> E[Test request to<br>protected page]
E --> F{Response<br>status?}
F -->|200 OK| G[Continue scraping<br>with loaded cookies]
F -->|302 Redirect<br>to login| H[Cookies expired]
F -->|401 / 403| H
H --> I[Clear old cookies]
I --> C
C --> J[Save new cookies<br>to disk]
J --> G
A robust implementation wraps requests with automatic re-authentication:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import requests
class AuthenticatedScraper:
def __init__(self, login_url, credentials, max_retries=2):
self.login_url = login_url
self.credentials = credentials
self.max_retries = max_retries
self.session = requests.Session()
self._login()
def _login(self):
response = self.session.post(self.login_url, data=self.credentials)
if response.url == self.login_url:
raise Exception('Login failed')
print(f'Logged in, cookies: {len(self.session.cookies)}')
def _is_auth_failure(self, response):
"""Detect expired sessions."""
if response.status_code in (401, 403):
return True
if response.status_code == 302:
location = response.headers.get('Location', '')
if 'login' in location.lower():
return True
return False
def get(self, url, **kwargs):
"""GET with automatic re-authentication on session expiry."""
kwargs.setdefault('allow_redirects', False)
for attempt in range(self.max_retries):
response = self.session.get(url, **kwargs)
if not self._is_auth_failure(response):
return response
print(f'Session expired, re-authenticating (attempt {attempt + 1})')
self.session.cookies.clear()
self._login()
raise Exception(f'Failed to access {url} after {self.max_retries} re-auth attempts')
# Usage
scraper = AuthenticatedScraper(
'https://example.com/login',
{'username': 'admin', 'password': 'secret123'}
)
# If the session expires mid-scrape, re-auth happens transparently
for page_num in range(1, 50):
response = scraper.get(f'https://example.com/data?page={page_num}')
print(f'Page {page_num}: {response.status_code}')
CSRF Tokens
Many modern web applications include a Cross-Site Request Forgery (CSRF) token in their login forms. The server generates a unique token, embeds it in a hidden form field, and expects it back with the POST request. If your scraper skips this step, the login will fail even with correct credentials.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import requests
from bs4 import BeautifulSoup
session = requests.Session()
# Step 1: GET the login page to retrieve the CSRF token
login_page = session.get('https://example.com/login')
soup = BeautifulSoup(login_page.text, 'html.parser')
# The token is usually in a hidden input field
csrf_token = soup.select_one('input[name="csrf_token"]')['value']
print(f'CSRF token: {csrf_token}')
# Step 2: POST with both credentials and the CSRF token
response = session.post('https://example.com/login', data={
'username': 'admin',
'password': 'secret123',
'csrf_token': csrf_token
})
# Step 3: Now you can scrape authenticated pages
dashboard = session.get('https://example.com/dashboard')
Some frameworks (like Django) also store the CSRF token in a cookie. In that case you can pull it from the cookie jar instead of parsing HTML:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
session = requests.Session()
session.get('https://example.com/login')
# Django stores the CSRF token in a cookie called csrfmiddlewaretoken
# or csrftoken
csrf_token = session.cookies.get('csrftoken')
# Django also expects the token in a custom header for AJAX requests
session.headers.update({
'X-CSRFToken': csrf_token,
'Referer': 'https://example.com/login'
})
response = session.post('https://example.com/login', data={
'username': 'admin',
'password': 'secret123',
'csrfmiddlewaretoken': csrf_token
})
The important detail is that the GET request to the login page and the POST request must share the same session. The server ties the CSRF token to the session cookie it issued on the GET, so using a different session for each will always fail.
Multi-Domain Auth
Some applications split their services across multiple domains. You might log in at auth.example.com but scrape data from api.example.com and app.example.com. Cookies are scoped to domains, so a cookie set for .example.com will be sent to all subdomains, but a cookie set for auth.example.com will only be sent to that exact subdomain.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import requests
session = requests.Session()
# Log in at the auth domain
session.post('https://auth.example.com/login', data={
'username': 'admin',
'password': 'secret123'
})
# Check which cookies we got and their domains
for cookie in session.cookies:
print(f'{cookie.name} -> {cookie.domain}')
# session_id -> .example.com (shared across subdomains)
# auth_flag -> auth.example.com (auth domain only)
# Requests to api.example.com will include session_id
# but NOT auth_flag
api_data = session.get('https://api.example.com/v1/data')
app_page = session.get('https://app.example.com/dashboard')
For browser-level cookie handling, Playwright’s cookie management at the HTTP level offers similar domain-scoping controls. If the application uses a separate API domain that does not share the cookie scope, you may need to extract a token from the login response and set it manually:
1
2
3
4
5
6
7
8
# Extract a token from the login response for a different domain
token = session.post('https://auth.example.com/login', data=credentials).json().get('api_token')
# Option 1: Set it as a cookie for the other domain
session.cookies.set('api_token', token, domain='api.otherdomain.com')
# Option 2: Use an Authorization header instead
session.headers.update({'Authorization': f'Bearer {token}'})
Complete Example: Authenticated Scraper with Cookie Persistence
Here is a complete, production-ready scraper that ties together every technique from this post: CSRF handling, cookie persistence, expiration detection, and automatic re-authentication.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
import json, os, time, requests
from bs4 import BeautifulSoup
class SessionCookieScraper:
"""Authenticated scraper with cookie persistence and auto re-auth."""
def __init__(self, config):
self.login_url = config['login_url']
self.credentials = config['credentials']
self.cookie_file = config.get('cookie_file', 'session_cookies.json')
self.check_url = config.get('check_url', self.login_url)
self.session = requests.Session()
# _save_cookies / _load_cookies use the JSON pattern shown above
def _extract_csrf_token(self):
"""GET the login page and extract a CSRF token if present."""
response = self.session.get(self.login_url)
soup = BeautifulSoup(response.text, 'html.parser')
for name in ('csrf_token', 'csrfmiddlewaretoken', '_token', 'authenticity_token'):
field = soup.select_one(f'input[name="{name}"]')
if field:
return name, field['value']
csrf_cookie = self.session.cookies.get('csrftoken')
if csrf_cookie:
return 'csrfmiddlewaretoken', csrf_cookie
return None, None
def _login(self):
csrf_name, csrf_value = self._extract_csrf_token()
post_data = dict(self.credentials)
if csrf_name:
post_data[csrf_name] = csrf_value
self.session.post(self.login_url, data=post_data)
self._save_cookies()
def ensure_authenticated(self):
if self._load_cookies() and self._is_authenticated():
return
self.session.cookies.clear()
self._login()
def get(self, url, **kwargs):
"""GET with automatic re-authentication on 401/403."""
response = self.session.get(url, **kwargs)
if response.status_code in (401, 403):
self.session.cookies.clear()
self._login()
response = self.session.get(url, **kwargs)
return response
# Usage
scraper = SessionCookieScraper({
'login_url': 'https://example.com/login',
'credentials': {'username': 'admin', 'password': 'secret123'},
'check_url': 'https://example.com/dashboard',
})
scraper.ensure_authenticated()
for page in range(1, 11):
response = scraper.get(f'https://example.com/data?page={page}')
soup = BeautifulSoup(response.text, 'html.parser')
for item in soup.select('.result-item'):
print(item.text.strip())
time.sleep(1)
Key Takeaways
Session cookie management is not complicated, but it does require attention to a few things that are easy to overlook:
- Always use
requests.Session()instead of barerequests.get()andrequests.post()calls when dealing with authenticated pages. The session object handles cookie storage and transmission automatically. - Visit the login page with a GET before POSTing credentials. This picks up initial cookies and CSRF tokens that the server expects. Our guide on automating web form filling covers more complex login flows that require multi-step interaction.
- Save cookies to disk between runs. JSON is portable and inspectable. Pickle is simpler but less transparent.
- Check for cookie expiration before relying on saved cookies. A single test request to a protected page tells you whether the session is still valid.
- Build re-authentication into your scraper. Sessions expire, servers restart, and cookies get invalidated. A scraper that can detect this and log in again without human intervention is far more reliable than one that crashes.
- Pay attention to cookie domains in multi-domain setups. A cookie scoped to
auth.example.comwill not be sent toapi.example.com.
Cookies are a simple mechanism, but they underpin almost all session management on the web. Handle them correctly in your scraper and you eliminate an entire class of “why is my scraper seeing the login page” bugs.

