No description

Python 99.3%
Shell 0.7%

Find a file

Roman 1b09d773bb docs: restructure agent-doc into Obsidian-style knowledge base		2026-04-13 23:40:41 +02:00
10	Initial commit through Cline Kanban	2026-04-10 21:55:35 +02:00
agent-doc	docs: restructure agent-doc into Obsidian-style knowledge base	2026-04-13 23:40:41 +02:00
app	Kategorie do user prompt, wall time w dashboardach, retry przy parse failure	2026-04-13 11:24:42 +02:00
docker	Ollama format benchmark: JSON vs flat vs YAML frontmatter	2026-04-13 03:02:23 +02:00
docs	Ollama format benchmark: JSON vs flat vs YAML frontmatter	2026-04-13 03:02:23 +02:00
lib	Kategorie do user prompt, wall time w dashboardach, retry przy parse failure	2026-04-13 11:24:42 +02:00
prompts	Kategorie do user prompt, wall time w dashboardach, retry przy parse failure	2026-04-13 11:24:42 +02:00
scripts	Kategorie do user prompt, wall time w dashboardach, retry przy parse failure	2026-04-13 11:24:42 +02:00
Sessions	Kategorie do user prompt, wall time w dashboardach, retry przy parse failure	2026-04-13 11:24:42 +02:00
.gitignore	Add .gitignore entries	2026-04-13 18:55:30 +02:00
AGENTS.md	Zaktualizuj CLAUDE.md i AGENTS.md, usuń nieaktualny NEMOTRON-CASCADE.md	2026-04-13 21:04:32 +02:00
CLAUDE.md	Zaktualizuj CLAUDE.md i AGENTS.md, usuń nieaktualny NEMOTRON-CASCADE.md	2026-04-13 21:04:32 +02:00
README.md	Update README with Streamlit run instructions	2026-04-13 18:59:51 +02:00
requirements.txt	Ollama format benchmark: JSON vs flat vs YAML frontmatter	2026-04-13 03:02:23 +02:00
synt_sfsr.py	Initial commit through Cline Kanban	2026-04-10 21:55:35 +02:00

README.md

Ollama Format Benchmark: JSON vs Flat vs YAML Frontmatter

Benchmark porównujący 3 formaty ekstrakcji metadanych z artykułów HTML za pomocą Ollama.

Cel

Zmierzyć który format wyjściowy (JSON, flat XML tags, YAML frontmatter) daje najlepszy stosunek:

Prędkość — tokens/s, wall time, TTFT
Jakość — poprawność parsowania, kompletność pól, trafność kategorii, ilość encji

Modele

Model	Ollama name	Architektura	Rozmiar
GLM-4.7-Flash	`glm-4.7-flash:latest`	MoE Transformer+MTP	~19 GB
gpt-oss-20b	`gpt-oss:20b`	MoE MXFP4	~13 GB
Nemotron-3-Nano	`nemotron-3-nano:latest`	Hybrid SSM+MoE	~24 GB
Gemma-4-26B	`gemma4:26b`	MoE Transformer	~17 GB

Szybki start

# 1. Upewnij się że Ollama działa i modele są pobrane
ollama list

# 2. Uruchom benchmark (wszystkie artykuły, 4 modele, 3 formaty, 4 równoległe requesty)
python -u scripts/benchmark_run.py --concurrency 4

# 3. Dashboard
streamlit run app/main.py

Struktura wyników

benchmark_results/TIMESTAMP/
├── articles.json          # metadane artykułów
├── summary.md             # podsumowanie wszystkich formatów
├── json/                  # wyniki dla formatu JSON
│   ├── metrics.csv
│   ├── raw_results.jsonl
│   └── responses/MODEL/ARTICLE_ID.txt
├── flat/
└── frontmatter/

Kluczowe flagi

Flaga	Opis
`--limit N`	Ilość artykułów (0 = wszystkie)
`--concurrency N`	Równoległe requesty do Ollama
`--models M [M...]`	Konkretne modele
`--formats F [F...]`	json, flat, frontmatter
`--resume [DIR]`	Wznów run (domyślnie ostatni)

Uruchamianie w tmux

Benchmarks mogą trwać godzinami. Uruchamiaj w tmux:

# Nowa sesja (zostaje żywa po zakończeniu benchmarku)
tmux new-session -d -s benchmark \
  'python3 -u scripts/benchmark_run.py --concurrency 4 2>&1 | tee benchmark.log; exec bash'

# Podgląd na żywo
tmux attach -t benchmark

# Odłącz się (benchmark dalej działa): Ctrl+B, potem D

# Sprawdź log bez wchodzenia do tmux
tail -f benchmark.log

Wznowienie po przerwaniu:

tmux new-session -d -s benchmark \
  'python3 -u scripts/benchmark_run.py --resume --concurrency 4 2>&1 | tee benchmark.log; exec bash'

Dashboard

streamlit run app/main.py
# → http://localhost:8501

Dashboard automatycznie wczytuje najnowszy katalog z benchmark_results/. Widoki: Leaderboard, Performance, Format Comparison, Quality, Raw Output.

Wymagania

Ollama z załadowanymi modelami
Python 3.11+
Zależności: pip install -r requirements.txt