๐Ÿค– Model Arena Evaluation

Compare two model simulations and choose the better one!

Situation

  • This scenario assumes that the patient called the hospital's administrative office for an outpatient inquiry.
  • Depending on the patient, some may already know their condition because they have diagnostic records from a smaller clinic they previously visited, while others may only know their symptoms since itโ€™s their first visit to the hospital.

Procedure and Explanation

  • First step: Arena! Please choose which of the two simulations you think is better.
  • Second step: After making your choice, a scoring panel will appear. Please rate each simulation on a scale of 1 to 5.
  • Regardless of which one you selected in step 1, please rate each simulation indepentently based on the criteria below.
  • Thereโ€™s no required number! Just do as many as you feel like, and hit the submit button before you leave. You can always come back and do a few more later if youโ€™re bored!.

Evaluation Criteria

If the simulation satisfies all four criteria below, please give it 5 points. Deduct 1 point for each criterion that is not met. If none of the criteria are satisfied, assign a score of 1 point.

  • Patient: The patient expresses symptoms naturally without using excessive medical jargon.
  • Staff : The staff does not diagnose like a doctor or provide treatment based on previous medical records, but instead focuses on asking appropriate questions within the scope of symptom checking, registration, and guidance. The tone of language is empathetic and polite.
  • Flow : The conversation proceeds naturally in the following order โ€” greeting โ†’ patient information collection โ†’ symptom collection โ†’ department assignment โ€” and each stage achieves its intended purpose.
  • Overall: The conversation overall feels realistic, resembling an actual hospital reception scenario (sentences are concise and the closing expressions sound natural).

์ƒํ™ฉ

  • ํ™˜์ž๊ฐ€ ๋ณ‘์› ์›๋ฌด๊ณผ์— ์™ธ๋ž˜ ์ง„๋ฃŒ ๋ฌธ์˜๋ฅผ ์œ„ํ•ด ์ „ํ™”ํ•œ ์ƒํ™ฉ์„ ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.
  • ํ™˜์ž์— ๋”ฐ๋ผ ์ด์ „ ์ž‘์€ ๋ณ‘์›์—์„œ ๋ฐ›์€ ์ง„๋‹จ ๊ธฐ๋ก์„ ๊ฐ€์ง€๊ณ  ์žˆ์–ด ์ž์‹ ์˜ ์งˆ๋ณ‘์„ ์ด๋ฏธ ์•Œ๊ณ  ์žˆ๋Š” ๊ฒฝ์šฐ๋„ ์žˆ๊ณ , ๋ณ‘์›์„ ์ฒ˜์Œ ๋ฐฉ๋ฌธํ•ด ์ฆ์ƒ๋งŒ ์•Œ๊ณ  ์žˆ๋Š” ๊ฒฝ์šฐ๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ณผ์ • ๋ฐ ์„ค๋ช…

  • First Step: Arena! ๋‘ ๊ฐœ์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ค‘์—์„œ, ์‹ค์ œ ๋ณ‘์› ์›๋ฌด๊ณผ ์ง์›๊ณผ์˜ ๋Œ€ํ™”๊ฐ€ ๋” ํ˜„์‹ค์  ๊ฒƒ์„ ์„ ํƒํ•ด์ฃผ์„ธ์š”.
  • Second step: Rate! ์„ ํƒ์ด ๋๋‚˜๋ฉด ์ ์ˆ˜ํŒ์ด ๋œน๋‹ˆ๋‹ค. ๊ฐ๊ฐ์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์— 1~5์  ์‚ฌ์ด ์ ์ˆ˜๋ฅผ ๋งค๊ฒจ์ฃผ์„ธ์š”. ํ‰๊ฐ€ ๊ธฐ์ค€์€ ์•„๋ž˜ ์„น์…˜์„ ์ฐธ๊ณ  ํ•ด์ฃผ์„ธ์š”!
  • 1๋ฒˆ์—์„œ ์–ด๋–ค ๊ฑธ ๊ณจ๋ž๋Š”์ง€์™€ ์ƒ๊ด€์—†์ด, ๋‘ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ๋ณด๊ณ  ์•„๋ž˜ ํ‰๊ฐ€ ๊ธฐ์ค€์— ๋งž์ถฐ์„œ ์ ์ˆ˜๋ฅผ ๋งค๊ฒจ์ฃผ์„ธ์š”.
  • ๊ฐœ์ˆ˜๋Š” ์ƒ๊ด€ ์—†์Šต๋‹ˆ๋‹ค(๊ทธ๋ž˜๋„ ์ตœ์†Œ 10๊ฐœ์ •๋„๋งŒ ๋ถ€ํƒ๋“œ๋ ค์šฉ).. ๊ทธ๋ƒฅ ์ ๋‹นํžˆ ํ•˜์‹ค๋งŒํผ ํ•˜์‹œ๋‹ค๊ฐ€ ๋‹คํ•˜๊ณ  ๋‚˜๊ฐ€์‹œ๊ธฐ ์ „์— ๊ผญ submit ๋ฒ„ํŠผ๋งŒ ๋ˆŒ๋Ÿฌ์ฃผ์‹œ๋ฉด ๋ผ์š”! ๊ทธ๋ฆฌ๊ณ  ๋‚˜์ค‘์— ์‹ฌ์‹ฌํ•˜์‹œ๋ฉด ์ซŒ์ซŒ๋”ฐ๋ฆฌ ํ•ด์ฃผ์…”๋„ ์ข‹์•„์—ฌ..

์ ์ˆ˜ ํ‰๊ฐ€ ๊ธฐ์ค€

์•„๋ž˜์˜ 4๊ฐ€์ง€ ๊ธฐ์ค€์„ ๋ชจ๋‘ ๋งŒ์กฑํ•˜๋ฉด 5์ , ๊ธฐ์ค€์„ ์ถฉ์กฑํ•˜์ง€ ๋ชปํ•œ ํ•ญ๋ชฉ์ด ํ•˜๋‚˜ ์žˆ์„ ๋•Œ๋งˆ๋‹ค 1์ ์”ฉ ๊ฐ์ ํ•ด ์ฃผ์„ธ์š”. ๋ชจ๋“  ๊ธฐ์ค€์„ ๋งŒ์กฑํ•˜์ง€ ๋ชปํ•œ ๊ฒฝ์šฐ๋Š” 1์ ์„ ๋ถ€์—ฌํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

  • Patient: ์ฆ์ƒ ํ˜ธ์†Œ๊ฐ€ ์ž์—ฐ์Šค๋Ÿฝ๊ณ , ๊ณผ๋„ํ•œ ์˜ํ•™ ์ „๋ฌธ ์šฉ์–ด๊ฐ€ ์‚ฌ์šฉ๋˜์ง€ ์•Š์•˜๋Š”์ง€.
  • Staff : ์˜์‚ฌ์ฒ˜๋Ÿผ ์ง„๋‹จํ•˜๊ฑฐ๋‚˜ ํ˜น์ธ ์ด์ „ ์ง„๋‹จ ๊ธฐ๋ก์„ ๋ฐ”ํƒ•์œผ๋กœ ์น˜๋ฃŒ๋ฅผ ํ•˜์ง€ ์•Š๊ณ , ์ฆ์ƒ ํ™•์ธยท์ ‘์ˆ˜ยท์•ˆ๋‚ด ๋ฒ”์œ„ ๋‚ด์—์„œ ์งˆ์˜๋ฅผ ์ž˜ ์ˆ˜ํ–‰ํ–ˆ๋Š”์ง€, ์–ธ์–ด ํ†ค์ด ๊ณต๊ฐ ์žˆ๊ณ  ์นœ์ ˆํ–ˆ๋Š”์ง€.
  • Flow : ์ธ์‚ฌ โ†’ ํ™˜์ž ์ •๋ณด ์ˆ˜์ง‘ โ†’ ์ฆ์ƒ ์ˆ˜์ง‘ ๋ฐ ์ด์ „ ์ง„๋‹จ ๊ธฐ๋ก ์—ฌ๋ถ€ โ†’ ์ง„๋ฃŒ๊ณผ ๋ฐฐ์ • ์ˆœ์„œ๋กœ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ง„ํ–‰๋˜์—ˆ๊ณ , ๊ฐ ๋‹จ๊ณ„์˜ ๋ชฉ์ ์ด ๋ชจ๋‘ ๋‹ฌ์„ฑ๋˜์—ˆ๋Š”์ง€.
  • Overall: ์ „์ฒด์ ์œผ๋กœ ์‹ค์ œ ๋ณ‘์› ์ ‘์ˆ˜ ์ƒํ™ฉ์ฒ˜๋Ÿผ ๋А๊ปด์ง€๋Š”์ง€(๋ฌธ์žฅ ํ‘œํ˜„์ด ๊ฐ„๊ฒฐํ•˜๊ณ , ๋Œ€ํ™”์˜ ๋๋งบ์Œ์ด ์ž์—ฐ์Šค๋Ÿฌ์šด์ง€ ๋“ฑ).

๊ธฐํƒ€

ํŽธ์˜๋ฅผ ์œ„ํ•ด LLM ๋ฒˆ์—ญ ๊ธฐ๋Šฅ์„ ์ถ”๊ฐ€ํ–ˆ๋Š”๋ฐ(Change language), ๋ฒˆ์—ญํ•˜๋Š” ๋ฐ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. GPT-5 Nano ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์–ด์„œ ๋ถ€์ •ํ™•ํ•  ์ˆ˜ ์žˆ์œผ๋‹ˆ ๋ฒˆ์—ญ๋ณธ์€ ์ฐธ๊ณ ์šฉ์œผ๋กœ๋งŒ ํ•ด์ฃผ์‹œ๋ฉด ๊ฐ์‚ฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

  • ๊ฐ€๋ น 'I am sorry to hear that'์ด๋ผ๋Š” ๋ฌธ์žฅ์„ '๋“ฃ๊ธฐ์— ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค' ์ด๋ ‡๊ฒŒ ์ด์ƒํ•˜๊ฒŒ ํ•ด์„๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ๋Š”๋ฐ, ๋ฒˆ์—ญํ•ด์„œ ์ฝ๋‹ค๊ฐ€ ์ด์ƒํ•œ ๋ถ€๋ถ„์€ ์˜์–ด๋กœ ์ž ๊น ๋ฐ”๊ฟ”์„œ ์›๋ฌธ์„ ๋ด์ฃผ์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒฝ์šฐ ๋ฒˆ์—ญ์˜ ์˜ค๋ฅ˜์ด๋ฏ€๋กœ ์ด ๋ถ€๋ถ„์€ ๊ฐ์•ˆํ•˜๊ณ  ์ ์ˆ˜๋ฅผ ๋งค๊ฒจ์ฃผ์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.